Sie sind auf Seite 1von 5

This article was downloaded by: [University of Alberta] On: 7 January 2009 Access details: Access Details: [subscription

number 713587337] Publisher Informa Healthcare Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Encyclopedia of Biopharmaceutical Statistics


Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713172960

Adjustment for Covariates


Thomas T. Permutt a a U.S. Food and Drug Administration, Rockville, Maryland, U.S.A. Online Publication Date: 23 April 2003

To cite this Section Permutt, Thomas T.(2003)'Adjustment for Covariates',Encyclopedia of Biopharmaceutical Statistics,1:1,18 21

PLEASE SCROLL DOWN FOR ARTICLE


Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Adjustment for Covariates


Thomas Permutt
U.S. Food and Drug Administration, Rockville, Maryland, U.S.A.

INTRODUCTION The techniques of analysis of covariance are employed in three mathematically similar but conceptually very different kinds of problem. Examples of all three kinds arise in connection with the development of pharmaceutical products. In the first case, a regression model is expected to fit the data well enough to serve as the basis for prediction. In testing the stability of a drug product, for example, the potency may be modeled as a linear function of time, and the possibility of different lines for different batches of the product needs to be allowed for. The purpose of the statistical analysis is to ensure, with a stated degree of confidence, that the potency at a given time will be within given limits. The second and perhaps widest application of analysis of covariance is in observational studies, such as arise in the postmarketing phase of drug development. It may be desired, for example, to study the association of some outcome with exposure to a drug. It is necessary to adjust for covariates that may be systematically associated both with the outcome and with the exposure and so induce a spurious relationship between the outcome and the exposure. In such studies the unexplained variation is typically high, so the model is not expected to fit the individual observations well. It must, however, include all the important potential confounders and must have at least approximately the right functional form, if a causal relationship, or the absence of one, between the outcome and the exposure is to be inferred. The third kind of application of analysis of covariance, although the first historically,[1] is to randomized, controlled experiments such as clinical trials of the efficacy of new drugs. In such experiments, adjustment for covariates is optional in a sense, because the validity of unadjusted comparisons is ensured by randomiza-

Downloaded By: [University of Alberta] At: 06:24 7 January 2009

tion. Adjustments properly planned and executed, however, can reduce the probabilities of inferential errors and so help to control the size, cost, and time of clinical trials. The modeling problem is straightforward, well covered in textbooks, and, strictly speaking, not a matter of adjustment. The observational problem, in contrast, is essentially intractable from the standpoint of formal statistical inference; but heuristic methods have had wide application and discussion. We focus here on the adjustment for covariates in the experimental setting. This problem has had relatively little attention in the literature, partly because early writings[1] are largely complete, correct, and still sufficient. Unfortunately, the more recent literature on modeling and on observational studies has been misapplied to the experimental problem. Either a well-fitting model is thought to be required, as in the first problem, or the analysis is supposed to be heuristic, as in the second. In fact, a rigorous theory of analysis of covariance in controlled experiments can be developed, even in the absence of a good model for the covariate effects.

ADJUSTING FOR BASELINE VALUES Consider the case of a randomized trial of two treatments, with a continuous measure of outcome (Y) that is also measured at baseline (X). If the populations are normal or the samples are large, the treatments might be compared by a two-sample t-test on the difference in mean outcome Y . Alternatively, the change from baseline, Y X, might be analyzed in the same way. The difference between groups in Y and the difference between groups in Y X have the same expectation, because the expected difference between groups in X is zero. We therefore have two unbiased estimators of the same parameter. They have different variances, according to how well the baseline predicts the outcome. If the variances (within treatment groups) of baseline and outcome are the same and the correlation is r, then the standard errors are in the ratio (2 2r)1/2. The adjusted estimator is better if r > 0.5.
Encyclopedia of Biopharmaceutical Statistics DOI: 10.1081/E-EBS 120007378 Copyright D 2003 by Marcel Dekker, Inc. All rights reserved.

The opinions expressed are those of the author and not necessarily of the U.S. Food and Drug Administration. 18

Adjustment for Covariates

19

Of course, there is no need to choose. The average of the two estimators has standard error proportional to (1.25 r)1/2, which is less than either of the two whenever 0.25 < r < 0.75. This average can be written as the difference between treatment groups in Y 0.5X . So Y 0.5X is a less variable measure of outcome than either the mean raw score Y or the mean difference from baseline Y X , whenever the correlation is between 0.25 and 0.75. This can, but need not, be viewed as fitting parallel straight lines with slope 0.5 to the two groups and measuring the vertical distance between them. Naturally, there is no need to choose 0.5 either. The difference in group means of any statistic of the form Y bX can be used to estimate the treatment effect. The smallest variance, and so the most sensitive test, is achieved when b happens to coincide with the least-squares common slope, but the variance does not increase steeply as b moves away from this optimal value. Thus, even a very rough a priori guess for b is likely to perform better than either of the special cases b = 0 (no adjustment) and b = 1 (subtract the baseline). Finally, there is no need to guess. The least-squares slope, calculated from the data, can be used for b, without any consequences beyond the loss of a degree of freedom for error. Asymptotic theory for the resulting adjusted estimator of the treatment effect was given by Robinson,[2] and an exact, small-sample theory by Tukey.[3] In general, then, the best way to adjust for a baseline value is neither to ignore it nor to subtract it, but to subtract a fraction of it. The fraction will be estimated from the data, simultaneously with the treatment effect, by analysis of covariance. There is no need to check the assumption that the outcome is linearly related to the baseline value, because this assumption plays no role in the analysis. If it did, not only the analysis of covariance would be tainted: After all, the unadjusted analysis also assumes a linear relationship, with slope 0, and the change-from-baseline analysis assumes a slope of 1.

Downloaded By: [University of Alberta] At: 06:24 7 January 2009

First, the covariate must be unaffected by treatment. While it is possible to give an interpretation of analysis of covariance adjusting for intermediate causes, this interpretation is not often useful in clinical trials. Any covariate measured before randomization is acceptable. With care, some covariates measured later may be assumed to be unaffected by treatment: the weather, for example, in a study of seasonal allergies. It may be noted that, while analysis of covariance is not usually appropriate for variables in the causal pathway, some of the advantages of analysis of covariance are shared by instrumental-variables techniques[4] that are appropriate. Second, the covariate is assumed to be prespecified. Model-searching procedures are unavoidable in observational studies, for there are typically many potential confounding variables whose effects must be considered and eliminated if necessary. Alarmingly little is known about the statistical properties of such procedures, however, and what is known is not generally encouraging. It is usual, although unjustifiable, to ignore the searching process in reporting the results, presenting simply the chosen model, its estimates, and its optimistic estimates of variability. Randomized trials are radically different from observational studies in this respect. There is no confounding, because a covariate cannot be systematically associated with treatment if it is not affected by treatment and if treatment is assigned at random. The purpose of analysis of covariance in randomized studies is to reduce the random variability of the estimated treatment effects by eliminating some of what would otherwise be unexplained variance in the observations. This difference has implications for the choice of covariates, which will be discussed in the next section.

CHOICE OF COVARIATES Whereas a confounder in an observational study is a variable correlated both with the outcome and with the treatment, a useful covariate in a randomized trial is a variable correlated just with the outcome. The greater the absolute correlation, the more the reduction in residual variance and so also in the standard error of the estimated treatment effect. This benefit is realized whether the treatment groups happen to be balanced with respect to the covariate or not. It is neither necessary nor useful, therefore, to choose covariates retrospectively, on the basis of imbalance.[5] It is accordingly safe to prespecify, in the protocol for a randomized trial, a covariate, or a few covariates, un-

OTHER COVARIATES Any single, prespecified covariate can be adjusted for in much the same way as a baseline measurement of the outcome variable. That is, the mean of a linear function Y bX may be compared across treatment groups, the coefficient b being estimated, simultaneously with the treatment effect, by least squares. Again, the much-tested assumption of a linear relationship between Y and X is superfluous. Two other critical assumptions are sometimes neglected, however.

20

Adjustment for Covariates

Downloaded By: [University of Alberta] At: 06:24 7 January 2009

affected by treatment but likely to be correlated with the outcome. Analysis of covariance, adjusting for these covariates, may then be carried out and relied on, without any justification after the fact. The probability of Type I error will be controlled by significance testing, and the probability of Type II error will be less than if covariates were not used. The improvement, however, depends on the correlations (and partial correlations) between the covariates and the outcome, and these may not be perfectly known ahead of time. It might therefore seem advantageous to determine the correlations for some candidate covariates with the data in view, and select a subset that explains a high proportion of the variance of the outcome. With care, it is possible to specify an unambiguous algorithm for selecting a model and to control the probability of Type I error.[3] It is not known, however, whether such procedures have any advantage with respect to Type II error over simply prespecifying the model. In practice, in critical efficacy trials the relevant covariates will often be apparent in advance; and when they are not, it may not be any easier or better to specify a set of candidates and an algorithm for choosing among the than to specify a single model. The properties of models with large numbers of covariates are not well understood. Various rules of thumb relating the number of variables to the sample sizes have been given, but none has any compelling theoretical justification. Furthermore, searches in large sets of potential models probably share some of the defects of models with many covariates, even if the chosen model has only a few covariates.

It is important to bear in mind, however, that in randomized trials the purpose of the covariate model is to reduce unexplained variance. Thus, nonlinear terms should be introduced when they are expected to explain substantial variance in the outcome, and not simply because it is feared that the assumption of a linear relationship between the outcome and the covariate may be violated. Conversely, trials with outcomes that are successes or failures, survival times, or small numbers of events are analyzed by methods that are nonlinear in the second sense. Recent theoretical developments (the generalized linear model) and computer programs have tended to emphasize the analogies between these methods and the linear model. Some of the same principles undoubtedly apply when such methods are used to analyze randomized trials. For example, if a model selection procedure is used, it is vital to understand the statistical properties of the procedure as a whole, rather than simply to report the nominal standard errors and p-values of the model that happens to be chosen. On the other hand, the similarity in form may conceal important differences in mathematical structure between linear and nonlinear models, and the linear results must not be casually assumed to have nonlinear analogs. It is not clear, for example, that the robustness of linear models against misspecification in randomized trials carries over to all the nonlinear cases.

INTERACTION If the difference in mean outcome between treatments changes as a covariate changes, there is said to be a treatment-by-covariate interaction. In a drug trial, such a finding would have important implications. In the extreme case, the treatment effect might change direction as the covariate changed. That is, a drug that was beneficial in one subset of patients, identified by the covariate, would be harmful in a different subset. Clearly such a drug would be effective. Equally clearly, for such a drug to be useful, the populations in which it was beneficial and harmful would need to be characterized. In less extreme cases, where the magnitude but not the direction of the treatment effect changes, considerations of risk and benefit might also make it very desirable to estimate the effect in different subgroups. The question of interaction often arises in connection with analysis of covariance, but it really has little to do with adjustment for covariates. Everything in the preceding paragraph is equally true whether the covariate in question is adjusted for, ignored, or even

NONLINEAR MODELS The word linear in the context of the analysis of covariance may be understood in two senses. In many applications, the model is linear in the covariates. However, a model with polynomial or other nonlinear covariate effects is still linear in the coefficients, and the least-squares estimators are consequently linear functions of the outcome measurements, so the theory of the general linear model applies. In contrast, logistic, proportional-hazards, and Poisson regression models all involve covariates in a more fundamentally nonlinear way. Nonlinear covariate effects can be added to an analysis of covariance without difficulty. The most common examples are the 1/0 variables used to represent categorical covariates, but polynomial, logarithmic, exponential, and other functions may sometimes be useful.

Adjustment for Covariates

21

Downloaded By: [University of Alberta] At: 06:24 7 January 2009

unmeasured. Furthermore, if the treatment main effect is to be estimated, it is still better to estimate it by analysis of covariance, even without an interaction term, than by the unadjusted difference in means. As with the assumption of linearity, the analysis of covariance is not invalidated by violation of the assumption of parallelism, for this assumption plays no role in the analysis. Also, as with linearity, if this assumption were crucial, its failure would taint as well the unadjusted analysis, which also assumes parallel regressions of the outcome on the covariate, but forces them to have slope 0. The possibility of interaction should be taken into account whenever it appears at all probable that different groups may respond differently. The reason for this is practical and concerns the interpretation and application of the results of a successful trial. However, the presence

of interaction or, what is more common, the inability to rule interaction in or out with confidence, should not be seen as invalidating analysis of covariance nor, especially, as a reason to prefer unadjusted analysis.

REFERENCES
1. Fisher, R.A. Statistical Methods for Research Workers, 14th Ed.; Oliver and Boyd: Edinburgh, 1970; 272 286. 2. Robinson, J. J. R. Stat. Soc., Ser. B 1973, 35, 368 376. 3. Tukey, J.W. Control. Clin. Trials 1993, 14, 266 285. 4. Angrist, J.D.; Imbens, G.W.; Rubin, D.B. J. Am. Stat. Assoc. 1996, 91, 444 455. 5. Permutt, T. Statist. Medd. 1990, 9, 1455 1462.

Das könnte Ihnen auch gefallen