Sie sind auf Seite 1von 18

Tan FES. Confounding in (non-) randomized comparison studies. OA Epidemiology 2013 Dec 30;1(3):21.

Competing interests: none declared. Conflict of interests: none declared. All authors contributed to conception and design, manuscript preparation, read and approved the final manuscript. All authors abide by the Association for Medical Ethics (AME) ethical rules of disclosure.

Licensee OA Publishing London 2013. Creative Commons Attribution License (CC-BY)

Section: Methods Development Confounding in (non-) randomized comparison studies.

FES Tan Department of Methodology and Statistics University of Maastricht

Correspondence: Frans E.S. Tan, Department of Methodology & Statistics, University of Maastricht, P.O.Box 616, 6200 MD Maastricht, The Netherlands, tel:+3143882278,
1

fax: +31433 618388, e-mail: frans.tan@maastrichtuniversity.nl Abstract

The term confounding is commonly use in epidemiologic research. However, the use of this term sometimes leads to confusion. In the literature confounding is defined according to at least two different underlying concepts. We will review these two concepts and as a reference will commence with the underlying concept and formalization of confounding based on the notion of comparability. This comparability concept matches the common understanding about confounding of epidemiologists. We will argue that it is not always a proper way to define confounding according to the comparability concept and check the existence of it using the non-collapsibility approach as often mentioned in standard textbooks. Implications for regression models will be discussed and adjustments based on propensity scores as a promising method to adjust for confounding (as defined according to the comparability concept) will be mentioned briefly.

1. Introduction Researchers often rely on existing statistical techniques like graphical representations of the data, procedures to estimate population parameters, calculations to perform statistical testing, etc. However, when performing this analysis, results should be interpreted with great care. In particular, if the causal relationship between variables is of interest, then the correctness of a causal interpretation depends on how much is known about the underlying causal system. In this paper, the validation problem of how to assess the causal effect between an explanatory factor and an outcome is considered. It is well known that the association between these factors is the combination of the direct effect, indirect effects and joint effects indicating the amount of association attributed to a common cause [1, 2]. In general, factors that are responsible for the joint effects should be identified and proper adjustments should be made in the analysis to determine the causal relationship. In order to locate the various sources of association information about the underlying causal system would then be very helpful [3]. Extraneous factors that are responsible for the joint effects are closely related to the concept of confounding playing a central role in the analysis of causal effects in epidemiologic research. The common understanding of confounding among Epidemiologists is the lack of comparability between an unexposed/control and an exposed/treated group. The general idea is that there will be no confounding if the two groups are fully comparable to each other with respect to all extraneous factors. Otherwise the observed group effect could be (partly) due to differences in distribution of some extraneous factors. The common intuition of confounding more or less matches the adequacy of the control group principle; a term introduced and whose underlying concept is formalized by Wickramaratne and Holford (W&H) [4]. This comparability concept will be explained in more depth in section two. Many textbooks do not make a distinction between this comparability concept and the so-called non-collapsibility principle, where the crude association measure is compared with the stratum specific one. In section three it will be argued that the
3

collapsibility principle is not always a proper way to check and to make adjustments for the existence of confounding as defined according to the comparability concept. A promising alternative method that receives more attention in the last decade is the propensity score method. It aims to directly balance the comparison groups with respect to all potential confounders, provided that confounding is defined according to the comparability concept. Finally in section four some conclusions are made.

2. Adequacy of the control group The comparability concept has been formalized by W&H [4] and a non-technical version of that will be elaborated here. Consider a comparison study (either randomized or non-randomized) in which we would like to ascertain the discrepancy between the results of two treatment conditions that can be attributed to treatment difference only. To get this done, we ideally would like to compare a treated group with the same group in the hypothetical situation whom had not been given the treatment. We will denote the first group as the exposed group, and the second one as the hypothetical control group. Let us consider an outcome R (quantitative or qualitative) to be used to establish the treatment effect, i.e. to compare the exposed group with the hypothetical group. If R is quantitative, then one could be interested in the expected difference E(R e R h ), where R e is the outcome in the exposed group and R h is the outcome in the hypothetical group. The expectation is the average value across samples or simply the average value at population level. If R is qualitative (say, binary (0/1) without loss of generality), then one could be interested in the log(odds ratio) or rate ratio. The discrepancy between these expected values (probabilities if R is binary) will be exclusively accounted for by the treatment difference, because -apart from the treatment difference- the two groups are fully comparable to each other. The discrepancy between the two expected outcomes is called the expected causal effect [5]. However, in a real life situation, we do not have such a hypothetical control group. Instead a comparison group is observed that hopefully resembles the
4

hypothetical control group. Denote this group as the unexposed group with outcome R u , then this group will serve as a proxy for the hypothetical group. The variable of primary interest that identifies the exposed group and the (proxy) unexposed group will be denoted by F. If the expected outcome in the unexposed group differs from that of the hypothetical group, i.e. E(R u ) E(R h ), then there is confounding. The (extraneous) factors that are responsible for confounding are denoted as confounders. It should be emphasized that no reference to any underlying statistical model has been made. Hence the existence of confounding according to this definition is model independent. Furthermore, if there is (no) confounding, it will remain so no matter what association measure between the explanatory variable F and the outcome R is used. This concept of confounding was also described in several papers [e.g. 6 - 9]. W & H denote this concept as the adequacy of the control group and for convenience we will call this the comparability concept. W&H also made the distinction between population and sample confounding but in this paper, we will only consider confounding at population level, because in statistical literature bias (e.g. due to confounding), is basically defined at population level. Some authors [9] have even classified confounding at sample level as spurious confounding. Be that as it may, Morabia [10] mentioned that the epidemiological concept of confounding has had a convoluted history and the above-mentioned concept is considered to lead to the modern understanding of confounding. However, it should be noted, that the comparability concept dates back to the work of Neyman [11] in the twenties and Rubin [5, 12] in the seventies of the last century. This concept is also known by them as the potential outcome or counterfactual approach. In what follows, we first reflect on the conditions and implications of noconfounding before discussing the problem of how to detect and adjust for confounders (according to the comparability concept). A general consent is that the expected causal effect of a specified treatment can be determined in the absence of confounding. One way to accomplish this, is to design a study for which there are no confounders involved. It can be shown [4] that if a potential confounder C (that may be correlated with the outcome R (using
5

the common Pearson product-moment correlation)) is uncorrelated to the variable of primary interest F, i.e. (C, F) = 0, then C is not a confounder. There will be no confounding if none of the potential confounders is correlated with F. Several implications can be made when implementing this sufficient condition for no confounding. Two of which will be mentioned briefly.

2.1.

Balanced designs.

Consider the following study where the distribution of a potential confounder AGE (C) among men is the same as that among women. Suppose that SEX (F) (with levels men/women) is the variable of primary interest. Table 1. shows the number of patients in each of the cells of the 2 x 2 cross-classified table.

==Insert table 1==

The balanced design ensures that the sufficient condition is satisfied, i.e. (AGE, SEX) = 0. Hence AGE is not a confounder according to the comparability concept. To clarify the meaning of this result, consider for sake of argument that the outcome R is systolic blood pressure (SBP). Suppose that the design is not balanced, i.e. there will be an over-representation of, say elderly men with higher average SBP level. If we do not account for this AGE imbalance, then any observed difference in SBP between men and women could be partly due to the difference in SBP between elderly and young people, i.e. sex-effect is confounded by age-effect. To prevent this possible confounding effect, it is sufficient to have a balanced design.

2.2.

Randomized trials.

The predominant argument in favor of randomization is the expectation (over samples) of comparability between the two comparison groups with respect to all characteristics other than the difference in intervention status. Despite the fact that for a particular sample, the (observed) unexposed group may differ from the hypothetical
6

control group with respect to several potential confounders, it is known that the correlation between the primary variable F and all potential confounders will be expected to be zero, i.e. over all randomizations the comparison groups will be balanced [13]. Hence, when randomization is performed properly, then there will be no confounding at population level. Moreover, it should be noted, that for a particular randomization, the comparison groups may be unbalanced [7, 8]. This is the reason why many researchers still make adjustments for this imbalance at sample level. However, this paper does not deal with the problem whether or not one should make adjustments in randomized trials. As already indicated in the introduction, it will be argued in this paper that to adjust for confounding according to the collapsibility concept does not necessarily solve the problem of confounding according to the comparability concept. Let us agree that confounding in the rest of the paper is according to the comparability concept, for the sake of discussion and to choose a reference definition.

3. Conditions for confounding and methods to detect confounding So far we can summarize that sufficient conditions for a variable C being not a confounder are 1. C is uncorrelated to the variable F of primary interest, i.e. (C, F) = 0, or 2. C is uncorrelated to the outcome R, conditional on F, i.e. (C, R | F) = 0. These sufficient conditions are also mentioned by other authors [4, 7]. It should be mentioned that some authors [7, 14] claimed that condition (1) should be checked at sample level; leading to the concept of confounding at sample level [4]. If the sufficient conditions for no confounding are not fulfilled, then some extra conditions are still necessary for C to be a confounder. It is possible that the observed unexposed group differs from the hypothetical control group with respect to the distribution of baseline characteristics, while the outcome R still remains unchanged due to factors whose effects balanced each other out. This is why the conditions as stated in
7

the literature, are only necessary but not sufficient for confounding. Basically, the necessary conditions for C being a confounder are the negation of the sufficient conditions for no confounding. On top of this, what is necessary for confounding, is that C is not a mediator, i.e. C itself is not a cause of the variable of primary interest. Unfortunately, a direct comparison between the unexposed and the hypothetical control group is not feasible, because the latter is not observed. Hence alternative adjustment methods should be used to deal with confounding in general . In many textbooks [15-17], regression methods are used for detecting confounders. These methods follow the non-collapsibility concept, which as we will discuss below, appears to not fully match the comparability concept.

3.1.

Non-collapsibility

To explain the principle of non-collapsibility, consider a study about the effect of surgical treatments on symptoms of malnutrition one week after the operation. A comparison is made between men and women (SEX). The covariate AGE is considered to be a potential confounder. Consider further the following regression models expressed at population level (assume for the moment that there are no other potential confounders except AGE).
( ) = 0 + 1

Linear: Logistic:

() = 0 + 1

(3)

(1)

() = 0 + 1 + 2

() = 0 + 1 + 2

(2)

(4)

If the stratum-specific sex effect 1 (model 2 if R is quantitative and model 4 if R is (marginalized over the distribution of AGE) to determine the sex effect. (Non-) collapsibility is often associated with the absence (presence) of confounding [17]. However, this is not entirely true.

binary) is equal to the marginal sex effect 1 (model 1 if R is quantitative and model 3 if

R is binary), then and only then can the joint distribution of SEX and AGE be collapsed

tables 2a, 2b, 2c show the results of the (fictitious) study.

==Insert table 2a, 2b, 2c about here==

The tables 1 and 2a marginalized the conditional tables 2b and 2c by summing up over the outcome - and age frequencies, respectively. Table 2a shows that 50 % of the patients do have symptoms of malnutrition one week after an operation. A comparison is made between men and women (variable F of primary interest) and because the AGE distribution (table 1) among men is the same as that among women, AGE is therefore not a confounder. The effect of SEX can be estimated using model (3). Table 3a shows the results of the analysis. It turns out that
1 1 the estimated marginal sex effect is equal to = 0.605 with standard error s.e( )=

confidence interval (1.43, 2.23). However, in contrast to the linear regression situation, 1 = 1.558 with the estimate of the stratum-specific sex effect in model 4 is equal to ratio is equal to 1 ) = 0.369. The corresponding estimated stratum-specific odds standard error s.e (

= 1.83, with a 95% 0.202. The corresponding marginal odds ratio is equal to

= 4.75, with a 95% confidence interval (4.2, 9.26). Apparently, the odds ratio can that SEX is uncorrelated to AGE, i.e. the cross-classification table (2 b, c) cannot be collapsed into table 2a without distorting the odds ratio. This phenomenon of non-

change substantially when calculated for men and women separately, despite the fact

collapsibility while C is not a confounder is due to the non-linear relationship between the outcome R (malnutrition) and F (SEX). Which model (3 or 4) should be used depends on the research question. Therefore, if one is not interested in the odds ratio of men and women separately, then model 1 can be chosen and the estimated marginal = 1.83 is then an unbiased estimate of the true causal effect. On the odds ratio of

other hand, if one is more interested in the AGE-specific sex effect, then AGE should be is closer to one, included in the model. Note, that the estimated marginal sex effect

10

[18]. This though estimated more precisely, than the stratum-specific sex effect (C, F) = 0.

statistical artifact is independent of the data or subject content and will hold whenever

Non-collapsibility does not always imply confounding and may be just a reflection of the discrepancy between stratum-specific and population-average effects [14] for non-linear relationships. Moreover, not only non-collapsibility in logistic regression may occur while there is no confounding as previously stated, collapsibility may also occur when there is confounding [14]. In contrast to confounding according to the comparability concept, non-collapsibility depends on the chosen statistical model and association measure. The concept of collapsibility was actually introduced by Bishop, Fienberg and Holland [19] in the context of a multivariate log-linear modeling (multidimensional cross-classified table analysis). When analyzing the relationship between variables (e.g. relationship between F and R), they discussed procedures and developed rules for variables reduction to simplify the analysis. However, no connection to confounding was made by them. An approach, that has gained popularity among researchers in the last decade, concerns methods that attempt to balance the comparison groups using propensity scores and is introduced by Rosenbaum and Rubin [20]. Several authors gave a nice introduction to this topic (see e.g. [21] and the references herein). It should be emphasized that adjustments based on propensity scores are only appropriate for dealing with confounding effects according to the comparability concept and not necessarily according to that of the non-collapsibility concept.

3.3.

Propensity scores method

A propensity score is the probability of a subject being assigned to the exposed group given all relevant baseline potential confounders. The idea is that for all subjects with the same propensity score, the distribution of the covariates will be similar between the comparison groups. Thus the confounding effects will be removed, provided that there are no unmeasured potential confounders. In practice, there may be many
11

confounders involved that should be adjusted for. The specification of the regression model should ideally be guided by subject content knowledge, biological plausibility and the data [22]. Information about the causal network [3] could help the researcher in the analysis of causal effects. The propensity scores can be estimated by means of a logistic model that specifies the probability of being assigned to the exposed group as a function of all potential confounders. Austin [21], for example, has elaborated some methods based on propensity scores whereby one could specify a regression model with the estimated propensity scores as a covariate [21]. Thus it can be shown that the effect of F on R will not be confounded according to the comparability concept. Note, that there are other adjustment methods based on propensity scores. A nice overview is given by Austin [21].

4. Conclusion According to the common intuition of confounding in the analysis of causal effects, it does make sense to formalize this concept according to the comparability concept. If confounding at population level is of main interest, then randomization ensures that there will be no confounding and the causal effect can be estimated by the observed (unadjusted marginal) treatment effect. This will be the case for quantitative as well as qualitative outcomes. However, for non-randomized trials, the concept of non-collapsibility can be used to search for confounders only in the context of linear regression modeling, provided that C is not a mediator. In this case, collapsibility and comparability are equivalent concepts. Collapsibility does not necessarily imply non-confounding and noncollapsibility does not necessarily imply confounding in the sense of comparability for logistic regression modeling. In standard textbooks, it is recommendable to emphasize that the concept of collapsibility is different from that of comparability and that the common understanding of what confounding is, does follow the concept of comparability.
12

An important implication is that confounding according to the comparability concept is model independent and also does not depend on the association measure between the comparison groups and the outcome. However, non-collapsibility does depend on the statistical model and also depends on the chosen association measure. It should be emphasized that no conclusion is made about which concept is better, although the comparability concept has a more intuitive interpretation and is therefore more appealing. Therefore, when using non-linear models, scientists should not interpret confounding according to the comparability concept and check and adjust for that, using the non-collapsibility principle. According to the comparability concept, balancing the comparison groups based on propensity scores seems a promising method and preferably should be included as part of the curriculum for statistics education.

13

LITERATURE

1. Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, vol 8, 96-146. 2. Saris, W., Stronkhorst, H. (1984). Causal modeling in nonexperimental research. Amsterdam: Sociometric research foundation. 3. Greenland, S., Brumback, B. (2002). An overview of relations among causal modeling methods. International epidemiological association; 31. 1030-1037. 4. Wickramaratne P.J., and Holford, T.R. (1987). Confounding in epidemiologic studies: the adequacy of the control group as a measure of confounding. Biometrics 43, 751-765. 5. Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized treatments. Journal of Educational Psychology, 66: 688-701. 6. Anderson, S., Auquier, A., Hauck, W.W., Oakes, D., Vandaele, W, and Weisberg, H.I. (1980). Statistical methods for comparative studies. New York: Wiley. 7. Miettinen, O.S. and Cook, E.F. (1981). Confounding: Essence and detection. American Journal of Epidemiology 96, 383-388. 8. Greenland, S., Robins, J.M. (2009). Identifiability, exchangeability and confounding revisited. Epidemiologic Perspective and Innovations. doi: 10.1186/1742-5573-6-4. 9. Schall, R., Zucchini, W. (1990). Model selection and the estimation of odds ratios in the presence of extraneous factors. Statistics in medicine vol 9, 1131-1141. 10. Morabia, A. (2011). History of the modern epidemiological concept of confounding. Journal of Epidemiology and Community Health, 65: 297-300. 11. Neyman J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9, Statistical Science. 1990; 5: 465480. 12. Rubin, D.B. (1978). Bayesian inference for causal effects: the role of randomization. Annals of Statistics; 6: 34-58.
14

13. Senn, S. (1994). Testing for baseline balance in clinical trials. Statistics in Medicine, 13: 1715 1726. 14. Greenland, S., Robins, J.M., Pearl, J. (1999). Confounding and collapsibility in causal inference. Statistical Science; 14, 1: 29 48. 15. Breslow, N.E. and Day, N.E. (1987). Statistical methods in cancer research, volume II. The design and analysis of cohort studies. Lyon: IARC scientific publications. 16. Hosmer, D.W., Lemeshow,S. (1989). Applied logistic regression. New York: Wiley. 17. Kleinbaum, D.G., Kupper, L.L., and Morgenstern, H. (1982). Epidemiologic research. Belmont, California: Lifetime learning publications. 18. Robinson, L.D., Jewell, N.P. (1991). Some surprising results about covariate adjustment in logistic regression models. International statistical review, 58, 2, pp. 227-240. 19. Bishop, Y.M.N., Fienberg, S.E., Holland, P.W. (1975). Discrete multivariate analysis; theory and practice. MIT Press, Cambridge, Mass. 20. Rosenbaum, P.R., Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. 21. Austin, P.C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46: 399-424. 22. McNamee, R. (2004). Regression modeling and other methods to control confounding. Journal of occupational and environmental medicine, 62, 500- 506.

15

Table 1. a balanced design


Age young Sex Man female 100 100 old 100 100 200 200

Table 2. Malnutrition study


Table 2a
marginal table of Sex versus malnutrition

Table 2b

Table 2c
Conditional table of sex versus malnutrition amongst old men and women

Conditional table of sex versus malnutrition amongst young men and women

Age

Young (code 0) Malnutrition no yes Total Total 5 20 95 80 100 100 man female

Old (code 1) Malnutritoin no yes

Malnutrition no yes Total Sex man female 85 115 115 85 200 200

man female

80 95

2 5

100 100

OR (RF)= 1.83, OR(RF|C = 0) = 4.75, OR(RF|C = 1) = 4.75

16

Table 3a. output logistic regression analysis of the malnutrition study s.e. Wald () = 0 + 1 Sex constant 0.605 -0.302 0.202 0.143

OR 1.830

95% C.I. OR* (1.43, 2.23)

8.932 4.466

Table 3b. Sex Age constant () = 0 + 1 + 2 1.558 -4.331 1.386 s.e. 0.369 0.369 0.235 Wald 17.784 137.388 34.723 OR 4.750 95 % C.I. OR (4.2, 9.26)

17

Das könnte Ihnen auch gefallen