Beruflich Dokumente
Kultur Dokumente
(AS08)
EPM304 Advanced Statistical Methods in Epidemiology
This document contains a copy of the study material located within the computer
assisted learning (CAL) session.
If you have any questions regarding this document or your course, please contact
DLsupport via DLsupport@lshtm.ac.uk.
Important note: this document does not replace the CAL material found on your
module CDROM. When studying this session, please ensure you work through the
CDROM material first. This document can then be used for revision purposes to
refer back to specific sessions.
These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of
the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale
or further copying.
London School of Hygiene & Tropical Medicine September 2013 v2.0
Objectives
By the end of this session you will be able to:
This session should take you between 1h 15m and 2 hours to complete.
AS01
SM07, SM08, SM09
SM11, AS05
+ i xi
+ 1x1 + 2x2 +
where:
y is a measure of disease
occurrence
xi are the explanatory variables (or
categories within them)
and i (i = 1, ... , I ) are the
regression parameters, which have to be estimated from the data.
p3c1rhs
In each of the following models, what is the disease outcome measure, and in what
form is the disease outcome measure modelled:
a) a Poisson regression model?
b) a logistic regression model?
Interaction: Button: clouds picture (pop up box appears and text and an interaction
appear on bottom RHS):
For Poisson regression the outcome measure y is the rate , but remember it is log(
y ) that is modelled.
+ 1x1 + 2x2
If we want the model equation on the original scale of y (so rates or odds), we need
to exponentiate both sides of the equation. Using the laws of logarithms, we have
y = exp() exp(1x1) exp(2x2)
Then the rates (or odds) for the four combinations of exposure are:
E1E1+
E2exp()
exp() exp(1)
E2+
exp() exp(2)
exp() exp(1) exp(2)
so that the effects of the two exposures multiply together.
Note that exp(1) is a rate ratio for the effect of E1 if the outcome y is a rate, and it
is an odds ratio for the effect of E1 if the outcome y is an odds. Similarly for exp(2)
Pop-up for each cell of the table:
Interaction: Button: "exp()" (pop-up box appears):
In this case there is no exposure to E1 or E2, so x1=0 and x2=0 and the rate y is
given by
exp() exp(1 x 0) exp(2 x 0)= exp()exp(0)exp(0) = exp()
is given by
The simplest alternative to the multiplicative model is the additive model. The
general form of an additive model is:
y=
+ i xi
Notice that this model is additive on the original scale of y, hence the term
additive model. The outcome y is modelled on its original scale.
This type of model is suitable for modelling rates and risks.
Interaction: Tabs: Rates:
In cohort studies, we have person-time data and y is the rate .
=
+ i xi
= + i xi
This is the additive risk model.
The parameters i represent risk differences whereas in logistic regression, where
we have a multiplicative model, they represent a log(odds ratio).
Note: This model involves neither logs nor odds, although it is often termed "logistic
regression with additive risks". A better term would be binomial regression. Note
also that this model is for risks (not odds), so it may not be used for case-control
studies.
+ 1x1 + 2x2
Under this model, the rates (or risks) for the four combinations of exposure are
shown opposite. You can click the button below the table to compare this with the
multiplicative format.
Notice how the effect due to E1 (+1) and the effect due to E2 (+2) add to give
the combined effect (+1 + 2).
Notice also how the effect of E1 - as measured by 1 - is measured as a rate (or
risk) difference.
And the effect of E2 - as measured by 2 - is measured also as a rate (or risk)
difference.
In contrast, with a multiplicative model 1 and 2 were each (i) a log(odds ratio), if
y was the log(odds of outcome) or (ii) a log(rate ratio), if y was the log(rate of
outcome)
Additive model
E1
E1+
E2
+1
E2+
+2
+1+2
Since the outcome is modelled on the original scale, there is no need to exponentiate
the coefficients.
Interaction: Button: (pop up box appears):
In this case, there is no exposure to E1 nor to E2. So x1 = 0 and
x2 = 0, and the rate y is given by:
y = + 1x1 + 2x2
= + (1 x 0) + (2 x 0)
=
Interaction: Button: +1 (pop up box appears):
In this case, there is exposure to E1 but not to E2. So x1 = 1 and
x2 = 0, and the rate y is given by:
y = + 1x1 + 2x2
= + (1 x 1) + (2 x 0)
= + 1
Interaction: Button: + 2 (pop up box appears):
In this case, there is exposure to E2 but not to E1. So x1 = 0 and
x2 = 1, and the rate y is given by:
y = + 1x1 + 2x2
= + (1 x 0) + (2 x 1)
= + 2
Interaction: Button:
50 to 64 years,
65 to 74 years.
The outcome, y, is the ischemic heart disease (IHD) mortality rate (per 1000 personyears)
In a multiplicative Poisson model, log(y) = log(rate) is modelled.
In an additive rate model, y = rate is modelled.
Smoking
(x1)
50-64 years=0
65-74 years=1
No/ex=0
2.76 (240/86863)
8.78 (243/27692)
Yes=1
5.60 (376/37131)
12.32 (293/23786)
Use the button to swap to a table of rate ratios for the effect of age group,
separately for smokers and non-smokers, and rate differences for the effect of age,
separately for smokers and non-smokers. On the basis of these rate ratios and rate
differences, do you think the effects of smoking and age combine multiplicatively or
additively?
Interaction: Button: Swap (table on centre bottom changes to the following):
Rate ratios and differences for effect of age by smoking
Smoker
Non/ Ex Smoker
Rate ratio for age:
3.18
2.20
6.02
6.72
Interaction: Button: The rate differences for the effect of age are similar (6.02
compared with 6.72) for non/ex-smokers and smokers. The rate ratios for age are
not so similar for non/ex-smokers and smokers (3.18 for non/ex-smokers compared
to 2.20 for smokers). This suggests that the effects of smoking and age might
combine additively, but perhaps not multiplicatively, in which case an interaction
term might be needed if we fit a multiplicative model.
We could form a similar table to summarise the effect of smoking, separately for age
group 50-64 years old, and age group 65-74 years old. If we did this, we would find
that the rate ratio for smoking in individuals aged 50-64 years old is 2.03 and the
rate ratio for smoking in individuals aged 65-74 years old is 1.40. And the rate
difference between smokers and non/ex-smokers is 2.84 for individuals aged 50-64
years old, compared to 3.54 for individuals aged 65-74 years old.
10
Multiplicative Poisson model with smoking and age, and the interaction
between them
0.7066
1.1556
0.3674
Standa
rd
Error
0.0826
0.0910
0.1198
1.0163
0.0645
Coeffici
ent
Smoking
Age
Smoking.
Age
Constant
P > |z|
8.55
12.70
3.07
< 0.001
< 0.001
0.002
15.74
< 0.001
95% Confidence
Interval
0.545
0.977
0.602
0.869
1.334
0.133
0.890
1.143
Log likelihood = -1225.701
0.534
5
Age
0.942
6
1.117
8
Constant
Stand
ard
Error
0.0597
P > |z|
95% Confidence
Interval
8.95
< 0.001
0.417
0.652
0.0591
15.95
< 0.001
0.827
1.058
0.0527
21.21
< 0.001
1.014
1.221
Log likelihood = -1230.411
In the model without interaction, what is the rate ratio for smoking, to 2 decimal
places?
RR (Smoking) =
In the model with interaction, what is the rate ratio for the additional joint effect of
smoking and age, to 2 decimal places?
RR (Smoking.Age) =
Interaction: Calculation: RR (Smoking) =____:
Correct Response 1.71 (pop up box appears):
Correct
That's right, the rate ratio is given as the exponential of the coefficient for smoking:
RR = exp(0.5345) = 1.71
Incorrect Response (pop up box appears):
Sorry, the rate ratio should be calculated as the exponential of the coefficient for
smoking (because the coefficient is the log rate ratio):
RR = exp(0.5345) = 1.71
11
P > |z|
0.7066
1.1556
0.3674
Standa
rd
Error
0.0826
0.0910
0.1198
8.55
12.70
3.07
< 0.001
< 0.001
0.002
1.0163
0.0645
15.74
< 0.001
Coeffici
ent
Smoking
Age
Smoking.
Age
Constant
95% Confidence
Interval
0.545
0.977
0.602
0.869
1.334
0.133
0.890
1.143
Log likelihood = -1225.701
12
0.5345
0.9426
1.1178
Standa
rd
Error
0.0597
0.0591
0.0527
P > |z|
95% Confidence
Interval
8.95
15.95
21.21
< 0.001
< 0.001
< 0.001
0.417
0.827
1.014
0.652
1.058
1.221
= exp(1.12+0.53x1 +0.94x2)
= exp(1.12) x exp(0.53x1) x exp(0.94x2)
From this we can obtain the fitted mortality rates (i.e. those predicted under this
model), for each of the four combinations of smoking and age. The fitted mortality
rates are given in the table opposite.
Interaction: Scroll over log( ):
Log rate
Interaction: Scroll over 1.12:
Baseline log rate (i.e. log rate in non/ex-smokers who are aged 50-64 years
old)
Interaction: Scroll over 0.53 x1:
Log RR for smoking
Interaction: Scroll over 0.94 x2:
Log RR for age
Interaction: Tabs: Interaction:
Multiplicative model with an interaction:
13
log( ) =
13.33
14
15
12.43
16
17
Smoking
2.8380
Standa
rd
Error
0.3395
Age
6.0120
0.5905
Smoking.Ag
e
Constant
0.7053
0.9747
2.7630
0.1783
Coeffici
ent
P>
|z|
8.36
10.1
8
0.72
<
0.001
<
0.001
0.469
15.4
9
<
0.001
95%
Confidence
Interval
2.173
3.503
4.855
7.160
1.205
2.616
2.413
3.113
2.9246
Standa
rd
Error
0.3189
Age
6.2789
0.4712
Constant
2.7394
0.1746
P>
|z|
9.17
<
0.001
<
0.001
<
0.001
13.3
3
15.6
9
95%
Confidence
Interval
2.300
3.550
5.355
7.202
2.397
3.082
Log likelihood = 1225.964
In the model without interaction, what is the rate difference for smoking, to 2
decimal places?
RD (Smoking) =
In the model with interaction, what is the rate difference for the additional joint
effect of smoking and age, to 2 decimal places?
RD (Smoking.Age) =
Interaction: Calculation: RD (Smoking) =____:
Correct Response 2.92 (pop up box appears):
Correct
That's right, the rate difference is the coefficient for smoking in the table:
RD = 2.92
Incorrect Response (pop up box appears):
Sorry, the rate difference is the coefficient for smoking in the table since the
outcome is modelled on the original scale:
18
RD = 2.92
Interaction: Calculation: RD (Smoking.Age) =____:
Correct Response 0.71 (pop up box appears):
Correct
Yes, the rate difference is the coefficient for smoking.age in the table:
RD = 0.71
Incorrect Response (pop up box appears):
Sorry, the rate difference is the coefficient for smoking.age in the table since the
outcome is modelled on the original scale:
RD = 0.71
19
2.74 + 2.92
+ 6.28
= 11.94
Model:
= 2.74 + 2.92 smoking + 6.28 age
Interaction: Button: Explanation (pop up box appears):
Explanation
20
In this example, no interaction term has been added to the additive model, so the
effects of smoking and age are assumed to combine additively.
The fitted mortality rate for an individual aged 50-64 years old who is a non/exsmoker is 2.74.
The fitted mortality rate for an individual aged 65-74 years old who is a smoker is
obtained by adding together the baseline rate, the effect of smoking and the effect
of age: 2.74 + 2.92 + 6.28 = 11.94.
Fitted mortality rates estimated from the additive rate model with
interaction:
Smoking
(x1)
None / exsmoker
(=0)
Current
smoker
(=1)
Agegroup (x2)
50-64
65-74 years
years
(=1)
(=0)
2.76
2.76 + 6.01
= 8.77
2.76 +
2.84
= 5.60
2.76 + 2.84
+ 6.01 +
0.71
= 12.32
Model:
= 2.76 + 2.84 smoking + 6.01 age
+ 0.71 smoking.age
Interaction: Button: Explanation (pop up box appears):
In this model the effects of smoking and age do not combine additively. The fitted
mortality rate for a non/ex-smoker aged 50-64 years old is 2.76. The estimated
mortality rate for an individual aged 65-74 years old who is a smoker is: 2.76 + 2.84
+ 6.01 + 0.71 = 12.32.
21
In our example, both rate differences are greater than zero so we can interpret the
interaction coefficient in the following way.
An interaction coefficient close to zero suggests additive effects.
A large negative interaction coefficient suggests less than additive effects.
A large positive interaction coefficient suggests greater than additive effects
(multiplicative, perhaps).
In the additive model with interaction opposite, the interaction term is small (0.70)
and there is no evidence it is different to zero (p=0.47), suggesting additive effects.
If the effects of the exposures combine multiplicatively then they cannot combine
additively, and vice versa. However, in reality it may not be possible to
distinguish between the models
No interaction on the multiplicative scale means there is an interaction on the
additive scale, although it might not be reflected by P-values from the hypothesis
tests. Hence, when reporting interaction results, it is important to specify the
scale e.g. there is (no) heterogeneity of rate ratios or there is (no)
heterogeneity of rate differences.
Interaction tests have low power and should be interpreted cautiously - large pvalues suggest data are compatible with no interaction but may mean not
enough power to detect an interaction.
Conducting lots of tests for interaction may lead to evidence for one or more
interactions just by chance.
The fitted values of the IHD mortality rates for the above additive model with an
interaction (2.76, 8.77, 5.60, 12.32) are exactly equal to the observed data. The
fitted values for the multiplicative model with an interaction (2.77, 8.85, 5.64,
12.43) are different to the observed data, but only because of rounding error
(because in the calculations we worked with all values to only 2 decimal places).
The fitted values from a model will always be exactly the same as the observed
data when there are as many parameters in the model as there are data points
(the models with interactions each have 4 parameters, which are estimated from
the 4 observed rates in the data).
22
points). With more covariates (anything from 5 or more), such models are
unnecessarily complicated. We would like to have a model that is "as simple as
possible, but no simpler".
In this example, we would either fit
a multiplicative model with an interaction (since there was evidence against the
null hypothesis of no interaction), or,
an additive model without an interaction (as the data were compatible with the
null hypothesis of no interaction).
The additive model is preferable, based on statistical considerations alone, because it
describes the data with fewer parameters (3 rather than 4).
23
make them easier to work with, and this is one of the reasons why they are much
more commonly used. Additive models tend to have convergence problems and
therefore they generally take longer to fit (sometimes they fail to converge).
Also, the Wald-based confidence intervals and P-values from additive models
can be misleading.
24
more stages of the process. Data on the effects of a particular exposure on risk are
used to make inferences about the stage or stages at which that exposure has its
effect. For the joint effect of two exposures, the general conclusion is that if they act
at the same stage their effects can be expected to combine additively, while if they
act at different stages their effects can be expected to combine multiplicatively.
Section 6: Summary
Multiplicative Models:
Ratios are useful for studying aetiology
Many effects combine multiplicatively
Models usually converge
Models easily fitted in standard software
Additive Models:
Differences are useful public health measures
Some effects known to combine additively
Models sometimes do not converge
25