Stats - Repeated Measures

HPH 560
Prof. Tia Palermo

Program in Public Health
Lecture 9: Repeated Measures I
October 24, 2013
Lecture outline
Repeated measures
Lab
Articles discussion
Repeated measures and

longitudinal data
Examples of data collection

Echocardiograms measured at 1,3, and 6 days
after admission to hospital for a brain
hemorrhage
Groups of patients in a urinary incontinence
study assembled from different treatment
centers
Susceptibility to tuberculosis measured in
family members
What are some

characteristics/issues with these
types of studies?
The outcomes are _______________
across observations
Predictor variables can be associated
with different _____________________
___________ across observations

Echocardiograms on the same person over
time are more similar to one another than to
those on other people
Groups of patients from a single center may
yield similar responses b/c of treatment
protocol variations from center to center, or
persons/machines providing measurements
Multilevel nature
Patients may be clustered within a surgeon
Multiple surgeons may be clustered within
institutions
Can you think of how household survey data
might be clustered?
What about your own data for this class?
Analysis example: HDL levels

Test differences in HDL by pill type
Continuous outcome, multiple pill types
(none, tablet, capsule, coated) so what test do
you want to use in bivariate analysis?
Data: repeated measures on 6 subjects (n=24)
Table 8.1
pill type
subject
none
tablet
capsule
coated
subject average
44.50
7.30
3.40
12.40
16.90
33.00
21.00
23.10
25.40
25.63
19.10
5.00
11.80
22.00
14.48
9.40
4.60
4.60
5.80
6.10
71.30
23.30
25.60
68.20
47.10
51.20
38.00
36.00
52.60
44.45
pill type average
38.08
16.53
17.42
31.07
25.78
Analysis example: LDL cholesterol

. anova LDL pilltype
Number of obs =
24
Root MSE
= 18.9649
R-squared
=
Adj R-squared =
0.2183
0.1010
Source | Partial SS
df
MS
F
Prob > F
-----------+---------------------------------------------------Model |
2008.6017
3 669.533901
1.86
0.1687
|
pilltype |
2008.6017
3 669.533901
1.86
0.1687
|
Residual | 7193.36328
20 359.668164
-----------+---------------------------------------------------Total | 9201.96498
23 400.085434
Interpreting anova results

Is there a difference according to this analysis?
What is an important assumption of ANOVA?
Analysis example: LDL

The model
= + + +
~. . (, 2 )
Take into account clustering by subject

(two-way anova)
anova LDL subject pilltype
Number of obs =
Root MSE
=
24
10.344
R-squared
=
Adj R-squared =
0.8256
0.7326
Source | Partial SS
df
MS
F
Prob > F
-----------+---------------------------------------------------Model | 7596.98166
8 949.622708
8.88
0.0002
|
subject | 5588.37996
5 1117.67599
10.45
0.0002
pilltype |
2008.6017
3 669.533901
6.26
0.0057
|
Residual | 1604.98332
15 106.998888
-----------+---------------------------------------------------Total | 9201.96498
23 400.085434
Previous example
Once we control for subject, we see that there
are significant differences by pill type.
Residual SS from two models:
7193-1604=5589
5589/7193=.78
78% of residual variation has been attributed to
subject-to-subject variation
How does interpretation change?

Compares pill type within a subject rather than between
subjects.
What differences do you see?
Taking into account correlation may alter results and
conclusion
Large subject effect means observations on one subject are
quite different from those on another subject.
In this example, a large portion of what was residual variance
in the first model has been attributed to subject-to-subject
variation (78%)
Model equations for LDL example

= + + +
~. . (, 2 )
Assumes that the subject effects are selected
from a distribution of possible subject effects,
independently of
Subjects are assumed to be a random sample
from a larger population of subjects
Inclusion of subject effect models a correlation in
the outcomes
Subject effect
In the model on the previous slide, takes into
account effect of each subject (subject effect)
Simultaneously raises or lowers all the
measurements on that subject.
Model assumes that subject effects are selected
from a distribution of possible subject effects,
independently of the error term.
i.e., subjects are assumed to be a random sample
from the larger population of subjects to which we
wish to draw inferences.
Correlations within subjects

We can calculate how highly correlated measurements
are within the same person (covariance between two
observations on the same subject)
Observations on the same subject are modeled as
correlated through their shared random subject effect
The larger the subject effects in relation to the error
term, the larger the correlation
Relatively large subject effects means the observations are
different between subjects but more similar within
subjects
Subject-to-subject variability
Subject-to-subject variability simultaneously
raises/lowers all the observations on a subject
This induces a correlation
The variability of an individual measurement
can be separated into
(1) that due to subjects and
(2) that due to residual variance
Clustered data may change:

Analysis approach
Results
Estimates based on the assumption of
independent data are quite good
Its the estimation of the standard errors and the
tests that go awry when failing to accommodate
correlated data
Ignoring correlation can make p-value too

small or too large
Hierarchical/Multi-level data
analysis
Hierarchical data analysis

Use knowledge of the variation between and
within units of analysis (eg, physicians) in
order to quantify the degree of unreliability of
individual unit averages and make significant
adjustments
Failing to do so might overstate degree to
which units differ
Could have adverse consequences in
interpretation of results
Issues with hierarchical data

Observations taken within the same subgroup
in a hierarchy are often more similar to one
another than to observations in different
subgroups, other things being equal.
Data clustered in the same level are likely to
be correlated
Statistical methods such as ANOVA, OLS,
logistic regression assume that observations
are independent.
Analysis strategies for hierarchical data

1. Separate analyses for each subgroup
2. Analyses at highest level in the hierarchy
3. Analyses on derived variables
Longitudinal (panel) data

Collect data repeatedly through time on same
individual (panel datasets)
We are interested in the change in the value of a
variable within a subject
Why is panel data desirable?

Ability to resolve causal ordering
Ability to study effect of a treatment on an
outcome
Allows us to control for differences across
units/people being studied
Sometimes this is called heterogeneity
Panel data allows us to control for unobservables
These differences can be known to the individual but
not to the researcher
ie, tastes, beliefs, abilities, skills, constraints
Problems with panel data

Attrition=>missing data
Correlation (statistical dependence) across
observations from same individual
Conventional methods assume independence
Estimated standard errors tend to be too low, so
test statistics tend to be too high
Conventional parameter estimates may be
statistically inefficient (true standard errors higher
than necessary=> more prone to make type I error
than alpha level suggests)
Methods we can use to correct for

dependence/correlation
1.
2.
3.
4.
5.
Analysis on derived/change score

Robust standard errors
Generalized estimating equations (GEE)
Random effects models
Fixed effects models
(1) Analysis on derived variables

Calculate a simple, focused variable for each
cluster that can be used in a more
straightforward analysis
Eg, calculation of a change score (instead of
before and after)
Some bivariate approaches

1. Analysis of difference scores
2. Repeated measures analysis
Analysis of difference scores
Before and after (paired t-test)
Multivariate approach
Change score as the dependent variable
Regress subsequent outcomes on baseline
controls + baseline outcome (control for initial
value)
Change score: Regression of

difference in birthweight on
centered initial age
. reg delwght cinitage
Source |
SS
df
MS
-------------+-----------------------------Model | 163789.382
1 163789.382
Residual | 82265156.7
198 415480.589
-------------+-----------------------------Total | 82428946.1
199 414215.809
Number of obs
F( 1,
198)
Prob > F
R-squared
Adj R-squared
Root MSE
=
200
=
0.39
= 0.5308
= 0.0020
= -0.0031
= 644.58
-----------------------------------------------------------------------------delwght |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cinitage |
8.891816
14.16195
0.63
0.531
-19.03579
36.81942
_cons |
191.64
45.57854
4.20
0.000
101.7583
281.5217
------------------------------------------------------------------------------
Regression of final bweight on centered

initial (baseline) age, adjusting for first
birthweight
Source |
SS
df
MS
-------------+-----------------------------Model | 10961363.1
2 5480681.54
Residual | 55866154.3
197
283584.54
-------------+-----------------------------Total | 66827517.4
199
335816.67
Number of obs
F( 2,
197)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
200
19.33
0.0000
0.1640
0.1555
532.53
-----------------------------------------------------------------------------lastwght |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------cinitage |
24.90948
11.81727
2.11
0.036
1.604886
48.21408
initwght |
.3628564
.0660366
5.49
0.000
.232627
.4930858
_cons |
2113.619
202.7309
10.43
0.000
1713.817
2513.42
------------------------------------------------------------------------------
Note: in this type of observational study, using baseline value of birthweight as a covariate is not a
reliable way to check the dependence of change in outcome on a covariate (birth order).
OK in randomized studies where there is no dependence between treatment effects and baseline
values.
(2) Robust standard errors

AKA Huber-White standard errors, sandwich
estimates, empirical standard errors
Use vce or cluster option in Stata
No effect on coefficients
Generally higher SE
Involves vector of residuals in calculation of

standard errors
Hence, empirical v. model-based
Robust se continued
Theoretical justification depends on a
moderately large sample
<100 individuals or lots of x-vars might not work
well.
(3) Generalized estimating equations

Fixes inefficiency problem
Takes into account correlation over time in
estimating Se and coefficients
Xtgee command in Stata
GEE continued
Must consider & specify:
Distributional family of outcome (normal, binary,
binomial)
What link function to use?
Logit, identity, log, negative binomial, power, probit
What correlation structure will be used or temporarily

assumed?
Exchangeable
Autoregressive
Unstructured, non-stationary, stationary
Working correlation and robust standard errors
GEE: note on SE
Sensitive to correlation structure that is
specified
Default is exchangeable
Means correlation between dep vars at different
points in time are all the same
Use estat wcorr to see estimated correlations
However, you can add , vce(robust) option to get
robust SE to get SE non dependent on structure
GEE: unstructured SE
, vce(robust) corr(uns)
Wont impose a structure
More confident because you dont impose a
structure that might be incorrect
Downside: a lot of correlation for model to
estimate with many time points
GEE: more on correlation

No goodness of fit measures to tell us which
correlation structure is best
Fit unstructured if not too many time periods
Then do estat wcorr
If no pattern, go with exchangeable (ie, correlation
not changing over time)
GEE: pros and cons

Cannot deal with more than one level of
clustering
With many time points (t>5) estimates of large
number of unique correlations may be unreliable
Autoregressive of order #
Stationary of order #
GEE can handle missing data under assumption

missing completely at random
Use xtset so stata knows which observations come
from which time points
GEE: Hypothesis tests and confidence

intervals
Hypothesis testing with GEE uses wald tests
Coeff/se are approximately normal (forms zstatistics)
Estimate 1.96*se forms confidence interval
References
Allison, Paul. Course notes from Longitudinal data analysis
using Stata. Statistical horizons: Boston, MA. June 2013.
Baum, CF. 2006, An Introduction to Modern Econometrics
Using Stata. Stata Press, College Station, TX.
Vittinghoff, E, Glidden, D V, Shiboski, S C and McCulloch, C E,
2008, Regression Methods in Biostatistics. Springer-Verlag,
New York.
Kacanek et al. (2013) Article
Kacanek et al. (2013)

3
Table 2: Effect of Exposure to IPV on Each Non-Adherence Outcome: GEE Results1
C
C
EP
TE
D
Condom non-adherence in control arm

Condom non-adherence in intervention arm
Diaphragm non-adherence
IPV Indicator
OR (95% CI)
aOR (95% CI)2
OR (95% CI)
aOR (95% CI)2
OR (95% CI)
aOR (95% CI)2
Any IPV (vs. none)
1.47 (1.29-1.65)
1.41 (1.24-1.61)
1.43 (1.25-1.63)
1.47 (1.28-1.69)
1.31 (1.13-1.50)
1.24 (1.06-1.45)
Fear of Violence
(vs. none)
1.49 (1.3-1.71)
1.45 (1.26-1.68)
1.36 (1.18-1.57)
1.37 (1.18-1.59)
1.41 (1.20-1.66)
1.28 (1.07-1.53)
Emotional Abuse (vs.
none)
1.40 (1.22-1.60)
1.34 (1.16-1.54)
1.29 (1.11-1.49)
1.34 (1.16-1.54)
1.31 (1.12-1.55)
1.31 (1.09-1.56)
Physical Violence (vs.
none)
1.44 (1.18-1.76)
1.39 (1.13-1.70)
1.65 (1.33-2.04)
1.66 (1.33-2.08)
1.47 (1.16-1.86)
1.26 (0.97-1.64)
Forced Sex (vs. none) 2.15 (1.74-2.66)
1.99 (1.59-2.48)
1.55 (1.22-1.96)
1.56 (1.22-1.99)
1.30 (1.01-1.68)
1.35 (1.02-1.78)
Any physical or
sexual IPV (vs.
none)
1.75 (1.48-2.07)
1.66 (1.39-1.98)
1.60 (1.34-1.92)
1.66 (1.39-1.98)
1.36 (1.12-1.65)
1.28 (1.03-1.58)
IPV=Intimate partner violence, OR=Odds Ratio (unadjusted), aOR=adjusted Odds Ratio, CI=Confidence Interval, GEE=Generalized estimating equations
1
Separate models fit for each of the six IPV types and indicators
2
Adjusted models controlled for age, site, number of sex partners, male partner infidelity
Table 3: Effect of change in IPV exposure from baseline to 12 months and condom and diaphragm non-adherence at 12 months
Condom non-adherence
in intervention arm
OR (95% CI)
aOR(95% CI)
Diaphragm non-adherence
OR (95% CI)
aOR(95%CI)
Any IPV
Persisting
Incident
Remitting
None
2.33(1.67-3.22)
1.75 (1.15-2.7)
1.47 (1.06-2.04)
1.00
2.2 (1.54-3.1)
1.69(1.08-2.6)
1.38 (0.99-1.94)
1.00
1.54 (1.09-2.17)
1.25 (0.77-2.0)
0.95 (0.68-1.33)
1.00
1.53 (1.06-2.2)
1.2 (0.74-1.95)
0.99 (0.7-1.39)
1.00
2.17(1.56-3.12)
1.45 (0.90-2.27)
1.61 (1.16-2.27)
1.00
2.0 (1.39-2.9)
1.42 (0.88-1.3)
1.51 (1.07-2.1)
1.00
Fear of Violence
Persisting
Incident
Remitting
None
2.08 (1.41-3.03)
2.08 (1.37-3.12)
1.79 (1.28-2.44)
1.00
1.88 (1.25-2.8)
2.0 (1.32-2.1)
1.67(1.2-2.3)
1.00
1.23 (0.81-1.89)
1.61 (1.01- 2.63)
1.09 (0.79-1.51)
1.00
1.22 (0.79-1.89)
1.61 (0.99-2.6)
1.15 (0.82-1.6)
1.64 (1.09-2.5)
2.04 (1.27-3.23 )
1.72 (1.23-2.38)
1.00
1.38 (0.9-2.1)
1.98 (1.23-3.2)
1.55 (1.11-2.2)
1.00
Emotional Abuse
Persisting
Incident
Remitting
None
2.08(1.41-3.03)
1.39 (0.91-2.13)
1.09 (0.77-1.52)
1.00
2.0 (1.37-3.1)
1.25 (0.80-1.95)
1.02 (0.72-1.46)
1.00
1.45 (0.94-2.27)
1.49 (0.92-2.38)
1.19 (0.83-1.72)
1.00
1.36 (0.87-2.1)
1.4 (0.86-2.3)
1.2 (0.82-1.74)
1.00
2.33 (1.47-3.70)
1.41 (0.89-2.22)
1.18 (0.83-2.87)
1.00
2.3 (1.43-3.7)
1.34 (0.83-2.2)
1.09 (0.76-2.58)
1.00
EP
TE
D
Condom non-adherence
in control arm
OR (95% CI)
aOR(95% CI)
AACC
CCEE
PPTT
EEDD
Jago et al. (2005)

Stats - Repeated Measures

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stats - Repeated Measures

Hochgeladen von

Copyright:

Verfügbare Formate

HPH 560

Prof. Tia Palermo

Repeated measures and

Examples of data collection

What are some

___________ across observations

Analysis example: HDL levels

pill type average

Analysis example: LDL cholesterol

Interpreting anova results

Analysis example: LDL

Take into account clustering by subject

How does interpretation change?

Model equations for LDL example

Correlations within subjects

Clustered data may change:

Ignoring correlation can make p-value too

Hierarchical data analysis

Issues with hierarchical data

Analysis strategies for hierarchical data

Longitudinal (panel) data

Why is panel data desirable?

Problems with panel data

Methods we can use to correct for

Analysis on derived/change score

(1) Analysis on derived variables

Some bivariate approaches

Analysis of difference scores

Before and after (paired t-test)

Change score: Regression of

Regression of final bweight on centered

(2) Robust standard errors

Involves vector of residuals in calculation of

(3) Generalized estimating equations

What correlation structure will be used or temporarily

GEE: more on correlation

GEE: pros and cons

GEE can handle missing data under assumption

GEE: Hypothesis tests and confidence

Kacanek et al. (2013) Article

Kacanek et al. (2013)

Table 2: Effect of Exposure to IPV on Each Non-Adherence Outcome: GEE Results1

Condom non-adherence in control arm

Jago et al. (2005)

Das könnte Ihnen auch gefallen