Sie sind auf Seite 1von 13

Repeated-Measures ANOVA

PSY 211
11-26-07

HW#9 (Computer Assignment) due Wednesday

A. Similarities to t-tests

 ANOVA is like the between-group t-test but can


include more than two groups
o Anxiety treatment study comparing
medication, exercise, and therapy
 Repeated-measures ANOVA is like the
repeated-measures (within-subject) t-test but
can include more than two time points
o Physical therapy study examining pain
levels at pre-treatment, post-treatment, and
3-month follow-up

B. Examples of Repeated-measures t-tests

Pain
Ignore the dashed
lines. The red line
shows a repeated-
measures ANOVA
for life satisfaction
across 10 different
time points among
people diagnosed
with a disability.

Two different
Comparison Group repeated-measures
ANOVAs: The
comparison group
(blue line) has a non-
significant ANOVA,
whereas the
Schizotypy group
High Schizotypy (red line) has a
significant ANOVA.
C. Hypothesis Testing

 Example of hypothesis testing, where a


researcher is examining how the dependent
variable (e.g. pain score) differs across three
time points:

H0: μ1 = μ2 = μ3 or The subscripts


At the population level, average scores will 1, 2, and 3
not differ across any time point. signify different
time points.

HA: At the population level, the mean for at least one


time point will significantly differ from the others

 Note: HA will be true under a wide variety of


conditions…

μ1 ≠ μ2 ≠ μ3 (all means are different)

μ1 = μ3, but μ2 is different (at least one mean differs)


 Alternative hypothesis is true if any of the means
significantly (reliably) differ from the others

 Just like ANOVA uses the F statistic to test


whether there are between-group differences,
the repeated-measures ANOVA uses the F
statistic to examine whether means differ across
time points

D. Significance Testing with the F Statistic

 Remember from last time that the


(between-group) ANOVA uses the following
rationale for the F statistic

F = total amount of variability in scores


variability due to chance

 F is near 1 if non-significant, large if significant


 The rationale for the repeated-measures ANOVA
is very similar, but this approach is a bit more
powerful
 As the F formula states, F tends to be larger
when chance variability (a.k.a. “error”) is low
 There are two main sources of chance
variability:
1. Individual differences, such as personality
factors that impact how different people will
respond
2. Uncontrolled environmental variables, such
as temperature, time of day, the
experimenter’s gender, posters on the wall,
weather, and the hundreds of other small
factors beyond the experimenter’s control
that might inadvertently impact the
participant
 The repeated-measures ANOVA is unique in that
it controls for the first source of error (individual
differences) because each participant acts as
their own control
 The basic rationale for the F statistic is the
same, but by controlling for this source of error,
there is a good chance that F will be larger;
the study has greater power, and fewer
participants are needed
Basic formula is the
same, but error due
F = total amount of variability in scores to individual
variability due to chance differences has been
controlled for
through complex
statistical techniques.

 Let’s consider an example…


Pain scores: Repeated-measures treatment study Variability due
to chance =
Participant Intake Discharge Follow-up average amount
Ellen 10 5 5 of variability
Jim 8 3 3 within each
Andrea 9 4 5 time point
Brian 9 3 4
Corey 8 5 6

Total
Variability =
F = total amount of variability in scores Total amount
variability due to chance
of differences
in scores
The total variability is probably large because pain
scores differ by a maximum of about 7 points across
all people and time points

The variability due to chance (error, or variability


within each time point) is smaller, with most scores
differing within each time point by about 3 points

F = big number = big F, so probably significant


small number

Because the repeated-measures design controls for


some of the individual differences (such as Jim having
less overall pain than Ellen), the chance variability in
the F formula is probably even less than it appears
when we try to eyeball it!

F = big number = huge F, so probably significant!


tiny number

 As you can see, it is very difficult to estimate


F by visually inspecting the data
E. Post hoc Tests

 Unfortunately, even if F is statistically significant,


it only tells us that somewhere there is a mean
difference for at least one of the time points
 Post hoc tests are secondary statistical tests
used to clarify which differences are statistically
significant (kind of like doing several t-tests all at
the same time)
 Looking at the above data, we might guess…
o Intake differs from Discharge
o Intake differs from Follow-up
o Discharge and Follow-up do not differ
 The F test tells us whether any groups differ,
and the post hoc tells test us how the groups
differ (e.g. at Intake, scores are reliably higher)

 When doing a repeated-measures ANOVA,


there are the same options for post hoc tests as
when doing a (regular) ANOVA
 Remember: We’ll use the Least Significant
Difference test, but some researchers prefer
others
F. SPSS Examples

 Repeated-measures ANOVA is a bit complicated


to run in SPSS, so it is not a requirement of PSY
211
 See the following web site if you would ever like
to run one:
http://academic.reed.edu/psychology/RDDAweb
site/spssguide/anova.html

 Reading and understanding the Output is less


difficult, so I will give you some Output to
interpret
Descriptive Statistics

Mean Std. Deviation N


Intake 7.8500 1.56525 20
Discharge 5.3000 2.36421 20
Follow-up 5.4500 2.06410 20
Tests of Within-Subjects Effects

Measure: MEASURE_1
Type III Sum
Source of Squares df Mean Square F Sig.
factor1 Sphericity Assumed 81.900 2 40.950 27.412 .000
Greenhouse-Geisser 81.900 1.348 60.768 27.412 .000
Huynh-Feldt 81.900 1.414 57.933 27.412 .000
Lower-bound 81.900 1.000 81.900 27.412 .000
Error(factor1) Sphericity Assumed 56.767 38 1.494
Greenhouse-Geisser 56.767 25.607 2.217
Huynh-Feldt 56.767 26.860 2.113
Lower-bound 56.767 19.000 2.988

Pairwise Comparisons Post hoc test


Measure: MEASURE_1

Mean 95% Confidence Interval for


a
Difference Difference
a
(I) factor1 (J) factor1 (I-J) Std. Error Sig. Lower Bound Upper Bound
1 2 2.550* .489 .000 1.526 3.574
3 2.400* .387 .000 1.591 3.209
2 1 -2.550* .489 .000 -3.574 -1.526
3 -.150 .244 .545 -.660 .360
3 1 -2.400* .387 .000 -3.209 -1.591
2 .150 .244 .545 -.360 .660
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no
adjustments).

Red Box: Overview of descriptive statistics for each


time point

Green Box: These are the degree of freedom (df)


values. Remember, the critical t value depended on
the df value. The critical F value depends on two df
values, one which is related to group size, and one
which is related to sample size. These values help
determine the critical F value, used to determine
whether a finding will be significant, and they are used
in the APA-style write-up too.

Blue Box: F value and p value. If p is less than .05


(p<.05), the finding is significant, indicating that there
is at least one reliable mean difference across time
points.

Orange Box: These values indicate the mean


differences across two time points. For example, the
first indicates a mean differences of 2.55 points
between time 1 (Intake) and time 2 (Discharge). The
next number indicates that there is a 2.40 point
difference between time 1 (Intake) and time 3
(Follow-up). Stars (*) indicate a statistically significant
(reliable) difference.

Pink Box: These are the precise p values for the post
hoc tests. If p<.05, the difference is statistically
significant. All mean differences in the Orange Box
that had a star also have a p<.05.

APA Style Write-up:


Pain scores significantly differed across time, F(2, 38)
= 27.41, p < .05. A post hoc LSD test indicated that
pain scores at discharge and follow-up were
significantly lower than at intake.
A non-significant result….
Descriptive Statistics

Mean Std. Deviation N


Intake 7.8000 1.70448 20
Discharge 7.3000 1.65752 20
Follow-up 7.2000 1.82382 20

Tests of Within-Subjects Effects

Measure: MEASURE_1
Type III Sum
Source of Squares df Mean Square F Sig.
factor1 Sphericity Assumed 4.133 2 2.067 .959 .392
Greenhouse-Geisser 4.133 1.613 2.562 .959 .377
Huynh-Feldt 4.133 1.741 2.374 .959 .383
Lower-bound 4.133 1.000 4.133 .959 .340
Error(factor1) Sphericity Assumed 81.867 38 2.154
Greenhouse-Geisser 81.867 30.653 2.671
Huynh-Feldt 81.867 33.075 2.475
Lower-bound 81.867 19.000 4.309

Pairwise Comparisons

Measure: MEASURE_1

Mean 95% Confidence Interval for


a
Difference Difference
a
(I) factor1 (J) factor1 (I-J) Std. Error Sig. Lower Bound Upper Bound
1 2 .500 .516 .344 -.579 1.579
3 .600 .520 .263 -.489 1.689
2 1 -.500 .516 .344 -1.579 .579
3 .100 .332 .766 -.594 .794
3 1 -.600 .520 .263 -1.689 .489
2 -.100 .332 .766 -.794 .594
Based on estimated marginal means
a. Adjustment for multiple comparisons: Least Significant Difference (equivalent to no
adjustments).

Pain scores did not significantly differ across time,


F(2, 38) = 0.96, ns. Thus, the treatment was
unsuccessful.

Das könnte Ihnen auch gefallen