Sie sind auf Seite 1von 89

January 7, 2009 -

morning session
1
Statistics Micro Mini

Multi-factor ANOVA


January 5-9, 2008

Beth Ayers
January 7, 2009 -
morning session
2
Thursday Sessions
ANOVA
One-way ANOVA
Two-way ANOVA
ANCOVA
With-in subject
Between subject
Repeated measures
MANOVA
etc.
January 7, 2009 -
morning session
3
What is ANOVA?
ANalysis Of VAriance

Partitions the observed variance based on
explanatory variables

Compare partitions to test significance of
explanatory variables

January 7, 2009 -
morning session
4
Some Terminology
Between subjects design each subject
participates in one and only one group

Within subjects design the same group of
subjects serves in more than one treatment
Subject is now a factor

Mixed design a study which has both between
and within subject factors

Repeated measures general term for any study
in which multiple measurements are measured
on the same subject
Can be either multiple treatments or several
measurements over time
January 7, 2009 -
morning session
5
ANOVA
Use variances and variance like
quantities to study the equality or non-
equality of population means

So, although it is analysis of variance we
are actually analyzing means, not
variances

There are other methods which analyze
the variances between groups
January 7, 2009 -
morning session
6
ANOVA
Typical exploratory analysis includes
Tabulation of the number of subjects in each
experimental group

Side-by-side box plots

Statistics about each group
At least mean and standard deviation, can
include 5-number summary and information on
skewness

Table of means for each experimental group
January 7, 2009 -
morning session
7
Notation
If we have k groups, denote the means
of the groups as:

1
,
2
, . . .,
k

Student i in group j has observation
y
ij
=
j
+
ij

Where
ij
are independent, distributed N(0,
2
)
Can combine this and say subjects from
group j have distribution N(
j
,
2
)

With random assignment, the sample
mean for any treatment group is
representative of the population mean
for that group

January 7, 2009 -
morning session
8
Assumptions
1. The errors
ij
are normally distributed

2. Across the conditions, the errors have
equal spread. Often referred to as
equal variances.
Rule of thumb: the assumption is met if the
largest variance is less than twice the
smallest variance
If unequal variances need to make a
correction!! This is usually /2.

3. The errors are independent from each
other
January 7, 2009 -
morning session
9
Checking the assumptions
Use the residuals, which are the
estimates of
ij

1. Look at normal probability plot
2. Look at residual versus fitted plot
3. Hard to check, often assumed from study
design

For mild violations of the assumptions,
there are options for correction

When the assumptions are not
met the p-value is simply
wrong!!
January 7, 2009 -
morning session
10
One-way ANOVA
One-way ANOVA is used when
Only testing the affect of one explanatory
variable
Each subject has only one treatment or
condition
Thus a between-subjects design

Used to test for differences among two
or more independent groups

Gives the same results as two-sample T-
test if explanatory variable has 2 levels
January 7, 2009 -
morning session
11
Hypothesis Testing
H
0
:
1
=
2
= . . . =
k


H
1
: the s are not all equal

The alternative hypothesis H
1
:
1

k
is wrong!

The null hypothesis is called the overall null and is
the hypothesis tested by ANOVA

If the overall null is rejected, must do more specific
hypothesis testing to determine which means are
different, often referred to as contrasts
January 7, 2009 -
morning session
12
Terminology
The sample variance is the sum of the squared
deviations from the mean divided by the degrees of
freedom



A mean square (MS) is a variance like quantity
calculated as a SS/df

df
SS
MS
1
) (
s
2
2

N
x x
i
January 7, 2009 -
morning session
13
One-way ANOVA
In one-way ANOVA we work with two mean
square quantities

MS
within
the mean square within-groups
MS
between
the mean square between-groups





between
between
between
df
SS
MS
within
within
within
df
SS
MS
January 7, 2009 -
morning session
14
Within vs. Between
January 7, 2009 -
morning session
15
One-way ANOVA
For each individual group we have



So the estimate of MS
within
is




And the estimate of MS
between
is


1
) (
df
SS
MS
1
2
between
between
between

k
x x n
k
i
i i



k
i
i
k
i
i
k N n
SS
1
1
within
within
within
) 1 (
df
SS
MS
1
) (
df
SS
1
2
i
i


i
n
j
i ij
n
x x
i
January 7, 2009 -
morning session
16
Mean Squares
What do these values mean?

MS
within
is considered a true estimate of
2
that is
unaffected by whether the null or alternative
hypothesis is true

MS
between
is considered a good estimate of
2
only
when the null hypothesis is true
If the alternative is true, values of MS
between
tend to be
inflated

Thus, we can look at the ratio of the two mean
square values to evaluate the null hypothesis
January 7, 2009 -
morning session
17
Testing the Hypothesis
The F-test looks at the variation among the group
means relative to the variation within the sample






The F-statistic tends to be larger if the alternative
hypothesis is true than if the null hypothesis is true

The test statistic F has an F(k-1, N-k) distribution

) (
) 1 (
k N
SS
k
SS
df
SS
df
SS
MS
MS
F
within
between
within
within
between
between
within
between


January 7, 2009 -
morning session
18
What does the F ratio tell us?
F = MS
between
/ MS
within

The denominator is always an estimate of
2
(under both the null and alternative hypotheses)

The numerator is either another estimate of
2

(under the null) or is inflated (under the
alternative)

If the null is true, values of F are close to 1

If the alternative is true, values of F are larger

Large values of F depend on the degrees of
freedom
January 7, 2009 -
morning session
19
The ANOVA table
When running an ANOVA, statistical packages will
return an ANOVA table summarizing the SS, MS, df,
F-statistic, and p-value
SS Df MS F Sig
Group
(Treatment,
between)

SS
between

df
between

MS
between
MS
between
_________________
MS
within

P-value
Residual
(Error,
within)

SS
within

df
within

MS
within
Total SS
between

+ SS
within
df
between

+ df
within

January 7, 2009 -
morning session
20
Example
Suppose we want to know if typing
speed varies across majors

Use 4 majors Biology, Business,
English, and Mathematics

H
0
: typing speed is the same for
students of all majors
H
0
:
Bio
=
Business
=
Eng
=
Math


H
1
: typing speed varies across the
majors
H
1
: at least one of the means is different
January 7, 2009 -
morning session
21
Box plots
January 7, 2009 -
morning session
22
Summary
Major n
i
Mean Variance
Biology 25 45.3 24.7
Business 25 47.6 25.4
English 25 55.6 38.8
Mathematics 25 45.1 20.1
The largest variance is less than twice the smallest
variance (38.8 < 2 20.1 = 40.2). Use = 0.05.
January 7, 2009 -
morning session
23
Degrees of Freedom
How many groups do we have?

What is the sample size?

Using these values:
What is df
within
?

What is df
between
?
January 7, 2009 -
morning session
24
Degrees of Freedom
How many groups do we have?
There are k = 4 groups Biology, English,
Business, and Mathematics

What is the sample size?
There are N = 100 students

Using these values,
What is df
between
?
k 1 = 4 1 = 3
What is df
within
?
N k = 100 4 = 96
January 7, 2009 -
morning session
25
Sample Output
SS Df MS F Sig
Group
(Treatment,
between)
1807.49 3 602.50 22.091 0.000
Residual
(Error,
within)
2618.20 96 27.17
Total 4425.69 99
Our estimate of
2
is 27.17

The numerator MS = 602.5 and appears
to be highly inflated
January 7, 2009 -
morning session
26
Results
F-statistic = 22.1
P-value: <0.0005

Conclusion the average words per
minute differs for at least one of the
majors

To make stronger statements need to do
further testing
January 7, 2009 -
morning session
27
Checking the assumptions
January 7, 2009 -
morning session
28
Further Analysis
If H
0
is rejected, we conclude that not all
the s are equal

Would like to make statements about
where there are differences

Can use planned or unplanned
comparisons (or contrasts)
Planned comparisons are interesting
comparisons decided on before analysis
Unplanned comparisons occur after seeing the
results
Be careful not to go fishing for results
January 7, 2009 -
morning session
29
Contrasts
A simple contrast hypothesis compares
two population means
H
O
:
1
=
5


A complex contrast hypothesis has
multiple population means on either side
H
0
: (
1
+
2
) / 2 =
3

H
0
: (
1
+
2
) / 2 = (
3
+
4
+
5
) / 3


January 7, 2009 -
morning session
30
Planned Comparisons
Most statistical packages allow you to
enter custom planned contrast
hypotheses

The p-values are only valid under strict
conditions
The conditions maintain Type-1 error rate
across the whole experiment

Computer packages assume that you
have checked the assumptions of the
ANOVA test
January 7, 2009 -
morning session
31
Conditions for Planned
Comparisons
Contrasts are selected before looking at the
residuals, they are planned not post-hoc

Must be ignored if the overall null is not rejected!

Each contrast is based on independent
information from other contrasts

The number of planned comparisons must not be
more than the corresponding degrees of freedom
(k-1 in one-way ANOVA)
January 7, 2009 -
morning session
32
Unplanned Comparisons
What if we notice a possible interesting
difference when looking at the results?

Can do comparisons but need to adjust
the -level to control for Type-1 error

One common method is to use Tukeys
simultaneous confidence intervals to
calculate any and all pairs of group
population means
This procedure takes multiple comparisons
into consideration to preserve the level
January 7, 2009 -
morning session
33
Other Options
Bonferroni correction for the number of
comparisons done

Dunnetts tests

Scheffe procedure
January 7, 2009 -
morning session
34
Tukeys Multiple Comparisons for
previous example
January 7, 2009 -
morning session
35
Conclusions
In the table on the previous page,
1 = Biology, 2 = Business, 3 = English,
4 = Mathematics

Biology, Business, and Mathematics are
all are significantly different from English

There are no other significant
differences

January 7, 2009 -
morning session
36
Additional sample output
Below is the same output from a
different software package

January 7, 2009 -
morning session
37
Comparison to Regression
Sample regression output
Which major is our baseline?
January 7, 2009 -
morning session
38
Comparison to Regression
F-statistic = 22.1, p-value < 0.0005
This is the same F-statistic and p-value as the
ANOVA on slide 25

At least one of the explanatory variables
is important in this corresponds to the
rejection of the null, at least one of the
means differs
January 7, 2009 -
morning session
39
Comparison to Regression
Note that Biology is the baseline and 45.3 is the
mean WPM for Biology students

Note that Business and Mathematics are not
significant

Agrees with post-hoc comparisons that neither
Business or Mathematics is significantly different
from Biology, but English is not

To make further conclusions will need to look at
multiple comparisons, such as the previous
Tukey intervals

January 7, 2009 -
morning session
40
Regression
The conclusions about the overall null
hypothesis will be the same

In regression can make statements
comparing groups to baseline

To make more conclusive statements
will need to do more analysis

ANOVA and either planned or post-hoc
comparisons will do the same thing and
is often easier
January 7, 2009 -
morning session
41
One-way ANOVA Power
Two different SAT prep courses charge
$1200 for a two month course. An
(unethical) experiment would be to
randomize students into one of the two
courses or take no course

What information is needed to calculate
power for this one-way ANOVA?
Sample size
Within group variance (
2
)
Estimated or minimally interesting outcome
means for each group
January 7, 2009 -
morning session
42
Estimate of
2
Based on previous years, we know that
95% of the student scores on SATs fall
between 900 and 1500

= (1500-900)/4 = 150


2
= 150^2

January 7, 2009 -
morning session
43
Minimally interesting outcome
What is the minimally average benefit,
in points gained, that would justify the
program?
The minimally interesting outcome is based
on previous knowledge

For this example well try several
different values



January 7, 2009 -
morning session
44
sd[treatment]
Different applets will define things slightly
different. Find an applet you understand.

For the applet I will show you, they require
sd[treatment]. From their definition this is
calculated as



Where
i
is the i
th
group mean
k = the number of groups

Ready to go to power applet

1 - k
) (
nt] sd[treatme
k
1 i
2


i
January 7, 2009 -
morning session
45
Calculating the power
Let = 150, n = 50, effect = 50 points
Power = 0.3811

Let = 150, n = 100, effect = 50 points
Power = 0.6772

Let = 150, n = 50, effect = 100 points
Power = 0.9367

Let = 150, n = 50, effect = 25 points
Power = 0.1245

January 7, 2009 -
morning session
46
Calculating the power
Let = 100, n = 50, effect = 50 points
Power = 0.7276

Let = 100, n = 100, effect = 50 points
Power = 0.9622

Let = 100, n = 50, effect = 100 points
Power = 0.997

Let = 100, n = 50, effect = 25 points
Power = 0.2294

January 7, 2009 -
morning session
47
Moving past One-way ANOVA
What if we have two categorical explanatory
variables?

What if we have categorical and quantitative
explanatory variables?

What if subjects have more than one treatment?

What if there is more than one response
variable?

And many other combinations
January 7, 2009 -
morning session
48
Two-way ANOVA
Suppose we now have two categorical
explanatory variables

Is there a significant X
1
effect?
Is there a significant X
2
effect?
Are there significant interaction effects?

If X
1
has k levels and X
2
has m levels,
then the analysis is often referred to as
a k by m ANOVA or k x m ANOVA

January 7, 2009 -
morning session
49
Terminology
If the interaction is significant, the
model is called an interaction model

If the interaction is not significant, the
model is called an additive model

Explanatory variables are often referred
to as factors
January 7, 2009 -
morning session
50
Assumptions
The assumptions are the same as in
One-way ANOVA

1. The errors
ij
are normally distributed

2. Across the conditions, the errors have equal
spread. Often referred to as equal variances.

3. The errors are independent from each other
January 7, 2009 -
morning session
51
Two-way ANOVA
Two-way (or multi-way) ANOVA is an
appropriate analysis method for a study
with a quantitative outcome and two (or
more) categorical explanatory variables.

The assumptions are Normality, equal
variance, and independent errors.
January 7, 2009 -
morning session
52
Results
Results are again displayed in a ANOVA
table

Will have one line for each term in the
model. For a model with two factors, we
will have one line for each factor and
one line for the interaction. We will also
have a line for the error and the total.

See next page.
January 7, 2009 -
morning session
53
The ANOVA table
SS df MS F Sig
Factor 1 k-1
Factor 2 m-1
Interaction (k-1)(m-1)
Error N-k*m *
Total N-1
The MS(error), denoted by * in the above table, is
the true estimate of
2

The MS in each row is that rows SS/df

The F-statistic is the MS/MS(error)
January 7, 2009 -
morning session
54
Exploratory Analysis
Table of means

Interaction or profile plots
An interaction plot is a way to look at
outcome means for two factors
simultaneously
A plot with parallel lines suggests an additive
model
A plot with non-parallel lines suggests an
interaction model
Note that an interaction plot should NOT be
the deciding factor in whether or not to run
an interaction model
January 7, 2009 -
morning session
55
Example
Continuing with the previous example,
suppose wed like to add gender as an
explanatory variable

X
1
: Major 4 levels
X
2
: Gender 2 levels

Response: words per minute typed

We will fit an 4 by 2 ANOVA
January 7, 2009 -
morning session
56
Table of Means and Counts
Male Female Overall
Biology 45.5 45.2 45.4
Business 48.6 46.9 47.6
English 55.3 55.9 55.6
Mathematics 45.6 44.6 45.1
Overall 48.9 47.9 48.4
Male Female
Biology 14 11
Business 10 15
English 14 11
Mathematics 12 13
Note, this
table should
also include
the standard
error of each
of the means.
January 7, 2009 -
morning session
57
Interaction plots
January 7, 2009 -
morning session
58
Interaction plots
There are two ways to do an interaction plot.
Both are legitimate. Ease of interpretation is the
final criteria of which to do.

If one explanatory variable has more levels than
the other, interpretation is often easier if the
explanatory variable with more levels defines the
x-axis

If one explanatory variable is quantitative but
has been categorized and the other is
categorical, interpretation is often easier if the
categorized quantitative variable defines the x-
axis.
Example: age, 20-29, 30-39, 40-49, etc.
January 7, 2009 -
morning session
59
Results
Typical output:






The last column contains the p-
values
Always check interaction first!
If the interaction is not significant, rerun
without it
January 7, 2009 -
morning session
60
Results
Updated results







Now we can interpret the main effects. We can
see that major is significant but that gender is
not.
January 7, 2009 -
morning session
61
Checking the assumptions
January 7, 2009 -
morning session
62
Notes
If the interaction is significant, do not
check the main effects. The main
effects should always be kept if the
interaction is significant.

Note that due to the groups of students,
you will see vertical lines in the residual
versus predicted plot. This is due to the
fact that all students with a particular
combination of the factors will have the
same predicted value.
January 7, 2009 -
morning session
63
Example 2
Using the same variables, lets look at a
different outcome
January 7, 2009 -
morning session
64
Table of Means Example 2
Male Female Overall
Biology 37.9 45.8 41.2
Business 39.9 45.0 43.0
English 45.3 60.0 51.8
Mathematics 41.8 50.0 46.1
Overall 41.3 49.8 51.2
January 7, 2009 -
morning session
65
Typical SPSS Exploratory Analysis
January 7, 2009 -
morning session
66
Interaction plots Example 2
January 7, 2009 -
morning session
67
Results Example 2
Results






Note that the interaction is significant
In this case both main effects are also
significant, however since the interaction is
significant we would keep them even if they
were not
January 7, 2009 -
morning session
68
Example 2
January 7, 2009 -
morning session
69
Example 2
January 7, 2009 -
morning session
70
Example 3
Again, using the same variables, lets
look at a different outcome
January 7, 2009 -
morning session
71
Table of Means Example 3
Male Female Overall
Biology 47.9 47.2 47.6
Business 50.2 48.1 49.0
English 54.8 62.1 58.1
Mathematics 52.0 48.4 50.1
Overall 51.3 51.1 58.0
January 7, 2009 -
morning session
72
Interaction Plots Example 3
January 7, 2009 -
morning session
73
Results Example 3
Results






In this case, the interaction and major
are significant, but gender is not.
Since the interaction is significant, leave
gender in the model.
January 7, 2009 -
morning session
74
Example 3
January 7, 2009 -
morning session
75
Example Ginkgo for Memory
A study was performed to test the
memory effects of the herbal medicine
Ginkgo biloba in healthy people.
Subjects received a daily dosage
(placebo, 120mg, 250mg) for two
months. Subjects also received one of
two types of mnemonic training. All
subjects were given a memory test
before the study and again at the end.
The response variable is the difference
(after before) in memory test scores.
There were 18 subjects randomly
assigned to each combination of levels.
January 7, 2009 -
morning session
76
Exploratory Analysis
January 7, 2009 -
morning session
77
Exploratory Analysis

January 7, 2009 -
morning session
78
SPSS ANOVA output
Conclusions?
January 7, 2009 -
morning session
79
ANOVA output
Conclusions?
January 7, 2009 -
morning session
80
Estimated Profile Plot
January 7, 2009 -
morning session
81
Post-hoc Comparisons
Since there are only two levels of
training and there is a significant
training effect, we dont need multiple
comparisons for training
January 7, 2009 -
morning session
82
Residual plot
No problems
January 7, 2009 -
morning session
83
Further Analysis
If there had been an interaction, we
could create a table indicating which
differences were significant
January 7, 2009 -
morning session
84
ANCOVA
Analysis of Covariance

At least one quantitative and one categorical
explanatory variable

In general, the main interest is the effects of
the categorical variable and the quantitative
variable is considered to be a control variable

It is a blending of regression and ANOVA
January 7, 2009 -
morning session
85
Example
Suppose that we have two different
math tutors and would like to compare
performance on the final math test

We also have time on tutor and would
like to use that as another explanatory
variable
January 7, 2009 -
morning session
86
Exploratory Analysis
January 7, 2009 -
morning session
87
Compare Regression and ANCOVA
Regression






ANCOVA

January 7, 2009 -
morning session
88
Compare Regression and ANOVA
Note that the p-value for the interaction
is the same in both models

The interaction is not significant, drop
and rerun
89
Compare Regression and ANOVA
Regression







ANCOVA