Beruflich Dokumente
Kultur Dokumente
Analysis of Variation
Math 243 Lecture
R. Pruim
[7.25]
[8.875]
[10.11]
Informal Investigation
Graphical investigation:
side-by-side box plots
multiple histograms
Whether the differences between the groups are
significant depends on
the difference in the means
the standard deviations of each group
the sample sizes
ANOVA determines P-value from the F statistic
days
10
9
8
7
6
5
A
treatment
Assumptions of ANOVA
each group is approximately normal
check
Normality Check
We should check for normality using:
assumptions about population
histograms for each group
normal quantile plot for each group
With such small data sets, there really isnt a
really good way to check normality from data,
but we make the common assumption that
physical measurements of people tend to be
normally distributed.
treatment
A
B
P
N
8
8
9
Mean
7.250
8.875
10.111
Median
7.000
9.000
10.000
StDev
1.669
1.458
1.764
(x i x )
(x
xi )
ij
Between MSG
F=
=
Within
MSE
A large F is evidence against H0, since it
indicates that there is more difference
between groups than within groups.
F
6.45
P
0.006
R ANOVA Output
treatment
Residuals
( xij xi )
SUMMARY
Groups
Column 1
Column 2
Column 3
Count
6.884
df
MS
F
P-value F crit
2 2.563667 10.21575 0.008394 4.737416
7 0.250952
9
group
mean
6.00
6.00
6.00
5.95
5.95
5.95
5.95
7.53
7.53
7.53
WITHIN
difference:
data - group mean
plain
squared
-0.70
0.490
0.00
0.000
0.70
0.490
-0.45
0.203
0.25
0.063
0.45
0.203
-0.25
0.063
-0.03
0.001
-0.33
0.109
0.37
0.137
1.757
0.25095714
BETWEEN
difference
group mean - overall mean
plain
squared
-0.4
0.194
-0.4
0.194
-0.4
0.194
-0.5
0.240
-0.5
0.240
-0.5
0.240
-0.5
0.240
1.1
1.188
1.1
1.188
1.1
1.188
5.106
2.55275
F = 2.5528/0.25025 = 10.21575
F
6.45
P
0.006
(x
ij
obs
xi )
(x
ij
x)
obs
F
6.45
(x
obs
P
0.006
x)
F = MSG / MSE
(P-values for the F statistic are in Table E)
F
6.45
P
0.006
P-value
comes from
F(DFG,DFE)
So How big is F?
Since F is
(
=
ij
n 1
SST
=
= MST
DFT
(
x
xi )
ij
ni 1
2
i
2
2
2
I
SSE
2
sp =
= MSE
DFE
so MSE is the
pooled estimate
of variance
In Summary
SST = (x ij x ) = s (DFT)
2
obs
SSE = (x ij x i ) = si (df i )
2
obs
groups
SSG = (x i x) =
2
obs
n (x
i
x)
groups
SS
MSG
SSE +SSG = SST; MS =
; F=
DF
MSE
R2 Statistic
R2 gives the percent of variance due to between
group variation
SS[Between ] SSG
R =
=
SS[Total ]
SST
2
Level
A
B
P
N
8
8
9
Pooled StDev =
Mean
7.250
8.875
10.111
1.641
StDev
1.669
1.458
1.764
F
6.45
P
0.006
Multiple Comparisons
Once ANOVA indicates that the groups do not all
have the same means, we can compare them two
by two using the 2-sample t test
We need to adjust our p-value threshold because we
are doing multiple tests with the same data.
There are several methods for doing this.
If we really just want to test the difference between one
pair of treatments, we should set the study up that way.
95% confidence
Family error rate = 0.0500
Individual error rate = 0.0199
-3.685
0.435
-4.863
-0.859
Only P vs A is significant
(both values have same sign)
Tukeys Method in R
Tukey multiple comparisons of means
95% family-wise confidence level
diff
lwr
upr
B-A 1.6250 -0.43650 3.6865
P-A 2.8611 0.85769 4.8645
P-B 1.2361 -0.76731 3.2395