Beruflich Dokumente
Kultur Dokumente
3]
5. 1. Basic concepts
If there are more than 2 treatments the problem is to determine which means
are significantly different. This process can take 2 forms.
*
5. 2. Error rates
Selection of the most appropriate mean separation test is heavily influenced
by the error rate. Recall that a Type I error occurs when one incorrectly
rejects a null true H0.
Example
An experimenter conducts 100 experiments with 5 treatments each.
In each experiment, there are 10 possible pairwise comparisons:
Total possible pairwise comparisons:
t (t 1)
2
COMPUTATION OF EER
The EER is difficult to compute because Type I errors are not independent.
But it is possible to compute an upper bound for the EER by assuming that
the probability of a Type I error in any comparison is and is independent
of the other comparisons. In this case:
45
Partial null hypothesis: Suppose there are 10 treatments and one shows a
significant effect while the other 9 are equal. ANOVA will reject Ho.
x
Yi.
x Y..
There is a probability of
making a Type I error between
each of the 9 similar treatments
The upper bound EER
Treatment number
10
setting t = 9 p=9(9-1)/2=36
EER=0.84
Examples
Table 4.1. Results (mg dry weight) of an experiment (CRD) to determine
the effect of seed treatment by acids on the early growth of rice seedlings.
Treatment
Replications
Control
HCl
Propionic
Butyric
4.23
3.85
3.75
3.66
4.38
3.78
3.65
3.67
4.1
3.91
3.82
3.62
3.99
3.94
3.69
3.54
4.25
3.86
3.73
3.71
Overall
Total
Mean
Yi.
Yi.
20.95
19.34
18.64
18.2
4.19
3.87
3.73
3.64
19
1.0113
Treatment
Exp. error
3
16
0.8738
0.1376
Mean
Squares
0.2912
0.0086
F
33.87
N
1.21
1.34
1.45
1.31
1.19
1.41
1.45
1.32
1.17
1.38
1.51
1.28
1.23
1.29
1.39
1.35
1.29 1.14
1.36 1.42 1.37
1.44
1.41 1.27 1.37
1.32
Overall
Total
6 7.23
8 10.89
5 7.24
7 9.31
26 34.67
Mean
1.20
1.36
1.45
1.33
1.33
df
25
3
22
Sum of
Squares
0.2202
0.1709
0.0493
Mean
Squares
0.05696
0.00224
25.41
EER under the complete null hypothesis: all population means are equal
EER under a partial null hypothesis: some means are equal but some
differ.
SAS subdivides the error rates into:
CER = comparison-wise error rate
EERC = experiment-wise error rate under complete null hypothesis (the
standard EER)
EERP = experiment-wise error rate under a partial null hypothesis.
MEER = maximum experiment-wise error rate under any complete or
partial null hypothesis.
Statistical methods for making two or more inferences while controlling the
probability of making at least one Type I error are called simultaneous
inference methods:
Multiple comparison techniques divide themselves into two groups:
Fixed-range tests: Those which can provide confidence intervals and
tests of hypothesis. One range for testing all differences in balanced
designs
Multiple-stage tests: Those which are essentially only tests of
hypothesis. Variable ranges.
5. 3. 1. Fixed-range tests
These tests provide one range for testing all differences in balanced
designs and can provide confidence intervals.
Many fixed-range procedures are available and considerable controversy
exists as to which procedure is most appropriate.
We will present four commonly used procedures starting from the less
conservative (high power) to the more conservative (lower power):
LSD
Dunnett
Tukey
Scheffe.
MSE
2
for equal r (SAS: LSD test)
r
Mean
4.19
3.87
3.73
3.64
LSD
a
b
c
c
MSE (
1 1
) for unequal r (SAS: repeated t test)
r1 r2
Feed-C
1.33 b
Feed-A
1.36 b
Feed-B
1.45 a
The 5% LSD for comparing the Control vs. Feed-B (1.45-1.20=0.25) is,
1
6
1
5
LSD 0.025= 2.074 0.00224( ) = 0.0595, and | Y i - Y i' | > 0.0595 Reject H0
Because of the unequal replication, the LSD values and the length of the
confidence intervals vary among different pairs of mean comparisons
A vs. Control
A vs. B
A vs. C
B vs. C
C vs. Control
= 0.0531
= 0.0560
= 0.0509
= 0.0575
= 0.0546
General considerations
The LSD test is much safer when the means to be compared are selected
in advance of the experiment.
It is primarily intended for use when there is no predetermined structure
to the treatments (e.g. in variety trials).
The LSD test is the only test for which the error rate equals the
comparison wise error rate. This is often regarded as too liberal (i.e. to
ready to reject the null hypothesis).
It has been suggested that the EEER can be held to the level by
performing the overall ANOVA test at the level and making further
comparisons only if the F test is significant (Fisher's Protected LSD
test). However, it was then demonstrated that this assertion is false if
there are more than three means.
5. 3. 1. 2. Dunnett's Method
Compare a control with each of several other treatments
Dunnett's test holds the maximum EER under any complete or partial
null hypothesis (MEER) < .
MSE
2
, for equal r
r
MSE (
1 1
) for unequal r
r1 r2
From Table 4-1, MSE = 0.0086 with 16 df and p= 3, t*/2=2.59 (Table A9)
2
5
0.152= least significant difference between control and any other treatment.
Control - HC1= 4.19 - 3.87= 0.32. Since 0.32>0.152 (DLSD) Significant!
The 95% simultaneous confidence intervals for all three differences are
computed as = Y o - Y i DLSD. The limits of these differences are,
Control
Control
Control
butyric
HC1
propionic
=
=
=
0.32 0.15
0.46 0.15
0.55 0.15
That is, we have 95% confidence that the three true differences fall
simultaneously within the above ranges.
For unequal replication: Table 5-1
To compare control with feed-C, t*0.025, 22 , p=3= 2.517 (from SAS)
1
6
1
7
5. 3. 1. 3. Tukey's w procedure
Tukey's test was designed for all possible pairwise comparisons.
The test is sometimes called "honestly significant difference test" HSDT
It controls the MEER when the sample sizes are equal.
It uses a statistic similar to the LSD but with a number q,(p, MSE df) that is
obtained from Table A8 (distribution (Y MAX Y MIN ) / sY )
w = q,(p, MSE df)
MSE
r
MSE (
1 1
) / 2 for unequal r (SAS manual)
r1 r2
0.0086
= 0.168 (Note that w= 0.168 > DLSD= 0.152)
5
Tukey critical value is larger than that of Dunnett because the Tukey family
of contrasts is larger (all pairs of means).
Table 4.1
Treatment
Control
HCl
Propionic
Butyric
Mean
4.19
3.87
3.73
3.64
a
b
b c
c
For unequal r, as in Table 5.1, the contrast between control with feed-C,
1
6
1
7
5. 3. 1. 4. Scheffe's F test
Scheffe's test is compatible with the overall ANOVA F test in that it
never declares a contrast significant if the overall F test is nonsignificant.
Scheffe's test controls the MEER for ANY set of contrasts including
pairwise comparisons.
Since this procedure allows for more kinds of comparisons, it is less
sensitive in finding significant differences than other pairwise
comparison procedures.
For pairwise comparisons with equal r, the Scheffe's critical difference
SCD has a similar structure as that described for previous tests.
2
for equal r
r
MSE
MSE (
1 1
) for unequal r
r1 r2
From Table 4-1, MSE = 0.0086 with dfTR= 3, dfMSE= 16, and r= 5
SCD 0.05= 3 * 3.24
Treatment
Control
HCl
Propionic
Butyric
Mean
4.19
3.87
3.73
3.64
0.0086
2
= 0.183 (Note that SCD= 0.183 > w= 0.168)
5
a
b
b c
c
1 1
3 * 3.05 0.00224( ) = 0.0796
6 7
Y i. with ci 0 (or
We will reject the hypothesis (Ho) that the contrast Q= 0 if the absolute
value of Q is larger than a critical vale FS.
This is the general form for Scheffe's F test:
Critical value FS = df TR F (df TR , df MSE )
ci2
MSE
ri
ci2
2/ r .
ri
Example
If we want to compare the control vs. the average of the three acids in Table
4.1, the contrast coefficients +3 -1 -1 1 are multiplied by the MEANS.
Q= 4.190 * 3 + 3.868 * (-1) + 3.728 * (-1) + 3.640 * (-1)= 1.334
Critical value:
FS 0.05, (3, 16) = 3 * 3.24 0.0086(32 ( 1) 2 ( 1) 2 ( 1) 2 ) / 5 = 0.4479
Q> Fs, therefore we reject Ho. The control (4.190-mg) is significantly
different from the average of the three acid treatments (3.745-mg).
11
5. 3. 2. Multiple-stage tests
Multiple range tests should only be used with balanced designs since they are
inefficient with unbalanced ones.
Smallest
Y1
p =2
p =3
Y2
p =2
p =4
Y3
p =2
p =3
Y4
Largest mean
12
MSE
r
p= t, t-1, ..., 2
3.0
3.65
4.05
Wp
Treatment
Control
HCl
Propionic
Butyric
Mean
4.19
3.87
3.73
3.64
Wp
a
b
c
c
Assume the following partial null hypothesis (means from smallest to largest)
Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10
There are 5 NS independent pairs compared by LSD
EERP=1-(1-0.05)5=0.23 Higher EERP than Tukey!
13
p<3
In Table 4.1
q(p; p, dfMSE)
MSE
r
2
0.0253
3.49
(p, 16)
Critical value
4
0.05
3.65
0.05
4.05
0.145
0.151
0.168
>SNK
=SNK
=SNK
Mean
4.19
3.87
3.73
3.64
F5
a
b
b c
c
14
Data
10, 17
10, 11, 12, 12, 13, 14, 15, 16, 16, 17, 18
14, 15, 16, 16, 17, 18, 19, 20, 20, 21, 22
16,21
Mean
13.5
14.0
18.0
18.5
NS
15