Beruflich Dokumente
Kultur Dokumente
0 ANOVA AND EXPERIMENTAL DESIGN
eg: We use ANOVA to compare chlorophyll contents of 4 kinds of leaves: rambutan,
durian, ciku and cempedak.
Hypothesis :
Ho:
Ha: At least one mean is different.
In this case we use Completely Randomized Design
6.1 Completely Randomized Design
eg: In the example above, for each tree we analyse 3 leaves and each leaf taken
randomly.
6.1.1 Running ANOVA
Arrange data as such:
_______________________________________________________
Treatment/ Population Replicate
_______________________________________________________
1 2 3 ........ Total
_______________________________________________________
1 Y11 Y12 Y13 .......... T1
2 Y21 Y22 Y23 ........... T2
3 Y31 Y32 Y33 ............ T3
t Yt1 Yt2 Yt3 ............ Tt
Biostatistics Notes: ANOVA
_________________________________________________________
G
6.1.2 Calculations
TSS = Total Sum Squares
= Y2ij - G2/n
SSB = Sum Squares Between Treatments
= Ti2/ni - G2/n
SSW = Sum Squares Within Treatments
= TSS SSB
ANOVA Table:
__________________________________________________________
Source of Variation SS DF MS (= SS/df) F
___________________________________________________________
Between Population SSB t1 sB2 = SSB/t1 F = sB2/ sw2
Within Population SSW nt sw = SSW 2
(n t)
Total SST n1
__________________________________________________________
6.2 Example for Completely Randomized Design
6-2
Biostatistics Notes: ANOVA
A pharmeceutical company wants to test the effectiveness of their new drug against
aspirin in curing headaches. A placebo was also used in the test. The variable
measured was the number of hours the ‘patients’ were able to be relieved of their
headaches. The data collected are as below. Are the three treatments similarly
effective in relieving a person of headaches?
______________________________________________________
Treatment Total
_______________________________________________________
Placebo 5.0 2.0 3.5 10.50
Drug 0.33 0.25 1.0 0.5 2.08
Aspirin 0.25 1.33 0.5 0.5 2.58
______________________________________________________
G= 15.16
Analysis:
Number of treatments = 3
Sample size: n1 = 3, n2 = 4, n3 = 4, n = 11
= (5.0)2 + (2.0)2 + ....... + (0.5)2 + (0.5)2 (15.16)2
11
= 45.00 – 20.89
= 24.11
SSB = T i2 G2 = [ (10.5)2 + (2.08)2 + (2.58)2] – 20.89
ni n 3 4 4
= 39.50 – 20.89
= 18.61
6-3
Biostatistics Notes: ANOVA
SSW = TSS SSB
= 24.11 – 18.61
= 5.50
ANOVA Table
_____________________________________________________
Source of Variation SS df MS F
______________________________________________________
Between Populations 18.61 2 9.31 13.49
Within Populations 5.50 8 0.69
______________________________________________________
Total 24.11 10
______________________________________________________
F0.05, 2, 8 = 4.46
Since F(=13.49) is larger than F0.05 (= 4.46) we reject Ho and conclude that at least one of the
means is different. Thus the three treatments are not similarly effective in relieving a person
of headaches.
6.3 Randomized Block Design
Example : For the chlorophyll study that was mentioned at the begining of the chapter, let
us assume that there is an external factor that may influence our results. Let us assume that
the analysis of the chlorophyll takes a long time to complete and that we can only manage 4
analyses per day. In this case we cannot just do leaves A (rambutan) on the first day, B
(durian) on the second day and so on. This is because the experimental conditions may not be
similar on each day. Thus the factor ‘day’ may add to the variations of our results. These
‘unwanted’ variations can be blocked (or removed) by properly designing our experiment. In
this case we should analyse one replicate from each kind of leaves each day. The design is
shown below:
________________________________________________
Day Sample Analysed
________________________________________________
1 A1 B1 C1 D1
6-4
Biostatistics Notes: ANOVA
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
_______________________________________________
For this design A represents rambutan leaves, B represents durian leaves, C represents ciku
leaves, and D represents cempedak leaves. With this design we are able to block the effect of
the day. As an example, suppose the experimental conditions for day 2 favour higher value,
then it will be felt by all 4 leaves. Thus in terms of differences among the populations, the
effect of day has been removed. Results of a Block Design should be arrange as such:
_______________________________________________________
Treatments/ Block (Day) Total
(population) (Treatments)
_________________________________________________________
1 2 .................... b
_______________________________________________________
1 Y11 Y12 ................... Y1b T1
2 Y21 Y22 ................... Y2b T2
. . . . .
. . . . .
t Yt1 Yt2 ................... Ytb Tt
_______________________________________________________
Total (Block) B1 B2 .................... Bb G
Yij : Observation for Population i and Block j
t : Number of Populations or treatments
b : Number of Blocks ( eg no. of days)
n : Total number of observations = bt
Ti : Total for Treatments i
Bj : Total for Block j
G : Grand Total
Calculations
TSS = Yij2 G2/n
6-5
Biostatistics Notes: ANOVA
SST = Sum Square Treatment
2
= (Ti /b) - G2/n
SSB = Sum Square Block
2
= (Bj /t) - G2/n
SSE = Sum Square Error
= TSS SST SSB
ANOVA Table
__________________________________________________________________
Source of Variations SS df MS F
__________________________________________________________________
Treatments SST t1 MST MST/MSE
Blocks SSB b1 MSB MSB/MSE
Error SSE (b1)(t1) MSE
____________________________________________________________________
Total TSS bt1
____________________________________________________________________
6.3.1 Example for Randomized Block Design
We want to study the effect of 3 kinds of insecticides on the growth of string beans.
For the experiment we were provided 4 plots to plant the string beans. Since we feel
that the soil conditions of the the plots may differ, we need to design our experiment
such that the effects of the different soil conditions will be blocked or removed. In
this case we make 3 rows of bunds for plot, maintaining a suitable distance between
the rows. Each row is planted with 100 seeds of string beans and then maintained
under the insecticide assigned to that row. The insecticides were randomly assigned
to the rows within a plot so that each insecticide appeared in one row in all 4 plots.
The response of interest was the number of seedlings that emerged per row. The data
obtained are shown below:
______________________________________________________
6-6
Biostatistics Notes: ANOVA
Plot
______________________________________________________
Insecticide 1 2 3 4 Total
______________________________________________________
1 56 49 65 60 230
2 84 78 94 93 349
3 80 72 83 85 320
_______________________________________________________
Total 220 199 242 238 899
_______________________________________________________
G2/n = 8992 = 67,350.08
12
= 562 + 492 + + 852 67,350.08 = 2334.92
2
SST = (Ti /b) - G2/n
= 2302 + 3492 + 3202 67,350.08 = 1925.17
4 4 4
2
SSB = (Bj /t) - G2/n
= 2202 + 1992 + 2422 + 2382 67,350.08 = 386.25
3 3 3 3
SSE = TSS – SST SSB
= 2334.92 – 1925.17 – 386.25 = 23.50
ANOVA Table
______________________________________________________
Source of SS df MS F
Variations
6-7
Biostatistics Notes: ANOVA
______________________________________________________
Treatments 1925.17 2 962.59 962.59/3.92 = 245.56
Blocks 386.25 3 128.75 128.75/3.92 = 32.84
Error 23.50 6 3.92
Total 2334.92 11
______________________________________________________
The F test for treatments, namely
Ho1 = 2 = 3 (no difference among treatment means)
Ha : at least one of treatment means is different
makes use of F Statistics MST/MSE. Since computed value of F (=245.56) is greater
than the tabulated Fvalue (F0.05 , 2, 6 = 5.14), we reject Ho and conclude that at
least one of the treatment means is different.
To test for blocks:
Ho 1 = 2 = 3 = 4 (no difference among block means)
Ha : at least one of block means is different
makes use of F Statistics MSB/MSE. Since computed value of F (=32.84) is greater
than the tabulated Fvalue (F0.05 , 3, 6 = 4.76), we reject Ho and conclude that at
least one of the block means is different.
6.4 Randomized Block Design With Missing Data.
When we use the Randomized Block Design, we must ensure that the replicates
(blocks) for the treatments are the same. If we were to loose a data we can take the
option of repeating the experiment or we can just estimate the missing data. Let us
look back at example 6.3.1. Suppose we have 3 insecticides and just 3 blocks. And
suppose we were not able to get data for Insecticide 2 and Plot 3. We label this data
as Y.
________________________________________________________
Plot
Insecticide 1 2 3 Total
6-8
Biostatistics Notes: ANOVA
________________________________________________________
1 56 49 65 170
2 84 78 Y (162 + Y )
3 80 72 83 235
________________________________________________________
220 199 ( 148 + Y ) 567 +Y
________________________________________________________
To estimate the value of Y we use the formula:
tTi ' bB 'j G '
y i j
t 1 b 1
b : number of blocks
t : number of treatments
T ' : Ti without value of Y
i
B ' : Bj without value of Y
j
G ' : G without value of Y
For the example above :
Y23 = (3 x 162 ) + ( 3 x 148 ) 567
2 x 2
= 90.7
Rounding up Y23 = 91
The data is now complete:
______________________________________________________
Plot
Insecticide 1 2 3 Total
______________________________________________________
1 56 49 65 170
6-9
Biostatistics Notes: ANOVA
2 84 78 91 253
3 80 72 83 235
______________________________________________________
Total 220 199 239 658
______________________________________________________
The ANOVA test can now be carried out. It should be noted that although the total
number of samples is 9, one data was generated (ie estimated). Thus the total df
is only 7 not 8.
6.5 Latin Square Design
The Latin Square Design is used when we feel that there may be 2 external factors
that may influence our results. Lets go back on the example on chlorophyll contents (Section
6.3.1). Suppose we can only do one chlorophyll analysis a day. As such we cannot use a
block design because we will not be able to block the effect of day. We need to do 4 analyses
a day to do that. To overcome this problem we can get help from 3 colleagues who will do
the analysis with us. However now we have introduced another external factor (analysts) that
may affect our results. Not to worry because with a proper design we can also block the
second external factor. This is shown below where the treatments are I (rambutan), II
(durian), III (ciku) and IV (cempedak):
(K)
(C)
Analyst (K)
Day (R) A B C D Total
1 I II III IV R1
2 II III IV I R2
3 III IV I II R3
4 IV I II III R4
Total K1 K2 K3 K4 G
Note that the treatments are arraged such that each leaf is analysed during each day (thus
blocking the effects of day) and by each analyst (thus blocking the effects of analysts).
6-10
Biostatistics Notes: ANOVA
Note also that in a Latin Square Design, the number of treatments = number of external
factor 1 = number of external factor 2. That is why it is called a square design.
Calculations:
TSS = (Yijk)2 G2
n
SSR = (Rj)2 G2
t n
SSC = (Kk)2 G2
t n
SST = (Ti)2 G2
t n
SSE = TSS SST SSR SSC
Analyst (K)
Day (R) A B C D Total
1 I II III IV R1
2 II III IV I R2
3 III IV I II R3
4 IV I II III R4
Total K1 K2 K3 K4 G
6-11
Biostatistics Notes: ANOVA
1 I II III IV R1
2 II III IV I R2
3 III IV I II R3
4 IV I II III R4
Total K1 K2 K3 K4 G
Analyst (K)
Day (R) A B C D Total
To calculate the Total for each treatment:
TI = 94.3, TII = 100.2, TIII = 72.2, TIV = 89.4
SST = (94.3)2 + (100.2)2 + (72.2)2 + (89.4)2 (356.1)2
__________________________________________ _______
6-12
Biostatistics Notes: ANOVA
4 16
= 108.98
SSR = (91.7)2 + (85.1)2 + (89.3)2 + (90.0)2 (356.1)2 = 5.9
______________________________________ _______
4 16
SSC = (57.3)2 + (125.6)2 + (69.4)2 + (103.8)2 (356.1)2 = 736.9
_________________________________________ ________
4 16
TSS = (15.5)2 + (16.3)2 + . . . . . (30.3)2 + (21.6)2 (356.1)2 = 875.6
16
SSE = TSS SST SSR SSC
= 23.8
ANOVA Table:
_________________________________________________________
Source SS df MS F
_________________________________________________________
Treatments 108.98 3 36.33 9.15
Row (R) 5.9 3 1.97 0.50
Column (K) 736.9 3 245.6 61.87
Error 23.81 6 3.97
___________________________________________________________
F0.05, 3,6 = 4.76
To test for differences among treatment means we use the hypotheses:
Ho1 = 2 = 3 = 4 (no difference among treatment means)
Ha : at least one of treatment means is different
6-13
Biostatistics Notes: ANOVA
makes use of F Statistics MS Treatment/MS Error. Since computed value of F (=9.15)
is greater than the tabulated Fvalue (F0.05 , 3, 6 = 4.76), we reject Ho and conclude
that at least one of the treatment means is different. This means that at least one of the
4 leaves have chlorophyll contents different from the rest.
6.7 Factorial Experiment
When we used the Block and the Latin Square Design we try to remove external factors that
may affect our results. Note that in both cases there is only one kind of treatment involved. In
the example, the treatment of concerned is different kinds of leaves.
In Factorial Eksperiments or Design, we are dealing with more than one kind of treatment (or
factor). Eg:
(i) An experiment was carried out to study the response of antibody using 4
preparations of vaccine and 6 kinds of additives.In this case, the treatments (or
factors) are kinds vaccines and additives.
(ii) An experiment was carried out to study the effects of salinity and temperature on
the growth of juvenile groupers. The fish were cultured at 3 levels of salinity,
namely 30%o, 20%o dan 10%o, and 2 levels of temperature, namely 30oC dan
20oC.
For both cases we want to know the effects of both factors and also the interactions between
them
The ANOVA Table for a 2factor Factorial Experiment is as follows:
6-14
Biostatistics Notes: ANOVA
Analysis
Let’s consider example (ii). Suppose only one observation was made for each combination of
factors. Suppose the data are as follows:
Increase in Weight (g) of Juvenile Grouper After 2 Weeks:
Note that in this case the total df is 5, df for factor A is 2, the df for Factor B is 1, and the df
for interactions is 2. There is no more df left for us to calculate MS of error. This means that
we are not able to run ANOVA using these data. However, we do have one option in that we
can assume there is no interaction between the factors.
Calculations:
2
TSS = Yij G2/n
= (52.72 + 65.12 + . . . 67.22) 363.92
6
= 22331.99 22070.54
= 261.45
= 22079.79 22070.54
= 9.25
2
SSB = B /n G2/n
j B
= 22320.15 22070.54
6-15
Biostatistics Notes: ANOVA
= 249.61
SSE = TSS SSA SSB
= 2.59
Here nA = number of neplicates that make up each total Ai
=b
nB = number of neplicates that make up each total Bj
= a
ANOVA Table
______________________________________________________
Source df SS MS F
______________________________________________________
Factor A 2 9.25 4.63 3.56
Factor B 1 249.61 249.61 192.0
Error 2 2.59 1.17
______________________________________________________
Total 5
______________________________________________________
F0.05, 1, 2 = 18.51
F0.05, 2, 2 = 19.00
Conclusions
(i) For Factor A:
Ho1 = 2 = 3 (no difference among treatment means)
Ha : At least one mean is different
Since F ( = 3.56) is less than F0.05, 2, 2 (= 19.00), we are not able to reject Ho. We
conclude that the differences in mean weights of fish exposed to different salinities are
not significant at = 0.05
(ii) For Factor B:
Ho1 = 2 (no difference among treatment means)
Ha : At least one mean is different
6-16
Biostatistics Notes: ANOVA
Since F ( = 192.0) is more than F0.05, 1, 2 (= 18.51), we reject Ho and conclude that the
difference in mean weights of fish exposed to different temperatures is significant at =
0.05.
If we have at least 2 replicates for each combination of the 2 factors, we do not have
to assume that there is no interactions as we would have enough df. Suppose the data
are as follows:
Salinity (Factor A)
Total
10 20 30
20 59.0 61.7 69.0 343.9
Temperature
Factor 47.8 52.1 54.3
(B)
30 63.5 75.4 71.0 404.8
TSS = 59.02 + 47.82 + 63.52 + . . . . + 71.02 + 62.92 (748.7)2
12
= 717.33
= 51.15
6-17
Biostatistics Notes: ANOVA
= 309.11
SSAB = [ (AB)ij]2 SSA SSB G2
nAB n
To get the value of (AB)ij we need to produce another table as below. For this table
the 2 replicates are added together.
Salinity (A)
10 20 30
SSAB = (106.8) 2 + (113.8) . . . . + (133.9)
2 2 51.15 309.11 (748.7)2
2 12
= 47108.3 – 51.15 – 309.11 – 46712.64
= 35.4
SSE = TSS – SSA – SSB SSAB
= 717.3 – 51.15 – 309.11 – 35.4
= 321.67
ANOVA Table
Source SS df MS F
6-18
Biostatistics Notes: ANOVA
Total 713.33 11
F0.05, 2, 6 = 5.14
F0.05, 1, 6 = 5.99
Conclusion:
Based on the above results and with = 0.05, we can conclude that salinity does not
affect the growth of the fish. The F value calculated is 0.48 which is much smaller
than the critical value of 5.14. The same situation is found for temperature where the
calculated F value (= 5.77) is also less than the critical value (= 5.99). Interactions
between salinity and temperature was found to be not significant since the F value (=
0.33) is less than the critical value (= 5.14).
6-19