Sie sind auf Seite 1von 12

One-way Analysis of Variance (available on E-book)

In basic statistics, the F-distribution is used in: (1) making inferences about two population variances
i.e., homogeneity of variance test, and (2) analysis of variance (ANOVA). In this class, we will cover
only the ANOVA test.
Fishers F-distribution
If 12 = 22 and s12 and s22 are sample variances from independent simple random samples of size n1
and n2, respectively, drawn from normal populations, then
2
s
F = 12
s2
follows the F-distribution with n1-1 degrees of freedom in the numerator and n2-1 degrees of freedom
in the denominator.

E.g., if samples are drawn of size n1=8 from Population 1 and size n2=5 from Population 2, then F has df
= 7, 4 (i.e., (8-1), (5-1)).
( x1i x1) 2

2
n1 = 8
s1 =
df = 7
8 1
2
s
F = 1 2 has df = 7 and 4
s2
n2 = 5

( x2
=

x 2) 2

df = 4
5 1
Note that df for F is always stated as first numerator df and then denominator df.
s2

Characteristics of the F-distribution


1. F > 0.
2. The F-distribution is not symmetric; it is skewed to the right.
3. The F-distribution is asymptotic to the horizontal axis on the right hand side.
4. As df increase, the high point of the F-distribution approaches 1.
5. The shape of the F-distribution depends upon the degrees of freedom in the numerator and
denominator (see Figure 10 above). This is similar to Students t-distribution, whose shape
dependscritical
upon the
degrees
of freedom.
Finding
values
of the
F-distribution using Table V.
6. The total area under the curve is 1.

Find the critical F-value for a right-tailed test with =0.05, degrees of freedom in the numerator = 10 and
degrees of freedom in the denominator = 6.

F 0.05,

10. 6

df of denominator
df of numerator

F0.05, 10, 6

Area in the right tail


(i.e., or significance
level)

Analysis of Variance (ANOVA) is an inferential method that is used to test the equality of three or more
population means. ANOVA is an extension of a t-test for independent samples (section 10.2)
H0: 1 = 2 = = k
H1: not all means are equal
For example, for k=3 the null hypothesis and alternative hypotheses are:
H0: 1 = 2 = 3
H1: 1 = 2 3
1 2 = 3
1 = 3 2
1 2 3

Assumptions of a One-Way ANOVA


1. There are k simple random samples from k populations.
2. The k samples are independent of each other, that is, the subjects in one group
cannot be related in any way to subjects in a second group.
3. The populations are normally distributed.
4. The populations have the same variance; that is, each treatment group has
population variance 2.

Population 1

Population 2

Population 3

ANOVA Test using the F-distributionHypothesis Test Regarding Three or More Means with
Unknown
Assumptions:
k simple random samples from k populations.

k samples are independent of each other

k populations are normally distributed.

The populations have the same variances, 2.

Step 1: A claim is made regarding the means of three or more populations. The null and alternative hypotheses are written as:
H 0: 1 = 2 = = k
H1: not all means are equal
Step 2: Select a level of significance, , and find the right-tailed critical value for the F-distribution with df=(k-1),
(n1+n2++nk-k). The rejection region (or critical region) is the set of all values of the test statistic to the right of the critical
F-value.

F,(k-1),(n1+n2++nk-k)

Step 3: Calculate the test statistic or calculated F-value:


a.

Calculate the grand mean of the combined data set, x , by adding up all the observations and dividing by the number of
observations.

b.

Find the sample mean for each population or treatment ( x1 = sample mean from population 1; x 2 = sample mean from
population 2; and so on).
Find the sample variance for each population (s12 = sample variance from population 1; s22 = sample variance from
population 2; and so on).
Calculate the mean square due to treatment. (Another name for mean square is variance which is equal to the mean of

c.
d.

the squared deviations about x ).

n 1 ( x 1 x ) 2 + n 2 ( x 2 x ) 2 + ... + n k ( x k x ) 2
MST =
,
k 1
where n1 is the sample size from population 1;
n2 is the sample size from population 2; and so on
k is the number of populations, or treatment levels.
e. Calculate the mean square due to error:

(n 1)s1 + (n 2 1)s 2 + ... + (n k 1)s k


MSE = 1
.
(n 1 + n 2 + ... + n k ) k
2

f.

Calculate the F test statistic:

F=

The numerator in the computation


of MST is called the sum of
squares treatment or SST.

The numerator in the computation


of MSE is called the sum of
squares error or SSE.

MST (mean square due to treatment)


MSE (mean square due to error)

The calculations in Step 3 are reported in an ANOVA table as shown below.


Source of Variation
Treatment
Error
Total

Sum of Squares

Degrees of Freedom

Mean Square

SST
SSE
SS

k-1
n1+n2++nk-k
n1+n2++nk-1

MST=SST/(k-1)
MSE=SSE / ( n1+n2++nk-k)

FStatistic
F=MST/MSE

Step 4: Draw a conclusion:


Compare the calculated F-value (or F test statistic) to the critical F-value and state whether or not H0 is rejected at the
specified .
If F > F, (k-1),(n1+n2++nk-k), reject H0; otherwise do not reject H0.
Interpret the conclusion in the context of the problem.

ANOVABlood Glucose Levels of Rats.


Problem: Researcher Jelodar Gholamali wanted to determine the effectiveness of various
treatments on glucose levels of diabetic rats. He randomly assigned diabetic albino rats into four
treatments groups. Group 1 rats served as a control group and were fed a regular diet. Group 2
rats were served a regular diet supplemented with a herb, fenugreek. Group 3 rats were served a
regular diet supplemented with garlic. Group 4 rats were served a regular diet supplemented
with onion. The basis for the study is that Persian folklore states that diets supplemented with
fenugreek, garlic, or onion help to treat diabetes. After 15 days of treatment, the blood glucose
was measured in milligrams per deciliter (mg/dL). The results are presented in the table below.
Carry out a test of the relevant null hypothesis to test the claim made by Persian folklore that
fenugreek, garlic, and onion help treat diabetes. Use = 0.05. Show all 4 steps of test of a
hypothesis.
Control
288.1
296.8
267.8
256.7
292.1
282.9
260.3
283.8

Fenugreek
229.1
240.7
239.4
207.7
225.7
230.8
206.6
213.3

Garlic
177.4
202.2
163.1
184.7
197.9
164.6
193.9
158.1

Onion
299.7
258.3
286.8
244.0
267.1
297.1
249.9
265.1

Step 1: A claim is made regarding the means of the three populations. The null and alternative hypotheses are
written as:
H 0: 1 = 2 = 3
H1: not all means are equal
Step 2: Select = 0.05 and find the right-tailed critical value for the F-distribution with df=(k-1), (n1+n2+n3+n4-k)
or df=3, 28.

F0.05, 2, 33 = 2.99

Step 3:
a. Calculate the grand mean of the entire data set:
288.1 + 296.8 + ... + 249.9 + 265.1
x=
= 238.54
32
b. Find the sample mean of each population, where control = Population 1, Fenugreek = Population 2, Garlic =
Population 3 and Onion = Population 4.
288.1 + 296.8 + ... + 283.8
x1 =
= 278.56
8
229.1 + 240.7 + ... + 213.3
x2 =
= 224.16
8
177.4 + 202.2 + ... + 158.1
x3 =
= 180.24
8
299.7 + 258.3 + ... + 265.1
x4 =
= 271.00
8

c. Find the sample variance for each population.


2
2
2
2 ( 288.1 278.56) + ( 296.8 278.56) + ... + ( 283.8 278.56)
s1 =
= 225.77
8 1
s2
s3

s3

2
2
2
( 229.1 224.16) + ( 240.7 224.16) + ... + ( 213.3 224.16)

8 1
2
2
2
(177.4 180.24) + ( 202.2 180.24) + ... + (158.1 180.24)

8 1
2
2
2
( 299.7 271.) + ( 258.3 271.) + ... + ( 265.1 271.)

8 1

= 181.99
= 291.03

= 448.58

d. Compute MST:
MST =

2
2
2
2
8( 278.56 238.54) + 8( 224.16 238.54) + 8(180.24 238.54) + 8( 271 238.54)
3 1

50,087.4112
3

= 16,695.8

e. Compute MSE:
MSW =

(8 1) 225.77 + (8 1)181.99 + (8 1) 291.03 + (8 1) 448.48


32 4

8,030.89
28

= 286.82

f. Compute F test statistic.


F=

Mean square due to treatment


Mean square due to error

ANOVA Table:
Source of
Variation
Between
Within
Total

Sum of
Squares
50,087.41
8,030.89
58,118.30

MST
MSE

16,695.8
286.82

Degrees of
Freedom
k-1=4-1=3
n1+n2+n3+n4-k=28
n1+n2+n3+n4-1=31

= 58.21

Mean Square
MST=16,695.80
MSE=286.82

F-Test Statistic
calc F=58.21

Step 4: ConclusionBecause the calculated F-statistic=58.21 is less than the critical F=2.99,

reject H0 at the 0.05 significance level. At least one of the population means is different from the
others.
6

Excel: ANOVASingle Factor


Step 1: Enter the raw data in columns A, B, C, ... for each sample (or treatment).
Step 2: From the Windows menubar, select Tools/Data Analysis/ANOVA: Single Factor.
Note: Be sure the Data Analysis Tool Pak is activated. This is
done by selecting the Tools menu and highlighting, Add-Ins.
Check the box for the Analysis Tool Pak and select OK.

Step 3: With the cursor in the Input Range: box, highlight the data. Click OK.
Perform the calculations using Excel.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

A
Control
288.1
296.8
267.8
256.7
292.1
282.9
260.3
283.8

Anova: Single Factor


SUMMARY
Groups
Control
Fenugreek
Garlic
Onion

B
Fenugreek
229.1
240.7
239.4
207.7
225.7
230.8
206.6
213.3

C
Garlic
177.4
202.2
163.1
184.7
197.9
164.6
193.9
158.1

D
Onion
299.7
258.3
286.8
244.0
267.1
297.1
249.9
265.1

Count

Sum
2228.5
1793.3
1441.9
2168.0

Average
278.5625
224.1625
180.2375
271

8
8
8
8

Variance
225.7713
181.9884
291.0341
448.58

F-statistic (or
calculated F)
Critical F

ANOVA
Source of Variation
Between Groups
(SST)
Within Groups (SSE)
Total

SS
50090.69
8031.616
58122.31

df
3
28
31

MS

16696.9
286.8434

58.2091

P-value
3.74E12

F crit
2.946685

Logic of the ANOVA test


If H0 is TRUE, MST will approximately equal MSE and the calculated F will be approximately
equal to 1.
If H0 is FALSE, MST will be greater than MSE and the calculated F > 1.
If the k samples are taken from populations with different means, then MST will be
considerably greater than MSE, owing to the wider dispersion of the sample means ( x i )
about the grand mean ( x )see figure below.
If MST is so large that in comparison to MSE it yields a calculated F-value > the critical Fvalue, we conclude that the sample means are significantly different and there must be at
least one pair of samples whose means differ significantly.

H0 is TRUE

x = grand mean

H0 is FALSE

x = grand mean

ANOVA test is always a one-tailed test:


A significant result occurs only if MST > MSE, i.e., if the calculated F > 1; thus, a righttailed test is always used in ANOVA.
Whenever MST < MSE, the result is never considered significant.

Tukeys Test Using the Studendized Range DistributionHypothesis Test Comparing Two
Means (see Section 13.2, available under Course Compass).
Assumptions:

k simple random samples from k populations.


k samples are independent of each other

k populations are normally distributed.


The populations have the same variances, 2.

Step 1: A claim is made regarding the two population means (i and j).
Two-Tailed Test
H0: i = j
H1: i j
i<j or i>j

Step 2: Determine the critical value, q, (n1+n2++nk-k), k, where = significance level,


(n1+n2++nk-k) = df for error, and k = df for treatments.
Step 3: (a) Compute the pairwise differences, x i x j , where x i > x j .
xi x j

(b) Compute the test statistic, q =

s 1
1

2 n i n j
Note that s2 is the mean square error due to error, MSE, from the ANOVA table; ni is the sample
size from population i; and nj is the sample size from population j.
2

Step 4: Draw a conclusion:

Compare the calculated q (or q statistic) to the critical value, q, (n1+n2++nk-k), k, and state
whether or not the H0 is rejected at the specified .
If q q, (n1+n2++nk-k), k, reject H0; otherwise do not reject H0.
Interpret the conclusion in the context of the problem

Compare all pairwise differences to identify which population means are considered equal.

Tukeys Test Using the Studentized Range DistributionExample


Control
288.1
296.8
267.8
256.7
292.1
282.9
260.3
283.8
278.56
Source of
Variation
Treatment
Error
Total

Fenugreek
229.1
240.7
239.4
207.7
225.7
230.8
206.6
213.3
224.16

Garlic
177.4
202.2
163.1
184.7
197.9
164.6
193.9
158.1
180.24

Onion
299.7
258.3
286.8
244.0
267.1
297.1
249.9
265.1
271.00

Sum of
Squares
50,087.41
8,030.89
58,118.30

Degrees of
Freedom
3
28
31

Mean Square
16,695.80
286.82

FStatistic
58.21

Step 1: State the null and alternative hypotheses.

Step 2: Determine the critical value, q, (n1+n2+n3+n4-k), k, where = significance level,


(n1+n2+n3+n4-k) = df for error, and k = df for treatments.
= ________
k = ______________
n1+n2+n3+n4-k = __________________________

Step 3: (a) Compute the pairwise difference, x i x j , where x i > x j .

(b) Compute the test statistic, q =

( x i x j ) ( i j )
s 1
1

+
2 n i n j

Step 4Conclusion. Provide a conclusion and the statistical justification for the conclusion, and
interpret your conclusion in the context of the problem.

Repeat this procedure for all pairwise differences in sample means.


10

Comparison,
H0 and H1

Difference,

xi x j

Test Statistic, q

Critical Value

Conclusion

Summary of Tukeys Test (arrange sample means from highest to lowest and draw a line under means that are not significantly different, p. 695):

Das könnte Ihnen auch gefallen