Sie sind auf Seite 1von 9

STAT3010: Lecture 4

CHAPTER 9: ANALYSIS OF VARIANCE


Analysis of Variance (ANOVA) is one of the most widely used
statistical techniques for testing the equality of population
means. ANOVA is used to test the equality of more than two
treatment means.
Recall from STAT 2020/2010: Hypothesis testing concerning a
Difference between two means. The data could have either
been independent, paired or pooled:
For Example: The Chapin Insight Test is a psychological test
designed to measure how accurate the subject appraises
other people. The possible scores on the test range from 0 to
41. During the development of the Chapin test, it was given to
several different groups of people. Here are the results for male
and female college students majoring in the liberal arts:
Group
1
2

Sex
Male
Female

n
133
162

25.34
24.94

s
5.05
5.44

Do these data support the contention that female and male


students differ in average social insight?

STAT3010: Lecture 4

For the ANOVA test, we want to test the equality or difference


of more than two treatment means, so our hypotheses are:

With assumptions:
1.
2.
3.
4.

Background Logic (Section 9.1, Page 408)


Example 9.1: Variation in Time to Relief of Symptoms Between
and Within Treatments
An experiment is conducted in which three treatments are
compared with respect to their effectiveness. For the purpose
of this example, effectiveness is evaluated in terms of time to
relief of symptoms, reported in minutes. We assume that the
distribution of time to relief are approximately normal. The test
of interest is as follows:

Fifteen subjects are randomly selected to participate in the


investigation. Five subjects are randomly assigned to each
treatment and each subject reports the time to relief of
symptoms, in minutes, following their assigned treatment.
Sample data and summary stats follow:

STAT3010: Lecture 4

Treatment 1

Treatment 2

29.0
29.2
29.1
28.9
28.8

Treatment 3

25.1
25.0
25.0
24.9
25.0

20.1
20.0
19.9
19.8
20.2

Summary Statistics by Treatment

x1
s12
s1

x2
s22
s2

x3
s32
s2

Summary statistics show:

So far, the summary statistics shows a small amount of within


group variability since all individual standard deviations are
quite small. Now, suppose that the population means are
equal (ie., H o : 1 2 3 is true). We can assume, then, that
the three samples are drawn from the same population and
can pool all of the observations together (i.e., N=15).

STAT3010: Lecture 4
options LS = 80 PS = 60
nodate;
data relief;
input x;
cards;
29
29.2
29.1
28.9
28.8
25.1
25
25
24.9
25
20.1
20
19.9
19.8
20.2
run;
proc print;
run;
proc means;
var x;
run;
The SAS System
The MEANS Procedure
Analysis Variable : x
N
Mean
Std Dev
Minimum
Maximum

15
24.6666667
3.8130728
19.8000000
29.2000000

This shows a large amount of between group variability.


Between groups variability is the difference between the mean
time to relief for each group versus the overall (pooled) mean.
Our individual mean values seems to be quite different from our
overall mean, so this suggests large between group

STAT3010: Lecture 4

variability. If the means in each group are very similar, then the
between variability is small.
In ANOVA, we compare the variation within samples (which is
small) to the variation between samples (which is large) to
assess the equality of the population means.
If the observations within a sample are similar in value (ie., small
within sample variation) and the means are different across
samples (large between sample variation), then a real
difference is said to exist in the population means. (Reject H o )
So what have we learned?

What causes variability?


Within groups: if the treatment has no effect, all participants
time to relief will be more or less the same, thus giving low
variability.
Between groups: if one treatment has a massively better effect
than the others, then the mean of this group will be very
different from the others, thus increasing between-group
variability.
In ANOVA, we wish to test the following:

H o : 1 2 3 ... k

H a : at least 2 means not equal

vs
5

STAT3010: Lecture 4

where k = the number of populations under consideration.


To test H o , we compute two estimates of the population
2

variance ( ).
First estimate:

Second estimate:

Formula for Within Treatment Variation:

Formula for Between Treatment Variation:

The test statistic in ANOVA is based on the ratio of these two


estimates:

STAT3010: Lecture 4

The test statistic follows an F distribution (Table B.4A and B.4B):

Lets use our example to find these values:


Treatment 1
29.0
29.2
29.1
28.9
28.8

Treatment 2
25.1
25.0
25.0
24.9
25.0

Treatment 3
20.1
20.0
19.9
19.8
20.2

Summary Statistics by Treatment

x1 29
s12 0.025
s1 0.158

x 2 25
s22 0.005
s2 0.071

x 3 20
s32 0.025
s2 0.158

STAT3010: Lecture 4

If the two estimates of 2 are close in value, then F will be


approx. equal to 1, which leads us to no reason for rejecting
H o . However, if the variation between samples ( sb2 ) is large
2

and the variation within samples ( sw ) is small, then F will be


large, and we would reject H o .
These arent our only guidelines for a conclusion.we need a
critical value from the F distribution as well. In order to get this
critical value, we need 2 degrees of freedom: the numerator
degrees of freedom ( df1 k 1 ) and the denominator degrees
of freedom ( df 2 nk k or df 2 N k ). We find this F critical
value using Table B.4A or B.4B.

Decision:

STAT3010: Lecture 4

Notation and Examples (Section 9.2, Page 413)


To make a decision of reject/do not reject the null hypothesis,
we simplify the test by the use of the ANOVA table. Here are
the formulas which make up the ANOVA table:
Analysis of Variance Table
Degrees of
Freedom
(df)

Source of
Variation

Sums of Squares
(SS)

Between

SS b n j ( X . j X .. ) 2

k-1

s b2 MS b

SS b
k 1

Within

SS w ( X ij X . j ) 2

N-k

s w2 MS w

SS w
N k

Total

SS total ( X ij X .. ) 2

N-1

Mean Squares
(MS)

MS b
MS w

Das könnte Ihnen auch gefallen