Chap 26 One Way Anova

CHAPTER 26
ANalysis Of VAriance (ANOVA)
Comparing 2 or more population means

One Way Analysis of
Variance
 The analysis of variance is a
procedure that tests to determine
whether differences exist between
two or more population means.
 To do this, the technique analyzes

the sample variances
One Way Analysis of Variance:
Example
 A magazine publisher wants to
compare three different styles of
covers for a magazine that will be
offered for sale at supermarket
checkout lines.
 She assigns 60 stores at random to
the three styles of covers and
records the number of magazines
that are sold in a one-week period.
Example
 How do five bookstores in the same
city differ in the demographics of
their customers?
 A market researcher asks 50
customers of each store to respond
to a questionnaire. One variable of
interest is the customer’s age.
Idea Behind ANOVA –
two types of variability
1. Within group variability

2. Between group variability
30
25
x3  20
x3  20
20 20
19
x 2  15
16 x 2  15
15
14
x1  10 12
11 x1  10
10 10
9 9
A small variability within 1

The sample means are the same as before,
theTreatment
samples makes it2easier
1 Treatment Treatment 3 Treatment 1 within-sample
but the larger Treatment 3
Treatment 2 variability
to draw a conclusion about the makes it harder to draw a conclusion
population means. about the population means.
Idea behind ANOVA: recall the two-
sample t-statistic
 Difference between 2 means, pooled variances, sample
sizes both n (xy) equal to n
xy
t sp 11
 2
sp
n n
n ( x  y )2
t 
2 2
s 2p
 Numerator of t2: measures variation between the
groups in terms of the difference between their
sample means
 Denominator: measures variation within groups by the
pooled estimator of the common variance.
 If the within-group variation is small, the same
variation between groups produces a larger statistic
and a more significant result.
Example
 Example 1
– An apple juice manufacturer is planning to
develop a new product -a liquid concentrate.
– The marketing manager has to decide how to
market the new product.
– Three strategies are considered
 Emphasize convenience of using the product.
 Emphasize the quality of the product.
 Emphasize the product’s low price.

One Way Analysis of Variance
 Example 1 - continued
– An experiment was conducted as follows:
 In three cities an advertisement campaign
was launched .
 In each city only one of the three
characteristics (convenience, quality, and
price) was emphasized.
 The weekly sales were recorded for twenty
weeks following the beginning of the
campaigns.
Convnce Quality Price

529 804 672
Weekly 658 630 531
793 774 443
sales 514 717 596
663 679 602
719 604 502
711 620 659
606 697 689
461
Weekly 706 675
529 615 512
sales
498 492 691
663 719 733
604 787 698
495 699 776
485 572
Weekly 561
557 523 572
353 sales
584 469
557 634 581
542 580 679
614 624 532
 Solution
– The data are quantitative
– The problem objective is to compare
sales in three cities.
– We hypothesize that the three
population means are equal
Defining the Hypotheses
• Solution
H0: 1 = 2 = 3
HA: At least two means differ
To build the statistic needed to test

the hypotheses use the following
notation:
Notation
Independent samples are drawn from k populations (treatment groups).
1 2 k
First
observation,
X11 X12 X1k
first sample x21 x22 x2k
. . .
. . .
Second observation, . . .
second sample Xn1,1 Xn2,2 Xnk,k
n1
n2 nk
x1
x2 xk
Sample size
Sample mean
X is the “response variable”.

The variables’ values are called “responses”.
Terminology
 In the context of this problem…

Response variable – weekly sales
Responses – actual sale values
Experimental unit – weeks in the three cities
when we record sales figures.
Factor – the criterion by which we classify the
populations (the treatments). In this problem
the factor is the marketing strategy.
Factor levels – the population (treatment)
names. In this problem factor levels are the 3
marketing strategies: 1) convenience, 2) quality,
3) price
The rationale of the test statistic
Two types of variability are

employed when testing for the
equality of the population means
1. Within sample variability

2.Between sample variability
H0: 1 = 2 = 3
The rationale behind the test statistic-I
 If the null hypothesis is true, we
would expect all the sample means to
be close to one another (and as a
result, close to the grand mean).
 If the alternative hypothesis is true,
at least some of the sample means
would differ.
 Thus, we measure variability between
sample means.
Variability between sample means
• The variability between the sample
means is measured as the sum of
squared distances between each mean
and the grand mean.
This sum is called the

Sum of Squares for Groups
SSG
In our example, treatments are
represented by the different
advertising strategies.
Sum of squares for treatment
groups (SSG)
k
SSG   n j (x j x) 2
j1
There are k treatments
The size of sample j The mean of sample j
Note: When the sample means are close to

one another, their distance from the grand
mean is small, leading to a small SSG.
Thus, large SSG indicates large variation
between sample means, which supports HA.
groups (SSG)
 Solution – continued
Calculate SSG
x1  577.55 x2  653.00 x3  608.65
k
SSG   n j (x j x) 2
j1
The grand mean is calculated by = 20(577.55 - 613.07)2 +
n1 x1  n2 x2  ...  nk xk + 20(653.00 - 613.07)2 +
x + 20(608.65 - 613.07)2 =
n1  n2  ...  nk
= 57,512.23
groups (SSG)
Is SSG = 57,512.23 large
enough to reject H0 in favor
of HA?
The rationale behind test statistic – II
 Large variability within the samples

weakens the “ability” of the sample
means to represent their
corresponding population means.
 Therefore, even though sample means
may markedly differ from one
another, SSG must be judged relative
to the “within samples variability”.
Within samples variability
 The variability within samples is
measured by adding all the squared
distances between observations and
their sample means.
This sum is called the
Sum of Squares for Error
SSE
In our example, this is the sum of

all squared differences between
sales in city j and the sample mean
of city j (over all the three cities).
Sum of squares for errors (SSE)
 Solution –
continued
Calculate SSE
s12  10, 775.00 s22  7, 238.11 s32  8, 670.24
k nj
SSE   (xij  x j )  (n1 - 1)s12 + (n2 -1)s22 +

2
j 1 i 1 (n3 -1)s32
= (20 -1)10,775 + (20 -1)7,238.11+ (20-1)8,670.24

= 506,983.50
Sum of squares for errors (SSE)
Is SSG = 57,512.23 large

enough relative to SSE =
506,983.50 to reject the null
hypothesis that specifies that
all the means are equal?
The mean sum of squares
To perform the test we need
to calculate the mean squares
as follows:
Calculation of MSG:
Mean Square for Calculation of MSE:
treatment Groups Mean Square for Error
SSG SSE
MSG  MSE 
k 1 nk
57, 512.23 506,983.50
 
3 1 60  3
 28, 756.12
 8,894.45
Calculation of the test statistic
MSG
F
MSE
with the following degrees of
28, 756.12 freedom:

8,894.45 v1=k -1 and v2=n-k
 3.23
Required Conditions:
1. The populations tested
are normally distributed.
2. The variances of all the
populations tested are
equal.
The F test rejection region
And finally the hypothesis test:
H0: 1 = 2 = …=k
MSG
Test statistic: F
MSE
R.R: F>Fa,k-1,n-k
The F test MSG
F
MSE
28, 756.12

Ho: 1 = 2= 3 8, 894.45
HA: At least two means differ  3.23
Test statistic F= MSG/ MSE= 3.23

R.R.: F  Fa k 1nk  F0.05,31,603  3.15
Since 3.23 > 3.15, there is sufficient

evidence to reject Ho in favor of HA, and
argue that at least one of the mean sales
is different than the others.
The F test p- value
 Use Excel to find the p-value
fx Statistical F.DIST.RT(3.23,2,57) =
.0469
0.1
0.08
0.06 p Value = P(F>3.23) = .0469
0.04
0.02
0
0 1 2 3 4
-0.02
Excel single factor ANOVA
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Convenience 20 11551 577.55 10775.00
Quality 20 13060 653.00 7238.11
Price 20 12173 608.65 8670.24
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 57512 2 28756 3.23 0.0468 3.16
Within Groups 506984 57 8894
Total 564496 59
SS(Total) = SSG + SSE

Multiple Comparisons
 When the null hypothesis is rejected,

it may be desirable to find which
mean(s) is (are) different, and at
what ranking order.
 Two statistical inference procedures,
geared at doing this, are presented:
– “regular” confidence interval calculations
– Bonferroni adjustment
Multiple Comparisons
 Two means are considered different
if the confidence interval for the
difference between the
corresponding sample means does not
contain 0. In this case the larger
sample mean is believed to be
associated with a larger population
mean.
 How do we calculate the confidence
intervals?
“Regular” Method
 This method builds on the equal variances
confidence interval for the difference
between two means.
 The CI is improved by using MSE rather than
sp2 (we use ALL the data to estimate the
common variance instead of only the data
from 2 samples)
1 1
( xi  x j )  ta 2, n k  s 
ni n j
d . f .  n  k , s  MSE
Experiment-wise Type I error rate
(the effective Type I error)
 The preceding “regular” method may result in an
increased probability of committing a type I error.
 The experiment-wise Type I error rate is the
probability of committing at least one Type I
error at significance level a. It is calculated by:
experiment-wise Type I error rate = 1-(1 – a)g
where g is the number of pairwise comparisons (i.e.
g = k C 2 = k(k-1)/2.
 For example, if a=.05, k=4, then
experiment-wise Type I error rate =1-.735=.265
 The Bonferroni adjustment determines the
required Type I error probability per pairwise
comparison (a*) , to secure a pre-determined
overall a.
Bonferroni Adjustment
 The procedure:
– Compute the number of pairwise comparisons
(g)
[g=k(k-1)/2], where k is the number of
populations.
– Set a* = a/g, where a is the true probability
of making at least one Type I error (called
experiment-wise Type I error).
– Calculate the following CI for i – j
1 1
( xi  x j )  ta * 2, n  k  s 
ni n j
d . f .  n  k , s  MSE
Bonferroni Method
 Example - continued
– Rank the effectiveness of the marketing
strategies (based on mean weekly sales).
– Use the Bonferroni adjustment method
 Solution
– The sample mean sales were 577.55, 653.0,
608.65.
– We calculate g=k(k-1)/2 to be 3(2)/2 = 3.
– We set a* = .05/3 = .0167, thus t.0167/2, 60-3 = 2.467
(Excel).
– Note that s = √8894.447 = 94.31
x1  x2  577.55  653  75.45 1 1

ta * 2  s  
x1  x3  577.55  608.65  31.10 ni n j
x2  x3  653  608.65  44.35 2.467 *94.31 1/ 20  1/ 20  73.57
Bonferroni Method: The Three
Confidence Intervals
1 1
( xi  x j )  ta * 2, n  k  s 
ni n j
d . f .  n  k , s  MSE
x1  x2  577.55  653  75.45 1 1

ta * 2  s  
x1  x3  577.55  608.65  31.10 ni n j
x2  x3  653  608.65  44.35 2.467 *94.31 1/ 20  1/ 20  73.57
1  2 :  75.45  73.57 (149.02, 1.88)

1  3 :  31.10  73.57 (104.67, 42.47)
2  3 :44.35  73.57 (29.22,117.92)
There is a significant difference between 1
and 2.
Bonferroni Method: Conclusions
Resulting from Confidence Intervals
Do we have evidence to distinguish two means?
 Group 1 Convenience: sample mean 577.55
 Group 2 Quality: sample mean 653
 Group 3 Price: sample mean 608.65
1  2 :  75.45  73.57 (149.02, 1.88)
1  3 :  31.10  73.57 (104.67, 42.47)
2  3 :44.35  73.57 (29.22,117.92)
 List the group numbers in increasing order of their sample
means; connecting overhead lines mean no significant
difference 1 3 2

Chap 26 One Way Anova

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chap 26 One Way Anova

Hochgeladen von

Copyright:

Verfügbare Formate

CHAPTER 26

ANalysis Of VAriance (ANOVA)

Comparing 2 or more population means

 To do this, the technique analyzes

1. Within group variability

A small variability within 1

 Emphasize the product’s low price.

Convnce Quality Price

To build the statistic needed to test

X is the “response variable”.

 In the context of this problem…

Two types of variability are

1. Within sample variability

This sum is called the

There are k treatments

The size of sample j The mean of sample j

Note: When the sample means are close to

 Large variability within the samples

In our example, this is the sum of

SSE   (xij  x j )  (n1 - 1)s12 + (n2 -1)s22 +

= (20 -1)10,775 + (20 -1)7,238.11+ (20-1)8,670.24

Is SSG = 57,512.23 large

And finally the hypothesis test:

Test statistic F= MSG/ MSE= 3.23

Since 3.23 > 3.15, there is sufficient

Anova: Single Factor

SS(Total) = SSG + SSE

 When the null hypothesis is rejected,

x1  x2  577.55  653  75.45 1 1

x1  x2  577.55  653  75.45 1 1

1  2 :  75.45  73.57 (149.02, 1.88)

Das könnte Ihnen auch gefallen