Beruflich Dokumente
Kultur Dokumente
Analysis of Variance
Ankitha DK
10SJCCMIB006
What is ANOVA?
Group i has
• ni = # of individuals in group i
• xij = value for individual j in group i
• x=i mean for group i
• si = standard deviation for group i
How ANOVA works (outline)
ANOVA measures two sources of variation in the data and
compares their relative sizes
xi x 2
x ij xi
2
The ANOVA F-statistic is a ratio of the Between Group
Variaton divided by the Within Group Variation:
Between MSG
F
Within MSE
A large F is evidence against H0, since it indicates that there is
more difference between groups than within groups.
One Way Analysis of Variance
Example
An apple juice manufacturer is planning to develop a new
product -a liquid concentrate.
The marketing manager has to decide how to market the new
product.
Three strategies are considered
Emphasize convenience of using the product.
Emphasize the quality of the product.
Emphasize the product’s low price.
One Way Analysis of Variance
Example continued
An experiment was conducted as follows:
In three cities an advertisement campaign was launched .
In each city only one of the three characteristics (convenience,
quality, and price) was emphasized.
The weekly sales were recorded for twenty weeks following
the beginning of the campaigns.
One Way Analysis of Variance
Convnce Quality Price
529 804 672
658 630 531
793 774 443
514 717 596
663 679 602
719 604 502
711 620 659
606 697 689
461 706 675
529 615 512
498 492 691
663 719 733
604 787 698
Weekl 495 699 776
485 572 561
y 557 523 572
sales 353 584 469
557 634 581
542 580 679
614 624 532
One Way Analysis of Variance
Solution
The data are interval( ie weekly! )
The problem objective is to compare sales in three cities.
We hypothesize that the three population means are equal
Defining the Hypotheses
• Solution
H0: m1 = m2= m3
H1: At least two means differ
25
x 3 20
x 3 20
20 20
19
x 2 15
16 x 2 15
15
14
x1 10 12
11 x1 10
10 10
9 9
A small variability within The1 sample means are the same as before,
the samples makes it easier but the larger within-sample variability
TreatmentTreatment
1 Treatment
2 3 Treatment 1 Treatment 2Treatment 3
to draw a conclusion about the makes it harder to draw a conclusion
population means. about the population means.
The rationale behind the test statistic – I
k
SST n j ( x j x) 2
j 1
Solution – continued
Calculate SST
MST
F
MSE
28,756.12
8,894.45
Required Conditions: 3.23
1. The populations tested
are normally distributed.
2. The variances of all the with the following degrees of freedom:
populations tested are v1=k -1 and v2=n-k
equal.
The F test rejection region
H0: m1 = m2 = …=mk
H1: At least two means differ
Test statistic:
MST
F
R.R: F>Fa,k-1,n-k
MSE
The F test
MST
F
MSE
28,756.12
Ho: m1 = m2= m3
8,894.17
H1: At least two means differ
3.23
Test statistic F= MST/ MSE= 3.23
0.1
0.08
p Value = P(F>3.23) = .0467
0.06
0.04
0.02
0
-0.02 0 1 2 3 4
Excel single factor ANOVA
Xm15-01.xls
SUMMARY
Groups Count Sum Average Variance
Convenience 20 11551 577.55 10775.00
Quality 20 13060 653.00 7238.11
Price 20 12173 608.65 8670.24
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 57512 2 28756 3.23 0.0468 3.16
Within Groups 506984 57 8894
Total 564496 59
= MSG / MSE
s 2
x
ij x 2
SST
MST
n 1 DFT
So SST = (n -1) s2, and MST = s2. That is, SST and MST measure the TOTAL
variation in the data set.
Connections between SSE, MSE, and standard
deviation
Remember:
si
2
x ij xi
2
SS[ Within Group i ]
ni 1 df i
So SS[Within Group i] = (si2) (dfi )
This means that we can compute SSE from the standard deviations and sizes (df)
of each group:
( n 1) s 2
( n 1) s 2
... ( n 1) s 2
s 2p 1 1 2 2 I I
nI
( df ) s 2
( df ) s 2
... ( df ) s 2
s 2p 1 1 2 2 I I
df1 df 2 ... df I
so MSE is the pooled
SSE estimate of variance
s
2
p MSE
DFE
In Summary
SST ( xij x) s ( DFT )
2 2
obs
SSE ( xij xi ) s
2 2
i (df i )
obs groups
SSG ( xi x)
2
n ( x x)
i i
2
obs groups
SS MSG
SSE SSG SST ; MS ; F
DF MSE
R2 Statistic
R2 gives the percent of variance due to between
group variation
SS [ Between] SSG
R 2
SS [Total ] SST
A B
These give 98.01%
B -3.685
0.435
CI’s for each pairwise
difference.
P -4.863 -3.238
-0.859 0.766 Only P vs A is significant
(both values have same sign)
98% CI for A-P is (-0.86,-4.86)
Tukey’s Method in R
Tukey multiple comparisons of means
95% family-wise confidence level