Sie sind auf Seite 1von 54

Siti Nor Jannah bt Ahmad

Siti Shahida bt Kamel


Zamriyah bt Abu Samah
 A statistical method for making simultaneous
comparisons between two or more means.

 ANOVA is a general technique that can be used to


test the hypothesis that the means among two or
more groups are equal, under the assumption that the
sampled populations are normally distributed.

 Analysis of variance can be used to test differences


among several means for significance without
increasing the Type I error rate.
• To begin, let us consider the effect of temperature on a passive
component such as a resistor.

• We select three different temperatures and observe their effect on the


resistors.

• This experiment can be conducted by measuring all the participating


resistors before placing n resistors each in three different ovens.

• Each oven is heated to a selected temperature. Then we measure the


resistors again after, say, 24 hours and analyze the responses, which
are the differences between before and after being subjected to the
temperatures.

• The temperature is called a factor.

• The different temperature settings are called levels. In this example


there are three levels or settings of the factor Temperature.
Different types of ANOVA
A factor is an independent treatment variable whose settings
What is
(values) are controlled and varied by the experimenter.
a
factor?
The intensity setting of a factor is the level. Levels may be
quantitative numbers or, in many cases, simply "present" or
"not present" ("0" or "1").

The 1-way In the experiment, there is only one factor, temperature, and
ANOVA the analysis of variance that we will be using to analyze the
effect of temperature is called a one-way or one-factor
ANOVA.

The 2-way or We could have opted to also study the effect of positions in
3-way ANOVA the oven. In this case there would be two factors,
temperature and oven position. Here we speak of a two-way
or two-factor ANOVA.

Furthermore, we may be interested in a third factor, the


effect of time. Now we deal with a three-way or three-factor
ANOVA.
4
 You may use ANOVA whenever you have 2
or more independent groups
 You must use ANOVA whenever you have 3

or more independent groups.


One-way ANOVA
 1 factor-e.g. smoking status

(never,former,current)

Two-way ANOVA
 2 factors-e.g. gender and smoking status

Three-way ANOVA
 3 factors-e.g. gender, smoking and beer

consumption
• F(2,27) = 8.80, p < .05
◦ F = test statistic
◦ 2,27
 2 =df between groups
27 = df within groups
◦ 8.80 = obtained value of F
◦ p < .05 = probability less than 5% that null
hypothesis is true
 Reject the null hypothesis
 Some of the group means differ significantly from
each other.
 Example
◦ An apple juice manufacturer is planning to develop
a new product -a liquid concentrate.
◦ The marketing manager has to decide how to
market the new product.
◦ Three strategies are considered
Emphasize convenience of using the product.
Emphasize the quality of the product.
Emphasize the product’s low price.
 Example continued
◦ An experiment was conducted as follows:
In three cities an advertisement campaign was
launched .
In each city only one of the three characteristics
(convenience, quality, and price) was
emphasized.
The weekly sales were recorded for twenty weeks
following the beginning of the campaigns.
Convnce Quality Price
529 804 672

Week 658
793
514
630
774
717
531
443
596
ly 663
719
679
604
602
502
711 620 659
sales 606
461
697
706
689
675
529 615 512
498 492 691
663 719 733
Wee
604
495
787
699
698
776
kly
485
557
572
523
561
572
353 584 469
sales
557
542
634
580
581
679
614 624 532

Weekl
y sales
 In the context of this problem…
Response variable – weekly sales
Responses – actual sale values
Experimental unit – weeks in the three cities
when we record sales figures.
Factor – the criterion by which we classify the
populations (the treatments). In this problems
the factor is the marketing strategy.
Factor levels – the population (treatment)
names. In this problem factor levels are the
marketing strategies.
 Solution
◦ The data are interval
◦ The problem objective is to compare sales in
three cities.
◦ We hypothesize that the three population
means are equal
• Solution
H0: µ 1 = µ 2= µ 3

H1: At least two means differ

To build the statistic needed to test the


hypotheses use the following notation:
 If the null hypothesis is true, we would
expect all the sample means to be close to
one another (and as a result, close to the
grand mean).
 If the alternative hypothesis is true, at least
some of the sample means would differ.
 Thus, we measure variability between
sample means.
• The variability between the sample
means is measured as the sum of
squared distances between each
mean and the grand mean.

This sum is called the


Sum of Squares for Treatments
In our example treatments are
represented by the different SST
advertising strategies.
k
SST= ∑ nj (xj − x) 2

j=1

There are k treatments

The size of sample The


j mean of sample j

Note: When the sample means are close to


one another, their distance from the grand
mean is small, leading to a small SST. Thus,
large SST indicates large variation between
sample means, which supports H1.
 Solution – continued
Calculate SST

x1 = 577.55x2 = 653
.00 x3 = 608
.65
k
SST = ∑nj (xj −x)2
j=1

=
he grand mean is calculated by 20(577.55 - 613.07)2 +
n1x1 + n2x2 + ...+ nkxk + 20(653.00 - 613.07)2 +
X= + 20(608.65 - 613.07) 2
=
n1 + n2 + ...+ nk
= 57,512.23
 Large variability within the samples
weakens the “ability” of the sample
means to represent their corresponding
population means.
 Therefore, even though sample means

may markedly differ from one another,


SST must be judged relative to the
“within samples variability”.
 The variability within samples is
measured by adding all the squared
distances between observations and their
sample means.

This sum is called the


Sum of Squares for Error
SSE
our example this is the
m of all squared differences
tween sales in city j and the
mple mean of city j (over all
e three cities).
 Solution – continued
Calculate SSE

s12 = 10,775.00 s 22 = 7,238,11 s32 = 8,670.24


k nj
SSE = ∑∑ (xij − x j ) 2
j =1 i2=1+ (n -1)s 2
= (n1 - 1)s12 + (n2
-1)s 2 3 3

= (20 -1)10,774.44 + (20 -1)7,238.61+


(20-1)8,670.24
= 506,983.50
To perform the test we need
to calculate the mean
squares as follows:

SST SSE
MST = M S E=
k −1 n− k
57 ,512 .23 5 0 9,9 8 3.5 0
= =
3 −1 6 0− 3
= 28 ,756 .12 = 8,8 9 4.4 5
MST
F=
MSE
28 ,756 .12
=
8,894 .45
Required Conditions: = 3.23
1. The populations tested
are normally distributed.
with the following degrees of freedom:
2. The variances of all the
populations tested arev1=k -1 and v2=n-k
equal.
And finally the hypothesis test:

H0: µ 1 = µ 2 = …=µ k
H1: At least two means differ

Test statistic:
MST
R.R: F>Fα ,k-1,n-k F=
MSE
MST
F=
MSE
Ho: µ 1 = µ 2= µ 3 28,756.12
=
H1: At least two means differ 8,894.17
=3.23
Test statistic F= MST/ MSE= 3.23

R.R.: F > Fα ,k−1,n− k = F0.05,3−1,60− 3 ≈ 3.15


Since 3.23 > 3.15, there is sufficient
evidence
to reject Ho in favor of H1, and argue
that at least one
of the mean sales is different than the
Anov a: Single Factor

SUM MARY
G roups Count Sum Average Variance
Conv enience 20 11551 577.55 10775.00
Q uality 20 13060 653.00 7238.11
Price 20 12173 608.65 8670.24

ANO VA
Source of Variation SS df MS F P-value F crit
Between G roups 57512 2 28756 3.23 0.0468 3.16
W ithin G roups 506984 57 8894

Total 564496 59

SS(Total) = SST + SSE


 Fixed effects
◦ If all possible levels of a factor are included in our
analysis we have a fixed effect ANOVA.
◦ The conclusion of a fixed effect ANOVA applies only to
the levels studied.
 Random effects
◦ If the levels included in our analysis represent a
random sample of all the possible levels, we have a
random-effect ANOVA.
◦ The conclusion of the random-effect ANOVA applies to
all the levels (not only those studied).
 In some ANOVA models the test statistic of the
fixed effects case may differ from the test
statistic of the random effect case.
 Fixed and random effects - examples

◦ Fixed effects - The advertisement Example .All the


levels of the marketing strategies were included
◦ Random effects - To determine if there is a difference
in the production rate of 50 machines, four machines
are randomly selected and there production recorded.
 Example
◦ Suppose in the Example, two factors are to be
examined:
 The effects of the marketing strategy on sales.
 Emphasis on convenience
Emphasis on quality
Emphasis on price
The effects of the selected media on sales.
Advertise on TV
Advertise in newspapers
 Solution
◦ We may attempt to analyze combinations of levels,
one from each factor using one-way ANOVA.
◦ The treatments will be:
Treatment 1: Emphasize convenience and advertise in TV
Treatment 2: Emphasize convenience and advertise in
newspapers
…………………………………………………………………….
Treatment 6: Emphasize price and advertise in
newspapers
 Solution
◦ The hypotheses tested are:
H0: µ 1= µ 2= µ 3= µ 4= µ 5= µ 6
H1: At least two means differ.
• Solutio
– In each one of six cities sales are recorded
nfor ten
weeks.
– In each city a different combination of
marketing
City1emphasis
Convnce
City2
Convnce
and
City3media
Quality
City4usage
Quality
is employed.
City5 City6
Price Price
TV Paper TV Paper TV Paper
 Solution
City1 City2 City3 City4 City5 City6
Convnce Convnce Quality Quality Price
Price
TV Paper TV Paper TV
Paper

• The p-value =.0452.


• We conclude that there is evidence that differences
exist in the mean weekly sales among the six cities.
 These result raises some questions:
◦ Are the differences in sales caused by the
different marketing strategies?
◦ Are the differences in sales caused by the
different media used for advertising?
◦ Are there combinations of marketing strategy
and media that interact to affect the weekly
sales?
 The current experimental design cannot
provide answers to these questions.
 A new experimental design is needed.
Factor A: Marketing strategy
Convenience Quality Price
Advertising media

TV City 1 City3 City 5


sales sales sales
Factor B:

City 2 City 4 City 6


Newspapers sales sales sales

Are there differences in the mean sales


caused by different marketing strategies?
Test whether mean sales of “Convenience”,
“Quality”,
and “Price” significantly differ from one
another.

Calculations are
H0: µ Conv. =µ Quality =µ Price
based on the sum of
square for factor A
SS(A)

H1: At least two means differ


Factor A: Marketing strategy

Convenience Quality Price


Advertising media

City 1 City 3 City 5


TV sales sales sales
Factor B:

City 2 City 4 City 6


Newspapers sales sales sales

Are there differences in the mean sales


caused by different advertising media?
st whether mean sales of the “TV”, and “Newspaper
nificantly differ from one another.

H0: µ TV = µ Newspapers Calculations are based on


the sum of square for factor B
H1: The means differ SS(B)
Factor A: Marketing strategy
Advertising media

Convenience Quality Price


Factor B:

TV City 1 City 3 City 5


sales sales sales

City 2 City 4 City 6


Newspapers sales sales sales

Are there differences in the mean sales


caused by interaction between marketing
strategy and advertising medium?
Test whether mean sales of
certain cells are different than
the level expected.
Calculation are based on the sum of
square for interaction SS(AB)
a


SS(A) = rb
i=1
(x[A]i − x)2 (10(2){(xconv. − x) 2 + ( xquality − x) 2 + ( x price − x) 2 }


SS(B) = ra
j=1
(x[B]j − x)2 (10 )(3){( xTV − x) 2 + ( x Newspaper − x ) 2 }

a b
SS(AB) = r∑ ∑ (x[AB]ij − x[A]i − x[B]j + x)2

i=1 j=1
a b r
SSE = ∑∑∑
i =1 j =1 k =1
( xijk − x[ AB ]ij ) 2
 Test for the difference between the levels of the main
factors A and B

SS(A)/(a-1) SS(B)/(b-1)
MS(A) MS(B)
F= F=
MSE MSE SSE/(n-ab)
Rejection region: F > Fα ,a-1,n-ab F
> Fα , b-1, n-ab
• Test for interaction between factors A and B

MS(AB) SS(AB)/(a-1)(b-1)
F=
MSE
Rejection region: F > Fα,( a-
1)(b-1),n-ab
1. The response distributions is normal
2. The treatment variances are equal.
3. The samples are independent.
Convenience Quality Price
TV 491 677 575
TV 712 627 614
TV 558 590 706
TV 447 632 484
TV 479 683 478
TV 624 760 650
TV 546 690 583
TV 444 548 536
TV 582 579 579
TV 672 644 795
Newspaper 464 689 803
Newspaper 559 650 584
Newspaper 759 704 525
Newspaper 557 652 498
Newspaper 528 576 812
Newspaper 670 836 565
Newspaper 534 628 708
Newspaper 657 798 546
Newspaper 557 497 616
Newspaper 474 841 587
 Example – continued
◦ Test of the difference in mean sales between the
three marketing strategies
H0: µ conv. =µ quality =µ price

H1: At least two mean sales are different

ANOVA
Source of Variation SS df MS F P-value F crit
Sample 13172.0 1 13172.0 1.42 0.2387 4.02
Columns 98838.6 2 49419.3 5.33 0.0077 3.17
Interaction 1609.6 2 804.8 0.09 0.9171 3.17
Within 501136.7 54 9280.3

Total 614757.0 59

Factor A Marketing strategies


 Example – continued
◦ Test of the difference in mean sales between
the three marketing strategies
H0: µ conv. =µ quality =µ price

H1: At least two mean sales are different


MS(A)/MSE
F = MS(Marketing strategy)/MSE = 5.33

Fcritical = Fα ,a-1,n-ab = F.05,3-1,60-(3)(2) = 3.17; (p-value = .0077)

◦ At 5% significance level there is evidence to


infer that differences in weekly sales exist
among the marketing strategies.
 Example - continued
◦ Test of the difference in mean sales between
the two advertising media
H0: µ TV. = µ Nespaper
H1: The two mean sales differ

ANOVA
Source of Variation SS df MS F P-value F crit
Sample 13172.0 1 13172.0 1.42 0.2387 4.02
Columns 98838.6 2 49419.3 5.33 0.0077 3.17
Interaction 1609.6 2 804.8 0.09 0.9171 3.17
Within 501136.7 54 9280.3

Total 614757.0 59

Factor B = Advertising media


 Example - continued
◦ Test of the difference in mean sales between
the two advertising media
H0: µ TV. =µ Nespaper

H1: The two mean sales differ


MS(B)/MSE
F = MS(Media)/MSE = 1.42
Fcritical = Fα, a-1,n-ab = F.05,2-1,60-(3)(2) = 4.02 (p-value = .2387)

◦ At 5% significance level there is insufficient


evidence to infer that differences in weekly
sales exist between the two advertising media.
 Example - continued
◦ Test for interaction between factors A and B
H 0: µ TV*conv. =µ TV*quality =…=µ newsp.*price

H1: At least two means differ

ANOVA
Source of Variation SS df MS F P-value F crit
Sample 13172.0 1 13172.0 1.42 0.2387 4.02
Columns 98838.6 2 49419.3 5.33 0.0077 3.17
Interaction 1609.6 2 804.8 0.09 0.9171 3.17
W ithin 501136.7 54 9280.3

Total 614757.0 59

Interaction AB = Marketing*Media
 Example - continued
◦ Test for interaction between factor A and B
H 0: µ TV*conv. =µ TV*quality =…=µ newsp.*price

H1: At least two means differ MS(AB)/MSE

F = MS(Marketing*Media)/MSE = .09

Fcritical = Fα ,(a-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = 3.17 (p-value= .9171)

◦ At 5% significance level there is insufficient


evidence to infer that the two factors
interact to affect the mean weekly sales.
• To compare 2 or more means in a single test we use ANOVA

• The type of ANOVA test to use is decided by the number of


FACTORS in the experiment

• The ANOVA will only tell whether there is a significant


difference and gives no information on which mean(s) are
different

• Further pairwise comparisons of the means are required to gain


further information on which mean(s) are different

• Pairwise testing of means can increase the probability of type 1


errors

If we have to go do pair wise t-tests after the ANOVA anyway,


why not just do them and forget the ANOVA? – Well of course
that is their choice BUT the ANOVA may return a result of no
sig diff. In one test, saving a lot of time and effort AND
pairwise testing increases the probability of false results
Thank You