You are on page 1of 48

1

2
 The one-way analysis of variance is used to test
the claim that three or more population means
are equal
 This is an extension of the two independent
samples t-test
 One-way ANOVA – An analysis of variance
procedure using one dependent and one
independent variable.

3
 The response variable is the variable you’re
comparing
 The factor variable is the categorical variable
being used to define the groups
◦ We will assume k samples (groups)
 The one-way is because each value is classified in
exactly one way
◦ Examples include comparisons by gender, race, political
party, color, etc.

4
To use the one-way ANOVA test, the following
assumptions must be true

◦ The population under study have normal


distribution
◦ The samples are drawn randomly, and each
sample is independent of the other samples.
◦ All the populations from which the samples
values are obtained, have the same unknown
population variance, that is for k number of
populations,
σ 1
2
= σ 2
2 = K = σ 2
k

5
 There is a “family” of F
Distributions.
 Each member of the family is
determined by two parameters:
◦ the numerator degrees of freedom
◦ the denominator degrees of freedom.
 F cannot be negative, and it is a
continuous distribution.
 The F distribution is positively
skewed.
 Its values range from 0 to ∞
 As F → ∞ the curve approaches
the X-axis.

6
 Only one classification factor is
considered
Factor
1 Response/ outcome/
Treatment 2 dependent variable
(samples)
(The level of
the factor)
i

Replicates (1,…,j)
The object to a
given
7
treatment
Mean square
(variance)
 H0: µ1 = µ2 = µ3 = ... = µk within
f(X)
— All population means
are equal
— No treatment effect
Ha: Not All µi Are Equal X

µ1 = µ2 = µ3
— At least 2 pop. means
are different Mean square among
— Treatment effect
f(X)
— µ1 ≠ µ2 ≠ ... ≠ µk is
Wrong
X
µ1 = µ 2 µ 3

8
 If the null hypothesis is true,
◦ we would expect all the sample means to be close
to one another (and as a result, close to the grand
mean).

 If the alternative hypothesis is true,


◦ at least some of the sample means would differ.

9
 Variation
◦ Variation is the sum of the squares of the
deviations between a value and the mean of
the value.
 As long as the values are not identical,
there will be variation
 Denoted as SS for Sum of Squares

10
 Are all of the values identical?
◦ No, so there is some variation in the data
◦ This is called the total variation
◦ Denoted SS(Total) for the total Sum of
Squares (variation)
◦ Sum of Squares is another name for variation

11
 VARIATION BETWEEN GROUPS
◦ Are all of the sample means identical?
 No, so there is some variation between the groups
 for each data value look at the difference between its
group mean and the overall mean. This is called the
between group variation
 Sometimes called the variation due to the factor
 Denoted SS(A) for Sum of Squares (variation)
between the groups

(xi − x ) 2

12
VARIATION WITHIN GROUPS
◦ Are each of the values within each group identical?
 No, there is some variation within the groups.
 for each data value we look at the difference between that
value and the mean of its group.This is called the within
group variation
 Sometimes called the error variation
 Denoted SS(E) for Sum of Squares (variation) within
the groups
(
x ij − x i
• for each data value we look )
2 at the difference
between that value and the mean of its group

13
Variance is described as Sum of Squares

Total Variance is partitioned as follows:

SS TOTAL

SSBETWEEN SS WITHIN

14
 ONE-WAY ANOVA TABLE

Source SS df MS F
Between
(Factor)
Within
(Error)

Total

15
“F” means “F test statistic”
One-way Analysis of Variance

Source DF SS MS F
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7

“Source” means “find the components of variation in this column”

“DF” means “degrees of freedom”

“SS” means “sums of squares”


“MS” means “mean squared” 16
One-way Analysis of Variance

Source DF SS MS F
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7

“Factor” means “Variability between groups” or “Variability due to


the factor of interest”

“Error” means “Variability within groups” or “unexplained random


variation”
“Total” means “Total variation from the grand mean” 17
One-way Analysis of Variance

Source DF SS MS F
Factor a-1 SS(Between) MSA MSA/MSE
Error n-a SS(Error) MSE
Total n-1 SS(Total)

MSA = SS(Between)/(a-1)
n-1 = (a-1) + (n-a) MSE = SS(Error)/(n-a)

SS(Total) = SS(Between) + SS(Error) 18


SST = ∑ (x ij − x) = ∑∑
2
x ij2 −
(∑ x ) ij
2

obs n
SSE = ∑ (x ij − x i ) 2
obs

SSA = ∑ (x i − x ) = ∑
2 (∑ x i )

2
(∑ x ) ij
2

obs ni n
SS MSA
SST = SSA + SSE; MS = ; F=
DF MSE
19
If means are equal,
F = MST / MSE ≈ 1.
Only reject if large F!

Reject H0

Do Not α
Reject H0

0 F
F(α; k – 1, n – k)
Always One-Tail!
© 1984-1994 T/Maker Co.
If MST is close to MSE then both have same source of variation
20
As production manager, you want to see if three filling
machines have different mean filling times. You assign
15 similarly trained and experienced workers, 5 per
machine, to the machines. At the 5% level of
significance, is there a difference in mean filling times?

Mach1 Mach2 Mach3


25.40 23.40 20.00
26.31 21.80 22.20
24.10 23.50 19.75
23.74 22.75 20.60
25.10 21.60 20.40

21
The summary statistics for the three filling
machines of each row are shown in the table
below

Row Mach 1 Mach 2 Mach 3


Sample
5 5 5
size
Total 124.65 113.05 102.95

22
 The H0 is that the means are all equal
◦ H0: All machines have equal mean filling times

 The alternative hypothesis is that at least one of the


means is different:
◦ H1 : Not All machines have equal mean filling times

23
SSA = ∑ ( x i − x ) = ∑
2
(∑ x i ) 2


(∑ x )ij
2

obs ni n
124.652 113.052 102.952  (340.65)2
= ∑ + + −
 5 5 5  15
= 7783.326 − 7736.162

= 47.164

24
(∑ x ) 2

SST = ∑ ( x ij − x ) = ∑∑ x −
2 2 ij
ij
obs n

[
= 25.4 + 26.31 + 24.1 +...+ 20.4 − 7736.162
2 2 2 2
]
= 7794.379 − 7736.162
= 58.2172

25
SST = SSA + SSE
SSE = SST − SSA

= 58.2172 − 47.164
= 11.0532

26
Source SS df MS F
Between
47.1640
(Machines)
11.0532
Within (Error)

58.2172
Total

27
 Filling in the degrees of freedom gives this …

Source SS df MS F

Between
47.1640 3-1=2
(Machines)

11.0532 15 - 3 = 12
Within (Error)

58.2172 15 - 1 = 14
Total

28
 Completing the MS gives …

Source SS df MS F

Between
47.1640 3-1=2 23.5820
(Machines)

11.0532 15 - 3 = 12 0.9211
Within (Error)

58.2172 15 - 1 = 14
Total

29
 Adding F to the table …

Source SS df MS F

Between
47.1640 3-1=2 23.5820 25.60
(Machines)

11.0532 15 - 3 = 12 0.9211
Within (Error)

58.2172 15 - 1 = 14
Total

30
H0: µ1 = µ2 = µ3
Test Statistic:
H1: Not all mean equal
MST 23 .5820
Critical Value(s): F= = = 25.6
MSE .9211
 α = .05
 ν1 = 2 ν2 = 12
Decision:
Reject at α = .05
α = .05

Conclusion:
There is evidence that three
0 3.89 F filling machines have different
31 mean filling times
One-way ANOVA: time versus Machine

Source DF SS MS F P
Machine 2 47.164 23.582 25.60 0.000
Error 12 11.053 0.921
Total 14 58.217

S = 0.9597 R-Sq = 81.01% R-Sq(adj) = 77.85%

Individual 95% CIs For Mean Based on


Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
1 5 24.930 1.032 (-----*-----)
2 5 22.610 0.882 (-----*-----)
3 5 20.590 0.959 (-----*-----)
-------+---------+---------+---------+--
20.8 22.4 24.0 25.6

Pooled StDev = 0.960

32
33
 An experiment was performed to determine whether
the annealing temperature of ductile iron affects its
tensile strength. Five specimens were annealed at each
of four temperatures. The tensile strength (in ksi) was
measured for each temperature. The results are
presented in the following table. Can you conclude that
there are differences among the mean strengths?

Temperature Sample Values


(oC)
750 19.72 20.88 19.63 18.68 17.89
800 16.01 20.04 18.10 20.28 20.53
850 16.66 17.38 14.49 18.21 15.58
900 16.93 14.49 16.15 15.53 13.25

34
Temperature Sample Total
(oC) size (n)

750
800
850
900

35
36
One-way ANOVA: strength versus Temperature

Source DF SS MS F P
Temperature 3 58.65 19.55 8.49 0.001
Error 16 36.84 2.30
Total 19 95.49

S = 1.517 R-Sq = 61.42% R-Sq(adj) = 54.19%

Individual 95% CIs For Mean Based on


Pooled StDev
Level N Mean StDev -+---------+---------+---------+--------
750 5 19.360 1.133 (------*------)
800 5 18.992 1.924 (------*------)
850 5 16.464 1.467 (------*-------)
900 5 15.270 1.439 (------*-------)
-+---------+---------+---------+--------
14.0 16.0 18.0 20.0

Pooled StDev = 1.517

37
38
Confidence interval for each mean, µi

MSE
x ± tα
,n − a ni
2

39
1 1
( X 1 − X 2 ) ± t MSE  n + n 
1 2

 where t is obtained from the t table with degrees of


freedom (n - k).
 MSE = [SSE/(n - k)]

40
 When the null hypothesis is rejected, it may
be desirable to find which mean(s) is (are)
different.
 Two statistical inference procedures, geared
at doing this, are presented:
◦ “regular” confidence interval calculations
◦ Tukey test

41
 Two means are considered different if the
confidence interval for the difference
between the corresponding sample
means does not contain 0.
 In this case the larger sample mean is
believed to be associated with a larger
population mean.

42
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons among Levels of Machine

Individual confidence level = 97.94%

Machine = 1 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----


2 -3.9381 -2.3200 -0.7019 (------*-----)
3 -5.9581 -4.3400 -2.7219 (------*-----)
----+---------+---------+---------+-----
-5.0 -2.5 0.0 2.5

Machine = 2 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----


3 -3.6381 -2.0200 -0.4019 (------*-----)
----+---------+---------+---------+-----
-5.0 -2.5 0.0 2.5

43
44
Only two classification factor is considered

Factor B
1 2 j
1
Factor A 2

45
 The standard two-way ANOVA tests are valid under the
following conditions:

◦ The design must be complete


 Observations are taken on every possible treatment

◦ The design must be balanced


 The number of replicates is the same for each treatment

◦ The number of replicates per treatment, k must be at least 2

◦ Within any treatment, the observations are a simple random


sample from a normal population

◦ The sample observations are independent of each other (the


samples are not matched or paired in any way)

◦ The population variance is the same for all treatments.


46
Source (Df) Sum of Squares (SS) Mean of Squares (MS) F Value

1 a 2 x...2 SSA MSA Row


A a-1 SSA = ∑ xi.. − MSA = Ftest =
bn i =1 abn a −1 MSE effect

1 b 2 x...2 SSB MSB


B b- 1 SSB = ∑ . j. abn
an j =1
x − MSB = Ftest =
Column
b −1 MSE effect

1 a b 2 x...2 SSAB MSAB


Interaction (a-1)(b-1) SSAB = ∑∑ xij . − MSAB = Ftest = Interaction
n i =1 j =1 abn ( a − 1)( b − 1) MSE effect

SSE = SST − SSA SSE


Error ab(n-1)
MSE =
ab ( n − 1)
) SSE
σ
− SSB N−−SSAB
= MSE =
2
Error
k

a b n
x...2
Total abn-1 SST = ∑∑∑ x − 2
ijk
i =1 j =1 k =1 abn

47
 A chemical engineer is studying the effects of various reagents
and catalyst on the yield of a certain process. Yield is expressed
as a percentage of a theoretical maximum. 4 runs of the process
were made for each combination of 3 reagents and 4 catalysts.
Construct an ANOVA table and test is there an interaction
effect between reagents and catalyst.
Reagent
Catalyst
1 2 3
A 86.8 82.4 93.4 85.2 77.9 89.6
86.7 83.5 94.8 83.1 89.9 83.7
B 71.9 72.1 74.5 87.1 87.5 82.7
80.0 77.4 71.9 84.1 78.3 90.1
C 65.5 72.4 66.7 77.1 72.7 77.8
76.6 66.7 76.7 86.1 83.5 78.8
D 63.9 70.4 73.7 81.6 79.8 75.7
77.2 81.2 84.2 84.9 80.5 72.9
48