Engineering Statistics (ANOVA)

1
2
The one-way analysis of variance is used to test
the claim that three or more population means
are equal
This is an extension of the two independent
samples t-test
One-way ANOVA – An analysis of variance
procedure using one dependent and one
independent variable.
3
The response variable is the variable you’re
comparing
The factor variable is the categorical variable
being used to define the groups
◦ We will assume k samples (groups)
The one-way is because each value is classified in
exactly one way
◦ Examples include comparisons by gender, race, political
party, color, etc.
4
To use the one-way ANOVA test, the following
assumptions must be true
◦ The population under study have normal

distribution
◦ The samples are drawn randomly, and each
sample is independent of the other samples.
◦ All the populations from which the samples
values are obtained, have the same unknown
population variance, that is for k number of
populations,
σ 1
2
= σ 2
2 = K = σ 2
k
5
There is a “family” of F
Distributions.
Each member of the family is
determined by two parameters:
◦ the numerator degrees of freedom
◦ the denominator degrees of freedom.
F cannot be negative, and it is a
continuous distribution.
The F distribution is positively
skewed.
Its values range from 0 to ∞
As F → ∞ the curve approaches
the X-axis.
6
Only one classification factor is
considered
Factor
1 Response/ outcome/
Treatment 2 dependent variable
(samples)
(The level of
the factor)
i
Replicates (1,…,j)
The object to a
given
7
treatment
Mean square
(variance)
H0: µ1 = µ2 = µ3 = ... = µk within
f(X)
— All population means
are equal
— No treatment effect
Ha: Not All µi Are Equal X

µ1 = µ2 = µ3
— At least 2 pop. means
are different Mean square among
— Treatment effect
f(X)
— µ1 ≠ µ2 ≠ ... ≠ µk is
Wrong
X
µ1 = µ 2 µ 3
8
If the null hypothesis is true,
◦ we would expect all the sample means to be close
to one another (and as a result, close to the grand
mean).
If the alternative hypothesis is true,

◦ at least some of the sample means would differ.
9
Variation
◦ Variation is the sum of the squares of the
deviations between a value and the mean of
the value.
As long as the values are not identical,
there will be variation
Denoted as SS for Sum of Squares
10
Are all of the values identical?
◦ No, so there is some variation in the data
◦ This is called the total variation
◦ Denoted SS(Total) for the total Sum of
Squares (variation)
◦ Sum of Squares is another name for variation
11
VARIATION BETWEEN GROUPS
◦ Are all of the sample means identical?
No, so there is some variation between the groups
for each data value look at the difference between its
group mean and the overall mean. This is called the
between group variation
Sometimes called the variation due to the factor
Denoted SS(A) for Sum of Squares (variation)
between the groups
(xi − x ) 2
12
VARIATION WITHIN GROUPS
◦ Are each of the values within each group identical?
No, there is some variation within the groups.
for each data value we look at the difference between that
value and the mean of its group.This is called the within
group variation
Sometimes called the error variation
Denoted SS(E) for Sum of Squares (variation) within
the groups
(
x ij − x i
• for each data value we look )
2 at the difference
between that value and the mean of its group
13
Variance is described as Sum of Squares
Total Variance is partitioned as follows:
SS TOTAL
SSBETWEEN SS WITHIN
14
ONE-WAY ANOVA TABLE
Source SS df MS F
Between
(Factor)
Within
(Error)
Total
15
“F” means “F test statistic”
One-way Analysis of Variance
Source DF SS MS F
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7
“Source” means “find the components of variation in this column”
“DF” means “degrees of freedom”
“SS” means “sums of squares”

“MS” means “mean squared” 16
Source DF SS MS F
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7
“Factor” means “Variability between groups” or “Variability due to

the factor of interest”
“Error” means “Variability within groups” or “unexplained random

variation”
“Total” means “Total variation from the grand mean” 17
Source DF SS MS F
Factor a-1 SS(Between) MSA MSA/MSE
Error n-a SS(Error) MSE
Total n-1 SS(Total)
MSA = SS(Between)/(a-1)
n-1 = (a-1) + (n-a) MSE = SS(Error)/(n-a)
SS(Total) = SS(Between) + SS(Error) 18

SST = ∑ (x ij − x) = ∑∑
2
x ij2 −
(∑ x ) ij
2
obs n
SSE = ∑ (x ij − x i ) 2
obs
SSA = ∑ (x i − x ) = ∑
2 (∑ x i )
−
2
(∑ x ) ij
2
obs ni n
SS MSA
SST = SSA + SSE; MS = ; F=
DF MSE
19
If means are equal,
F = MST / MSE ≈ 1.
Only reject if large F!
Reject H0
Do Not α
Reject H0
0 F
F(α; k – 1, n – k)
Always One-Tail!
© 1984-1994 T/Maker Co.
If MST is close to MSE then both have same source of variation
20
As production manager, you want to see if three filling
machines have different mean filling times. You assign
15 similarly trained and experienced workers, 5 per
machine, to the machines. At the 5% level of
significance, is there a difference in mean filling times?
Mach1 Mach2 Mach3

25.40 23.40 20.00
26.31 21.80 22.20
24.10 23.50 19.75
23.74 22.75 20.60
25.10 21.60 20.40
21
The summary statistics for the three filling
machines of each row are shown in the table
below
Row Mach 1 Mach 2 Mach 3

Sample
5 5 5
size
Total 124.65 113.05 102.95
22
The H0 is that the means are all equal
◦ H0: All machines have equal mean filling times
The alternative hypothesis is that at least one of the

means is different:
◦ H1 : Not All machines have equal mean filling times
23
SSA = ∑ ( x i − x ) = ∑
2
(∑ x i ) 2
−
(∑ x )ij
2
obs ni n
124.652 113.052 102.952  (340.65)2
= ∑ + + −
 5 5 5  15
= 7783.326 − 7736.162
= 47.164
24
(∑ x ) 2
SST = ∑ ( x ij − x ) = ∑∑ x −
2 2 ij
ij
obs n
[
= 25.4 + 26.31 + 24.1 +...+ 20.4 − 7736.162
2 2 2 2
]
= 7794.379 − 7736.162
= 58.2172
25
SST = SSA + SSE
SSE = SST − SSA
= 58.2172 − 47.164
= 11.0532
26
Source SS df MS F
Between
47.1640
(Machines)
11.0532
Within (Error)
58.2172
Total
27
Filling in the degrees of freedom gives this …
Source SS df MS F
Between
47.1640 3-1=2
(Machines)
11.0532 15 - 3 = 12
Within (Error)
58.2172 15 - 1 = 14
Total
28
Completing the MS gives …
Source SS df MS F
Between
47.1640 3-1=2 23.5820
(Machines)
11.0532 15 - 3 = 12 0.9211
Within (Error)
58.2172 15 - 1 = 14
Total
29
Adding F to the table …
Source SS df MS F
Between
47.1640 3-1=2 23.5820 25.60
(Machines)
11.0532 15 - 3 = 12 0.9211
Within (Error)
58.2172 15 - 1 = 14
Total
30
H0: µ1 = µ2 = µ3
Test Statistic:
H1: Not all mean equal
MST 23 .5820
Critical Value(s): F= = = 25.6
MSE .9211
α = .05
ν1 = 2 ν2 = 12
Decision:
Reject at α = .05
α = .05
Conclusion:
There is evidence that three
0 3.89 F filling machines have different
31 mean filling times
One-way ANOVA: time versus Machine
Source DF SS MS F P
Machine 2 47.164 23.582 25.60 0.000
Error 12 11.053 0.921
Total 14 58.217
S = 0.9597 R-Sq = 81.01% R-Sq(adj) = 77.85%
Individual 95% CIs For Mean Based on

Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
1 5 24.930 1.032 (-----*-----)
2 5 22.610 0.882 (-----*-----)
3 5 20.590 0.959 (-----*-----)
-------+---------+---------+---------+--
20.8 22.4 24.0 25.6
Pooled StDev = 0.960
32
33
An experiment was performed to determine whether
the annealing temperature of ductile iron affects its
tensile strength. Five specimens were annealed at each
of four temperatures. The tensile strength (in ksi) was
measured for each temperature. The results are
presented in the following table. Can you conclude that
there are differences among the mean strengths?
Temperature Sample Values

(oC)
750 19.72 20.88 19.63 18.68 17.89
800 16.01 20.04 18.10 20.28 20.53
850 16.66 17.38 14.49 18.21 15.58
900 16.93 14.49 16.15 15.53 13.25
34
Temperature Sample Total
(oC) size (n)
750
800
850
900
35
36
One-way ANOVA: strength versus Temperature
Source DF SS MS F P
Temperature 3 58.65 19.55 8.49 0.001
Error 16 36.84 2.30
Total 19 95.49
S = 1.517 R-Sq = 61.42% R-Sq(adj) = 54.19%
Individual 95% CIs For Mean Based on

Pooled StDev
Level N Mean StDev -+---------+---------+---------+--------
750 5 19.360 1.133 (------*------)
800 5 18.992 1.924 (------*------)
850 5 16.464 1.467 (------*-------)
900 5 15.270 1.439 (------*-------)
-+---------+---------+---------+--------
14.0 16.0 18.0 20.0
Pooled StDev = 1.517
37
38
Confidence interval for each mean, µi
MSE
x ± tα
,n − a ni
2
39
1 1
( X 1 − X 2 ) ± t MSE  n + n 
1 2
where t is obtained from the t table with degrees of

freedom (n - k).
MSE = [SSE/(n - k)]
40
When the null hypothesis is rejected, it may
be desirable to find which mean(s) is (are)
different.
Two statistical inference procedures, geared
at doing this, are presented:
◦ “regular” confidence interval calculations
◦ Tukey test
41
Two means are considered different if the
confidence interval for the difference
between the corresponding sample
means does not contain 0.
In this case the larger sample mean is
believed to be associated with a larger
population mean.
42
Tukey 95% Simultaneous Confidence Intervals
All Pairwise Comparisons among Levels of Machine
Individual confidence level = 97.94%
Machine = 1 subtracted from:
Machine Lower Center Upper ----+---------+---------+---------+-----

2 -3.9381 -2.3200 -0.7019 (------*-----)
3 -5.9581 -4.3400 -2.7219 (------*-----)
----+---------+---------+---------+-----
-5.0 -2.5 0.0 2.5
Machine = 2 subtracted from:
Machine Lower Center Upper ----+---------+---------+---------+-----

3 -3.6381 -2.0200 -0.4019 (------*-----)
----+---------+---------+---------+-----
-5.0 -2.5 0.0 2.5
43
44
Only two classification factor is considered
Factor B
1 2 j
1
Factor A 2
45
The standard two-way ANOVA tests are valid under the
following conditions:
◦ The design must be complete

Observations are taken on every possible treatment
◦ The design must be balanced

The number of replicates is the same for each treatment
◦ The number of replicates per treatment, k must be at least 2
◦ Within any treatment, the observations are a simple random

sample from a normal population
◦ The sample observations are independent of each other (the

samples are not matched or paired in any way)
◦ The population variance is the same for all treatments.

46
Source (Df) Sum of Squares (SS) Mean of Squares (MS) F Value
1 a 2 x...2 SSA MSA Row

A a-1 SSA = ∑ xi.. − MSA = Ftest =
bn i =1 abn a −1 MSE effect
1 b 2 x...2 SSB MSB

B b- 1 SSB = ∑ . j. abn
an j =1
x − MSB = Ftest =
Column
b −1 MSE effect
1 a b 2 x...2 SSAB MSAB

Interaction (a-1)(b-1) SSAB = ∑∑ xij . − MSAB = Ftest = Interaction
n i =1 j =1 abn ( a − 1)( b − 1) MSE effect
SSE = SST − SSA SSE

Error ab(n-1)
MSE =
ab ( n − 1)
) SSE
σ
− SSB N−−SSAB
= MSE =
2
Error
k
a b n
x...2
Total abn-1 SST = ∑∑∑ x − 2
ijk
i =1 j =1 k =1 abn
47
A chemical engineer is studying the effects of various reagents
and catalyst on the yield of a certain process. Yield is expressed
as a percentage of a theoretical maximum. 4 runs of the process
were made for each combination of 3 reagents and 4 catalysts.
Construct an ANOVA table and test is there an interaction
effect between reagents and catalyst.
Reagent
Catalyst
1 2 3
A 86.8 82.4 93.4 85.2 77.9 89.6
86.7 83.5 94.8 83.1 89.9 83.7
B 71.9 72.1 74.5 87.1 87.5 82.7
80.0 77.4 71.9 84.1 78.3 90.1
C 65.5 72.4 66.7 77.1 72.7 77.8
76.6 66.7 76.7 86.1 83.5 78.8
D 63.9 70.4 73.7 81.6 79.8 75.7
77.2 81.2 84.2 84.9 80.5 72.9
48

Engineering Statistics (ANOVA)

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Engineering Statistics (ANOVA)

Hochgeladen von

Copyright:

Verfügbare Formate

1

◦ The population under study have normal

If the alternative hypothesis is true,

Total Variance is partitioned as follows:

“Source” means “find the components of variation in this column”

“DF” means “degrees of freedom”

“SS” means “sums of squares”

“Factor” means “Variability between groups” or “Variability due to

“Error” means “Variability within groups” or “unexplained random

SS(Total) = SS(Between) + SS(Error) 18

Mach1 Mach2 Mach3

Row Mach 1 Mach 2 Mach 3

The alternative hypothesis is that at least one of the

S = 0.9597 R-Sq = 81.01% R-Sq(adj) = 77.85%

Individual 95% CIs For Mean Based on

Pooled StDev = 0.960

Temperature Sample Values

S = 1.517 R-Sq = 61.42% R-Sq(adj) = 54.19%

Individual 95% CIs For Mean Based on

Pooled StDev = 1.517

where t is obtained from the t table with degrees of

Individual confidence level = 97.94%

Machine = 1 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----

Machine = 2 subtracted from:

Machine Lower Center Upper ----+---------+---------+---------+-----

◦ The design must be complete

◦ The design must be balanced

◦ The number of replicates per treatment, k must be at least 2

◦ Within any treatment, the observations are a simple random

◦ The sample observations are independent of each other (the

◦ The population variance is the same for all treatments.

1 a 2 x...2 SSA MSA Row

1 b 2 x...2 SSB MSB

1 a b 2 x...2 SSAB MSAB

SSE = SST − SSA SSE

Das könnte Ihnen auch gefallen