Sie sind auf Seite 1von 27

Ch.

12 The Analysis of Variance

• Example.

◦ A quality characteristic of electric motors is motor


vibration.

◦ Does the mean amount of vibration depend on the


type of bearing?

◦ An experiment:

◦◦ 5 brands of bearings

◦◦ 6 motors tested per brand

◦◦ ⇒ 5 × 6 vibration measurements recorded


◦ This is an example of a completely randomized single
factor experiment.

◦ The single factor is bearing brand.

◦ There are 5 factor levels. Each brand is a level. Levels


are sometimes called treatments.

◦ Each treatment has 6 observations, or replicates.


motor vibration (microns)

12 13 14 15 16 17

V1
V2
V3

bearing brand
V4
V5
◦ A side-by-side boxplot indicates that there is variability

1. within each treatment

2. between different treatment


◦ The Analysis of Variance (ANOVA) is used to
determine whether the amount of variation between
different treatments is larger than the amount of
variation within each treatment.

◦ Alternative Viewpoint: ANOVA analyzes a


signal-to-noise ratio.
• An ideal World: No noise

◦ Without noise or error, the signal can be discerned


without the use of statistics

◦ e.g. 1; In a perfectly controlled environment, 4


measurements at each of 3 different factor levels were
recorded:

Level A Level B Level C


13 10 13
13 10 13
13 10 13
13 10 13

◦ Without noise, variation between groups can be clearly


seen (where it exists).
• The real World: A noisy place

◦ With noise or error, the signal cannot be discerned


without the use of statistics

◦ e.g. 2; In a realistic environment, 4 measurements at


each of 3 different factor levels were recorded:

Level A Level B Level C


14 12 15
12 10 14
15 11 12
13 8 11

◦ With noise, variation between groups (where it exists) is


obscured (partially drowned out) by the noise.
◦ noise ⇐⇒ variability within each level

◦ σ 2 = variance within each level; estimate with MSE


• The Completely Randomized Design
(Balanced Case)

◦ I independent random samples of measurements are


taken:

X11, X12, . . . , X1J

X21, X22, . . . , X2J

···

XI1, XI2, . . . , XIJ

◦ There are I factor levels or treatments.

◦ J replicates per treatment. (balanced design)


◦ The jth measurement from the i treatment can be
modelled as
Xij = µi + εij
where

1. µi is the expected value of all measurements in the


ith treatment group

2. εij is the amount by which the jth replicated


measurement differs from its expected value. This is
called a random error, and we assume

E[εij ] = 0
and
V (εij ) = σ 2
◦ Since the measurements are independent of each
other, the random errors ε are independent.

◦ For e.g. 1, σ 2 = 0. For e.g. 2, σ 2 > 0 (estimate this


using MSE)
• Summarizing and Estimating Within Treatment Variability

◦ An estimate of σ 2 is based on the error sum of squares


I X
J
(Xij − X̄i·)2
X
SSE =
i=1 j=1
where X̄i· denotes the ith treatment sample average,
for i = 1, 2, . . . , I.

◦ Dividing by the degrees of freedom remaining after


estimating I means, an unbiased estimate of σ 2 is the
mean-squared-error
SSE
M SE =
I(J − 1)

◦ Note that the MSE is the average of the I sample


variances (this only works in the balanced case).
• Summarizing and Estimating Between Treatment Variability

◦ The expected value of Xij is

E[Xij ] = E[µi] + E[εij ] = µi

◦ Differences between the I different treatment groups


are reflected in the variability in µ1, µ2, . . . , µI .

◦ Estimates of these expected values can be obtained by


computing the I sample averages: X̄1., X̄2., . . . , X̄I..
◦ The variability in these averages is related to the sum
of squares of the differences between the sample
averages and the grand average:
I
(X̄i. − X̄..)2
X
SST r = J
i=1

I
1 X
X̄.. = X̄i.
I i=1

◦ We can show that (if all expected values are equal),


then
I
SST r/σ 2 = J (X̄i. − X̄..)2/σ 2
X

i=1
has a χ2 distribution on I − 1 degrees of freedom.
◦ Therefore, if all expected values are equal, then

E[M ST r] = E[SST R/(I − 1)] = σ 2

◦ If the expected values differ, then

E[M ST r] = σ 2 + extra variation

◦ Comparing MSTr with an unbiased estimate of σ 2 will


tell us whether there is extra variation among the
expected values.
• Comparing Between and Within Variability

◦ If all treatment expected values are equal, then both


M ST r and M SE are unbiased estimates of σ 2.

◦ Then the F ratio


M ST R
f =
M SE
will tend to be near 1.

◦ If the treatment group expected values vary, then f


will tend to be larger than 1, since

◦◦ M SE still estimates σ 2

◦◦ M ST R tends to be larger than σ 2

◦ To decide whether F is significantly larger than 1, we


consult the F table.
i.e.

H0 : µ1 = µ2 = · · · = µI

Ha : at least one mean differs


Test Statistic:
M ST R
f =
M SE
p-value:
P (F > f )

◦ numerator degrees of freedom: I − 1

◦ denominator degrees of freedom I(J − 1)


• Motor-vibration Example

◦ The 5 treatment averages and variances are

1. X̄1. = 13.68, S12 = 1.43

2. X̄2. = 15.95, S22 = 1.36

3. X̄3. = 13.67, S32 = 0.67

4. X̄4. = 14.73, S42 = 0.88

5. X̄5. = 13.08, S52 = 0.23


I = 5, and J = 6.

M SE = 0.913; SST r = 30.85


so
M ST r = 7.71
M ST r 7.71
f = = = 8.45
M SE .913
degrees of freedom: I − 1 = 4 and I(J − 1) = 25
From the F-table with 4, 25 degrees of freedom:

.100 2.18
.050 2.76
.010 4.18
.001 6.49

P (F > f ) = P (F > 8.45) < .001

◦ We conclude that there really are differences in


performance among the different motor bearing brand
means.
• Another Viewpoint; Analyzing or Breaking Down Variation

◦ Example.

◦◦ Metal plate-connected trusses used for roof support.

◦◦ Plate Lengths (in inches): 4, 6, 8, 10, 12

◦◦ Response Measurements: Axial Stiffness Index (ASI,


KIPS/in)

◦◦ 7 independent measurements per plate length ⇒


J = 7 replicates

◦ This is an example of a balanced CRD with I = 5


factor levels (plate lengths).

◦ Does variation in plate length have any effect on true


mean axial stiffness?
Scatterplot of Plate−connected Trusses Data

450
axial stiffness index

400
350

4 6 8 10 12

plate length
◦ We will analyze the variation in the ASI
measurements:

variation in ASI =

variation due to possible

differences in plate length

variation due to noise (error)

i.e.
SST = SST r + SSE
◦ Model:
Xi,j = µi + εi,j
where

◦◦ µi is the expected ASI for the ith plate length group


(treatment group) and

◦◦ εi,j is the random disturbance associated with the


jth measurement in the i treatment group.

◦ Estimates of the expected values for each of the


I = 5 treatment groups are
Pl. Length 4 6 8 10 12
x̄i· 333 368 375 407 437

◦ From this, and the boxplot (or scatterplot), there


appears to be a difference among the expected values.
◦ Is this difference real, or is it due to noise?

◦ ANOVA calculations: Test for a difference among


the means.

◦◦ SST = total sum of squares = 75621.27


(Recall: this is the summary of all variability in the
data set.)
PI
◦◦ SSTr = J i=1(x̄i· − x̄··)2 = 43932
(this is the sum of squares attributable to variation
between treatment groups)

M ST r = SST r/(I − 1) =

43932/4 = 10983

SSE = SST − SST r = 31689.27


SSE 31689.27
M SE = = = 1056.309
I(J − 1) 30
10983
f = = 10.4
1056.309
p-value (from the F table):
P (F > f ) = P (F > 10.4) < .001

◦ We conclude that there is strong evidence of a


difference among the expected values of the ASI
measurements at the 5% level.

• A Summary: the ANOVA table


Variation Source d.f. SS MS f
Treatments I-1 SSTr MSTr MSTr/MSE
Error I(J-1) SSE MSE
Total IJ-1 SST
Exercise. Suppose 8 observations were taken on 3
different levels of a factor giving an MSE of 30 and an
SST of 850. Is there evidence that the three factor level
means differ?

Das könnte Ihnen auch gefallen