Sie sind auf Seite 1von 60

The one-way analysis of variance (one-way ANOVA) is sometimes referred to as the one way

layout, and in the experimental design context as a single factor experiment in a completely
random design. It may also be thought of as the extension of the two sample t-test for normal
means to more than two samples. No matter what it is called, it is the underlying model and
assumptions that characterize the one-way ANOVA model.
The assumptions apply to the data setup: For the random variable of interest, the response variable
Y, a data set in the form of observations on Y are available as k independent random samples of
size
, k ,..., 2 , 1 j for , n
j

from each of k normal populations. The k population variances are equal.
More notation: X ~ N(,
2
) means that X is a normally distributed random variable with mean
and variance
2
. We will denote a collection of n independent and normally distributed random
variables, , X ,..., X , X
n 2 1
as
i
X ~ NID( ) ,
2
i
i
.
Y
ij
will denote the ith sample value (observation, response) from the jth random sample, both as
the random variable Y
ij
and the data point Y
ij
. Here i = 1,2,...,n
j
and j =1,2,...,k. We can represent
the data as
Sample
1 2 . . . k

12 11
Y Y
k 1
Y

2k 22 21
Y Y Y
. . .
. . .
. . .
1 n
1
Y
k n 2 n
k 2
Y . . . Y
k random samples from
) , N( ... ) , N( ) , ( N
2
k
2
2
2
1

.
There are

k
1 j
j
N n
independent random variables;
1
n are NID(
2
1
, ),
2
n are NID(
k
2
2
n ),..., ,
are NID( ), ,
2
k
and there are k + 1 unknown parameters. Similarly there are N
data points. Note the sample sizes need not be equal. We will use the notation n
j
= n if the sample
sizes are equal. That is, in the equal sample size case N = kn. The design is said to be balanced in
the equal sample size case. Otherwise it is unbalanced.
In practice, one-way ANOVA data arise according to two data collection structures. 1) From a
sample survey, in which case the populations studied are referred to as k categories of a
classification variable. 2) From a designed experiment, where the populations studied are
referred to as k levels of a factor.
An example of a sample survey is a comparative study of mortgage interest rates in four regions of
the country, (southeast, west, northeast, midwest), where a random sample of counties is selected
in each region and the current median interest rate recorded. Here region is the classification
variable and k = 4 categories, the four regions in the study. Note that it is not uncommon to
interchange the survey and experimental design terminology. However, our examples will be for
designed experiments.
Handsheet Example
A pulp mill's performance is based on pulp brightness as measured by a reflectance meter. An
operator prepares a pulp handsheet from a container of unbleached pulp and then reads the
reflectance using a brightness tester. An experiment was conducted to compare 4 operators, A, B,
C, D as to making the handsheets and reading their brightness. The plan was to have each operator
prepare 5 handsheets and measure brightness. Initially, 20 containers each holding enough pulp to
make a handsheet were randomly assigned to the 4 operators, 5 to an operator, and then the 5
handsheets for each operator were randomly ordered for reading. Here operator is the factor and
there are k = 4 levels. The reflectance data are

Operator
A B C D
59.8 59.8 60.7 61.0
60.0 60.2 60.5 60.8
60.8 60.4 60.7 60.6
60.8 59.9 60.9 60.5
59.8 60.0 60.3
(Note one of D's handsheets was accidentally destroyed).
A statistical model is a representation of the response, treated as a random variable, in terms of
other random variables and of parameters. The one-way ANOVA model, with fixed effects (see
note 2 below) is
ij j ij
Y + +
,
where

k
1 j
j j
0 n
, and where
ij

~ NID(0,
2
), j = 1, 2, ... ,k; i = 1 ,2, ... ,n
j
.
Note 1: is called the grand mean, or overall mean, and is thought of as the mean of the
population means, or of the level means. It is an unknown parameter. If the model were

ij
Y
, it
would be inadequate and unrealistic.
Note 2: The jth population mean, (or jth factor level mean) which we would denote as
j
, does not
appear explicitly in the model. It does appear as j j
+
, and

j j appears explicitly in
the model as an unknown parameter. Each
j
measures the difference between the mean of the jth
population and the overall population mean. It is called the effect of category j in survey data and
of level j in experimental design data. The
j
are k parameters, called effects, and since they
represent deviations from a grand mean, and since there are k specified, or fixed populations, or
fixed levels, in the one-way ANOVA they are called fixed effects. In other words the model and
analysis apply only to the k specified populations or levels. In the balanced case the
j
are required
to sum to zero and in the unbalanced case the j j
n
are required to sum to zero.
Note 2a: We shall study the one-way ANOVA, with random effects after we complete the fixed
effects analysis.
Note 3: If the model were j ij
Y +
it would still be inadequate and unrealistic if there were any
differences within a level. To account for within level differences, the model adds ij

, the random
variable that measures the difference j ij
Y
. It is called the error, or experimental error, or
random error component, or term of the model. Therefore, the j

's measure the between sample, or


between level, variation and the errors measure the within sample, or within level, variation. The
errors account for all the variation in the data that the model does not explicitly represent.
Note 4: Y is random through , the only random component of the model. This means that
. ) Var(Y and ) Y ( E
2
ij j ij
+ The model has k+2 parameters: .
2
,
k
,...,
2
,
1
,
Point estimators of the parameters (note the dot notation implying summation)
Parameter Estimator


k
1 j
j
k
1 j
n
1 i
ij ..
n N where N / Y Y
j


j

, Y Y
j .. j . j

j
n
1 i
j ij j . j
n / Y Y where


2

2
k
1 j
2
j
j
2
s ) k N /( s ) 1 n (

,
where ) 1 n /( ) Y Y ( s
j
2
j .
n
1 i
ij
2
j
j

= the sample variance for level j


Note 1:
0 n
j
k
1 j
j

Note 2: The point estimator of a difference between effects, ' j j




' j . j . ' j j ' j j
Y Y is
Note 3: Some other notation is T
..
for the grand total, T
.j
for the jth total. .
Note 4: The estimator of
2
is a weighted average of the level sample variances where the weights
are the level degrees of freedom.
Note 5: For the reflectance data (k=4, N=19)

. 105 . 0 , 665 . 0 , Y
, 04917 . 0 s , 0520 . 0 s , 0580 . 0 s , 2680 . 0 s
, 11063 . 0 , 33026 . 0 , 22526 . 0
, 33474 . 0 , 15474 . 0 , 39474 . 60
4 3 4 2 j . j j
2
4
2
3
2
2
2
1
2
D 4 C 3
B 2 A 1
+



It is important to make some observations about the estimators. In fact these observations will
apply for virtually all estimators in ANOVA models.
1) The estimators are intuitive.
2) The estimators are least squares estimators (LSE). This means that the estimators of the k+1
explicit model parameters,
and
j , are derived by minimizing the sum of squares of deviations of
the response variable, ij
Y
, from its expected value j
+
. That is, the least squares estimators
j
and
are obtained by minimizing the sum of squares,

k
1 j
n
1 i
2
j ij
j
) Y ( with respect to
and
j . The estimator of
2

is obtained by first finding the expected value of the minimized


sum of squares, and then adjusting it to find an unbiased estimator of
2

. (Recall

is an
unbiased estimator of if E( )

= no matter what the true value of ).


Least Squares Estimation of and
j
Let SS =

k
1 j
n
1 i
2
j ij
j
) Y (
Then


k
1 j
n
1 i
j ij
j
) Y ( 2
SS
and
k 1,2,..., j ; ) Y ( 2
SS
j
n
1 i
j ij
j

.
Since
) N Y N ( 2
SS
, 0 n
.. j
k
1 j
j

,
and
) n n Y n ( 2
SS
j j j j . j
j

.
The least squares estimators of and
j
are denoted

and j

, and are found by setting the
derivatives equal to zero, replacing and
j
by their estimators and solving for the estimators. That
is,

.. ..
Y 0 ) N Y N ( 2

2 (
, 0 ) n n Y n
j j j j . j

j=1,2,...,k

.. j . j
Y Y
Least Squares Estimator of
2

Denote by SSE the value of SS when the parameters are replaced by their estimators. That is,
2
j .
k
1 j
n
1 i
k
1 j
n
1 i
ij
2
j ij
) Y Y ( ) Y ( SSE
j j


k
1 j
2
j
j
s ) 1 n (
, and in terms of the random errors,
SSE

j j
n
1 i
j ij .j
k
1 j
n
1 i
2
j . ij
n / where , ) ( . Note that SSE is the minimized value of SS.
We are looking for E(SSE). Recall the expected value of a sum is the sum of expected values.
Therefore, E(SSE) =

k
1 j
n
1 i
2
j . ij
j
) ( E
,
and we can write
2
j . j . ij
2
ij
2
j . ij
2 ) ( + . Straightforwardly,

j
2 2
j . j
2
j . ij j ' i ij
2 2
ij
n / ) ( E , n / ) ( E , 0 ) ( E , ) ( E .
Therefore, E(SSE) =
2 2 2
k k 2 N +
,
= (N-k)
2
and therefore the least squares estimator of
2
is

). k N /( SSE
2

Note that least squares estimation does not require normality, only the response's expected value.
3) The minimized sum of squares plays an important role in ANOVA. It is called the error sum of
squares, the residual sum of squares or the within sample sum of squares. We'll denote it as SSE.
That is

k
1 j
n
1 i
k
1 j
n
1 i
2
j . ij
2
j ij
j j
) Y Y ( ) u Y ( SSE . Note also that
). k N /( SSE s
2 2

4) The estimators of j
and
are maximum likelihood estimators (MLE). The maximum
likelihood estimator of
2
is (N-k) s
2
/N.
Recall maximum likelihood estimation requires a probability distribution, and are such that the
likelihood function, is maximized with respect to the unknown parameters.
In the one-way ANOVA the likelihood function is
2
j ij
2
j
) Y (
2
1
k
1 j
n
1 i
e
2
1




, and
maximization of the likelihood function with respect to the k+2 parameters gives j
and
.
Maximum Likelihood Estimation of ,
j
,
2

Let L denote the likelihood function.
L =
2
j ij
2
j
) Y (
2
1
k
1 j
n
1 i
e
2
1




, and therefore,
ln L =



k
1 j
n
1 i
2
j ij
2
2
j
) Y (
2
1
ln
2
N
2 ln N
, which means

) Y ( 2
2
1 L ln
k
1 j
n
1 i
j ij
2
j



2
j
2
1 L ln

j
n
1 i
j ij
k 1,2,..., j , ) Y ( 2

2
j
k
1 j
n
1 i
ij
4 2 2
) Y (
1
2
1 1
2
N L ln
j


Setting the k+2 derivatives equal to zero, and solving gives

.. ML
Y

.. j . j j
Y Y
ML


2
k
1 j
n
1 i
2
j . ij
2
ML

N
) k N (
N / SSE
N
) Y Y (





The key to the ANOVA is the breakdown, or decomposition of the total sum of squares. This
decomposition will apply to all our ANOVA models. It is known as the fundamental identity of the
analysis of variance.
The fundamental identity applied to the one-way ANOVA model begins by writing the deviation
of each observed response from the mean response,
.. ij
Y Y
as
) Y Y ( ) Y Y ( Y Y
.. j . j . ij .. ij
+
,
and then squaring,

2
.. ij
) Y Y (
2
.. j .
2
j . ij
) Y Y ( ) Y Y ( + +
) Y Y )( Y Y ( 2
.. j . j . ij


and then summing over all the data, gives
+

k
1 j
n
1 i
k
1 j
n
1 i
2
.. j .
k
1 j
j
2
j . ij
2
.. ij
j j
) Y Y ( n ) Y Y ( ) Y Y ( .
More Terminology and Notation
The left hand side is the total sum of squares and it will be denoted SSTO. It measures the total
variation, relative to the mean, in the data
The total sum of squares has been decomposed into two components:


k
1 j
n
1 i
2
j . ij
j
) Y Y ( = within samples sum of squares
= within levels sum of squares
= within groups sum of squares
= residual sum of squares = error sum of squares = SSE
=

k
1 j
2
j
j
s ) 1 n (

k
1 j
2
.. j . j
) Y Y ( n
= between samples sum of squares
= between levels sum of squares
= between groups sum of squares
= factor sum of squares
= treatment sum of squares = SST
Note 1: In the balanced case SST =
2
..
k
1 j
j .
) Y Y ( n

.
Note 2: SSE =

k
1 j
2
j
j
s ) 1 n (
, a weighted sum of the level variances where the weights are the
individual degrees of freedom, and in the balanced case SSE = (n-1)

k
1 j
2
j
. s
Note 3: A computing formula for SST that uses totals is

N / T n / T (
2
..
k
1 j
j
2
j .

,
_

Note 4: The fundamental identity is SSTO = SSE + SST. Informally, given SSTO, the bigger is
SST the smaller is SSE. The bigger is SST the more different the effects and the more different the
means. The smaller is SSE the better the model.
Note 5: On degrees of freedom: 1) The total degrees of freedom in a data set of N observations is
N-1, where one degree of freedom in the data set is associated with the grand mean. Therefore
SSTO has N-1 degrees of freedom. Similarly, each level in the one-way ANOVA has n
j
-1 degrees
of freedom (each level "carries" n
j
-1 degrees of freedom). This in turn implies that SSE has N-k =

k
1 j
j
) 1 n (
degrees of freedom. The k levels have k-1 degrees of freedom, which implies SST has
k-1 degrees of freedom. The degrees of freedom add corresponding to the fundamental identity's
breakdown. That is, corresponding to SSTO = SST + SSE, is N-1 = (k-1) + (N-k). 2) When a sum
of squares is divided by its degrees of freedom, the result is called a mean square (MS). The two
mean squares in the one-way ANOVA are the treatment mean square,
MST = SST/(k-1), and the error mean square, MSE = SSE/(N-k) =
2

.
Note 5a: When the model, +
j
is ignored, the estimated variance is SSTO/(N-1) = , s
2
y
say
and it is important to distinguish between s
2
=
2

,and ) 1 N /( SSTO s
2
y
.
Note 6: The first four columns of the ANOVA table summarize these calculations
Source of Sum of Degrees of Mean
Variation Squares Freedom Square
(SV) (SS) (DF) (MS)
Treatment SST k-1 MST
Error SSE N-k MSE = s
2
=
2

Total SSTO N-1
Calculation by hand fro the reflectance data gives the ANOVA table
SV SS DF MS
Operator 1.3700 3 0.457
Error 1.6595 15 0.11063
Total 3.0295 18
Note the estimated error variance is
11063 . 0
2

compared to
16831 . 0 18 / 0295 . 3 s
2
Y

had
the operator effects been ignored. To determine whether the operator effects are significantly
different a hypothesis test is required.
The canonical one-way ANOVA null hypothesis is

0 ... : H ... : H
k 2 1 0 k 2 1 0


: H
1
at least one inequality in
0
H
Note that the k variances
2
k
2
2
2
1
,..., ,
are assumed equal under both hypotheses, and normality is
also common to both hypotheses. Note also that not rejecting H
0
means the model is
. Y
ij ij
+

Two digressions on statistical theory
The first digression is on likelihood ratio tests. The likelihood ratio method of hypothesis test
construction applies for any null hypothesis and any alternative hypothesis. Denote by L() the
likelihood function restricted to the null hypothesis, and denote by L() the likelihood function
restricted to the alternative hypothesis. Let ) ( L denote the likelihood function evaluated at ,
where denotes the maximum likelihood estimators of unknown parameters under H
0
. Similarly,
let )

( L denote the likelihood function evaluated at

, where

denotes the maximum likelihood


estimators of unknown parameters under H
1
. The likelihood ratio test is to reject H
0
in favor of H
1
if

the , ) c P( where , c
)

( L
) ( L


significance level of the test.
The ratio is called the likelihood ratio test statistic, and the test procedure is to reject H
0
in favor
of H
1
if the ratio is small. The ratio will be small when its denominator is large relative to its
numerator, indicating that the data are more consistent with H
1
than with H
0
. Usually c can not be
found explicitly, but c may be manipulated to obtain an equivalent condition.
For the ANOVA null and alternative hypotheses, the two likelihood functions for
0
H
and H
1
are





k
1 j
j
n
1 i
2
0
2
ij
/ ) X X (
2
1
2 / N 2
0
2 / N
e
) ( ) 2 (
1
) ( L
, and





k
1 j
j
n
1 i
2
1
2
j ij
/ ) X X (
2
1
2 / N 2
1
2 / N
e
) ( ) 2 (
1
)

( L
Therefore, the likelihood ratio test statistic is
.
SSTO
SSE
N / SSTO
N / SSE

e
e
) (
) (
2 / N 2 / N
2 / N
2
0
2
1
2 / N
2 / N
2 / N 2
0
2 / N 2
1

,
_


,
_

,
_

The likelihood ratio test is to reject H


0
in favor of H
1
if
, ) c P( where , c
)

( L
) ( L


the
significance level of the test. This means the likelihood ratio test is to reject H
0
in favor of H
1
if
2 / N
SSTO
SSE

,
_

c, or if
,
_

SSTO
SSE
c
1
. That is, the smaller is the within sample sum of squares
relative to the total sum of squares, the more we are led to the alternative hypothesis that the means
are different.
Equivalently,
. say , c
) 1 k (
) k N (
c
) k N /( SSE
) 1 k /( SST
c
SSE
SST
c
SSE
SST
1
1
SST SSE
SSE
SSTO
SSE
3 2 2 1

,
_

Therefore, the likelihood ratio test is to reject H


0
in favor of H
1
if

) k N /( SSE
) 1 k /( SST

a constant, C,or equivalently if


MSE
MST
a constant, C.
The second digression is on the F distribution. (Recall the F distribution notes).
The two digressions are connected by what follows. The fundamental identity applied to the one-
way ANOVA model partitions SSTO into SST + SSE. A theorem in sampling distribution theory
says that under the conditions of the null hypothesis, SST/
2
~
2
with k -1 degrees of freedom and
SSE/
2
~
2
with N - k degrees of freedom and SST and SSE are independent random variables.
Therefore, the ratio MST/MSE has an F distribution with k-1 numerator degrees of freedom and N-
k denominator degrees of freedom. In fact, it is not uncommon to refer to MST/MSE as the one-
way ANOVA F statistic. Therefore if is the significance level for the canonical one-way
ANOVA hypothesis test, then H
0
is rejected if MST/MSE > F(1-; k-1, N-k). That is C = F(1-; k-
1, N-k). The p-value for the canonical one-way ANOVA hypothesis test, frequently referred to as
"the F test", is P(F(
) k N , 1 k
>MST/MSE).
Note that if H
0
is not true, MST/MSE does not have an F distribution. Its distribution is known as
the noncentral F distribution, a three parameter pdf, where the three parameters are

k
1 j
2 2
j
j
/ n , n , m
. The noncentral F pdf is given by, X ~ F(
) , n , m

,
_

,
_


+
+
+
+

0 r r
2
n m
1 r
2
m
r
2
m
r
2 /
) x
n
m
1 (
x
2 / n , r
2
m
B
n
m
! r
) 2 / (
e ) x ( f
; x >0.
Based on the theory we add a column to the ANOVA table for the F statistic
Source of Sum of Degrees of Mean
Variation Squares Freedom Square
(SV) (SS) (DF) (MS) F
Treatment SST k-1 MST MST/MSE
Error SSE N-k MSE = s
2
=
2

Total SSTO N-1
For the reflectance data, the F = .457/.11063 = 4.13, and the ANOVA table is
SV SS DF MS F
Operator 1.3700 3 0.457 4.13
Error 1.6595 15 0.11063
Total 3.0295 18
With significance level =0.05, the significance level based decision rule is to reject H
0

if F > F(.95;3,15) = 3.29. Therefore, since F = 4.13, H
0
is rejected, and the conclusion is that the
operators are significantly different in preparing the handsheets. They do have a significant effect
on the reflectance.
Some non-theory motivation for the F test
1. From basic statistics,
2 2
j
) s ( E , and therefore

) ( E
2

k
1 j
2 2
j j
2
)) k N /( SSE ( E )) k N /( ) s ) 1 n ( (( E ) s ( E
That is,
2

is an unbiased estimator of
2
whether H
0
is true or not.
2. Consider MST.
)) 1 k /( SST ( E ) MST ( E
, and we can write

2
k
1 j
.. j . j j
2
k
1 j
.. j . j
)) ( ( n ) Y Y ( n SST +

,
where

k
1 j
n
1 i
ij ..
n
1 i
j ij j .
j j
N / , n / . By squaring and taking expected values,
+

k
1 j
2
j j
2
n ) 1 k ( ) SST ( E
,and therefore
) 1 k (
n
) MST ( E
k
1 j
2
j
j
2


+

.
3. From 1) s
2
is an unbiased estimator of
2
whether the null hypothesis is true or not. From 2)
MST is an unbiased estimator of
2
only if the null hypothesis is true, in which case we "expect"
the ratio MST/ MSE to be approximately 1. Thus, large values of the ratio lead to rejecting the null
hypothesis. (Small values, less than 1, mean the residual variation is greater than the variation
among the samples).
4. We add an expected mean square column to the ANOVA table
Source of Sum of Degrees of Mean Expected
Variation Squares Freedom Square Mean Square
(SV) (SS) (DF) (MS) F EMS
Treatment SST k-1 MST MST/MSE
) 1 k /( ) n (
2
j
k
1 j
j
2
+

Error SSE N-k MSE = s


2
=
2


2

Total SSTO N-1



For the reflectance data, the six columns of the ANOVA table are therefore,
SV SS DF MS F EMS
Operator 1.3700 3 0.457 4.13
3
4 ) ( 5
2
4
2
3
2
2
2
1 2
+ + +
+
Error 1.6595 15 0.11063
2


Total 3.0295 18
If the F test does not reject H
0
then we can behave as if the N observations are a single random
sample from a common normal distribution.
Recall the notation for the pth quantile of Student's t distribution with degrees of freedom is
t(p;).That is, t(p;) denotes the value of Student's t distribution with degrees of freedom with
area p to its left, and area 1-p to its right.
For example, if p =0.975 and if = 15 then t(.975;15) = 2.131 (from most of the books), t(.975;15)
= 2.131449536 (from EXCEL, TINV), t(.975;15) = 2.13145 (from SAS, TINV). Note that
t(.975;15) = t((1-.05/2);15).
There are several confidence intervals that can be applied to the one-way ANOVA after the F test
finds significance, that is, after the F test rejects H
0
. Also recall the error degrees of freedom are N-
k.
1. 100 (1-) % CI on :
N
MSE
) k N ); 2 / 1 (( t Y
..
t
2. 100 (1-) % CI on
j
:
j
j .
n
MSE
) k N ); 2 / 1 (( t Y t
3. 100 (1-) % CI on
j
:

,
_

t
N
1
n
1
MSE ) k N ); 2 / 1 (( t
j
j

4. 100 (1-) % comparisonwise CI (also called "one-at-a-time" CI or individual CI) on ' j j

:

) k N ); 2 / 1 (( t ) (
' j j
t ) ( SE
' j j

,
where the standard error of
) (
' j j

is
( SE
' j j
' j j
n
1
n
1
MSE ) +
Since the comparisonwise, one-at-a-time confidence intervals are not independent, they can not be
interpreted beyond the individual confidence interval. This leads to simultaneous confidence
intervals.
5. 100 (1-) % experimentwise CI (also called simultaneous CI or Bonferroni CI) on
' j j

are
the same as in 4) except
) k N ); 2 / 1 (( t
is replaced by
) k N ); m 2 / 1 (( t
for m
simultaneous confidence intervals.
6. 100(1-)% CI on
) (and
2

: Recall
) ; p (
2

is the value of a chi-square random variable with
degrees of freedom that has area p to its left. The confidence interval is

,
_

) k N ; 2 / (
SSE
,
) k N ); 2 / 1 ((
SSE
2 2
The confidence interval for takes the square roots of the interval values for
.
2

Reflectance Experiment
Let's use == 0.05, which implies 95% CIs, and let t(0.975;15) = 2.13.
1. : ) 07631 . 0 ( 13 . 2 39474 . 60
19
1106333 . 0
) 15 ; 975 . 0 ( t 39474 . 60 t t
the 95% on is (60.2322, 60.5573)
2.
j
:
5
11063333 . 0
13 . 2 Y
j .
t for Operators A, B, C and

4
11063333 . 0
13 . 2 Y
4 .
t for Operator D. This leads to
Operator 95% CI on
j

A (59.9232, 60.5568)
B (59.7432, 60.3768)
C (60.3032, 60.9368)
D (60.3708, 61.0792)
3.
j
:
,
_

t
19
1
5
1
11063333 . 0 13 . 2
j
for Operators A, B, C and

,
_

t
19
1
4
1
11063333 . 0 13 . 2
4
for Operator D. This leads to
Operator 95% CI on
j

A (-0.4267, 0.1172)
B (-0.6067, -0.0628)
C (-0.0467, 0.4972)
D (0.0155, 0.6450)
4. Comparisonwise CIs on
' j j

:
13 . 2 ) (
' j j
t ) ( SE
' j j

,
where
( SE
' j j
' j j
n
1
n
1
11063333 . 0 ) +
5. Experimentwise CIs on
' j j

: same as comparisonwise except 2.13 is replaced by, for m = 6
comparisons,
) 15 ; 99583333 . 0 ( t
= 3.03628, which, of course will make the confidence intervals
wider.
(3.03628 comes from SAS -- data one; t=tinv(.99583333,15); proc print; run; or from EXCEL
where probability is (05/12)2).
The LSD test compares all (or as many as desired) pairs of effects. It is related to the
comparisonwise confidence intervals for ' j j

. For a given pair, j' j
and
, and a given
significance level , define
' j j
n
1
n
1
MSE ) k N ); 2 / 1 (( t LSD +
. If
, LSD | |
' j j
>

conclude j' j
and
are significantly different.
For the reflectance data, with = 0.05, for comparing
3, 2 3 1 2, 1
and , and and
,
LSD = 0.448, which implies
2
and
3
are significantly different (operators B and C). For
comparing
1
and
4,

2
and
4

3,
and
4,
LSD = 0.475, which implies
1
and
4,
and
2
and
4
are
significantly different (operators A and D, and operators B and D).

-----x-------x-----0--------x---x---------- scale

2

1

3

4
_______ ____
____________
The Bonferroni test compares m pairs of effects simultaneously. It is related to the experimentwise
confidence intervals for
' j j

. For a given pair, j' j
and
, and a given significance level , and a
given m, define
' j j
n
1
n
1
MSE ) k N ); m 2 / 1 (( t BLSD +
. If
, BLSD | |
' j j
>
conclude
j' j
and
are significantly different.
For the reflectance data, with = 0.05 and m = 6, for comparing
1
and
2
,
1
and
3
,
2
and
3
,
BLSD =0.639, and for comparing
1
and
4,

2
and
4,

3
and
4,
BLSD = 0.677, implying no
significant pair-wise differences. As we've seen the Bonferroni t value is 3.03628.
Tukey's test is an experimentwise comparison of pairs of effects that uses the studentized range
(usually denoted by q in tables in various experimental design books, Tables 13 and 14 in
Mendenhall and Sincich -- and tabled in SAS). For a given pair, j' j
and
, define
' j j
n
1
n
1
MSE ) 2 / ) k N , k , 1 ( q ( TSD +
. If
, TSD | |
' j j
>
conclude j' j
and
are
significantly different.
For the reflectance data, with = 0.05, q(.95,4,15) = 4.08 from Mendenhall and Sincich, p.921,
Table 13. TSD = 0.607, for comparing
1
and
2
,
1
and
3
,
2
and
3
, implying no differences;
and TSD = 0.644, for comparing
1
and
4,

2
and
4,

3
and
4,
implying
2
and
4
are significantly
different (operators B and D).
-----x-------x-----0--------x---x---------- scale

2

1

3

4
_________________
_______________
Scheff's test is an experimentwise test of pairs of effects that uses the F distribution. For a given
and a given pair, j' j
and
, define
' j j
n
1
n
1
MSE ) k N , 1 k ); 1 (( F ) 1 k ( SSD +
. If
, SSD | |
' j j
>
conclude j' j
and
are significantly different.
For the reflectance data, with = 0.05, F (.95; 3,15) = 3.29, and SSD = 0.6609 for comparing
1

and
2
,
1
and
3
,
2
and
3
, implying no differences; SSD = 0.701 for comparing
1
and
4,

2
and

4,

3
and
4,
implying no differences. There are no significant differences among pairs of effects by
Scheff's test.
One Way Random Effects ANOVA
An example: Suppose a chemical engineering company wishes to study a new experimental
method of determining the percentage of methyl alcohol in a specimen of a compound taken from
a single batch. A very large number of laboratories make such determinations. A random sample of
k=6 laboratories is selected by the company to participate in the study. Each laboratory prepares 4
batches of the compound, randomly selects a specimen from the batch and measures the percentage
of methyl alcohol. The data are (in % methyl alcohol)
Laboratory
1 2 3 4 5 6
85.06 84.99 84.48 84.10 84.63 85.10
85.25 84.28 84.72 84.55 84.37 85.04
84.87 84.88 85.10 84.05 84.89 84.87
84.98 85.01 85.07 84.65 84.93 84.75
At first glance a balanced one-way fixed effects ANOVA model would seem appropriate. That is,
ij j ij
Y + +
. However, the company is not interested in comparing these 6 laboratories. It is
not interested in estimating or testing for laboratory differences. Its primary interest is in the mean
percentage of methyl alcohol. There is a laboratory effect,
j

for the jth laboratory. For the four


specimens from the jth laboratory,
j
is added to the mean percentage of methyl alcohol over all of
the large number of laboratories,

. However, the condition that the 6 laboratories are a random


sample from a large population of laboratories means that
j

is an observation of a random variable


called "laboratory effect". The
j

are assumed to be NID(0, )


2

. That is, the


j

, j=1,2,...,6 are
assumed to be a random sample of 6 from a normal distribution with mean zero and variance
2

.
The model will now be called a random effects model. Specimen differences within a given
laboratory are still modeled by
ij

~NID(0,
2

). The six random variables,


6 2 1
,..., ,
are
assumed independent of the 24 random variables,
ij
. The two variances,
2 2
and

are called
variance components, and the model is sometimes called a one-way variance components model.
(See sections 5.1, 5.2, 5.3 on pages 147-151 of Lawson or google variance components)
The balanced one-way random effects model is, in general


, Y
ij j ij
+ +

k ,..., 2 , 1 j ; n ,..., 2 , 1 i

where
j

~ NID(0, )
2

,
ij

~ NID(0,
)
2

,

j

and
ij

are k + N = k(n+1) independent random variables, where N = kn.


This model has 3 parameters: and the two variance components,
2 2
and

. It is not uncommon
to think of
2

as measuring variation between levels (between laboratories in the example) and


2

as measuring variation within levels (within laboratories in the example).


Recall, in the fixed effects model,
2
ij ij j ij
) ( Var ) Y ( Var , ) Y ( E + , and the
ij

, and
hence the ij
Y
, are independent random variables.
For the random effects model,
) Y ( E
ij , Var(Y
ij
) =
2 2
+

. Consider the following


covariances:
Cov(
0 ) , ( Cov ) Y , Y
' ij ' j ij j ' ij ij
+ + + +
Cov(
0 ) , ( Cov ) Y , Y
' j ' i ' j ij j ' j ' i ij
+ + + +

Cov(
2
j ' i j ij j j ' i ij
) , ( Cov ) Y , Y

+ + + + .
Note that this last covariance, Cov(
2
j ' i ij
) Y , Y

means that samples values for a given level are
correlated. The correlation is Corr(
) Y , Y
j ' i ij ) (
2 2 2
+

.
A note on terminology:
2 2
+

is called the total variance and = ) (


2 2 2
+

is called the
intraclass correlation. Viewed as a percentage, (100 )% gives the percentage of total variation
accounted for by variation in the effect random variable.
In the methyl alcohol / laboratory example, specimens within a laboratory are correlated, and the
percentage of total variation in measuring methyl alcohol attributed to the laboratories is (100 )%.
Point estimation of is straightforward,
..
Y .
For the methyl alcohol / laboratory example,

..
Y = 84.775833 %.
Point estimation of the variance components is not as straightforward.
The traditional variance components estimator is known as an ANOVA estimator. In general,
ANOVA estimators of variance components are based on equating mean squares in the fixed
effects ANOVA table to their expected mean squares in the fixed effects ANOVA table. (These
estimators are called method of moments estimators in section 5.4.1 of Lawson).
Therefore, in the one-way random effects model, ANOVA estimators of
2 2
and

start with the


ANOVA table as if the effects were fixed. The only differences will be in the expected mean
square column and the irrelevancy of the F statistic. Recall that in the one-way balanced fixed
effects model E(MST) =
. )) 1 k /( n ((
2
k
1 j
2
j
+

2 2
) MSE ( E ) s ( E
, as in the fixed effects case, and therefore the ANOVA estimator of
2

is
2

= MSE = s
2
, just as in the fixed effects one way ANOVA.
Based on the derivation of E(MST),
2 2
n ) MST ( E +

. Therefore, the ANOVA estimator of
2

is found by equating MST to its expected value with


2

replaced by its ANOVA estimator,


2

= s
2
= MSE and solving for
2

. This gives the ANOVA estimator of


2

,
2 2
s


n / ) s MST (
2

= (MST-MSE)/n.
From the estimators of
2 2
and

we can also find the following ANOVA estimators.


+

) Y , Y ( r r Co , s ) Y , Y ( v o C , s s ) Y ( r a V
j ' i ij
2
j ' i ij
2 2
ij
r = . ) s s ( s
2 2 2
+

It gives the same results for the example.
Note 5: 100(1-)% confidence intervals on
2 2
, ,

,
2 2
/

.
1) is on CI )% 1 ( 100 N / MST ) Y ( r a V
N
n
) Y Var( , Y :
..
2 2
.. ..

+



N
MST
) 1 k ); 2 / 1 (( t Y
..
t
2) , MSE s :
2 2 2


2
2
k)s - (N
: Theory

~ is on CI )% 1 ( 100 ) k N (
2 2

,
_

,
_

) k N ; (
SSE
, 0
) k N ; (
s ) k N (
, 0
2 2
2
3) , n / ) s MST ( :
2 2 2


is on CI )% - 100(1 least at an ) 1 k ( ~
n
1)MST - (k
: Theory
2 2
2 2

,
_

) 1 k ; ( n
SST
, 0
2
4) is / on CI )% - 100(1
2 2


,
_

1
k) - N 1, - k /2; F( MSE
MST
, 1
k) - N 1, - k /2; - F(1 MSE
MST
Methyl alcohol / laboratory example
95% CI on is (84.52 %, 85.03 %)
95% CI on
2
is (0,0.1341)
An at least 95% CI on is
2

(0,0.26)
, 6035 . 0 /
2 2

95% CI is (0.0094, 20.72)


A model check is a normal probability plot of the k means. The theory is that they are
NID( ) n / ) n ( ,
2 2

+ . (see mthylalc1.sas for the Methyl alcohol/laboratory example


Nested Anova
We've now studied the one-way ANOVA model with fixed effects and the one-way ANOVA
model with random effects. In the reflectance data example the operator effects were fixed, and in
the methyl alcohol example the laboratory effects were random. A model that has both random and
fixed effects is known as a mixed model. An important, yet simple mixed model is a nested
ANOVA model. (See Section 5.6 of Lawson). Nested ANOVA models apply to nested, or
hierarchical experimental designs, and to sample surveys where there is subsampling.
Bolt diameter example
An industrial plant manufactures bolts, with a target bolt diameter of 0.5 in., in three shifts a day.
The plant manager wants to compare the three shifts. Shift is the fixed effect. The plant manager
decides to conduct the study over a six month period, and randomly samples six days on which to
measure each shift's production. Shift 1: day 86, day 8, day 42, day 61, day 39, day 56; Shift 2; day
75, day 33, day 93, day 117, day 142, day 65; Shift 3; day 43, day 65, day 1, day 52, day 14, day
68. Day is the random factor. From each shift's production a random sample of 4 bolts is selected
on the given day. Days are said to be nested within shifts, and, of course samples are nested within
days. The data are
Shift 1 2
Day 1 2 3 4 5 6 1 2 3 4 5 6
.4974 .5021 .4974 .4967 .4992 .4990 .4993 .4998 .4975 .5056 .4987 .4982
.4994 .5006 .5007 .5069 .5047 .4984 .5028 .5012 .4961 .4993 .4977 .4963
.5017 .4976 .4959 .5008 .4997 .4991 .4977 .5008 .4967 .5000 .5008 .4982
.4972 .5010 .4987 .4975 .5014 .5015 .5000 .5013 .5000 .5013 .4991 .4988
Shift 3
Day 1 2 3 4 5 6
.4958 .5008 .4996 .4988 .4975 .4987
.4978 .5000 .4990 .4962 .4987 .4968
.5003 .4963 .5001 .4961 .5001 .4952
.4978 .4979 .4991 .5008 .4988 .4981
(see bolts.dat)
The ANOVA model for a balanced design with a levels of a single fixed factor, with b levels of
one random factor nested in the fixed factor, and n samples nested in the random factor is
Y
ijk
= response for level i of the fixed factor and level j of the random nested factor and
the k th sample
=
n ,..., 2 , 1 k ; b ,..., 2 , 1 j ; a ,..., 2 , 1 i ;
) ij ( k ) i ( j i
+ + +
where is the grand mean,

i
the effect of level i of the fixed factor, where 0
a
1 i
i

,


) i ( j
the random effect of level j of the factor nested in the fixed factor, and where

) i ( j

~ NID(0,
2
b
),


) ij ( k
the random error of the kth sample, and where
) ij ( k

~ NID(0,
2

),
and where the
s ' and s '
are independent of each other.
In the bolt diameter example, Y
253
= .5008, the observed value of the third sample from shift 2 on
day 142 (the 5th day for shift 2), and before the data were collected, the model for Y
253
is Y
253
=
,
) 25 ( 3 ) 2 ( 5 2
+ + +
where
. ) Y ( E
2 253
+
) 2 ( 5

is a normally distributed random


variable with mean zero and variance
2
b
, and ) 25 ( 3

is a normally distributed random variable with


mean zero and variance
2

, and ) 25 ( 3 ) 2 ( 5
and
are two independent random variables. Also a =
3, b = 6, n = 4.
Note that in the model the fixed factor could also be random, in which case the model is not mixed,
but is a variance components model with three variance components. Note also that, as in the one-
way ANOVA model with fixed effects,
i i
+ is the mean (expected value) for level i of the
fixed factor. Note also there are abn responses (data points), bn at each level of the fixed factor.
The model has a + 1 fixed effect parameters, and
a 2 1
,..., ,
. Their point estimators are the
same as in the fixed effects model. This means point estimation of
' i i i i
, , , is as in the
fixed effects one-way ANOVA model. That is,

'.. i .. i ' i i ' i i ... .. i i .. i i ...
Y Y

, Y Y , Y , Y .
The fundamental identity applied to the mixed nested model
... . ij . ij .. i .. i ijk ... ijk
Y Y Y Y Y Y Y Y + + ) Y Y ( ) Y Y ( ) Y Y (
. ij ijk .. i . ij ... .. i
+ +
.
Squaring and summing gives the left hand side, SSTO. That is,
left hand side = SSTO =


a
1 i
b
1 j
n
1 k
(
2
... ijk
) Y Y measures the total variation in the experiment.
Squaring and summing on the right hand side leaves three terms, since each of the three cross
product terms sums to zero.
The first right hand side term is

2
...
a
1 i
.. i
a
1 i
b
1 j
n
1 k
2
... .. i
) Y Y ( bn ) Y Y (

= SSA, the fixed factor sum of squares, where
SSA measures the variation among the fixed factor levels.
The second right hand side term is


a
1 i
b
1 j
n
1 k
2
.. i . ij
) Y Y (
=
2
.. i
a
1 i
b
1 j
. ij
) Y Y ( n

= SSB, the nested random factor sum of squares, where
SSB measures the variation among the random factor levels within each level of the fixed factor.
The third right hand side term is


a
1 i
b
1 j
n
1 k
2
. ij ijk
) Y Y (
= SSE, the error sum of squares.
The degrees of freedom break down correspondingly:

SSTO carries abn-1 degrees of freedom, SSA carries a-1 degrees of freedom, SSB carries
a(b-1) and SSE carries ab(n-1).
The ANOVA table is (see the end of the notes for a derivation of E(MSA) and E(MSB))
Source df SS MS EMS F
Fixed factor a-1 SSA MSA + +

a
1 i
2
i
2
b
2
) 1 a /( bn n MSA/MSB
Random factor a(b-1) SSB MSB
2
b
2
n + MSB/MSE
(nested)
Error ab(n-1) SSE MSE=
2
s

2


Total abn-1 SSTO
(Note the F statistics -- we will discuss them later).
The ANOVA point estimators of the variance components are


n / ) MSE MSB (
, s MSE
2
b
2 2


.
The ANOVA point estimator of Var(Y
ijk
) =
2
b
2
+ = the total variance, is
2
b
2
+ .
The intraclass correlation, is the same ratio as in the one-way random effects model. That is =
) /(
2
b
2 2
b
+ , and its ANOVA point estimator is /(
2
b

2
b
2
+ ).
The F tests
1) Motivated by the expected mean squares, the canonical null hypothesis,
by , level at tested, is 0 ... : H ... : H
a 2 1 0 a 2 1 0

the decision
rule: Reject
1)) - a(b 1, - a ; - F(1 MSA/MSB F if H
0
>
.
2) Motivated by the expected mean squares, by , level at tested, is 0 : H vs. 0 : H
2
b 1
2
b 0
>
the decision rule
Reject
1)) - ab(n 1), - a(b ; - F(1 MSB/MSE F if H
0
>
.
100(1-)% confidence intervals
parameter confidence interval

abn / MSB )) 1 b ( a ;
2
1 ( t Y
...

t

i i
+ bn / MSB )) 1 b ( a ;
2
1 ( t Y
.. i

t

i
bna / ) 1 a ( MSB )) 1 b ( a ;
2
1 ( t ) Y Y (
... .. i i

t
' i i
bn / MSB 2 )) 1 b ( a ;
2
1 ( t ) (
' i i

t

2


)
)) 1 n ( ab ; (
1)MSE - ab(n
(0, or )
)) 1 n ( ab ;
2
(
MSE ) 1 n ( ab
,
)) 1 n ( ab ;
2
1 (
MSE ) 1 n ( ab
(
2
2 2


2
b

)
)) 1 b ( a ; ( n
1)MSB - a(b
(0,
2

*
* This last confidence interval is not an exact 100(1-) % confidence interval. There is no exact
confidence interval for
2
b
. This one has confidence coefficient that is at least
100(1-)%.
The Randomized Complete Block (RCB) Design and Analysis
The design and analysis of a randomized complete block experiment will be the 4th ANOVA
model that we study.
First, an example.
Rubber Compound Example
4 rubber compounds, A, B, C, D are to be tested for tensile strength. The objective is to choose the
best compound and estimate the testing error. 20 strips, each of the same dimensions, were cut as
specimens for the tensile strength test, and were given numbers 1-20. Using a table of random
numbers the specimens were selected in the following order:
4 13 12 6 3 14 1 20 9 16 7 19 10 5 11 15 17 8 18 2.
Compound A was tested on the first 5 specimens, 4, 13, 12, 6, 3; Compound B on the next 5
specimens, 14, 1, 20, 9, 16; and so on for compounds C and D. The response is the tensile strength,
in psi. This design is a completely random design and the correct analysis is the one-way fixed
effects ANOVA. The experimental design and the responses are in the SAS .dat file
rbbrcmpdscr.dat and printing it gives (in rbbrcmpdscr.sas).
Obs specimen compound psi
1 4 a 233
2 13 a 210
3 12 a 231
4 6 a 161
5 3 a 203
6 14 b 173
7 1 b 162
8 20 b 196
9 9 b 232
10 16 b 237
11 7 c 146
12 19 c 188
13 10 c 210
14 5 c 200
15 11 c 148
16 15 d 231
17 17 d 151
18 8 d 189
19 18 d 144
20 2 d 161
In fact, unfortunately after this experiment was analyzed, it was learned that the 20 specimens were
prepared by five different technicians, named 1, 2, 3, 4, 5.
Specimens 1, 2, 6, 7 were prepared by Technician 1
" 11, 14, 17, 18 " " " Technician 2
" 4, 9, 12, 16 " " " Technician 3
" 3, 10, 13, 15 " " " Technician 4
" 5, 8, 19, 20 " " " Technician 5
A better designed experiment would have used this information and tested the four compounds,
randomized on each technician's specimens. Suppose this had been done, with the following
randomization:
Technician 1 Technician 2 Technician 3 Technician 4 Technician 5
spec cmpnd spec cmpnd spec cmpnd spec cmpnd spec cmpnd
1 c 11 b 4 a 3 b 5 d
2 d 14 d 9 d 10 a 8 b
6 a 17 c 12 c 13 c 19 a
7 b 18 a 16 b 15 d 20 c
If the experiment had been run this way, and assuming the same tensile strengths, and where
compound1 denotes the original compound applied to the specimen, and compound2 is the
supposed hypothetical compound randomized with the technician, then
Obs specimen compound1 psi technician compound2
1 1 b 162 1 c
2 2 d 161 1 d
3 3 a 203 4 b
4 4 a 233 3 a
5 5 c 200 5 d
6 6 a 161 1 a
7 7 c 146 1 b
8 8 d 189 5 b
9 9 b 232 3 d
10 10 c 210 4 a
11 11 c 148 2 b
12 12 a 231 3 c
13 13 a 210 4 c
14 14 b 173 2 d
15 15 d 231 4 d
16 16 b 237 3 b
17 17 d 151 2 c
18 18 d 144 2 a
19 19 c 188 5 a
20 20 b 196 5 c
Had the experiment been run this way, that is with the four compounds randomized for each
technician's specimens and been analyzed as a randomized complete block experiment, we would
have found significant differences among the compounds, with compound D significantly stronger
than compounds A and B, and the testing error would have been estimated to be 18 . 7 psi.
(You will return).
Second, the example suggests some general comments about statistical experimental design.
(google randomized complete block experiments; see sections 4.1 - 4.5 on pp.115 - 125 in Lawson;
pp. 48,49 of "Design, Data, And Analysis", Edited by Colin L. Mallows, Wiley, 1987 in
blackboard)
An engineering experiment is a test or run or series of runs, generally to confirm knowledge about
a system or to learn new knowledge about a system. The experiment will represent a change in the
operation of a system, and therefore should be comparative. At the beginning phase the engineer
should formulate the problem, state the objectives of the experiment, specify the response
variable(s), specify the factors and their experimental levels (treatments) and think about the
inference space. (Note: we have used and will continue to use the term treatment, a term borrowed
from agricultural research, for our factors, such as machines, processes, operators, shifts, etc. and
their levels). An efficient experimental design should be directed toward a specific goal, be
planned by subject matter experts, enable clear learning that is affected as little as possible by
experimental error and should yield correct data analyses.
The design phase of the experiment should address the nature of the experimental material, the
specification and selection of experimental units, and the application of treatments to experimental
units. This design phase is guided by the concepts of replication, randomization and blocking.
Genuine replication means carrying runs of each treatment several times. This means that several
experimental units are used for each of the treatments. Variation among replicates can provide
estimates of experimental error. The more replications the more precise are estimates of effects.
Finally, replication may protect against bad experimental units by detecting outlier responses.
Randomization is the process of assigning treatments to experimental units by a random
mechanism so that the assignment is objective, impersonal and unbiased. It ensures that the
probability that any experimental unit receives any particular treatment is a matter of chance.
Random allocation is such that each treatment is equally likely to be applied to each experimental
unit. The principle of randomization is: whenever experimental units are allocated to treatments in
an experiment, it should be done by a random process using equal probabilities. Factors which are
suspected to be causally important but which are not actually part of the experimental structure
should be randomized out. It makes valid the statistical inference procedures, by buying insurance
against biased estimates of effects, biased estimates of experimental error and invalid hypothesis
tests.
Experimental error may obscure important effects, or may lead to believing in effects that don't
really exist. Blocking is an important experimental design feature that is specifically imposed by
the experimenter to reduce the effects of experimental error. It is sometimes referred to as "local
control".A block is a portion of the experimental material that is believed to be more homogeneous
than the aggregate of all the material. An experimental design with blocking is called a block
design, and block designs are either complete or incomplete. A familiar experimental design
prescription is "block what you can and randomize what you can't".
Third, in the completely randomized one-way ANOVA fixed effects design and analysis for
comparing levels of a single factor there is no blocking restriction on the randomization. The
randomized complete block (RCB) experimental design reduces variability in the errors, i.e. it
reduces experimental error by measuring the effect of another factor, called the blocking factor.
The blocking factor is generally not of interest in itself. It serves to reduce the error sum of
squares. At the design stage of the experiment the blocks represent a restriction on the complete
randomization of the completely randomized experimental design. The experimenter decides that
the blocking factor's blocks possess greater homogeneity within blocks than between blocks.
The language again comes from agriculture experiments, where a full set of treatments would
appear in plots within a given homogeneous block of land. This would be done over several
blocks, where each block's plots of land received the full set of treatments under study. For
example, suppose 3 treatments were to be compared, and that 15 plots of land laid out in a 5 plot
by 3 plot grid were available for the study so that each treatment could be applied to 5 plots. A
completely random experimental design would randomly allocate the 3 treatments to the 15 plots,
without regard to where in the grid they were. The randomized complete block design would argue
that adjacent plots are more homogenous, and would restrict the randomization by forming 5
blocks of 3 plots each, and then randomly assigning the 3 treatments to the 3 plots in each block.
Suppose the objective of the experiment is to compare k levels of a factor, with respect to the
response, Y. Suppose further that the experimental units that are to receive the levels (treatments)
have been blocked (grouped) into blocks of homogeneous units. In a RCB experimental design the
k levels (treatments) are all randomly allocated within each block. Therefore the block size, the
number of experimental units in a block, must be k. Hence, the term randomized complete block.
For example, in a machine output experiment, a given operation is performed by 4 different
machines (the levels of a factor called machine) on each of 5 different days (the blocks), and Y, the
number of units produced is measured. Since each machine runs on each day the blocks are
complete. If the block size is less than k, in the general case, then the experimental design is called
an incomplete block design.
In summary then a randomized complete block experimental design has a single factor of interest,
with k levels, randomly allocated within each of b blocks, each block being of size k.
Note that in the rubber compound/technician experimental design, the 5 technicians were the
blocks.
The randomized complete block expeiment model is the following. Let
ij
Y
denote the response in
block i from level j, or treatment j, of the factor. The model for the response in a RCB design is

k ,..., 2 , 1 j ; b ,... 2 , 1 i ; Y
ij j i ij
+ + +
,
where is the mean response over all levels and blocks,
j

is the fixed effect of level j of the


factor, or of treatment j,

k
1 j
j
0
;
i
is the effect of block i, where

b
1 i
i
0 if the block effects
are fixed, or where
i
~ NID (0, )
2
b
if the block effects are random;
ij

is the experimental error


associated with level j and block i,
ij

~ NID (0,
2

). If the block effects are random they are


assumed independent of the experimental errors.
With fixed block effects there are k + b + 2 parameters, and with random block effects there are k
+3 parameters.
Note1:
interest, of be may
j j
+
whether the blocks are random or fixed.
Note2:
i j ij
+ +
may be of interest if the block effects
are fixed.
Note3: With fixed block effects, E(
ij ij
) Y
and Var (
2
ij
) Y .
Note4: With random block effects E(
j ij
) Y
and Var( . ) Y
2 2
b ij
+
Note5: Usually block effects are fixed.
Girder example
In a 1983 study of four methods for predicting the shear strength for steel plate girders, nine
girders were used as blocks. Each of the four methods, abbreviated as Aa, Ka, Le, Ca was
randomly assigned to each girder and used to predict the strength of the girder. The response was
Y, the ratio of predicted to observed load. The data are summarized (with k =4, b = 9).
Method
Aa Ka Le Ca
girder
1 0.772 1.186 1.061 1.025
2 0.744 1.151 0.992 0.905
3 0.767 1.322 1.063 0.930
4 0.745 1.339 1.062 0.899
5 0.725 1.200 1.065 0.871
6 0.844 1.402 1.178 1.004
7 0.831 1.365 1.037 0.853
8 0.867 1.537 1.086 0.858
9 0.859 1.559 1.052 0.805
The fences are Method
Aa Ka Le Ca
lower fence .5965 .897 1.0325 .75
upper fence .9925 1.705 1.0845 1.038
Recall our notation uses
ij
Y
for both the random variable representation of Y
ij
and the observed
value of Y
ij
. For example, the response from method 2 (Ka) and girder 6 is both
62 2 6 62
Y + + +
and Y
62
= 1.402.
The basic descriptive statistics from a RCB experiment are

b
1 i
ij
Y = T
.j
= the total response for level j, j = 1,2,...,k; (e.g. T
.3
= 9.596 = T
LE
)

k
1 j
ij
Y
= T
i.
= the total response for block i, i = 1,2,...,b; (e.g. T
8.
= 4.348)

j .
Y

b
1 i
ij
Y /b = the mean response for level j; (e.g.
LE 3 .
Y 0662 . 1 Y )

. i
Y

k
1 j
ij
Y
/k = the mean response for block i; (e.g. ) 087 . 1 Y
. 8

) 1 b /( ) Y Y ( s
2
b
1 i
j . ij
2
j

= the sample variance for level j; (e.g. ) 0024384 . 0 s s


2
LE
2
3

SE( b / s ) Y
j j .
= the standard error of
j .
Y
. (e.g. ) 0164662 . 0 ) Y ( SE ) Y ( SE
LE 3 .
.
The following SAS code will give these basic descriptive statistics for the four methods of
predicting girder strength. (in girders1.sas)
proc sort; by method;
proc means mean var sum stderr;
by method;
var strength;
method=Aa Analysis Variable : strength
Mean Variance Sum Std Error

0.7948889 0.0030364 7.1540000 0.0183677

method=Ca Analysis Variable : strength


Mean Variance Sum Std Error

0.9055556 0.0051160 8.1500000 0.0238421

method=Ka Analysis Variable : strength


Mean Variance Sum Std Error

1.3401111 0.0213251 12.0610000 0.0486771

method=Le Analysis Variable : strength


Mean Variance Sum Std Error

1.0662222 0.0024384 9.5960000 0.0164602


Recall, inference interest is in the treatments, and not the blocks.
Point Estimation
j

=

.. j .
Y Y
estimator of the effect of treatment j (level j of the factor).


b
1 i
k
1 j
ij ..
N / Y Y
= estimator of the overall mean, N=kb.
j . j j
Y +
= estimator of treatment j (level j of the factor) mean response.

' j . j . ' j j
Y Y
= estimator of the difference between two
treatment (level) effects = estimator of the
difference between two treatment (level)
mean responses.
For the girder example,
0266944 . 1
(from proc means mean; var strength), and from the method
means, we find that
0395 . 0 , 3134 . 0 , 1211 . 0 , 2318 . 0
Le Ka Ca Aa

. Note

4
1 j
j
. 0
From the
,
j

we can calculate any of the six


' j j

. The six method differences are
Ka - Le 0.27389
Ka - Ca 0.43456
Ka - Aa 0.54522
Le - Ca 0.16067
Le - Aa 0.27133
Ca - Aa 0.11067
Note that these could be found from the following SAS code, ignoring for now all the output
except the level differences,
The fundamental identity applied to the rcb design

) Y Y Y Y ( ) Y Y ( ) Y Y ( .. Y Y
.. j . . i ij .. j . .. . i ij
+ + +
Squaring and summing gives the left hand side as usual
SSTO =
2
..
b
1 i
k
1 j
ij
) Y Y (

, with bk-1 degrees of freedom.
The right hand side is k
2
.. . i
b
1 i
) Y Y (

+
+ +

b
1 i
k
1 j
2
.. j . . i ij
2
..
k
1 j
j .
) Y Y Y Y ( ) Y Y ( b
=
+ + + +

b
1 i
k
1 j
2
j i ij
k
1 j
2
j
b
1 i
2
i
))

( Y ( b

k
= SSB + SST + SSE.
Note 1:
i

is the estimator of the ith block effect, assuming blocks are fixed.
Note 2: SSB is the sum of squares due to the blocks, and carries b-1 degrees of freedom.
Note 3: SST is the sum of squares due to the levels (treatments) of the factor, and carries k-1
degrees of freedom.
Note 4: SSE is the sum of squares of the residuals. That is,
SSE =


b
1 i
k
1 j
2
ij
e
.
Note 5: SSE carries kb-(1+(b-1)+(k-1)) = kb-(b+k-1)
=(k-1)(b-1) degrees of freedom.
Note 6: The fundamental identity does not distinguish between levels of the factor and blocks.
Mathematically, it doesn't know the difference. Similarly the first four columns of the ANOVA
table (below) do not distinguish blocks from factor levels, and of course
treat the block effects as fixed.
The full six column ANOVA table is
SV DF SS MS F EMS
Factor k-1 SST

MST

MST/s
2
+

k
1 j
2
j
2
) 1 k /( b
Blocks b-1 SSB MSB
Error (k-1)(b-1) SSE MSE = s
2

2

Total kb-1 SSTO


Note 1: MSE = s
2
= the point estimator of
2

.
Note 2: For fixed block effects, )] 1 b /( k [ ) MSB ( E
b
1 i
2
i

+
2
and for random block effects,
2
b
2
k ) MSB ( E + .
Note 3: The F statistic tests, in general, the canonical hypothesis

0 ... ... : H
k 2 1 k 2 1 0

From the ANOVA table
. 08313 . 0 s 00690984 . 0 MSE s
2 2

Note that ignoring the
methods and the girders has s
Y
= , 22483 . 0 35 / 76913564 . 1 and ignoring the girders, as in a
one-way ANOVA, has s = . 08933 . 0 32 / ) 0894139 . 16583617 (. +
The F statistic and its virtually zero p-value lead us to argue the four methods are highly
significantly different in predicting girder strength.
Note 4: If H
0
is not rejected, we would report

and a confidence interval for , and s


2
and a
confidence interval for
2
. If H
0
is rejected then confidence intervals on
' j j

should be reported
and at least one multiple comparison procedure reported. The
100(1-)% confidence interval on
' j j

is b / s 2 )) 1 b )( 1 k ( ;
2
1 ( t ) (
2
' j j

t .
Note 5: Var(
. b / ) Y ( Var , b / 2 ) ( Var ,
bk
) 1 k (
)
2
j .
2
' j j
2
j



In the girder example H
0
was rejected. The 95% confidence interval on any
' j j

is
08088 . 0 ) (
' j j
t
from the ANOVA table, where t(0.975;24) = 2.064, or from SAS, t(0.975,24)
= 2.0639, and from the SAS code we saw earlier
Duncan's multiple comparison procedure applied to the girder example has

Ka . ) 4 ( Le . ) 3 ( Ca . ) 2 ( Aa . ) 1 (
Y 340 . 1 Y , Y 066 . 1 Y , Y 906 . 0 Y , Y 795 . 0 Y
Each mean is based on b = 9 values 02771 . 0
9
MSE
) Y ( SE
) j (
. Let = 0.05. From SAS
proc anova;
class method girder;
model strength = method girder;
means method / duncan;
Number of Means (p) 2 3 4
Critical Range (Wp) .08088 .08494 .08756
Therefore, Duncan's MCP declares the four methods are all significantly different.
Note 6a: Tukey's method and SNK's method also declare the four methods different. Scheffe's
method declares Aa and Ca not significantly different.
Note 7: The F statistic, MSB/s
2
tests block effects equal to zero, in the fixed blocks case, and
2
b

equals zero in the random blocks case. In the girder example the block (girder) F statistic is 1.62,
and its p-value is 0.17, so that we would argue the 9 girders used in the experiment are not
significantly different, in the fixed case; and if these nine girders were a random sample from a
larger girder population they have no significant variability in measuring girder strength, in the
random case. In fact, in this latter case of random girder effects, 001069 . 0 4 / ) s MSB (
2 2
b
,
which implies the estimated percentage of the total variation attributed to the girders is only 13%.
Note 8: (Efficiency) If the analysis had ignored the blocks and analyzed the data as a completely
random design (CR) by the one-way ANOVA fixed effects model, the error sum of squares would
have been SSE + SSB (0.2553 for the girder example), with k(b-1) degrees of freedom (32 for the
girder example), and the error mean square would have been (SSE + SSB)/k(b-1). (For the girder
example it would have been .007978 compared to .00691). Since k(b-1) is bigger than (k-1)(b-1),
ignoring the blocks gives more error degrees of freedom and a more powerful F test for testing
treatment effects. Based on this kind of an argument a randomized complete block design is said to
be more efficient than the corresponding completely random design if MSE(CR)>MSE(RCB).
That is, if

) RCB ( MSE
) 1 b )( 1 k (
SSE
) 1 b ( k
SSE SSB
) CR ( MSE

>

,
which reduces to the RCB design is more efficient than the CR design if

). 1 k /( SSE SSB >
(For the girder example, 165836 . 0 SSE , 08949 . 0 SSB the RCB design is more efficient
than a CR design).
DUNCANS TEST
Duncan's Multiple Comparisons Procedure (Duncan's MCP)
It is usually referred to as Duncan's multiple range test for multiple comparisons, and is
recommended for balanced comparisons. It compares any number of means based on a comparison
error rate of .

First decide on k, the number of means to compare. Then array the k means that are being
compared from smallest to largest. In other words find the order statistics of the means. That is, if
k means are to be compared, find

) k ( ) 2 ( ) 1 (
Y ,..., Y , Y
.
If each
, k ,..., 2 , 1 i , Y
) i (

is based on n values, compute SE (
) Y
) i (
=
n / s
2
= SE ( ) Y ,
where MSE s
2
from the ANOVA table.
(Duncan's MCP is not recommended if the means are based on different sample sizes, say
i
n . It
can still be used with

k
1 i
i
) n / 1 ( / k n ).
Duncan's procedure compares any of a subset of say, p means, where
k p 2
. For example, if we
did not reject
) k ( ) 1 ( 0
: H
then we would argue there is no difference among the k means.
Duncan's procedure is based on the least significant range (LSR) for the p means. Call this least
significant range Wp. Then W
p
) Y ( SE r
p

, where
p
r
is called the least significant studentized
range (LSSR), and is tabled in some statistics books and software and the internet. The
p
r
value,
for a given p depends on , the significance level, and on the error degrees of freedom. Any
comparison among p means is declared significant if the absolute difference of the means exceeds
p
W
. We will see in an example the SAS output for Duncan's test.
Here is the Wikipedia description of Duncan's multiple comparison procedure.
In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure
developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple
comparison procedures that use the studentized range statistic q
r
to compare sets of means.
Duncan's new multiple range test (MRT) is a variant of the StudentNewmanKeuls method that
uses increasing alpha levels to calculate the critical values in each step of the NewmanKeuls
procedure. Duncan's MRT attempts to control family wise error rate (FWE) at
ew
= 1 (1
pc
)
k1
when comparing k, where k is the number of groups. This results in higher FWE than unmodified
NewmanKeuls procedure which has FWE of
ew
= 1 (1
pc
)
k/2
.
David B. Duncan developed this test as a modification of the StudentNewmanKeuls method that
would have greater power. Duncan's MRT is especially protective against false negative (Type II)
error at the expense of having a greater risk of making false positive (Type I) errors. Duncan's test
is commonly used in agronomy and other agricultural research.
Duncan's test has been criticised as being too liberal by many statisticians including Henry
Scheff, and John W. Tukey. Duncan argued that a more liberal procedure was appropriate because
in real world practice the global null hypothesis H
0
= "All means are equal" is often false and thus
traditional statisticians overprotect a probably false null hypothesis against type I errors. Duncan
later developed the DuncanWaller test which is based on Bayesian principles. It uses the obtained
value of F to estimate the prior probability of the null hypothesis being true.
The main criticisms raised against Duncan's procedure are:
Duncan's MRT does not control family wise error rate at the nominal alpha level, a
problem it inherits from StudentNewmanKeuls method.
The increased power of Duncan's MRT over NewmanKeuls comes from intentionally
raising the alpha levels (Type I error rate) in each step of the NewmanKeuls procedure and
not from any real improvement on the SNK method.
A Wine Tasting Experiment
In an experiment to compare 5 California red wines, say A, B, C, D, E, ten subjects (the tasters)
were selected, and each subject was given a random sequence of glasses of wine, with suitable
time between tastes. Each subject rated each wine on taste, aroma and dryness on a 10 point scale,
with 1 being low and 10 being high. The response Y is the sum of the ratings. The experiment is a
RCB experiment with five treatments, the wines and 10 blocks, the subjects (the tasters). For
example subject 3 tasted wines in the following time sequence--C E B D A; subject 8 tasted the
wines in the time sequence--B E A C D; etc. The order statistics of the means are
4 . 25 Y Y , 1 . 24 Y Y , 9 . 22 Y Y , 3 . 21 Y Y , 9 . 20 Y Y
B ) 5 ( A ) 4 ( C ) 3 ( D ) 2 ( E ) 1 (

.
(The data are in a file wine.dat and the five means are from proc means mean; by wine; var
rating;)
Each mean is based on b=10 values, and the ANOVA table gives s
2
= MSE = 1.0866667, and
therefore . 10 / MSE 329646 . 0 ) Y ( SE
(from proc anova; class wine subject; model rating = wine subject;)
There are 36 error degrees of freedom. Assume = 0.05. Duncan's MCP permits comparing 2
means, comparing 3 means, comparing 4 means and comparing 5 means. That is, p = 2 or 3 or 4 or
5. The following SAS code will give the ANOVA table, the canonical F test on the five wines and
the Duncan multiple comparison proceure.
proc anova;
class wine subject;
model rating = wine subject;
means wine / duncan lines;
The output is
The ANOVA Procedure
Class Level Information
Class Levels Values
wine 5 a b c d e
subject 10 1 2 3 4 5 6 7 8 9 10
Number of Observations Read 50
Number of Observations Used 50
The ANOVA Procedure

Dependent Variable: rating
Source DF Sum of Squares Mean Square F Value Pr > F
Model 13 386.5600000 29.7353846 27.36 <.0001
Error 36 39.1200000 1.0866667
Corrected Total 49 425.6800000
R-Square Coeff Var Root MSE rating Mean
0.908100 4.548137 1.042433 22.92000
Source DF Anova SS Mean Square F Value Pr > F
wine 4 142.4800000 35.6200000 32.78 <.0001
subject 9 244.0800000 27.1200000 24.96 <.0001
The ANOVA Procedure
The ANOVA Procedure

Duncan's Multiple Range Test for rating
Note: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 36
Error Mean Square 1.086667
Number of Means 2 3 4 5
Critical Range 0.945 0.994 1.026 1.048
Means with the same letter
are not significantly different.
Duncan Grouping Mean N wine
A 25.4000 10 b

B 24.1000 10 a

C 22.9000 10 c

D 21.3000 10 d
D
D 20.9000 10 e
The summary of Duncan's procedure is
p 2 3 4 5
r
p
2.87 3.02 3.11 3.18
W
p
(=
) Y ( SE r
p
) 0.946 0.996 1.025 1.048
Of the ten pairwise differences, i.e. with p =2, the difference between B and E is the largest and it
exceeds W
2
. In fact, all ten pairwise differences exceed W
2
except the E and D difference. There
are three comparisons among p = 3 means: E and C, D and A, C and B, and their differences all
exceed W
3
. There are two comparisons among p=4 means: E and A, D and B, and both differences
exceed W
4
, and the p=5 comparison, between E and B, exceeds W
5
.
The result then of Duncan's multiple comparison procedure is
E D C A B
BIB
Balanced Incomplete Block (BIB) Designs and Analysis
As we've seen a randomized complete block design is such that each block contains each of the k
treatments exactly once. Hence the term complete block design.
Each of the 4 compounds was tested on each technician's specimens. Each subject rated each of the
5 wines. Each girder's strength was predicted by each of the 4 methods.
If the blocks are unable to accommodate all of the treatments, for whatever reason, the design is
called an incomplete block design. In incomplete block designs interest is in testing t treatments
and there are available b blocks, exactly the same primary objective as in randomized complete
block designs. However, the blocks can not accommodate all t treatments. In fact, we could have
an entire course on incomplete block designs and their analysis.
For example, suppose in the rubber compound / technician example there were four technicians
and each technician could only prepare two specimens. A possible experimental design is
Technician Compound
1 A C
2 D B
3 C A
4 B D
This is an incomplete block design, where the blocks are the technicians. However it is not the
kind of incomplete block design that we shall study.
We will only study, and briefly, designs that are called balanced incomplete block designs.
For example, recall the oven position / material type example from Mallows. Suppose there were
seven material types and seven ovens, where each oven had only three positions. A balanced
incomplete block design is
Oven
1 2 3 4 5 6 7
Material B G A C C A D
Type F F E D G B G
E A C F B D E
As another example, recall the ten wine tasters. Suppose each taster could only effectively taste
and rate three wines. A balanced incomplete block design is
Taster
1 2 3 4 5 6 7 8 9 10
Wines A B B D A E B C E C
B D A A E A C E B D
C A E C C D D B D E
A third example from child educational / psychological testing is where three tests are to be
compared for their efficacy. Six children are available for the study but each child can only take
two tests in a given day without tiring. A balanced incomplete block design is
Test
Child 1 2 3
1 x x
2 x x
3 x x
4 x x
5 x x
6 x x
In each of these three examples every pair of treatments (material type, wine, test) occurs together
in a block (oven, taster, child) the same number of times. In the material type example the number
is one, in the wine example the number is three, in the test example the number is two. It is this
requirement that serves to characterize a balanced incomplete block design. (See Lawson, sections
7.1 - 7.3 on pages 255 -261;
There are five balanced incomplete block design parameters t, b, r, k, , where
t denotes the number of factor levels (treatments)
b denotes the number of blocks
k is the block size, i.e. the number of experimental units in a block, assumed the
same for each block, and where k < t
r denotes the number of times each treatment appears, i.e. each treatment
appears in r blocks, where r is the same for each treatment
is the number of times each pair of treatments occurs together in a block,
where each pair of treatments appears together within a block an equal number
of times. That is, each pair of treatments appears in blocks
In the oven position / material type example, t = 7, b = 7, k = 3, r = 3, = 1.
In the wine taster example, t = 5, b = 10, k = 3, r = 6, = 3.
In the child education example, t = 3, b = 6, k = 2, r = 4, = 2.
The example that we will work is a variation on a problem in Miller & Freund's Probability and
Statistics for Engineers by Richard A. Johnson, the book we use for our undergraduate course.
Here is the book problem, on p.429: "An industrial engineer tests 4 different shop-floor layouts by
having each of 6 work crews construct a subassembly and measuring the construction times (in
minutes). Test whether the 4 floor layouts produce different assembly times and whether some of
the work crews are consistently faster in constructing this subassembly than the others". Here the
crews are the blocks, the layouts are the treatments, and in the book problem the experiment is a
randomized complete block experiment. However, suppose because of time constraints, union
contract terms, salary issues, etc. each crew could not work on each layout. Hence the variation.
Suppose the crews worked on the layouts in the following experimental design
layout 1 layout 2 layout 3 layout 4
crew a x x
crew b x x
crew c x x
crew d x x
crew e x x
crew f x x
Then this is a balanced incomplete block design with parameters t = 4, b = 6, k = 2, r = 3, = 1.
In a BIB design, no treatment appears more than once in any block. A BIB design where b = t is
called symmetric. (The oven position / material type example is a symmetric BIB design). The five
BIB design parameters must satisfy the three BIB design conditions:
(i) rt = bk = the total number of experimental units = the total number of responses = n
(ii) (t-1) = r(k-1) integer an
1 t
) 1 k ( r


(iii) b

t. (Fisher's inequality)

The three BIB design conditions are conditions for the existence of a balanced incomplete block
design.
Note that in the four examples the three BIB design conditions are met. Also note that, for
example, with t=5, b=7, k=3, r=4, =2, ii) and iii) are met, but i) is not, which means there is no
BIB design with these parameters. For example, if t=9, b=12, r=4, k=3, =1 there is a BIB design.
Two Notes on condition (ii)
1) A given treatment appears in r blocks. In any block that contains the given treatment there are
(k-1) other treatments. The right hand side is the number of experimental units used by the other
treatments in the blocks that contain the given treatment. Since each of the other (t-1) treatments
appears with the given treatment times, the left hand side equals the right hand side. That is, the
r(k-1) experimental units contain the other t-1 treatments, times.
2) If
1 t
) 1 k ( r

is not an integer, there is no BIB design with r, k and t.


There are large issues dealing with construction of balanced incomplete block designs. There is a
lot of modern research on construction of these designs, especially in conjunction with computer
technology. However, a famous reference in balanced incomplete block design construction is
Bose, R. C., "On the Construction of Balanced Incomplete Block Designs," Ann. Eugenics, 9, 353-
400, 1939.
The analysis of a BIB designed experiment is not as straightforward as the RCB design. In fact, if
we ignored the incomplete block design and computed SSB and SST and SSE as if the design was
a RCB one, we would find that SSB +SST+SSE . SSTO Block effects can not be separated from
treatment, or factor level, effects because all treatments dont appear in each block. An adjustment
needs to be made in SST to account for not all the treatments appearing in each block.
For example, consider the balanced incomplete block design in the wine tasting experiment. The
data are, after grouping the results by each wine, (in winebib.dat)
a 24 1 c 25 1 e 15 3
a 28 2 c 25 4 e 20 5
a 19 3 c 22 5 e 25 6
a 24 4 c 22 7 e 20 8
a 22 5 c 24 8 e 20 9
a 26 6 c 21 10 e 19 10
b 26 1 d 24 2
b 28 2 d 23 4
b 22 3 d 24 6
b 26 7 d 23 7
b 27 8 d 19 9
b 23 9 d 20 10
Since 8667 . 22 Y
..
, SSTO =
2
10
1 i
5
1 j
.. ij
) Y Y (

= 265.4667. Had we ignored the incomplete block
design and computed the treatment (wine) sum of squares as
, ) Y Y ( 6
2
5
1 j
.. j .

and since
, 1667 . 22 Y , 1667 . 23 Y , 3333 . 25 Y , 8333 . 23 Y
d . c . b . a .

, 8333 . 19 Y
e .
SST = 100.8, and had we ignored the incomplete block design and computed the
subject (block) sum of squares as , ) Y Y ( 3
2
10
1 i
.. . i

SSB = 177.46667, and therefore some kind of


adjustment is required because SSB + SST > SSTO, and, of course, SSE can't be negative. In fact
PROC ANOVA would give this senseless output. (winebib.sas)
For our crew/layout example, the response is the construction time, in minutes, for the crew to
complete the layout, and where the layouts are the levels of the layout factor, the treatments, and
where the crews are the blocks. Here are the times:
layout 1 layout 2 layout 3 layout 4 T
i.
crew a 48.2 53.1 101.3
crew b 50.7 49.9 100.6
crew c 50.0 60.1 110.1
crew d 50.6 57.5 108.1
crew e 47.1 55.3 102.4
crew f 57.2 53.5 110.7

T
.j
146.0 160.9 153.4 172.9 633.2=T
..

We can observe that crew b was the fastest crew and layout 1 had the fastest time, and crew e on
layout 1 was the fastest, and crew c on layout 4 was the slowest. Furthermore, . 766667 . 52 Y
..

Here is the senseless output from running proc anova; class crew layout; model time =
crew layout; (in layouts.sas)
Dependent Variable: time
Sum of
Source DF Squares Mean Square F Value Pr > F
crew 5 53.1066667 10.6213333 Infty <.0001
layout 3 131.7400000 43.9133333 Infty <.0001
Error 3 0.0000000 0.0000000
Corrected Total 11 180.7066667
As we shall see there is a way to tell SAS to compute the correct sum of squares. But, what is the
correct sum of squares? The next few pages of these notes will derive the correct sum of squares.
To compute the correct treatment sum of squares, we introduce the incomplete block design
incidence matrix, typically denoted N = (n
ij
), where
n
ij
=

'

i block in appear not does j treatment if 0


i block in appears j treatment if 1
.
In general, therefore the incidence matrix is a b x t matrix of ones and zeroes. Note that

b
1 i
ij
r n
and

t
1 j
b
1 i
t
1 j
ij ij
, n n , k n
where n is the total number of responses.
For the crew/layout BIB example the incidence matrix is
N =
1
1
1
1
1
1
1
1
]
1

0110
1001
0101
0011
1010
1100


The BIB analysis comes from applying the least squares estimation criterion to the BIB model in
the context of a general linear model. The BIB model is the RCB model with the BIB design
conditions.
The model for a BIB experimental design is
) ( n Y
ij i j ij ij
+ + +
, where the conditions on
, the , the ,
i j

and the errors are the same as for the RCB design, and where the three BIB design
conditions are a part of the model.
For example,
), ( Y and 9 . 49 Y
23 2 3 23 23
+ + +
and Y
24
= 0, where 0 means crew b does
not work on layout 4.
Assume the block effects are fixed. The term intrablock analysis is used to describe this fixed
block effects case. The intrablock analysis follows from the least squares estimation criterion
applied to the model. It produces the following system of b + t + 1 equations in the b + t + 1
unknowns,
i j

, ,

b 1,2,..., i

k n k T
t 1,2,..., j

n r r T
Y

k r n T
i
t
1 j
j ij . i
i
b
1 i
ij j j .
b
1 i
.. i
t
1 j
j ..
+ +
+ +
+ +


For example, see the overhead for a derivation of the equation for j =2. That is, the derivation of

n r r T
i
b
1 i
ij 2 2 .
+ +

is on the overhead.
The system of b + t + 1 equations in the b + t + 1 unknowns is not a full rank system of equations,
and therefore the system has many possible solutions.
For the crew/layout example, the system of 11 equations in the 11 unknowns can be represented by
Coefficient

6 5 4 3 2 1 4 3 2 1


LHS

633.2 12
146.0 3 3 1 1 1 the 4 treatment
160.9 3 3 1 1 1 equations
153.4 3 3 1 1 1
172.9 3 3 1 1 1
101.3 2 1 1 2 the 6 block
100.6 2 1 1 2 equations
110.1 2 1 1 2
108.1 2 1 1 2
102.4 2 1 1 2
110.7 2 1 1 2
The solution to the system of equations provides the adjustment to the treatment totals for
differences in the block totals.
Just as in the RCB design, the blocks are experimental error reducers, and therefore no adjustment
needs be made for the block effect estimates. In fact, the estimates of the block effects are the same
as in the RCB analysis. That is, . ) k / T (

. i i
Then SSTO and SSB are computed exactly
design. RCB in the as
In the BIB design, SSB = .

k
b
1 i
2
i

In the crew/layout example, the estimates of and of the six crew effects (the block effects) are

6
1 i
2
i 6
5 4 3 2 1 ..
). 55333 . 26 ( 2

2 53.1067 SSB and 180.7067 SSTO Then . 5833 . 2

, 5667 . 1

, 2833 . 1

, 2833 . 2

, 4667 . 2

, 1167 . 2

, 7667 . 52 12 / 2 . 633 Y
Note that these are, in fact, part of the senseless output from PROC ANOVA. It's SST that needs to
be adjusted.
The BIB adjustment of the treatment totals
The adjustment comes about by removing the block effect estimators,
i

from the t treatment


equations. First, each one of the b block equations is multiplied by n
ij
/ k. Then these multiplied
block equations are summed and subtracted from each of the t treatment equations. This eliminates
the estimated block effects from each treatment equation, and at the same time eliminates

and
the estimated treatment effects that are different from
.
j

The result of this block effect elimination


process is the t equations, j = 1, 2,...,t

k
n n
r n ) k / n ( r
k
T n
T
b
1 l
t
1 j
j ij lj
j
b
1 l
t
1 j
j ij lj j
b
1 i
. i ij
j .


The left hand side shows that each treatment total, T
.j
is "adjusted" by subtracting from it,
,
k
T n
b
1 i
. i ij

the total response for those blocks where treatment j appears, divided by the block size.
That is, the adjusted treatment total is, say
(AT)
j
= T
.j
-
k
Q
k
T n kT
k
T n
j
b
1 i
. i ij j .
b
1 i
. i ij


.
Note, importantly that the adjusted treatment totals sum to zero.
The right hand side is the estimated treatment effect multiplied by a coefficient that depends on the
BIB design parameters and leads to the estimator of
.
j

The least squares estimator of


j

, the effect of treatment j (level j) is the adjusted effect



. t / ) AT ( k t / Q
j j j

The adjusted treatment sum of squares is SST(adj) =
t k / Q ) AT (
t
1 j
2
j
t
1 j
j j


.
The fundamental identity for the BIB design is SSTO = SSB + SST(adj) + SSE.
For computing by hand, SSE is typically computed by subtraction
The F statistic,

F =
) 1 t b n /( SSE
) 1 t /( ) adj ( SST
MSE
) adj ( MST
+

tests the canonical null hypothesis of no treatment effects. Under the null hypothesis, the F statistic
has an F distribution with t-1 numerator degrees of freedom and n - b - t +1 denominator degrees
of freedom. Our informal motivation for the F test is E(MST(adj)) =
+

t
1 j
2
j
2
). 1 t /( r
Also, although not necessarily of interest, MSB = SSB/(b-1), and E(MSB) =

,
k ) 1 b (
) n ( b
k
t
1 j
2
j ij
2
b
2


+ +
assuming the block effects are random, and E(MSB) =
, ) n (
k ) 1 b (
b
1 b
k
b
t
1 j
2
j ij
b
1 i
2
i
2


assuming the block effects are fixed.
In the crew / layout BIB design example the adjusted treatment totals are
(AT)
1
= 146 - (101.3 + 100.6 + 102.4) /2 = 146 - 304.3/2 = 146 - 152.15 = -6.15
(AT)
2
= 160.9 - (101.3 + 108.1 + 110.7)/2 = 160.9 - 320.1/2 = 160.9 - 160.05 = 0.85
(AT)
3
= 153.4 - (100.6 + 110.1 + 110.7)/2 = 153.4 - 321.4/2 = 153.4 - 160.7 = -7.3
(AT)
4
= 172.9 - (110.1 + 108.1 + 102.4)/2 = 172.9 - 320.6/2 = 172.9 - 160.3 = 12.6.
The adjusted layout effects, and the unadjusted layout effects (
Y
j .
) , are (
) 76667 . 52
layout Q
j
adjusted effect unadjusted effect
j .
Y
1 -12.3 -3.075 -4.1000 48.66667
2 1.7 0.425 0.8666 53.63333
3 -14.6 -3.65 -1.6334 51.13333
4 25.2 6.3 4.8666 57.63333
Therefore, SST(adj) = 125.2975, and SSE = 180.7067-53.1067-125.2975 = 2.3025, and MST(adj)
= 41.7658, MSE = 0.7675, F = 54.4180, with p-value = 0.0041. We would argue that the four floor
layouts produce significantly different assembly times.
FACTORIAL EXPERIMENTS
(Chapter 3 of Lawson is titled Factorial Designs)
Factors: There are many ways to think of a factorial experiment. The term factor is one of those
words that can not be precisely defined, but which we all know what we mean. In statistical
design and analysis of experiments factor denotes any feature of the experimental conditions which
may vary from experimental unit to experimental unit in a run of the experiment. A factor could be
temperature, type of furnace, plant layout, reactant, pH, catalyst, operator, plant, type of alloy,
laboratory, machine, method, shift, weeks, region, wine, pressure, compound and so on. Blocks, in
this general sense are factors, usually called blocking factors. Girders, technicians, crews, strips,
positions, tasters, etc. were blocking factors. The different forms (or values) of a factor used in the
actual experiment (or study) are called levels. A combination of factor levels is called a
treatment. The key nature of factorial experiments is that different factors are studied
simultaneously to assess their effects on the response variable under study. Sometimes we say the
factors are crossed. Crossed is usually used in contrast to nested. (Recall our nested model where
there were two factors, that were not studied simultaneously, but where one factors levels, the
days were nested in the other factors levels, the shifts).
A factorial experiment is called complete or full if each possible treatment is used in the
experiment, and it is called balanced if each treatment is used the same number of times. The three
factor Latin square example is a balanced factorial experiment that is not full.
A factor is called a treatment factor if its application to experimental units is planned or
experimentally controlled. A treatment factors levels are typically randomized in their allocation
to experimental units. A factor is called a classification factor if its levels correspond to
classifications of the experimental units. Classification factors are not typically randomized.
For example, consider an experiment to study the product yield of an undesirable chemical by-
product. There are two treatment factors, catalysts at 3 levels, and pressure at 2 levels. Therefore,
there are six treatments, the 6 combinations of the levels of the two factors: say catalyst 1 and
pressure level 1, catalyst 1 and pressure level 2, catalyst 2 and pressure level 1, catalyst 2 and
pressure level 2, catalyst 3 and pressure level 1, catalyst 3 and pressure level2. The study is to be
carried out at two different laboratories. Laboratory is a classification factor at 2 levels. This
factorial experiment would be described as a 3x2x2 factorial experiment, or as a 3x2 factorial
experiment with classification by 2 laboratories. Two questions to answer could be does either
catalyst or pressure have a significant effect on yield?, are the laboratories consistent with each
other?
Factors may also be classified as quantitative, where the levels correspond to well defined
numerical values of a variable, such as time, temperature, pressure, humidity, size, angle, speed,
concentration, etc. Here the levels can be ordered in a meaningful way, and it is desirable to
arrange for equal spacing of the levels, if possible. If a factors levels admit of no well defined
ordering, the factor is designated qualitative, as for example, the load prediction methods in the
girder study. Some qualitative factors may be orderable, but not in a well defined sense, such as
small, medium, large levels of a size factor, or 0-4 weeks, 4-8 weeks, 8-24 weeks, more than 24
weeks levels of a time factor.
As we have already seen, factors are also classified as fixed or random. A fixed factor is one
whose levels are specified by the experiment or study and such that results apply to these levels
only. The levels of a random factor that are used in the experiment are a random sample of levels
from a large population of levels. Results are expected to apply to the population of levels. Also, as
we've seen, the analysis of experiments that involve one or more random factors may be called
variance components analysis.
Initially, we will study factorial experiments with 2 factors, factor A at a levels and factor B at b
levels, in a full axb factorial experiment, implying ab treatments, or ab treatment combinations; i
will index A's levels and j will index B's levels. Each treatment is a combination of a factor A level
and a factor B level. In the pressure-temperature example there were 2 factors, each at two levels,
and a total of 4 treatments. Later we will study factorial experiments with k factors each at two
levels. The treatments in a factorial experiment may be run in a completely random design, a
randomized complete block design, a BIB design, etc. We will also assume, at first, that both
factors are fixed. It will be made clear when the model assumes random factors.
Replication is a "funny" word in experimental design. It is used in the sense of repeating. We may
think a replicate of a factorial experiment defines one run of all the ab treatments, and n replicates
means the experiment was run, replicated n times, hoping to match as carefully as possible the first
run. We do use this latter terminology. However, if only one run of the experiment was done, we
say there was no replication. For example, if the pressure-temperature example had been run 3
times we would say it was replicated 3 times, or it was run in 3 replicates, where n = 3. If it was
run once, as in the example, we would say there was no replication, or that it was not replicated,
where actually n=1. Usually factorial experiments are run in more than one replicate, and they are
usually balanced, meaning each of the ab treatments is run in each of the n replicates. In the
balanced factorial experiment in n replicates there will be n responses to each of the ab treatments.
In practice, the factorial experimenter needs to distinguish between true replication and sampling.
(For more discussion and some references see an article by M.J. Bayarri and A.M.Mayoral,
Bayesian Design of "Successful" Replications in The American Statistician, August 2002).
Up to now all of our models have been additive. Additive models assume no interaction among
factors. The model for a factorial experiment allows for the analysis of interaction between
factors. If interaction exists, speaking informally, differences between effects of two levels of one
factor are not the same for each level of the other factor. Recall in the pressure-temperature
example, the difference in yield between T
H
and T
L
at P
L
was different from the difference in yield
between T
H
and T
L
at P
H
, and similarly the difference in yield between P
H
and P
L
at T
L
was
different from the difference in yield between P
H
and P
L
at T
H
. We would say there is an
interaction effect between temperature and pressure on the yield.
Factorial Experiment (No replication)
Temperature-Catalyst Example: an axb factorial experiment with no replication.
The objective is to determine if temperature (factor A) or type of catalyst (factor B) has an effect
on the setting time of a new plastic. The response is Y, the elapsed setting time, in minutes, to a
uniform criterion of hardness.
Temperature is a quantitative factor and the temperature settings, the levels for the A factor are
C 75 , C 50 , C 25
. There are four types of catalyst in the study, A, B, C, D. These are the four
levels of the qualitative factor, catalyst, factor B.
The 12 treatment combinations were randomly assigned to be applied to 12 experimental units, the
moulds, in a completely random experimental design:

Mould Treatment = (factor A, factor B) Setting Time (Mins.)
1 (
A , 50
) 27
2 (
) D , 25
24
3 (
) B , 25
28
4 (
A , 75
) 30
5 (
C , 75
) 26
6 (
C , 50
) 23
7 (2
C , 5
) 22
8 (
B , 75
) 32
9 (
) D , 75
29
10 (
) A , 25
25
11 (50
) D ,
23
12 (
B , 50
) 29


The setting times arranged in a 3x4 table are
Catalyst
Temperature A B C D
(
o
C)
25 25 28 22 24
50 27 29 23 23
75 30 32 26 29
Notes: a=3, b=4, ab=12, n=1. Catalyst and temperature are treatment factors. The factorial
experiment is complete, or full, and balanced. Temperature is a quantitative factor, with equally
spaced levels. Catalyst is a qualitative (non-orderable) factor. Both are fixed factors. There is no
replication.
The model for the axb factorial experiment with no replication and both factors fixed is

b ,..., 2 , 1 j ; a ,..., 2 , 1 i ; ) ( Y
ij ij j i ij
+ + + +
,
where
Y
ij
= the response at level i of factor A and level j of factor B
= the overall mean response

i
= the effect of level i of factor A,
0
a
1 i
i

j
= the effect of level j of factor B,
0
b
1 j
j

()
ij
= the effect of level i of factor A and level j of factor B,
= the interaction effect of factor A and factor B, where


b
1 j
ij
a
1 i
ij
0 ) ( ) (
(recall the temperature-pressure example)

ij
= experimental error associated with the experimental unit that
received treatment combination (ij),
ij
~ NID(0,
2
).
Note 1: There are ab + a + b effect parameters, but with the conditions of summing to zero, there
are actually (a-1) + (b-1) main effect parameters (factor effect parameters), and (a-1)(b-1)
interaction effect parameters. These will correspond to degrees of freedom in the ANOVA table.
The other parameters are and
2
. The
ij
are ab independent normally distributed random
variables.
Note 2: Note the main effect and interaction effect terminology.
Note 3: If all the interaction effects are zero, the model is an additive model, or as it's also
sometimes called, a main effects model.
Note 4: The difference in an additive model and an interaction model can be summarized by
additive interaction
j ' i ij ' i i ' i i j ' i ij
) ( ) ( ) ( ) Y Y ( E +

' ij ij ' j j ' j j ' ij ij
) ( ) ( ) ( ) Y Y ( E +
) Y Y ( E
j ' i ij

is the expected difference in the response to factor A at levels i and i', holding factor
B fixed at level j, and
) Y Y ( E
' ij ij

is the expected difference in response to factor B at levels j and
j', holding factor A fixed at level i. For example,
) Y Y ( E
A , 75 A , 50

=
, ) ( ) (
A , 75 A , 50 75 50
+
whereas if the model is additive,
) Y Y ( E
A , 75 A , 50

is
75 50

.
At this point we should note the "no replication" implication on the interaction effects. Even
though we call ()
ij
an interaction effect and we call
ij
the experimental error, and these terms
appear separately in the model, statistically there is no way to distinguish them. We could assume
an additive model, and hope for the best. (Note the RCB model is an additive one, which assumes
no interaction between levels of the factor and the blocks--this being reasonable given the nature of
the RCB design). Let's not assume a main effects model until we've at least applied the
fundamental identity to the model.
The least squares point estimators of and the effects are
.. j . . i ij ij .. j . j .. . i i ..
Y Y Y Y ) ( , Y Y

, Y Y , Y +
Note that
ij j i ij ij
e

Y ) ( .
The estimates for the example are
75 . 2 , 00 . 1 , 75 . 1 Y Y , 5 . 26
3 2 1 .. . i i
(
25 . 29 Y , 5 . 25 Y , 75 . 24 Y
. 3 . 2 . 1
)
167 . 1

, 833 . 2

, 167 . 3

, 833 . 0

Y Y

4 3 2 1 .. j . j

( ) 33 . 25 Y , 67 . 23 Y , 67 . 29 Y , 33 . 27 Y
4 . 3 . 2 . 1 .

ij j i ij ij
e

Y ) (

the interaction effect estimates = the residuals are


Catalyst
Temperature A B C D
(
o
C)
25 -.583 .083 .083 .417
50 .667 .333 .333 -1.333 = the
ij ij
e the ) (
75 -.083 -.417 -.417 .917
Note the estimated effects of factor A sum to zero, the estimated effects of factor B sum to zero,
and the estimated interaction effects sum to zero in both directions.
The fundamental ANOVA identity breaks down SSTO =
2
a
1 i
b
1 j
.. ij
) Y Y (

into
+ + +

a
1 i
b
1 j
a
1 i
b
1 j
2
.. j . . i ij
2
.. j .
2
.. . i
) Y Y Y Y ( ab ) Y Y ( a ) Y Y ( b
, and since
, e Y Y Y Y ) ( , Y Y

, Y Y
ij .. j . . i ij ij .. j . j .. . i i
+
SSTO equals
+ +

a
1 i
2
ij
b
1 j
b
1 j
2
j
a
1 i
2
i
) (

a b
, and we identify SSA, the sum of squares due to factor A as

a
1 i
2
i
b ; SSB, the sum of squares due to factor B as

b
1 j
2
j

a
; and SSAB, the sum of squares due to
the interaction as

a
1 i
2
ij
b
1 j
) (
. From this the first four columns of the ANOVA table would be
Source SS DF MS SV SS DF MS
Factor A SSA a-1 MSA Temp. 46.5 2 23.25
Factor B SSB b-1 MSB example

Ctlyst. 60.333 3 20.111


AB interaction SSAB (a-1)(b-1) MSAB TC inter. 4.167 6 0.6945
Total SSTO ab-1 Total 111 11
The expected values of the mean squares are
+
+ +


a
1 i
b
1 j
2
ij
2
a
1 i
b
1 j
2
j
2 2
i
2
) 1 b )( 1 a /( ) ( ) MSAB ( E
, ) 1 b /( a ) MSB ( E ), 1 a /( b ) MSA ( E
and therefore at this point it looks like we can't estimate
2
, unless we assume all the interaction
effects are zero, i.e. the additive model. If the interaction effects are not zero there is no way to test
for the significance of the main effects, since the F test requires s
2
, the ANOVA estimate of
2
.
That is, unless we can distinguish SSAB from SSE we can go no further than point estimation.
(Unless we have an "outside" estimate of
2
).
Recall the interaction plot. If we believe there is parallelism in the interaction plot then we would
not feel uncomfortable about an additive model, and analysis would be the same as the RCB
analysis with interest in both factors. If the plot says "interaction is present", we would examine
the residuals. In the example the plot suggests interaction may be present, mainly because of
catalyst D and temperature

50
, where the setting time should have been higher. Here are the
residuals or interactions
Note: We could remove the "bad cell", at temperature 50
o
and catalyst D, with residual -1.33333,
and replace it with

'
..
'
4 .
'
. 2
R
24
Y Y Y Y + ,
where ' means ignore temperature 50 and catalyst 4 (catalyst D).
This implies
6
26 32 30 22 28 25
2
29 24
3
23 29 27
Y
R
24
+ + + + +
+
+
+
+ +

= 25.67, and we would redo the analysis with Y


24
=26.
Doing this leads to the residuals

-.333 .333 .333 -.333
.167 -.167 -.167 .167
.167 -.167 -.167 .167
There are no misbehaving residuals, and now the additive model has SSE = 0.667, and
s
2
= .667/5 = 0.133 (one degree of freedom is lost due to "removal").
This "removal of the bad cell" is not really recommended.
In fact, at this point it looks like there may be interaction present at temperature 50 and catalyst D
(we should ask questions like is 23 a typo, was the time incorrectly read?, was something wrong
with the experimental unit?, is there a "physics" reason?, can we "investigate" this response?).
There is a significance test for interaction in this case where there is no replication. It is known as
Tukey's test for (non)additivity. (See p.78 of Lawson). It is actually a test for

a
1 i
b
1 j
2
ij
) (
= 0
(recall E(MSAB)). The test breaks out one degree of freedom from the (a-1)(b-1) error degrees of
freedom and identifies it with the interaction of the two factors.
To do the test, compute


a
1 i
b
1 j
2
j
2
i ij
a
1 i
b
1 j
j i

Q and Y

P
and SSNA=P
2
/Q and
SSENA=SSAB-SSNA and MSENA=SSENA/(ab-a-b).
(For our example P = -4.0855, Q = 233.79735, SSNA = 0.0714,
SSENA = 4.167-.0714 = 4.0956, MSENA = 4.0956/5 = 0.819).

The null hypothesis is H
0
: the model is an additive model. If the null hypothesis is true

F=SSNA/MSENA ~ F(1,ab-a-b) = t
2
(ab-a-b).
(In the example, F = 0.087, with p-value = 0.78, F(.95,1,5)=6.61, additivity is not
rejected, and the model can be analyzed with no interaction effects. In other words we can treat
SSAB as SSE).
ANOVA table for the temperature-catalyst example, additive model.
SV SS DF MS F EMS
Temperature 46.5 2 23.25 33.48 +

3
1 i
2
i
2
2 / 4
Catalyst 60.333 3 20.111 28.96
+

4
1 j
2
j
2
3 / 3

Error 4.167 6 0.6945 = s
2

2
Total 111 11
With replication
The axb factorial experiment with r replications
Suppose an axb factorial experiment, with factor A at a levels and factor B at b levels is replicated
r times. Assume both factors are fixed. There are rab responses, r responses to each of the ab
treatments.
Assume each replicate is run as a completely random experimental design. That is, each replicate is
run as a completely random experimental design with ab treatments randomly assigned to the ab
experimental units in each replicate. The experiment requires rab experimental units.
The ANOVA model for Y
ijk
, the response to treatment (i,j) in replicate k is

r ,..., 2 , 1 k ; b ,..., 2 , 1 j ; a ,..., 2 , 1 i ; ) ( Y
ijk k ij j i ijk
+ + + + +
.
where
= the overall mean response

i
= the effect of level i of factor A,
0
a
1 i
i

j
= the effect of level j of factor B,
0
b
1 j
j

()
ij
= the effect of level i of factor A and level j of factor B,
= the interaction effect of factor A and factor B, where


b
1 j
ij
a
1 i
ij
0 ) ( ) (

k
= the effect of replication k,

r
1 k
k
= 0

ijk
= experimental error associated with the experimental unit that
received treatment combination (ij) in replicate k,
ijk
~ NID(0,
2
).
Note 1: There are ab + a + b + r effect parameters, but with the conditions of summing to zero,
there are actually (a-1) + (b-1) main effect parameters (factor effect parameters), and (a-1)(b-1)
interaction effect parameters and r-1 replication effect parameters. These will correspond to
degrees of freedom in the ANOVA table. The other parameters are and
2
. The
ijk
are abr
independent normally distributed random variables.
Note 2: The model assumes no interaction between the treatments and the replicates.
Point Estimation
The least squares point estimators of , the overall population mean response is, as usual
...
Y ,
and of the effects are
Effect Estimator

i

... .. i
Y Y
j


... . j .
Y Y
ij
) (

... . j . .. i . ij
Y Y Y Y +

=

Y
j i . ij

k

... k ..
Y Y
The least squares point estimator of
2
is
) 1 r )( 1 ab /( e s
a
1 i
b
1 j
r
1 k
ijk
2 2


. Here, e
ijk
is the residual
from treatment (ij) in replicate k, and is
) ) (

( Y ) Y ( E

Y e
k ij j i ijk ijk ijk ijk
+ + + + ,
or in terms of means,

... k .. . ij ijk ijk
Y Y Y Y e +
.
Of course s
2

will be MSE in the ANOVA table.
The fundamental ANOVA identity gives the following decomposition of the total sum of squares,
SSTO =

a
1 i
b
1 j
r
1 k
2
... ijk
) Y Y (

= SSA + SSB + SSAB + SSR + SSE , where
SSA =

a
1 i
2
i
br , carrying a-1 degrees of freedom, is the factor A sum of squares,
SSB =

b
1 j
2
j

ar
, carrying b-1 degrees of freedom, is the factor B sum of squares,
SSAB =

a
1 i
b
1 j
2
ij
) ( r
, carrying (a-1)(b-1) degrees of freedom, is the interaction sum of
squares,
SSR =

r
1 k
2
k
ab , carrying r-1 degrees of freedom, is the replication sum of squares,
SSE =

a
1 i
b
1 j
r
1 k
2
ijk
e
, carrying (ab-1)(r-1) degrees of freedom, is the error sum of squares.
The ANOVA table is
SV DF SS MS EMS F
Factor A a-1 SSA MSA +

a
1 i
2
i
2
) 1 a /( br
see
Factor B b-1 SSB MSB +

b
1 j
2
j
2
) 1 b /( ar
Note 3
AB interaction (a-1)(b-1) SSAB MSAB
) 1 b )( 1 a /( ) ( r
a
1 i
b
1 j
2
ij
2
+

be
Replications r-1 SSR MSR +

r
1 k
2
k
2
) 1 r /( ab
l
Error (ab-1)(r-1) SSE MSE=s
2

2
o
Total abr-1 SSTO w
Note 3: All F tests are of the form F = MS/s
2
. Assuming the given null hypothesis H
0
is true, F ~ F
(DF corresponding to the MS, (ab-1)(r-1)). In the fixed effects case that we are assuming, there are
four null hypotheses, each of the form H
0
: no effect due to , and rejection takes place, as usual in
the upper tail of the F distribution. For example, the canonical null hypothesis for factor B is
0 ... : H
b 2 1 0

. If we do not reject H
0
, we would argue that factor B has no effect on
the response. The F statistic for this hypothesis test is F = MSB/MSE, which assuming H
0
true has
an F distribution with b-1 numerator degrees of freedom and (ab-1)(r-1) denominator degrees of
freedom.
Note 4: If the experiment is not truly run as replications, it is called an axb factorial with r
observations per cell. Then the replications DF and SS are part of the error, and the error DF are
ab(r-1) and the error sum of squares is SSE+SSR. That is, there is no real replication so that the
variation from repeated observations is part of random error.
Typically, in this case a (random) sample of r responses is taken at each of the ab treatments of an
axb factorial experiment. The model is now
n ,..., 2 , 1 k ; b ,..., 2 , 1 j ; a ,..., 2 , 1 i ; ) ( Y
ijk ij j i ijk
+ + + + .
The only difference in this model and the replicated axb factorial experiment is that the error term
in this model represents sampling error, as opposed to experimental error in the replicated model.
In fact, one needs to be aware of the distinction especially if the data set is summarized. The best
way to distinguish the two is to think of the sampling error equaling experimental error +
replication effect. Also, in the sampling model
. Y Y e
. ij ijk ijk

Note 5: The replications could be blocks in a RCB design with ab treatments in each block. In this
case the replications source of variation in the ANOVA table would be identified with the name of
the blocking factor.
Note 6: Caution: The F test in Note 3 for the AB interaction should be done before the F tests for
the main effects. If it rejects H
0
, then we argue there is interaction present and it may not be clear
as to what the F tests for the main effects mean. In this situation the interaction plots become
important.
Note 7: If factor A or factor B is quantitative at equally spaced levels, its sum of squares and
degrees of freedom can be broken down into one degree of freedom sum of squares corresponding
to mutually orthogonal contrasts. If factor A or factor B is qualitative and if its F test shows
significance, then a multiple comparison procedure should be applied, and confidence intervals
may also be of interest.
Temperature/Catalyst Experiment in r =3 Replicates
Consider an experiment where the objective is to determine if temperature (factor A) or type of
catalyst (factor B) has an effect on the setting time of a new plastic. The response is Y, the elapsed
setting time, in minutes, to a uniform criterion of hardness.
Temperature is a quantitative factor and the temperature settings, the levels for the A factor are
C 75 , C 50 , C 25
. There are four types of catalyst in the study, A, B, C, D. These are the four
levels of the qualitative factor, catalyst, factor B.
It was decided to run three replicates, where in each replicate the 12 treatment combinations were
randomly assigned to be applied to 12 experimental units, the moulds, in a completely random
experimental design. There are 36 experimental units, 36 moulds in the experiment. The
experimental design is
Replicate 1 Replicate 2 Replicate 3
mould (A,B) mould (A,B) mould (A,B)
1 50,A 1 50,C 1 75,D
2 25,D 2 25,C 2 50,B
3 25,B 3 25,D 3 50,A
4 75,A 4 75,D 4 25,D
5 75,C 5 25,A 5 50,D
6 50,C 6 50,A 6 75,C
7 25,C 7 75,B 7 25,A
8 75,B 8 75,C 8 25,B
9 75,D 9 25,B 9 25,C
10 25,A 10 75,A 10 50,C
11 50,D 11 50,D 11 75,B
12 50,B 12 50,B 12 75,A

The results of the experimental design, where the response is the setting time, in minutes, may be
summarized as

temperature catalyst
A B C D
25 25 28 22 24
24 25 26 22
28 26 25 27
50 27 29 23 23
28 25 27 26
27 20 23 24
75 30 32 26 29
31 27 24 26
28 26 23 26

or we could let SAS summarize the results by the SAS code
Another example (an axb factorial with sampling)
An experiment was done to see if there are differences in the quality of steel produced by 5
different types of rolling machines. The engineer believed there may also be differences in the
feedstock obtained from 3 different suppliers. For the experiment 10 samples of feedstock were
selected from each supplier, the 30 experimental units, and 2 samples were randomly assigned to
each machine. The response, Y is the ductility of the product, a measure of quality of steel, the
bigger the better. If machine is factor A and supplier is factor B then the experiment is a 5x3
factorial experiment with 2 samples on each treatment (a=5,b=3,n=2). Machine is a treatment
factor and supplier is a classification factor. Both are qualitative and fixed. The model for the
response is

ijk ij j i ijk
) ( Y + + + +
,
where i indexes the five machines, j indexes the 3 suppliers, and k indexes the 2 samples for each
machine/supplier treatment combination. The parameters are , the overall mean ductility for the 5
machines and 3 suppliers;
i
, the main effect of machine i on ductility;
j
, the main effect of
supplier j on ductility; ()
ij,
the interaction effect of machine i and supplier j on ductility; and
2
,
the error variance. There are nab=30 responses, and 30 random variables
ijk
assumed NID (0,
2
).
The ductility results from the experiment are
Machine
1 2 3 4 5
Supplier 1 8.03 7.76 8.17 7.91 8.58
7.55 6.36 8.52 9.04 8.01

2 7.26 7.90 7.18 8.55 7.97
6.05 7.79 7.26 8.43 8.03

3 8.65 8.21 9.64 8.51 7.77
8.29 7.39 8.78 8.34 7.87
Note that the summary table does not distinguish a sampling model from a replicated model. In
other words, given a summary table, one would not know whether the responses were from a
replicated design or a sampling design.
The cell means, the
. ij
Y
and the marginal means are (with rounding)
Machine
1 2 3 4 5
. j .
Y
Supplier 1 7.79 7.06 8.35 8.48 8.30 7.99

2 6.66 7.85 7.22 8.49 8.00 7.64

3 8.47 7.80 9.21 8.43 7.82 8.35

.. i
Y 7.64 7.57 8.26 8.46 8.04 99 . 7 Y
...

Estimated Variances of Estimators
Estimator Replications Model Sampling Model

MSE/rab MSE/nab

i
MSE(a-1)/abr MSE(a-1)/abn

j

MSE(b-1)/abr MSE(b-1)/abn

ij
) (
MSE(a-1)(b-1)/abr MSE(a-1)(b-1)/abn

k
MSE(r-1)/abr
' i i
2MSE/br 2MSE/bn
' j j

2MSE/ar 2MSE/an
An axb factorial with sampling and A and B random
The model is a variance components model

ijk ij j i ijk
) ( Y + + + +
,
where Y
ijk
is the kth sample response to factor A at level i and factor B at level j,

i
~ NID(0, )
2
a
,
j
~ NID(0,
2
b
), ()
ij
~ NID(0, )
2
ab
,
ijk
~ NID(0,
2
),
is the overall mean, and equal to E(Y
ijk
). The a + b +ab + abn random variables are all
independent. Here n is the sample size. The model has five parameters.
The ANOVA table is
SV DF SS MS EMS F
A a-1 SSA MSA
2
ab
2
a
2
n bn + + see
B b-1 SSB MSB
2
ab
2
b
2
n an + + note
AB (a-1)(b-1) SSAB MSAB
2
ab
2
n + be
Error ab(n-1) SSE MSE=s
2

2
lo
Total abn-1 SSTO w
From the fundamental identity
SSTO =

a
1 i
b
1 j
2
n
1 k
... ijk
) Y Y (
, SSA =

a
1 i
2
... .. i
) Y Y ( bn , SSB =

b
1 j
2
... . j .
) Y Y ( an
,
SSAB =
2
... . j . .. i
a
1 i
b
1 j
. ij
) Y Y Y Y ( n +

, SSE =
2
. ij
a
1 i
b
1 j
n
1 k
ijk
) Y Y (

.
Of course, SSTO = SSA + SSB + SSAB + SSE. Note the sums of squares are calculated the same
way as if the factors were fixed. The first four ANOVA table columns are the same whether the
factors are fixed or random.
Point Estimators of the Five Parameters
. MSE s and Y
2 2
...
The ANOVA estimators of the variance components are
, bn / ) n MSE MSA ( , n / ) MSE MSAB (
2
ab
2
a
2
ab

. an / ) n MSE MSB (
2
ab
2
b

Note: We initially test MSAB/MSE F by 0 : H
2
ab 0
. The theory is that if this null hypothesis
is true, then F ~ F((a-1)(b-1),ab(n-1)).
a) if 0 : H
2
ab 0
is rejected, test MSA/MSAB. F by 0 : H
2
a 0
The theory is that if this null
hypothesis is true, then F ~ F(a-1,(a-1)(b-1)) , and also test MSB/MSAB. F by 0 : H
2
b 0
The
theory is that if this null hypothesis is true, then
F ~ F(b-1,(a-1)(b-1)).
b) if 0 : H
2
ab 0
is not rejected,
i) do a) above or
ii) in a) above use MSE instead of MSAB or
iii) pool (combine) SSAB and SSE for doing the tests of
2
b
2
a
and . Then
F =
1) b - a - SSE)/(abn (SSAB
MSB) or ( MSA
+ +
~ F (a-1 or b-1,(abn-a-b+1)), if H
0
is true.
Example: Assume in the ductility experiment the machines and suppliers are random. The first four
columns of the ANOVA table, the estimate of , the estimate of
2

and the F test for


2
ab
are from
proc anova;
class machine supplier;
model ductility = machine supplier machine*supplier;
7.993333 =
Source DF Anova SS Mean Square
machine 4 3.59880000 0.89970000
supplier 2 2.47104667 1.23552333
machine*supplier 8 5.88672000 0.73584000
Error 15 3.49790000 0.23319333 =
2 2
s
Corrected Total 29 15.45446667
From the ANOVA table, the F statistic for testing
2
ab
=0 is .73584/.23319333 = 3.16, with p-
value=0.0262 from EXCEL.
Therefore, assuming we are p-value engineers, the F test for H
0
:
2
ab
= 0 rejects this null
hypothesis. We argue there is significant variability among the random interaction effects.
We can obtain the ANOVA estimators of the variance components
2
ab
2
b
2
a
, , from the mean
squares in the ANOVA table. For example,
2513 . 0 2 / ) 23319333 . 73584 (.
2
ab
.
However, it may be easier to let PROC VARCOMP, or some software, do the calculations. The
SAS code
proc varcomp method=type1;
class machine supplier;
model ductility = machine supplier machine*supplier;
will give the ANOVA estimates of the variance components and the EMS column.
Dependent Variable: ductility
Type 1 Analysis of Variance
Sum of
Source DF Squares Mean Square Expected Mean Square
machine 4 3.598800 0.899700 Var(Error) + 2Var(machine*supplier)
+ 6 Var(machine)
supplier 2 2.471047 1.235523 Var(Error) + 2Var(machine*supplier)
+ 10 Var(supplier)
machine*supplier 8 5.886720 0.735840 Var(Error) + 2Var(machine*supplier)
Error 15 3.497900 0.233193 Var(Error)
Corrected Total 29 15.454467 . .
Type 1 Estimates
Variance Component Estimate
Var(machine) 0.02731
Var(supplier) 0.04997
Var(machine*supplier) 0.25132
Var(Error) 0.23319
Note the ANOVA estimates are the same as the MIVQU estimates and the REML estimates. The
MLEs are slightly different. Recall, in proc varcomp, method = type1 gives the MIVQU estimates
and method=ml and method=reml gives the maximum likelihood estimates and the restricted
maximum likelihood estimates. Here is the SAS code
proc varcomp method=ml;
class machine supplier;
model ductility = machine supplier machine*supplier;
proc varcomp method=reml;
class machine supplier;
model ductility = machine supplier machine*supplier;
and the results
MIVQUE=ANOVA=REML MLE
Var(machine) 0.02731 0
Var(supplier) 0.04997 0.0033222
Var(machine*supplier) 0.25132 0.27863
Var(Error) 0.23319 0.23319
Since H
0
:
2
ab
= 0 was rejected, the F test for the significance of
2
b
2
a
and has MSAB in the
denominator of the F statistic. That is,
0.24611 value - p 1.6791, MSB/MSAB F by tested is 0 vs. 0 : H
0.37345 value - p 1.2227, MSA/MSAB F by tested is 0 vs. 0 : H
2
b
2
b
0
2
a
2
a 0
>
>
We would conclude the interaction effect variance is significant, and the main effect variances are
not significant. We could argue that almost 45% of the estimated total variance is attributed to the
interaction effect, using ANOVA or MIVQUE or REML estimators.
Note: Suppose we were = 0.01 significance level engineers. Then F = 3.16 leads to not rejecting
H
0
:
2
ab
= 0. Then, using iii in the note above, and pooling SSAB and SSE gives for 0 : H
2
a 0
, F
= MSA / [(5.88672 + 3.4979)/23] = .8997 / .40803 = 2.205 with
p-value = 0.10018, from EXCEL, and for 0 : H
2
b
0
, F = 1.235523 / .40803 = 3.028 with p-
value = 0.068, from EXCEL. Both results are very different from the results based on rejecting
. : H
2
ab 0

Repeated Measures
An application of axb factorial models is known as repeated measures designs, and where the
repeated measures are over time, the analysis is referred to as longitudinal data analysis. Repeated
measures data arises when a number of measurements are taken from each of several experimental
units allocated to one or more treatments, usually at various times. When the model is over time
the data can be viewed as a time series. There is a huge literature on repeated measures designs and
analysis. We will consider two such models.
The first model deals with a single level of a factor. We assume a response is to be studied over
time. The experimental entities are referred to as subjects, and the subjects in the study are
assumed to be a random sample from a large population. (Think of, for example, a random sample
of machine operators who are selected for a training program). Repeated measures of the response
are made over time at fixed, and preferably equally spaced time intervals. Assume a random
sample of b subjects, where each subject's response is measured at a time periods.
For example, suppose a random sample of 8 people take part in a study of a lithium carbonate
formulation in a pill. On a given day, each person is given a dose of 300 milligrams, and blood
samples are taken three, six, nine and twelve hours later, and the amount of lithium in the sample is
recorded. The data are
Person Lithium -- hours after drug administration
3 6 9 12
1 .467 .300 .233 .160
2 .533 .300 .200 .178
3 .467 .233 .200 .156
4 .520 .280 .200 .145
5 .680 .440 .360 .244
6 .458 .320 .218 .111
7 .433 .367 .300 .133
8 .510 .325 .215 .165
The data are in a SAS .dat file called lithium.dat. Here it is
1 3 .467 5 9 .36
1 6 .3 5 12 .244
1 9 .233 6 3 .458
1 12 .16 6 6 .320
2 3 .533 6 9 .218
2 6 .3 6 12 .111
2 9 .2 7 3 .433
2 12 .178 7 6 .367
3 3 .467 7 9 .300
3 6 .233 7 12 .133
3 9 .2 8 3 .510
3 12 .156 8 6 .325
4 3 .52 8 9 .215
4 6 .28 8 12 .165
4 9 .2
4 12 .145
5 3 .68
5 6 .44
It is common in longitudinal analysis to plot the time series of values.
The plots are from the SAS code, in lithium.sas,
In general the data will look like an axb factorial with no replication, (or with sampling where n=1)
The model is a randomized complete block model with time the fixed factor of interest, at a levels,
and where subjects are the blocks, and random. The model therefore is a mixed model with no
interaction and no replication. The model is

; Y
ij j i ij
+ + +
i = 1,2,...,a; j = 1,2,...,b where
j
a
1 i
i
, 0

~ NID(0,
2
b
), ij

~ NID (0,
),
2

and ij

are independent random effects, is the


overall mean, and where response, on the i period time of effect fixed the is
i
and where
j
is the
random effect of subject j, and where there is no time-subject interaction.
There is nothing new here. But what is important about this application is that since factor A is real
time, and with equally spaced time periods, SSA can be broken down into a-1 mutually othogonal
polynomial time effect sum of squares, each with one degree of freedom to test for the significance
of the individual effect.

Duncan's multiple comparison procedure applied to the subjects finds subject 5 significantly
different from the seven other subjects.
The second model extends the single level of a factor model to several levels. Suppose there are a
factor levels. For each level, a random sample of n subjects is selected or designed. The n subjects
are nested in the levels of the factor. A given subject, at a given level has repeated responses at
fixed intervals of time, implying the factor and time are as in an axb factorial. Let
i
be the effect
of fixed factor level i and let
j
be the effect of fixed time factor level j and let ()
ij
be the
interaction effect of factor and time. Finally, let
k(i)
be the random effect of subject k nested in
factor level i. Then the model is

n 1,2,..., k b; 1,2,..., j a; 1,2,..., i ; ) ( Y
ijk ) i ( k ij j i ijk
+ + + + +
where


a
1 i
b
1 j
a
1 i
b
1 j
ij
ij j i
0 ) ( ) (
, ) i ( k

~ NID(0, )
2

, ijk

~ NID(0,
2
),
and all random variables are independent.
For example, suppose in the lithium carbonate study there were a = 4 formulations, three, A, B, C,
were pills with different chemical substances, and D a liquid solution. Suppose
n = 2 of the eight people were randomly assigned to each formulation. The data are

Person (nested in formulation) Lithium -- hours after drug administration
3 6 9 12
1 C .433 .289 .256 .187
2 A .476 .333 .240 .210
3 C .502 .345 .308 .295
4 B .680 .440 .360 .320
5 D .467 .233 .133 .133
6 A .533 .300 .200 .187
7 D .433 .210 .205 .150
8 B .543 .333 .267 .267

Das könnte Ihnen auch gefallen