MTH6134 Notes11

MTH6134 Statistical Modelling II
September 2011
1 Introduction
In the course Statistical Modelling I, regression models were studied in which there was a
dependent variable, Y , and p 1 explanatory variables, X
1
, . . . , X
p1
. The variable Y
was continuous quantitative and the Xs were quantitative. The focus of the course was the
general linear model given by
Y = X + ,
where Y = (Y
1
, . . . , Y
n
)
is the vector of responses, X is the n p design matrix, =

(
0
,
1
, . . . ,
p1
)
is the parameter vector and = (

1
, . . . ,
n
)
is the error vector. Note

that the number p of columns of X is greater than the number p1 of explanatory variables,
since the matrix has an additional column in which all elements are equal to one. It was
assumed that has zero mean vector and covariance matrix
2
I
n
, where I
n
denotes the
identity matrix of order n.
In this course, we consider the case where the Xs are qualitative variables or factors. We
rst study the completely randomized design, which is a generalization of the two-sample t
test. We then meet the randomized block design, which is an extension of the matched pairs
t test. Factorial designs are then introduced which enable interactions between dierent
factors to be investigated. Next, we study nested designs, where factors are nested within
other factors. Finally, we show how the theory of linear models may be developed within
the framework of linear algebra, using ideas such as subspace and dimension. Throughout
the course, the statistical computing package GenStat will be used to illustrate the main
ideas.
1.1 Example of qualitative explanatory variable
Suppose that we wish to compare three types of petrol. The mileage per gallon was measured
in 12 cars of the same model and the data are given below.
1
Mileage Petrol
24.0 A
25.0 A
24.3 A
25.5 A
25.3 B
26.5 B
26.4 B
27.0 B
27.6 B
23.3 C
24.0 C
24.7 C
Here there is one factor, namely the type of petrol, with three levels A, B and C. The
response variable mileage is continuous. An appropriate model for this data would be
Y
ij
= +
i
+
ij
,
where Y
ij
is the mileage of the jth car using the ith petrol type, for i = 1, 2, 3 and j =
1, 2, . . . , r
i
, is an overall mean (or some other constant - see later),
i
is the eect of the
ith petrol type and
ij
is a random error.
In this module we will learn to model data similar to this example and more complex data
sets with several factors.
1.2 Observational and experimental studies
The above data could have arisen from two dierent situations and the above model would
be appropriate in both cases.
In a simple random sample 12 cars are randomly selected from a population of cars of
the same type, then a record is made of the type of petrol the owners are using and the fuel
consumption is recorded. This is an observational study.
In a completely randomized design petrol types A, B and C are randomly allocated to
4, 5 and 3 cars respectively, and the fuel consumption from each car is recorded. This is an
experimental study.
In this course you will learn about analysis of data from experimental studies and especially
completely randomized designs. However, many of the methods are also used in observational
studies.
Sometimes in experiments the randomization is restricted in a randomized block design,
in which similar individuals are grouped so that one in each group receives each treatment.
We will also learn how to analyse data from this and other more complex designs.
2
Some of you will learn more about designing experiment in the Design of Experiments
module.
2 Completely Randomized Design
2.1 Introduction
This chapter considers the analysis of variance for a completely randomized design
(CRD) also known as one-way ANOVA. We wish to compare t treatments, for example,
varieties of wheat, dierent fertilisers, dierent blood pressure pills, and so on, which are the
levels of a single factor. Suppose that we have n experimental units, observational
units or plots, such as plots of land or patients. Then we allocate the treatments to the
units in some random way.
Suppose that r
i
units receive treatment i for i = 1, 2, . . . , t, so that
t
i=1
r
i
= n. Then we
measure a response y
ij
on the jth unit of the ith treatment, which could represent the yield
of wheat, yield of potatoes or change in blood pressure. The basic one-way model for a CRD
is given by
Y
ij
= +
i
+
ij
for i = 1, 2, . . . , t and j = 1, 2, . . . , r
i
, where is an overall mean (or some other constant -
see later),
i
is the eect of the ith treatment and
ij
is random error. It is assumed that
ij
N(0,
2
), all independent. This implies E(Y
ij
) = +
i
and V ar(Y
ij
) =
2
.
We are interested in whether there are dierences between the treatments. The null hypoth-
esis is H
0
:
1
=
2
= . . . =
t
and the alternative is H
1
: at least two of the parameters
are dierent. We can test this hypothesis by nding the analysis of variance (ANOVA)
table to test the null hypothesis.
2.2 Analysis of variance
There is an algorithm to nd the ANOVA table. First note that we can represent the data
in the form below.
Treatment 1 Treatment 2 . . . Treatment t
y
11
y
21
. . . y
t1
y
12
y
22
. . . y
t2
.
.
.
.
.
.
.
.
.
y
1r
1
y
2r
2
. . . y
trt
Dene the total sum of squares
S
G
=
t
i=1
r
i
j=1
(y
ij
y
..
)
2
,
3
the treatment sum of squares
S
T
=
t
i=1
r
i
(y
i.
y
..
)
2
and the residual sum of squares
S
E
=
t
i=1
r
i
j=1
(y
ij
y
i.
)
2
,
where y
i.
and y
..
are the ith group mean and overall mean respectively. Note that the sums of
squares are always non-negative. Similar to the analysis of variance in Statistical Modelling
I, the decomposition of the total sum of squares is
S
G
= S
T
+ S
E
.
Let T
1
=
r
1
j=1
y
1j
, . . . , T
t
=
rt
j=1
y
tj
and G =
t
i=1
T
i
. Then
t
i=1
r
i
j=1
(y
ij
y
..
)
2
=
t
i=1
r
i
j=1
_
y
ij

G
n
_
2
=
t
i=1
r
i
j=1
_
y
2
ij
2y
ij
G
n
+
G
2
n
2
_
=
t
i=1
r
i
j=1
y
2
ij
2
G
n
t
i=1
r
i
j=1
y
ij
+ n
G
2
n
2
=
t
i=1
r
i
j=1
y
2
ij

G
2
n
.
Similarly it can be shown that
S
T
=
t
i=1
T
2
i
r
i
G
2
n
and
S
E
=
t
i=1
r
i
j=1
y
2
ij

t
i=1
T
2
i
r
i
.
Verify these two equations as an exercise.
The algorithm to nd the ANOVA is as follows:
1. Calculate the treatment totals T
1
=
r
1
j=1
y
1j
, . . . , T
t
=
rt
j=1
y
tj
.
4
2. Calculate the grand total G =
t
i=1
T
i
.
3. Calculate the quantity G
2
/n, which is often referred to as the correction factor.
4. Find the treatment sum of squares
S
T
=
t
i=1
T
2
i
r
i
G
2
n
.
5. Find the total sum of squares
S
G
=
t
i=1
r
i
j=1
y
2
ij

G
2
n
.
6. Find the residual sum of squares (also known as the error sum of squares)
S
E
= S
G
S
T
=
t
i=1
r
i
j=1
y
2
ij

t
i=1
T
2
i
r
i
.
The ANOVA table for a CRD is given below.
Source SS df MS F
Treatments S
T
t 1 M
T
= S
T
/(t 1) M
T
/M
E
Residual S
E
n t M
E
= S
E
/(n t)
Total S
G
n 1
The quantities M
T
and M
E
are called the mean squares for treatments and residual,
respectively. Note that S
T
and S
E
in the table always add up to S
G
. Similarly the degrees
of freedom t 1 for Treatments and n t for Residual always sum to n 1, that is the
degrees of freedom for Total.
The total sum of squares can be partitioned as
t
i=1
r
i
j=1
(y
ij
y
..
)
2
=
t
i=1
r
i
(y
i.
y
..
)
2
+
t
i=1
r
i
j=1
(y
ij
y
i.
)
2
,
where y
i.
and y
..
are the ith group mean and the overall mean respectively. In other words,
the total variability in a completely randomized design which we observe in the response
variable (i.e. total sum of squares) can be divided into two sources - the between group
sum of squares and the within group sum of squares.
The ANOVA table for a CRD can be written as
Source SS df MS F
Between groups S
T
t 1 M
T
= S
T
/(t 1) M
T
/M
E
Within groups S
E
n t M
E
= S
E
/(n t)
Total S
G
n 1
5
where M
T
denotes the mean square for treatments and M
E
denotes the residual mean
square.
In order to test for dierence between the treatments we use the F test with test statistic
F =
M
T
M
E
.
If H
0
is true, then
F F
t1,nt
,
where F
1
,
2
denotes Fishers F distribution with
1
and
2
degrees of freedom. So we reject
H
0
at the 100% level of signicance if F > F
t1,nt,
, where F
t1,nt,
denotes the per cent
point corresponding to which can be looked up in the New Cambridge Statistical Tables.
Example 2.1 Petrol example revisited.
Suppose that we wish to compare three types of petrol. The mileage per gallon was measured
in 12 cars of the same model and the data are given below.
A B C
24.0 25.3 23.3
25.0 26.5 24.0
24.3 26.4 24.7
25.5 27.0
27.6
So the treatment totals are T
A
= 98.8, T
B
= 132.8 and T
C
= 72.0. Thus, the grand total is
G = 98.8 + 132.8 + 72.0 = 303.6,
so that the correction factor is
G
2
n
=
303.6
2
12
= 7681.08.
Next, the treatment sum of squares is
S
T
=
98.8
2
4
+
132.8
2
5
+
72
2
3
7681.08 = 14.448.
Similarly, the total sum of squares is
S
G
= 7700.78 7681.08 = 19.70,
so that the residual sum of squares is
S
E
= 19.70 14.448 = 5.252.
Hence, the ANOVA table is given below.
6
Source SS df MS F
Petrols 14.448 2 7.224 12.379
Residual 5.252 9 0.5836
Total 19.70 11
To test for dierences between the types of petrol at the 1% level of signicance we compare
the observed value of F = 12.379 with the per cent point F
2,9,0.01
= 8.022. Since F > F
2,9,0.01
we reject the null hypothesis of no dierences and conclude that there are dierences.
Alternatively we can nd the p-value which is the probability P = P(F
2,9
> 12.379). From
tables, we have F
2,9,0.005
= 10.11 and F
2,9,0.001
= 16.39, so that 0.001 < P < 0.005. Using
software the exact value can be found as P = 0.003. Thus, there is strong evidence against
H
0
, that is, strong evidence that there are dierences between the types of petrol.
To see why the F test uses the ratio of the mean squares M
T
and M
E
we calculate the
expected mean squares E(M
T
) and E(M
E
), that is the expected values of M
T
and M
E
.
In doing so we use the fact that the variance of every random variable Y can be written as
V ar(Y ) = E(Y
2
) [E(Y )]
2
so that E(Y
2
) = V ar(Y ) + [E(Y )]
2
. The expected value of the
residual sum of squares is equal to
E(S
E
) = E
_
t
i=1
r
i
j=1
y
2
ij

t
i=1
T
2
i
r
i
_
=
t
i=1
r
i
j=1
E(y
2
ij
)
t
i=1
1
r
i
E(T
2
i
)
=
t
i=1
r
i
j=1
_
V ar(y
ij
) + [E(y
ij
)]
2
_
i=1
_
1
r
i
(V ar(T
i
) + [E(T
i
)]
2
)
_
=
t
i=1
r
i
j=1
_
2
+ ( +
i
)
2
_
i=1
_
1
r
i
(r
i
2
+ [r
i
( +
i
)]
2
)
_
= n
2
+
t
i=1
r
i
( +
i
)
2
t
2
i=1
r
i
( +
i
)
2
= (n t)
2
and so the expected mean square for residual is equal to
E(M
E
) = E
_
S
E
n t
_
=
2
.
This shows that the mean square for residual M
E
is an unbiased estimator of
2
. Similarly,
the treatment sum of squares has expectation
E(S
T
) = E
_
t
i=1
T
2
i
r
i
G
2
n
_
7
=
t
i=1
1
r
i
E(T
2
i
)
1
n
E(G
2
)
=
t
i=1
1
r
i
(r
i
2
+ [r
i
( +
i
)]
2
)
1
n
_
_
n
2
+
_
t
i=1
r
i
( +
i
)
_
2
_
_
= t
2
+
t
i=1
r
i
( +
i
)
2
1
n
_
t
i=1
r
i
( +
i
)
_
2
= (t 1)
2
+
t
i=1
r
i
(
2
+ 2
i
+
2
i
)
1
n
_
t
i=1
r
i
+
t
i=1
r
i
i
_
2
= (t 1)
2
+ n
2
+ 2
t
i=1
r
i
i
+
t
i=1
r
i
2
i

1
n
_
_
n
2
2
+ 2n
t
i=1
r
i
i
+
_
t
i=1
r
i
i
_
2
_
_
= (t 1)
2
+
t
i=1
r
i
2
i

1
n
_
t
i=1
r
i
i
_
2
= (t 1)
2
+
t
i=1
r
i
(
i
)
2
,
where we have set =
1
n
t
i=1
r
i
i
. Hence the expected mean square for treatments is equal
to
E(M
T
) = E
_
S
T
t 1
_
=
2
+
t
i=1
r
i
(
i
)
2
t 1
.
It follows that
i
= 0 for every i if and only if the null hypothesis H
0
:
1
= . . . =
t
is
true. Hence if H
0
is true, then E(M
T
) =
2
. In other words: If H
0
is true, then the mean
square for treatments is also an unbiased estimator of
2
.
Now we can understand the logic behind the F test: M
E
is always an unbiased estimator of
2
and M
T
is unbiased for
2
if and only if H
0
is true. We expect that for a given set of
data, both M
E
and M
T
will not be too far away from their expected values. This implies: If
H
0
is true, then F = M
T
/M
E
will be close to 1. However, if H
0
is not true, then M
T
will be
greater than M
E
, and so F will be greater than 1. Hence large values of F provide evidence
against the null hypothesis of no treatment dierences, and H
0
is rejected at signicance
level if F is larger than the appropriate per cent point F
t1,nt,
of the F distribution.
2.3 Least squares estimation
Recall that the model is
Y
ij
= +
i
+
ij
for i = 1, 2, . . . , t and j = 1, 2, . . . , r
i
. We can t the model using the method of least squares,
which is equivalent to the method of maximum likelihood if
ij
N(0,
2
). The least squares
8
estimates and
1
, . . . ,
t
of and
1
, . . . ,
t
are obtained by minimizing
S =
t
i=1
r
i
j=1
(y
ij

i
)
2
.
From the estimates we obtain the tted values as y
ij
= +
i
and the corresponding
residuals are e
ij
= y
ij
y
ij
.
In order to minimize S we calculate partial derivatives. Now,
S
= 2
t
i=1
r
i
j=1
(y
ij

i
) = 0 n +
t
i=1
r
i
i
= G
and
S
k
= 2
r
k
j=1
(y
kj

k
) = 0 r
k
+ r
k
k
= T
k
for k = 1, 2, . . . , t. So we have t + 1 equations in t + 1 unknowns. These are known as the
normal equations. However, replacing k by i, since the sum of the second equation for
i
over i = 1, , t, gives the rst equation, the equations have innitely many solutions. We
say that the model is overparameterized, since there is one more parameter than is really
needed.
To see this more clearly, we re-write the equations using matrix notation. Thus the equations
are equivalent to the following system of equations:
_
_
_
_
_
_
_
n r
1
r
2
r
t
r
1
r
1
0 0
r
2
0 r
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
r
t
0 . . . 0 r
t
_
_
_
_
_
_
_
_
_
_
_
_
_
_
2
.
.
.
t
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
G
T
1
T
2
.
.
.
T
t
_
_
_
_
_
_
_
(2.1)
Note that the (t+1)(t+1) matrix on the left-hand side has rank t and that its rst column
is the sum of the remaining columns. Similarly the rst row is the sum of the remaining
ones. It is straightforward to verify that any vector of length t + 1 which is of the form
_
_
_
_
_
c
c
.
.
.
c
_
_
_
_
_
+
_
_
_
_
_
0
T
1
/r
1
.
.
.
T
t
/r
t
_
_
_
_
_
is a solution, where c can be any real number. In other words,
= c,
1
=
T
1
r
1
c, . . . ,
t
=
T
t
r
t
c
9
are a set of least squares estimates for any choice of c. Sometimes it is convenient to use y
i.
to denote the average T
i
/r
i
of the responses for treatment i, in which case the estimates of
the s become
i
= y
i.
c for i = 1, . . . , t. It is important to note that regardless of the
choice of c the tted values are always the same. In particular,
y
ij
= +
i
= c + y
i.
c = y
i.
does not depend on c. Similarly the residuals are equal to e
ij
= y
ij
y
ij
= y
ij
y
i.
irrespective
of the choice of c.
In order to get a unique set of least squares estimates we need to x a value for c. In
principle, we could choose any value. However, a choice that is in line with the interpretation
of as the overall mean is to dene c as the mean G/n of all the responses, for which y
..
is
a shorthand notation. This is equivalent to imposing the constraint
t
i=1
r
i
i
= 0
on the s. Thus for c = y
..
or equivalently under the constraint
t
i=1
r
i
i
= 0 the least
squares estimates are:
= y
..
,
1
= y
1.
y
..
, . . . ,
t
= y
t.
y
..
Sometimes other values such as c = 0 are chosen, but we will not consider such alternative
choices here.
In the module Statistical Modelling I you met the normal equations in the form
X
X = X
y
which at rst sight looks dierent from the normal equations given in (2.1). However, if we
dene a matrix X with n rows and t + 1 columns as well as vectors y of length n and of
length t + 1 appropriately by
X =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0
.
.
.
.
.
.
.
.
.
1 1 0
.
.
.
.
.
.
1 0 1
.
.
.
.
.
.
.
.
.
1 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, y =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
y
11
.
.
.
y
1r
1
.
.
.
y
t1
.
.
.
y
trt
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
and =
_
_
_
_
_
_
_
1
.
.
.
t
_
_
_
_
_
_
_
10
then it is not dicult to see that
X
X =
_
_
_
_
_
_
_
n r
1
r
2
r
t
r
1
r
1
0 0
r
2
0 r
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 0
r
t
0 . . . 0 r
t
_
_
_
_
_
_
_
and X
y =
_
_
_
_
_
_
_
G
T
1
T
2
.
.
.
T
t
_
_
_
_
_
_
_
which shows that the normal equations in (2.1) are exactly of the same form as those in
Statistical Modelling I. Note, however, that as we had observed before X
X has rank t and

so does not have an inverse. Hence, we cannot calculate the least squares estimates by means
of the formula

= (X
X)
1
X
y. As we have seen, we can get around this problem by

imposing the constraint
t
i=1
r
i
i
= 0 to make the estimates unique.
2.4 Checking model assumptions
The random errors
ij
in the one-way ANOVA model are assumed
1. to have mean E(
ij
) = 0 and
2. equal variances V ar(
ij
) =
2
,
3. to be independent and
4. to have a normal distribution.
As part of the analysis we want to check if these assumptions do at least hold approximately
for the data we are looking at. To this end we use the residuals e
ij
= y
ij
y
i.
from tting
the model. Using the method of least squares implies that the sum of the residuals is always
equal to zero, that is
t
i=1
r
i
j=1
e
ij
= 0
and so this is always in agreement with the rst assumption.
In order to check the other assumptions plots similar to those you met in Statistical Mod-
elling I can be used:
A normal probability plot of the residuals can be used to check normality. Simi-
larly, a histogram of the residuals can be used to check symmetry and whether the
distribution has a single mode, but this is less accurate than a normal probability plot.
To check for independence we look at a plot of the residuals against the order in
which the responses were collected. In the simplest case, the residuals would be plotted
against the observation numbers 1, . . . , n.
11
To check for evidence that
2
is constant, we plot the residuals against the treatments
or against the tted values. In other words, we look at the residuals for each group
separately. Another possibility is to create a plot showing a separate boxplot for each
treatment.
More formally, we can use Bartletts test to check the assumption of equal variances. If
we denote the variances for the t groups corresponding to the treatments by
2
1
, . . . ,
2
t
then
this tests the null hypothesis H
0
:
2
1
=
2
2
= . . . =
2
t
that all the variances are the same
against the alternative H
1
that at least two variances are dierent.
For every i = 1, . . . , t let
s
2
i
=
1
r
i
1
r
i
j=1
(y
ij
y
i.
)
2
be the sample variance for the ith treatment group. The test statistic of Bartletts test is
then dened as
B =
K
1 + L
where
K = (n t) log(M
E
)
t
i=1
(r
i
1) log(s
2
i
)
and
L =
1
3(t 1)
_
t
i=1
1
r
i
1

1
n t
_
.
For large n and normally distributed data, B
2
t1
if H
0
:
2
1
=
2
2
= . . . =
2
t
is true. The
test rejects H
0
at the 100% level of signicance if B >
2
t1,
, where
2
t1,
is the per cent
point corresponding to of the chi-square distribution with t 1 degrees of freedom. This
can be looked up in the New Cambridge Statistical Tables. Since Bartletts test is sensitive
to the normality assumption, its use is only recommended if the data can be assumed to be
normal.
2.5 Further analyses
The ANOVA F test allows us to test if the treatments have dierent eects on the response
variable. If the null hypothesis of no dierences between the treatments is rejected, then
we can conclude that some treatments inuence the response in a dierent way than others.
The F test does however not tell us where these dierences lie. In the petrol Example 2.1
we saw, for example, that the null hypothesis of no dierences between the types A, B and
C of petrol is rejected at the 1% level. This result does however not tell us if the eect of A
is dierent from the eect of B or if B and C are dierent etc.
12
Frequently, we have some idea about where the dierences could lie before analyzing or even
before collecting the data. In this case, we can formulate one or more specic hypotheses
prior to the analysis. Tests of such hypotheses are called pre-planned comparisons and
are the subject of this section.
For example, suppose that we want to compare treatments 1 and 2. Then we wish to test
H
0
:
1
=
2
, or, equivalently, H
0
:
1
2
= 0 against H
1
:
1
2
= 0. This is equivalent
to testing if the population means for the two groups corresponding to the treatments 1 and
2 are the same.
Since each of the random variables Y
ij
in the model equation of the one-way ANOVA model
has mean E(Y
ij
) = +
i
it is straightforward to see that the mean of the responses for
treatment 1 has expected value
E(y
1.
) = E
_
1
r
1
r
i
j=1
y
1j
_
=
1
r
1
r
i
j=1
E(y
1j
) =
1
r
1
r
i
j=1
( +
1
) = +
1
where for convenience of notation we ignore the distinction between the random variables Y
ij
and the corresponding values y
ij
. Similarly, we have E(y
2.
) = +
2
. Hence the dierence
y
1.
y
2.
of the means is an unbiased estimator of
1
2
, since
E(y
1.
y
2.
) = E(y
1.
) E(y
2.
) = +
1
( +
2
) =
1
2
.
Next, since all the random errors and hence the responses are independent we have (by
using only facts about variances from Introduction to Statistics) that
V ar(y
1.
y
2.
) = var(y
1.
) + var(y
2.
) =

2
r
1
+

2
r
2
.
So, under H
0
:
1
2
= 0,
y
1.
y
2.
N
_
0,
2
_
1
r
1
+
1
r
2
__
,
and thus
y
1.
y
2.
_
2
_
1
r
1
+
1
r
2
_
N(0, 1).
The
2
in the denominator is unknown, so we replace it with an estimate. This idea is
exactly the same as the one that was used in Introduction to Statistics to derive t tests.
Which estimate should we use? At this point it is important to note that although we are
currently interested in the comparison of treatments 1 and 2 we are still in the situation of
the one-way ANOVA model where there are t treatments in total.
13
We saw earlier that in this model the residual mean square M
E
is an unbiased estimator
of
2
. Hence, we replace the
2
with M
E
. It follows that under H
0
:
1

2
= 0 the test
statistic
T =
y
1.
y
2.
_
M
E
_
1
r
1
+
1
r
2
_
has a t
nt
distribution, where the degrees of freedom n t are those for Residual in the
ANOVA table. The null hypothesis H
0
:
1

2
= 0 is then rejected at the 100% level if
the modulus |T| is greater than t
nt,/2
. Note that the point corresponding to /2 is used
since we are testing H
0
against the two-sided alternative H
1
:
1
2
= 0.
In a similar way, the equality of any other pair of treatment eects can be tested. To this
end, we only need to replace the values of y
1.
and y
2.
as well as the replications r
1
and r
2
in
the formula for T by the appropriate values for the treatments that are to be compared.
Suppose that before the analysis we had decided to compare petrol types A and C. The rst
of these has treatment total T
A
= 98.8 and replication 4 and so the corresponding mean is
24.7. Similarly, the treatment total for C is T
C
= 72 and since the replication is 3 this gives
rise to a mean of 24.
From the ANOVA table in Example 2.1 we nd the residual mean square as M
E
= 0.5836
and so the value of the test statistic for testing the null hypothesis that the eects of A and
C are equal is
T =
24.7 24.0
_
0.5836
_
1
4
+
1
3
_
= 1.199.
The residual degrees of freedom in the ANOVA table are equal to n t = 9 and so the
critical value for testing the null hypothesis at the 5% level can be found in Table 10 of the
New Cambridge Statistical Tables to be equal to t
9,0.025
= 2.262. Since |T| < t
9,0.025
there
is no evidence at the 5% signicance level of a dierence between A and C. We would have
reached the same conclusion, if we had used the p-value.
In general, we may be interested in a pre-planned comparison involving not only two, but
three or more of the treatments. To this end, we consider tests of contrasts, which are linear
combinations of the s in the model equation of the one-way ANOVA. More precisely, these
are dened as follows: Any linear combination
t
i=1
i
with
t
i=1
i
= 0
and where
1
, . . . ,
t
are known real numbers is called a contrast in the one-way ANOVA
model. We will be testing null hypotheses of the form H
0
:
t
i=1
i
= 0 against alternatives
given by H
1
:
t
i=1
i
= 0. Note that the null hypothesis H
0
:
1

2
= 0 for comparing
14
treatments 1 and 2 considered earlier is exactly of this form since
1
2
=
t
i=1
i
with
1
= 1,
2
= 1 and
3
= . . . =
t
= 0.
The derivation of the test for testing the null hypothesis H
0
:
1
2
= 0 was motivated by
the fact that E(y
1.
y
2.
) =
1

2
. Another way of looking at this is as follows: If the s
in the contrast
t
i=1
i
where
1
= 1,
2
= 1 and
3
= . . . =
t
= 0 are replaced with
the means of the treatment groups, then
E
_
t
i=1
i
y
i.
_
=
t
i=1
i
.
That is the linear combination of the group means which uses the coecients of the contrast
is an unbiased estimator of the contrast. This does not only hold for the specic contrast
corresponding to the comparison of treatments 1 and 2, but is true in general, since for any
contrast
t
i=1
i
we have
E
_
t
i=1
i
y
i.
_
=
t
i=1
i
E(y
i.
) =
t
i=1
i
( +
i
) =
t
i=1
i
+
t
i=1
i
=
t
i=1
i
.
Note that in the nal step of this calculation we have used that by denition the s which
dene the contrast add to zero, that is
t
i=1
i
= 0.
A test for testing H
0
:
t
i=1
i
= 0 can now be derived along the same lines as for
comparing only two treatments. First, since all the responses are independent the variance
of
t
i=1
i
y
i.
is equal to
V ar
_
t
i=1
i
y
i.
_
=
t
i=1
2
i
V ar(y
i.
) =
t
i=1
2
i
2
r
i
=
2
t
i=1
2
i
r
i
.
Secondly, if H
0
is true, then
t
i=1
i
y
i.
N
_
0,
2
t
i=1
2
i
r
i
_
,
and so

t
i=1
i
y
i.
_
t
i=1
2
i
r
i
N(0, 1).
Finally, substituting the residual mean square M
E
for
2
in the denominator gives the test
statistic
T =
t
i=1
i
y
i.
_
M
E
t
i=1
2
i
r
i
.
If H
0
:
t
i=1
i
= 0 is true, then this has a t
nt
distribution. So when testing this null
hypothesis against the alternative H
1
:
t
i=1
i
= 0 we reject H
0
at the 100% level of
signicance if the modulus |T| is greater than t
nt,/2
.
15
To make things more specic, suppose we have t = 3 treatments. Moreover, suppose we
believe that treatments 1 and 3 have similar eects. Rather than comparing treatment 2
with 1 and 3 separately, we may wish to test if the mean response for treatment 2 is dierent
from the average of the mean responses for treatments 1 and 3. This gives rise to the null
hypothesis H
0
: (
1
+
3
)/2 =
2
, or, equivalently, H
0
:
1
2
2
+
3
= 0 which is tested
against H
1
:
1
2
2
+
3
= 0. So here the contrast involved in the hypotheses is
3
i=1
i
where
1
= 1,
2
= 2 and
3
= 1 which add to zero as required. Hence the test statistic
for testing H
0
against H
1
is
T =
y
1.
2y
2.
+ y
3.
_
M
E
_
1
r
1
+
4
r
2
+
1
r
3
_
which has a t
nt
distribution if H
0
is true.
Suppose that we wish to compare petrol type B with A and C. The treatment totals, repli-
cations and means for A, B, and C are:
Treatment Total Replication Mean
A 98.8 4 24.70
B 132.8 5 26.56
C 72.0 3 24.00
Using these and the residual mean square M
E
= 0.5836 from the ANOVA in Example 2.1
the value of the test statistic for testing H
0
:
1
2
2
+
3
= 0 against H
1
:
1
2
2
+
3
= 0
is
T =
24.7 2 26.56 + 24.0
_
0.5836
_
1
4
+
4
5
+
1
3
_
= 4.919.
The critical value for the test at the 1% level of signicance can be looked up in Table 10 of
the New Cambridge Statistical Tables as t
9,0.005
= 3.250. Since |T| = 4.919 is greater than
this we conclude that petrol type B is dierent from the others.
2.6 Method of orthogonal contrasts
We have seen how contrasts can be used to test specic hypotheses in the one-way ANOVA
model. In this section we look at the method of orthogonal contrasts which allows us
to test t 1 orthogonal contrasts. These contrasts are chosen such that the treatment sum
of squares S
T
can be partitioned into t 1 independent components, one for each test.
In general, there are many sets of t 1 contrasts which allow us to decompose S
T
into inde-
pendent components and we can choose the contrasts which correspond to the comparisons
of the treatments we are most interested in, provided the contrasts are orthogonal. We will
explain in a second what that exactly means.
16
In what follows we will be considering contrasts of the form
L
k
=
t
i=1
ki
i
where L
k
is only a shorthand notation for the expression on the righthand side of the equa-
tion. The rst subscript k of
ki
is only used to distinguish the coecients associated with
L
k
from those of other contrasts such as L
which would be denoted as

1
, . . . ,
t
. Note
that since L
k
is a contrast we have
t
i=1
ki
= 0. Similarly, for L
we have
t
i=1
i
= 0.
We can now dene what orthogonality of contrasts means: Two contrasts L
k
=
t
i=1
ki
i
and L
t
i=1
i
are orthogonal to each other if
t
i=1
ki
i
r
i
= 0
where as usual r
i
is the replication of treatment i. A set of more than two contrasts is said to
be orthogonal if every pair of contrasts in the set is orthogonal in the sense of the denition
above. Note that the condition for orthogonality simplies if all treatments have the same
replication, that is if r
1
= . . . = r
t
. In that case, the above condition for orthogonality of L
k
and L
is equivalent to
t
i=1
ki
i
= 0.
Bearing in mind that the t = 3 treatments 1, 2 and 3 are, respectively, the petrol types A,
B and C, the replications are r
1
= 4, r
2
= 5 and r
3
= 3. We might be interested in the
contrast L
1
=
1
3
with coecients
11
= 1,
12
= 0 and
13
= 1. Moreover, we might
be interested in the contrast L
2
= 4
1
7
2
+ 3
3
whose coecients are
21
= 4,
22
= 7
and
23
= 3. Now L
1
and L
2
are orthogonal since
t
i=1
1i
2i
r
i
=
1 4
4

0 7
5

1 3
3
= 0.
Try to nd another pair of orthogonal contrasts yourself.
Now suppose that as in the petrol example there are t = 3 treatments and n responses.
Moreover, suppose that L
1
=
t
i=1
1i
i
and L
2
=
t
i=1
2i
i
are any two orthogonal
contrasts. To test H
0
: L
1
= 0 we use the test statistic
T =
t
i=1
1i
y
i.
_
M
E
t
i=1
2
1i
r
i
17
from Section 2.5, which has a t
nt
distribution if H
0
: L
1
= 0 is true. Note that n t is the
number of residual degrees of freedom in the ANOVA. Similarly, to test H
0
: L
2
= 0 we use
the test statistic
T =
t
i=1
2i
y
i.
_
M
E
t
i=1
2
2i
r
i
which also has a t
nt
distribution if H
0
: L
2
= 0 is true.
Recall that if a random variable T has a t
distribution, then T
2
has an F
1,
distribution.
Applying this fact to L
1
and L
2
respectively it follows that
(
t
i=1
1i
y
i.
)
2
M
E
t
i=1
2
1i
r
i
F
1,nt
if H
0
: L
1
= 0 is true and that
(
t
i=1
2i
y
i.
)
2
M
E
t
i=1
2
2i
r
i
F
1,nt
if H
0
: L
2
= 0 is true.
Furthermore, it can be shown that the treatment sum of squares can be decomposed as
follows:
S
T
=
(
t
i=1
1i
y
i.
)
2
t
i=1
2
1i
r
i
+
(
t
i=1
2i
y
i.
)
2
t
i=1
2
2i
r
i
.
We illustrate this decomposition of the treatment sum of squares for the petrol data and the
two orthogonal contrasts from Example 2.4.
For the petrol types A, B and C with replications r
1
= 4, r
2
= 5 and r
3
= 3 consider again
the orthogonal contrasts L
1
=
1
3
with coecients
11
= 1,
12
= 0 and
13
= 1. and
L
2
= 4
1
7
2
+ 3
3
whose coecients are
21
= 4,
22
= 7 and
23
= 3. For these set
S
1
=
(
t
i=1
1i
y
i.
)
2
t
i=1
2
1i
r
i
S
2
=
(
t
i=1
2i
y
i.
)
2
t
i=1
2
2i
r
i
.
Then
S
1
+S
2
=
(y
1.
y
3.
)
2
1
4
+
1
3
+
(4y
1.
7y
2.
+ 3y
3.
)
2
16
4
+
49
5
+
9
3
=
(
1
16
T
2
1

1
6
T
1
T
3
+
1
9
T
2
3
)
7
12
+
(T
1

7
5
T
2
+T
3
)
2
84
5
18
=
T
2
1
6
+
7T
2
2
60
+
T
2
3
4

T
1
T
2
+T
1
T
3
+T
2
T
3
6
=
T
2
1
4
+
T
2
2
5
+
T
2
3
3

T
2
1
+T
2
2
+T
2
3
+ 2T
1
T
2
+ 2T
1
T
3
+ 2T
2
T
3
12
=
T
2
1
4
+
T
2
2
5
+
T
2
3
3

G
2
12
= S
T
as required.
Both of the properties which have been illustrated before for a one-way ANOVA model with
t = 3 treatments generalize to situations with an arbitrary number of treatments: First the
treatment sum of squares can be partitioned into t 1 independent components. Secondly,
if such a component is divided by the residual mean square, then the resulting ratio has an
F
1,nt
distribution, where n t is the number of residual degrees of freedom. The following
theorem states the general result.
Theorem 2.1 Suppose that there are t treatments and that L
1
, L
2
, . . . , L
t1
are orthogonal
contrasts, where L
k
=
t
i=1
ki
i
for k = 1, . . . , t 1. Then
S
T
= S
1
+ . . . + S
t1
where
S
k
=
(
t
i=1
ki
y
i.
)
2
t
i=1
2
ki
r
i
which are called the contrast sums of squares. Further, F
k
= S
k
/M
E
F
1,nt
under H
0
that
t
i=1
ki
i
= 0.
The contrasts in Theorem 2.1 are often referred to as single degree of freedom contrasts
since each of them accounts for one degree of freedom of the treatment sum of squares. A
useful consequence of the decomposition of S
T
into the contrast sums of squares is that the
tests of the orthogonal contrasts can be shown as part of the ANOVA table. We will see how
this can be done in GenStat. Moreover, breaking down the treatment sum of squares into
various components is a general idea that will be further explored in subsequent chapters.
In the following example, rst the one-way ANOVA as described in Section 2.2 is carried
out. This is followed by illustrating the calculations involved in testing t 1 orthogonal
contrasts. Finally, an expanded ANOVA table from GenStat is shown which includes the
tests of the contrasts.
Example 2.6 Cereals.
Suppose that we wish to compare ve machines which put cereals into packets. For each of
these there are ten observations available.
Machines 1 and 2 are ve years old and have recently been serviced.
Machines 3 and 4 are also ve years old but due to be serviced.
Machine 5 is one year old.
19
The data below give the deviations from 500 grammes for the ve machines for each of 10
cereal packets. The nal row of the table contains the treatment totals T
1
, . . . , T
5
.
Machine
1 2 3 4 5
4.0 6.3 3.1 7.3 -3.0
-2.1 1.6 11.7 8.7 4.0
3.0 1.8 4.8 7.8 0.9
0.9 6.5 6.7 4.4 1.6
2.7 3.6 3.3 4.1 3.4
5.7 0.9 5.2 8.2 2.3
1.5 -1.8 8.1 6.0 0.2
-0.6 4.9 3.6 2.1 3.3
-4.0 0.9 6.5 7.2 4.4
-3.1 -0.4 9.3 5.1 2.1
8.0 24.3 62.3 60.9 19.2
So the grand total is G = 174.7, so that the correction factor is G
2
/n = 610.4018. Next, the
treatment sum of squares is
S
T
=
1
10
(8.0
2
+ 24.3
2
+ 62.3
2
+ 60.9
2
+ 19.2
2
) 610.4018 = 250.9212.
Similarly, the total sum of squares is
S
G
= 1179.03 610.4018 = 568.6282,
so that the residual sum of squares is
S
E
= 568.6282 250.9212 = 317.7070.
Source SS df MS F
Machines 250.9212 4 62.73 8.885
Residual 317.7070 45 7.06
Total 568.6282 49
The p-value is P = P(F
4,45
> 8.885) < 0.001. Thus, there is overwhelming evidence against
H
0
that there are no dierences between the machines.
We may be in interested in the following comparisons:
comparison of serviced machines: L
1
=
1
2
,
comparison of unserviced machines: L
2
=
3
4
,
comparison of serviced machines with new one: L
3
=
1
+
2
2
5
,
comparison of unserviced ones with all others: L
4
= 2
1
+ 2
2
3
3
3
4
+ 2
5
.
20
It is easily veried that all of these contrasts are orthogonal. For each k = 1, . . . , 4 we can
now calculate the contrast sum of squares S
k
corresponding to the contrast L
k
and the value
of F
k
= S
k
/M
E
in Theorem 2.1 which is used for testing the null hypothesis H
0
: L
k
= 0.
For L
1
we have
S
1
=
(0.8 2.43)
2
0.1 + 0.1
= 13.2845,
so that F
1
= 13.2845/7.06 = 1.88. So the p-value is P = P(F
1,45
> 1.88) > 0.1. Similarly,
for L
2
we have
S
2
=
(6.23 6.09)
2
0.1 + 0.1
= 0.0980,
so that F
2
= 0.0980/7.06 = 0.014. Obviously, P > 0.1. Next, for L
3
we have
S
3
=
(0.8 + 2.43 2 1.92)
2
0.1 + 0.1 + 0.4
= 0.6202,
so that F
3
= 0.6202/7.06 = 0.088. Again, it is clear that P > 0.1. Finally, for L
4
we have
S
4
=
{2 (0.8 + 2.43 + 1.92) 3 (6.23 + 6.09)}
2
3 0.4 + 2 0.9
= 236.9185,
so that F
4
= 236.9185/7.06 = 33.56. This time, P < 0.001. Hence, the result of the overall
F test is due to the dierence between the new and serviced machines and the unserviced
ones. Note that the sum of the above contrast sums of squares is
13.2845 + 0.0980 + 0.6202 + 236.9185 = 250.9212,
which is the treatment sum of squares.
21
Finally, we present an augmented ANOVA table from GenStat which in addition to the
overall F test also includes the tests of the contrasts L
1
, . . . , L
4
.
Variate: y
Source of variation d.f. s.s. m.s. v.r. F pr.
machine 4 250.921 62.730 8.89 <.001
L1 1 13.285 13.285 1.88 0.177
L2 1 0.098 0.098 0.01 0.907
L3 1 0.620 0.620 0.09 0.768
L4 1 236.919 236.919 33.56 <.001
Residual 45 317.707 7.060
Total 49 568.628
It is not dicult to verify that what is presented in the table agrees with the previous
calculations.
2.7 Unplanned comparisons
In general, specic hypotheses about dierences between the treatments should be formulated
before looking at the data. Frequently, however, following the ANOVA F test, multiple
comparisons are performed which were not specied in advance. For example, all pairwise
hypotheses of the form H
0
:
k

l
= 0 might be tested. If there are t treatments, then
there are t(t 1)/2 possible pairs that could be tested. In this section we briey look at the
most common method for comparing pairs of treatments.
2.7.1 Least signicant dierence (LSD) method
Suppose all t treatments have the same replication r. This method is a two-step procedure
which gives a lower bound called the least signicant dierence (LSD) for the dierence
y
k.
y
l.
of the group means for treatments k and l at which the two treatments can be
declared as having dierent eects. It works as follows:
1. Carry out the ANOVA F test of H
0
:
1
= . . . =
t
at the 100% level of signicance.
2. Only if H
0
is rejected compute
LSD = t
nt,
2
_
M
E
2
r
,
where n t are the residual degrees of freedom and M
E
is the residual mean square
from the ANOVA. For any pair of treatments k and l reject H
0
:
k

l
= 0 if
|y
k.
y
l.
| > LSD.
22
The rst step of the procedure controls the probability of incorrectly identifying at least one
pair of treatments as dierent. Note that |y
k.
y
l.
| > LSD is true if and only if the t test
from Section 2.5 rejects H
0
:
k

l
= 0, that is if and only if
|t| =
|y
k.
y
l.
|
_
M
E
_
1
r
+
1
r
_
> t
nt,/2
.
If the treatments have unequal replication, it is just as easy to carry out all pairwise t tests.
Example 2.7 Cereal example revisited.
Suppose that we wish to compare the ve machines at the 5% signicance level. Then, for
the LSD method, we have
LSD = t
45,0.025
_
M
E
2
10
= 2.014
_
7.06
2
10
= 2.394.
Now, the treatment means are y
1.
= 0.8, y
2.
= 2.43, y
3.
= 6.23, y
4.
= 6.09 and y
5.
= 1.92.
Thus there are no signicant dierences between machines 1, 2 and 5, and machines 3 and
4, but there are signicant dierences between machines from the dierent clusters (1, 2, 5)
and (3, 4).
2.8 Regression approach to ANOVA
In this section we take a second look at the relationship between the one-way ANOVA model
and the multiple regression model from Statistical Modelling I. When we considered least
squares estimation in Sec. 2.3 we saw that for the one-way ANOVA we can dene a matrix X
and vectors y and such that the normal equations (2.1) can be written as X
X = X
y.
However, due to the overparameterization the matrix X
X did not have an inverse and so

we could not use the formula (X
X)
1
X
y to estimate .
We now see how the constraint
t
i=1
r
i
i
= 0 can be used to dene another matrix

X and
a corresponding vector

such that the vector

can be estimated by means of the formula
(

X)
1

X
y. Moreover, the resulting estimates, tted values etc. agree with what was
derived in previous sections.
The following model
y
ij
= +
i
+
ij
for i = 1, , t and j = 1, , r
i
, can be written as an equivalent multiple regression model
y
ij
=
0
+
1
x
1j
+ +
t
x
tj
+
ij
where x
ij
is the value of an indicator variable X
i
which is equal to
x
ij
=
_
1 if observational unit has received treatment i
0 otherwise.
23
The ys and the s are the same as before and the s are dened as follows:
0
=
i
=
i
for i = 1, , t.
For simplicity, let i = 1, 2, 3 and j = 1, 2, i.e. all treatments have replication r
i
= 2, then
under treatment 3, we have x
1j
= 0, x
2j
= 0 and x
3j
= 1, so that
y
3j
=
0
+
1
x
1j
+
2
x
2j
+
3
x
3j
+
3j
=
0
+
3
+
3j
= +
3
+
3j
which illustrates the equivalence of the models.
The following table shows more clearly that the above multiple regression model and the
one-way ANOVA model are indeed equivalent, since if for each observational unit we write
down the values of the explanatory variables and substitute those in the multiple regression
equation, then we get exactly the one-way ANOVA equation. Thus the multiple regression
model changes only the representation, but not the meaning of the one-way ANOVA model.
Treatment applied to a unit x
1j
x
2j
x
3j
response from multiple regression model
1 1 0 0 y
1j
=
0
+
1
+
1j
= +
1
+
1j
2 0 1 0 y
2j
=
0
+
2
+
2j
= +
2
+
2j
3 0 0 1 y
3j
=
0
+
3
+
3j
= +
3
+
3j
Thus we have a multiple regression model with explanatory variables X
1
, X
2
and X
3
. In
matrix notation this reads
y = X + ,
where
y =
_
_
_
_
_
_
_
_
y
11
y
12
y
21
y
22
y
31
y
32
_
_
_
_
_
_
_
_
, X =
_
_
_
_
_
_
_
_
1 1 0 0
1 1 0 0
1 0 1 0
1 0 1 0
1 0 0 1
1 0 0 1
_
_
_
_
_
_
_
_
, =
_
_
_
_
3
_
_
_
_
and =
_
_
_
_
_
_
_
_
11
12
21
22
31
32
_
_
_
_
_
_
_
_
.
The rst column of X is the sum of the remaining three. This is known as aliasing, since the
eect of the intercept cannot be distinguished from the combined eect of the explanatory
variables X
1
, X
2
and X
3
. It is called intrinsic aliasing because the aliasing arises as a
consequence of the overparameterization.
As in Sec. 2.3 because of the aliasing X
X has no inverse and so cannot be estimated by

means of the formula (X
X)
1
X
y. In that section the constraint
t
i=1
r
i
i
= 0 was used
24
to get a set of unique least squares estimates. We can use this to redene X and in such
a way that X
X does have an inverse. First the constraint
t
i=1
r
i
i
= 0 implies that
t
=
r
1
r
t
r
2
r
t
2
. . .
r
t1
r
t
t1
. (2.2)
That is, the last treatment parameter
t
can expressed as a combination of the other pa-
rameters
1
, . . . ,
t1
.
Expressing
t
as a combination of
1
, ,
t1
motivates another multiple regression model,
which is also equivalent to the ANOVA model. The model equation for this is
y
ij
=
0
+
1
x
1j
+ +
t1
x
(t1)j
+
ij
where
0
= and
i
=
i
for i = 1, , t1 as before. Further, x
ij
is the value of a dummy
variable

X
i
equal to
x
ij
=
_
_
_
1 if unit has received treatment i
r
i
rt
if unit has received treatment t
0 otherwise.
For the above example the constraint
t
i=1
r
i
i
= 0 then implies
3
=
1

2
since all
three treatments have the same replication. Then the model equation is
y
ij
=
0
+
1
x
1j
+
2
x
2j
+
ij
where
0
= and
i
=
i
for i = 1, 2 and
x
ij
=
_
_
_
1 if unit has received treatment i
1 if unit has received treatment 3
0 otherwise.
In matrix notation this model is
y =

X
+
where
X =
_
_
_
_
_
_
_
_
1 1 0
1 1 0
1 0 1
1 0 1
1 1 1
1 1 1
_
_
_
_
_
_
_
_
and

=
_
_
2
_
_
.
By setting up a table which is similar to the one for the previous multiple regression model
it is easy to see that this new regression model is again equivalent to the one-way ANOVA
model:
25
Treatment x
1j
x
2j
response from second multiple regression model
1 1 0 y
1j
=
0
+
1
+
1j
= +
1
+
1j
2 0 1 y
2j
=
0
+
2
+
2j
= +
2
+
2j
3 -1 -1 y
3j
=
0
2
+
3j
=
1
2
+
3j
= +
3
+
3j
Note that

X has one column less than X and that

contains one parameter less than .
However the matrix

X

X possesses an inverse and so

can be estimated as in Statistical
Modelling I using the formula

= (

X)
1

X
y. This gives estimates ,

1
and
2
from
which
3
=
1

2
can be obtained.
This regression approach generalizes to any number of treatments t and to situations where
the treatments have unequal replication which is now illustrated for the petrol example.
Note that in general to nd
t
equation (2.2) has to be used, with
i
replaced by
i
.
Recall that we wish to compare three types of petrol. Further, we have r
1
= 4, r
2
= 5 and
r
3
= 3. Using equation (2.2) the design matrix

X is
X =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0
1 1 0
1 1 0
1 1 0
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1
4
3

5
3
1
4
3

5
3
1
4
3

5
3
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
Now,

X =
1
3
_
_
36 0 0
0 28 20
0 20 40
_
_
, (

X)
1
=
1
60
_
_
5 0 0
0 10 5
0 5 7
_
_
, and

X
y =
_
_
303.6
2.8
12.8
_
_
and so
= (

X)
1

X
y =
_
_
25.30
0.60
1.26
_
_
.
Thus the estimates of the parameters are = 25.3,
1
= 0.6 and
2
= 1.26. From these
we obtain

3
=
r
1
r
3

1
r
2
r
3

2
=
4
3
0.6
5
3
1.26 = 0.8 2.1 = 1.3,
26
which are exactly of the form given in Sec. 2.3: = y
..
and
i
= y
i.
y
..
for i = 1, 2, 3.
Hence, the tted values are
y
1.
= +
1
= 25.3 0.6 = 24.7,
y
2.
= +
2
= 25.3 + 1.26 = 26.56
and
y
3.
= +
3
= 25.3 1.3 = 24.0,
as before.
Further, for the model y =

X
+ it can be veried that the ANOVA table for testing the

overall signicance of the regression in the Statistical Modelling I notes is exactly the same
as the ANOVA table in Example 2.1.
GenStat uses the indicator variables X
i
to perform the calculations for ANOVA, but uses a
dierent constraint. It ts
y
ij
=
0
+
2
x
2j
+ +
t
x
tj
+
ij
.
Note that there is no X
1
in this model formula. This is equivalent to using the constraint
1
= 0 in our ANOVA model.
Under this constraint, the parameter is interpreted as the mean response from treatment 1
and, for i = 2, . . . , t,
i
is the dierence in mean response between treatment i and treatment
1. This is sometimes called the baseline parameterization and is especially useful if one
treatment is a standard, or control, treatment. Clearly, the interpretation of individual
parameters changes, although the overall model does not. The ANOVA, the F test and all
tests comparing contrasts will be exactly the same.
2.9 Random eects model
So far we have assumed that the parameters
1
, . . . ,
t
for the treatment eects are con-
stants. In this case, the treatments are said to have xed eects. This is appropriate when
the particular treatments investigated are the focus of interest; for example when they are
dierent drugs, or varieties of wheat etc. we wish to compare.
Another common situation occurs when experimenters are interested in a factor with a large
number of possible levels. The main interest is then not in comparing particular levels of
the factor, but in nding out if there is variability due to that factor. Since an experiment
cannot be performed with all the factor levels, a random sample of t of them is selected.
We say that such a factor is random. We make inferences about the entire population of
factor levels, not the individual levels that happen to be selected. This section considers the
one-way analysis of variance with a single random factor. We assume that the population of
factor levels is innite or at least very large.
27
Example 2.9 Ratings.
A company wants to look at its personnel ocers ratings of new recruits. Four personnel
ocers are chosen at random from the population of personnel ocers and each ocer is
randomly assigned three new recruits to assess. Interest is not in these four particular
personnel ocers, but the whole population of personnel ocers. Here, the factor personnel
ocer is random.
The model equation for the one-way ANOVA random eects model is
y
ij
= + a
i
+
ij
for i = 1, 2, . . . , t and j = 1, 2, . . . , r
i
, where y
ij
is the response and
ij
is random error. As
before it is assumed that
ij
N(0,
2
) for every i and j, and that all the
ij
are independent.
The parameter represents the overall mean. The eect of level i of the random factor is
denoted by a
i
. This change in the notation for the treatment eects highlights that each a
i
is now regarded as a random variable a
i
N(0,
2
a
), whereas the
i
in previous sections
were constants. It is assumed that a
1
, . . . , a
t
are independent and also that a
1
, . . . , a
t
and
the random errors
ij
are independent.
As a consequence of these assumptions we have E(y
ij
) = and V ar(y
ij
) =
2
a
+
2
for every
i and every j, where the quantities
2
a
and
2
are called variance components. Note how
these expressions for the expected value and the variance are dierent from the corresponding
ones in Section 2.1. Moreover, although all the random errors
ij
are independent, the y
ij
are not independent. This is another dierence from the model with xed treatment eects.
More specically, since dierent responses y
ij
and y
ij
for treatment i depend on a
i
, they will
be correlated, so that for j = j
cov(y
ij
, y
ij
) = E([y
ij
E(y
ij
)][y
ij
E(y
ij
)]) = E([a
i
+
ij
][a
i
+
ij
])
= E(a
2
i
) + E(a
i
ij
) + E(a
i
ij
) + E(
ij
ij
)
= E(a
2
i
) + E(a
i
)E(
ij
) + E(a
i
)E(
ij
) + E(
ij
)E(
ij
)
= E(a
2
i
) =
2
a
.
Note that in this calculation we have used the independence assumptions of the model, and
also the fact that by assumption E(a
i
) = E(
ij
) = E(
ij
) = 0. You should also note that
since E(a
i
) = 0 it follows that V ar(a
i
) = E(a
2
i
)
On the other hand, responses y
ij
and y
i
j
for dierent treatments i and i
are not correlated,

that is cov(y
ij
, y
i
j
) = 0 for all j and j
.
We are interested in testing the null hypothesis H
0
:
2
a
= 0 against H
1
:
2
a
> 0. The
null hypothesis says that the random treatment eects are all the same, whereas in view of
V ar(y
ij
) =
2
a
+
2
the alternative H
1
states that there is variability in the responses due to
the factor.
Now, it is perhaps surprising that this test can be done by using exactly the same ANOVA
table as for the one-way ANOVA model with xed treatment eects in Section 2.2. More
28
specically, to test H
0
:
2
a
= 0 against H
1
:
2
a
> 0 we can use the ANOVA F test in exactly
the same way as before. It is important to note, however, that although the calculations and
the ANOVA table will be identical, the interpretation is dierent.
The logic for using the test statistic F = M
T
/M
E
in order to test H
0
:
2
a
= 0 against
H
1
:
2
a
> 0 is the same as the one described in Section 2.2, although the detail do slightly
dier. In short, M
E
is always an unbiased estimator of
2
and M
T
is an unbiased estimator
of
2
, if H
0
:
2
a
= 0 is true. Thus if H
0
is true, then F will be close to one. Hence
H
0
is rejected if F is large. To show how the expected mean square E(M
T
) = E(
S
T
t1
) for
treatments depends on
2
a
we rst calculate E(S
T
). Now
E(S
T
) = E
_
t
i=1
T
2
i
r
i
G
2
n
_
=
t
i=1
E(T
2
i
)
r
i
E(G
2
)
n
and
E(T
2
i
) = V ar(T
i
) + [E(T
i
)]
2
= V ar
_
r
i
j=1
y
ij
_
+
_
E
_
r
i
j=1
y
ij
__
2
=
r
i
j=1
V ar(y
ij
) + 2
r
i
1
j=1
r
i
=j+1
cov(y
ij
, y
ij
) +
_
r
i
j=1
E(y
ij
)
_
2
= r
i
(
2
a
+
2
) + 2
_
r
i
2
_
2
a
+ [r
i
]
2
.
Similarly,
E(G
2
) = V ar(G) + [E(G)]
2
= V ar
_
t
i=1
T
i
_
+
_
E
_
t
i=1
T
i
__
2
=
t
i=1
V ar(T
i
) +
_
t
i=1
E(T
i
)
_
2
=
t
i=1
_
r
i
(
2
a
+
2
) + 2
_
r
i
2
_
2
a
_
+
_
t
i=1
r
i
_
2
= n(
2
a
+
2
) +
t
i=1
2
_
r
i
2
_
2
a
+ [n]
2
.
Hence, it follows that
E(S
T
) =
t
i=1
E(T
2
i
)
r
i
E(G
2
)
n
29
=
t
i=1
r
i
(
2
a
+
2
) + 2
_
r
i
2
_
2
a
+ [r
i
]
2
r
i
n(
2
a
+
2
) +
t
i=1
2
_
r
i
2
_
2
a
+ [n]
2
n
= t(
2
a
+
2
) +
t
i=1
(r
i
1)
2
a
+ n
2
(
2
a
+
2
)

2
a
n
t
i=1
r
i
(r
i
1) n
2
= (t 1)(
2
a
+
2
) + (n t)
2
a

2
a
n
t
i=1
r
i
(r
i
1)
= (t 1)
2
+
_
n 1
1
n
t
i=1
r
2
i
+ 1
_
2
a
= (t 1)
2
+
n
2
t
i=1
r
2
i
n

2
a
and so
E(M
T
) =
E(S
T
)
t 1
=
2
+
n
2
t
i=1
r
2
i
n(t 1)

2
a
which shows that E(M
T
) =
2
if and only if H
0
:
2
a
= 0 is true.
As usual, the expected residual mean square is E(M
E
) =
2
. So, if we want to estimate
2
a
,
a natural unbiased estimator would be

2
a
=
n(t 1)
n
2
t
i=1
r
2
i
(M
T
M
E
).
Note that this could give a negative estimate of
2
a
, in which case we take the estimate to be
zero. Also note that if each treatment is replicated r times so that r
i
= r for every i, then
the expressions for E(M
T
) and
2
a
simplify to E(M
T
) =
2
+ r
2
a
and

2
a
=
M
T
M
E
r
respectively.
Example 2.10 Ratings example revisited.
Each of the randomly selected four personnel ocers rates three new employees. The data
are given below
Personnel ocer
1 2 3 4
55 80 85 65
69 96 90 91
57 95 89 72
The treatment totals are T
1
= 181, T
2
= 271, T
3
= 264 and T
4
= 228 respectively and so the
grand total is G = 944. Moreover,
4
i=1
3
j=1
y
2
ij
= 76612. By carrying out the ANOVA as
described in Section 2.2 it is then easily veried that the ANOVA table is as follows:
30
Source SS df MS F
Personnel ocer 1699.33 3 566.44 6.96
Residual 651.33 8 81.42
Total 2350.67 11
If H
0
:
2
a
= 0 is true, then the F statistic has an F
3,8
distribution and to test H
0
at the
5% level of signicance we compare F = 6.96 with F
3,8,0.05
= 4.006 from Table 12(b) of the
New Cambridge Statistical Tables. Since F > F
3,8,0.05
we reject H
0
and conclude that there
is variability due to dierent personnel ocers.
The estimates of the variance components are
2
= 81.42 and
2
a
= (566.44 81.42)/3 =
161.67 which shows that the variation between personnel ocers is large compared to the
variation within an ocer.
Sometimes there are good reasons to believe that there is certain to be variability in the
responses due to a random factor. In that case the test of H
0
:
2
a
= 0 will not be very
informative since it can only conrm what is already known. In such cases, estimation and
condence intervals are more useful. Then it is usually easier to interpret the ratio
2
a
/
2
of
the variance components or the proportion of the total variance accounted for by the factor,
that is
2
a
2
a
+
2
which is also known as the intraclass correlation. In the example above,
2
a
= 161.67 is
only interpretable relative to
2
= 81.42.
Next we present condence intervals for both of these quantities. Simple formulas are,
however, only available if there is equal replication. We therefore restrict ourselves to the
case where r
i
= r for all i = 1, . . . , t.
If all treatments have the same replication r then it can be shown that
M
T
/(
2
+ r
2
a
)
M
E
/
2
F
t1,nt
.
So we may write
P
_
F
t1,nt,1
M
T
/(
2
+ r
2
a
)
M
E
/
2
F
t1,nt,
2
_
= 1 ,
which may be rearranged to yield
P
_
L

2
a
2
U
_
= 1 ,
where
L =
1
r
_
M
T
M
E
1
F
t1,nt,
2
1
_
and U =
1
r
_
M
T
M
E
1
F
t1,nt,1
2
1
_
.
31
Thus, (L, U) is a 100(1 )% condence interval for
2
a
/
2
. Similarly, a 100(1 )%
condence interval for
2
a
/(
2
a
+
2
) is given by
_
L
L + 1
,
U
U + 1
_
.
Example 2.11 Ratings example revisited.
Recall that t = 4, r = 3, M
T
= 566.44 and M
E
= 81.42. From tables, we have F
3,8,0.025
=
5.416 and F
3,8,0.975
= 1/F
8,3,0.025
= 1/14.54 = 0.0688. Thus, we have
L =
1
3
_
566.44
81.42 5.416
1
_
= 0.095 and U =
1
3
_
566.44
81.42 0.0688
1
_
= 33.373,
so that a 95% condence interval for
2
a
/
2
is (0.095, 33.373). Hence, a 95% condence
interval for
2
a
/(
2
a
+
2
) is (0.087, 0.971). Note that, since the sample size is small, we have
wide condence limits.
32
3 Randomized Block Design
3.1 Introduction
The one-way ANOVA model for a completely randomized design assumes that dierences in
the responses can be attributed to the treatments. Any additional variation in the responses
is regarded as being due to random error and is assumed to be roughly the same for each
group of observations.
Frequently, the units for which responses are collected can be divided into groups which share
some characteristic. Examples are adjacent plots of land in a eld, animals from the same
litter, measurements taken by the same person and so forth. By grouping similar units in
blocks the variation due to dierent litters, persons etc. can be separated from the variation
which is due to the treatments. Here, we consider situations where the grouping into blocks
is achieved by means of a qualitative variable which is consequently called a block factor.
We will see that taking into account the variation between blocks gives a more sensitive test
for detecting dierences between the treatments. More specically, the chapter considers
the analysis of variance of the randomized block design which is a generalization of
the matched pairs t test from Introduction to Statistics. The matched pairs t test gives a
better test of dierences than the two-sample t test because it takes account of the variation
between, for example, people by using blocks of matched or similar units. Here, we extend
these ideas to larger blocks.
Suppose that we have b blocks, each of size t so that the total number of observations is
n = tb. We want to compare the t treatments. One unit in each block is allocated to each
treatment at random. So the data can be represented in the form given below.
Block
1 2 . . . b Total
1 y
11
y
12
. . . y
1b
T
1
2 y
21
y
22
. . . y
2b
T
2
Treatment
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
t y
t1
y
t2
. . . y
tb
T
t
Total B
1
B
2
. . . B
b
G
Note that each treatment appears exactly once in each block. Hence each treatment i has
replication r
i
= b. The response to treatment i in block j is denoted by y
ij
which is essentially
the same notation as in Chapter 2. The totals T
1
, . . . , T
t
in the margins are the treatment
totals which are calculated in exactly the same way as before and the block totals B
1
, . . . , B
b
are merely the column sums. More formally, we have B
j
=
t
i=1
y
ij
for j = 1, . . . , b. Of
course both the treatment totals and the block totals add up to the grand total G.
33
3.2 Analysis of variance
The model for the above randomized block design is
y
ij
= +
i
+
j
+
ij
for i = 1, 2, . . . , t and j = 1, 2, . . . , b, where y
ij
is the response to treatment i in the jth
block, is the overall mean,
i
is the eect of the ith treatment,
j
is the eect of the jth
block and
ij
is random error. It is assumed that
ij
N(0,
2
), all independent. Note that
the above is a xed eects model, that is ,
1
, . . . ,
t
and
1
, . . . ,
b
are all constants.
The main interest is in testing if there are dierences between the treatments. The null
hypothesis is H
0
:
1
= . . . =
t
and the alternative is H
1
: at least two of the s are
dierent. In addition it is possible to test H
0
:
1
= . . . =
b
against H
1
: at least two of the
s are dierent. Usually, this second test is of lesser interest, because assumed dierences
between blocks are the reason for using a randomized block design. We can carry out both
tests by nding the ANOVA table.
The calculations for carrying out the analysis of variance are similar to those described in
Section 2.2. The total sum of squares is
S
G
=
t
i=1
b
j=1
y
2
ij

G
2
n
.
S
T
=
t
i=1
T
2
i
b

G
2
n
.
Note that the formulas for S
G
and S
T
are exactly the same as before, we only need to bear
in mind that each treatment has replication r
i
= b.
In addition to S
G
and S
T
there is now a sum of squares for blocks which is called the block
sum of squares. It is dened by
S
B
=
b
j=1
B
2
j
t

G
2
n
.
Note that S
B
is dened in a way similar to S
T
. All the sums of squares are non-negative.
Given the above sums of squares the residual sum of squares is now
S
E
= S
G
S
T
S
B
.
Thus as for the one-way ANOVA we have a decomposition of the total sum of squares which
is now given by
S
G
= S
T
+ S
B
+ S
E
.
34
Note that if we ignored the blocking and carried out a one-way ANOVA then S
G
and S
T
would be the same as for the randomized block design, but the residual sum of squares S
E
would be larger. In other words, the ANOVA for the randomized block design eliminates
the variation due to blocks from the residual sum of squares.
The ANOVA table for a randomized block design is as follows.
Source SS df MS F
Blocks S
B
b 1 M
B
=
S
B
b1
F
B
=
M
B
M
E
Treatments S
T
t 1 M
T
=
S
T
t1
F
T
=
M
T
M
E
Residual S
E
(t 1)(b 1) M
E
=
S
E
(t1)(b1)
Total S
G
n 1
We wish to test H
0
:
1
= . . . =
t
that there are no dierences in the response due to
treatments. If H
0
is true, then
F
T
=
M
T
M
E
F
t1,(t1)(b1)
.
Hence H
0
is rejected at the 100% level of signicance if F
T
> F
t1,(b1)(t1),
.
We can also test H
0
:
1
= . . . =
b
that there are no dierences due to blocks, that is, there
is no point in blocking. If H
0
is true, then
F
B
=
M
B
M
E
F
b1,(t1)(b1)
and H
0
is rejected at the 100% level of signicance if F
B
> F
b1,(b1)(t1),
. Note that, as
in the one-way ANOVA, M
E
is our estimate of
2
.
Example 3.1 Litters.
Suppose that we wish to compare the weight gain in pigs under ve diets. A randomized
block design was used in which there were four litters and each of the ve pigs in each litter
was randomly assigned to a dierent diet. The data together with the treatment and block
totals are given below.
Litter
1 2 3 4 Total
1 6 4 10 4 24
2 8 3 11 2 24
Diet 3 9 5 9 5 28
4 5 2 8 1 16
5 8 3 7 4 22
Total 36 17 45 16 114
35
The total sum of squares is
S
G
= 6
2
+ 8
2
+ . . . + 1
2
+ 4
2
114
2
20
= 810
114
2
20
= 160.2.
S
T
=
24
2
4
+
24
2
4
+
28
2
4
+
16
2
4
+
22
2
4

114
2
20
=
2676
4

114
2
20
= 19.2.
Similarly, the block sum of squares is
S
B
=
36
2
5
+
17
2
5
+
45
2
5
+
16
2
5

114
2
20
=
3866
5

114
2
20
= 123.4.
So the residual sum of squares is
S
E
= 160.2 19.2 123.4 = 17.6.
Source SS df MS F
Litters 123.4 3 41.13 28.04
Diets 19.2 4 4.80 3.27
Residual 17.6 12 1.467
Total 160.2 19
In order to test for dierences between the treatments at the 5% level of signicance the
value of F
T
= 3.27 is compared with F
4,12,0.05
= 3.259 which can be looked up in Table 12(b)
of the New Cambridge Statistical Tables. Since F
T
is greater than F
4,12,0.05
we reject the
null hypothesis of no dierences at the 5% level. Similarly, since F
B
= 28.05 is greater than
F
3,12,0.05
= 3.490 we also reject H
0
of no eects due to litters at the 5% level of signicance.
Note that, if the blocking was ignored, the ANOVA table would be as given below.
Source SS df MS F
Diets 19.2 4 4.8 0.51
Residual 141.0 15 9.4
Total 160.2 19
Now F = 0.51 is smaller than F
4,15,0.05
= 3.056 and so the F test in the one-way ANOVA
would not reject the null hypothesis of no dierences at the 5% level of signicance. So the
conclusion would have been dierent! This illustrates the importance of blocking.
Example 3.1 illustrates that if the blocking is eective, that is if there is variation due to the
blocks, then the test of H
0
:
1
= . . . =
t
for dierences between the treatments using F
T
will be more sensitive than the F test from the one-way ANOVA. Of course, things are not
always as extreme as in the example where substantially dierent conclusions were reached.
In general, if blocking is eective, M
E
will be smaller, often substantially smaller, in the
randomized block design than in the completely randomized design, since the variation
36
between blocks has been separated from the residual variance. This will lead to a larger
value of F
T
than F, as M
T
will be the same. However, if blocking is ineective, the test will
be less sensitive, as M
E
will be about the same, but is based on fewer degrees of freedom.
As for the one-way ANOVA model, we can check model assumptions: A normal probability
plot of the residuals can be used to check normality. To check that
2
is constant, we plot
the residuals against the tted values.
When comparing any pair of treatments, the variance of the dierence can be calculated as
in Section 2.5. We only need to observe that here all treatments have the same replication
which is equal to the number of blocks b, i.e. r
k
= r
l
= b for all k and l. Thus the standard
error of the dierence (SED), i.e. the square root of the variance, is given by
SED =
_
2
2
b
.
2
will of course be estimated by M
E
from the randomized block design.
Moreover pre-planned comparisons of treatments using contrasts can be carried out as in
Sec. 2.5 and 2.6. For unplanned comparisons we can use the LSD method from Sec. 2.7. For
both pre-planned and unplanned comparisons the formulas in Sec. 2.5, 2.6 and 2.7 apply.
All of these involve the residual mean square M
E
which has now to be taken from the
ANOVA table for the randomized block design. Also, all these methods require us to use the
t distribution or the F distribution. Since we use M
E
from the randomized block design, the
residual degrees of freedom are equal to (t 1)(b 1). Thus, we need to use the t
(t1)(b1)
or F
1,(t1)(b1)
distribution when a randomized block design is used.
To t the model
y
ij
= +
i
+
j
+
ij
to the data we use again the method of least squares. The least squares estimates ,
1
, . . . ,
t
and

1
, . . . ,

b
of the parameters ,
1
, . . . ,
t
and
1
, . . . ,
b
are obtained by minimizing
S =
t
i=1
b
j=1
(y
ij

i
j
)
2
.
To this end S is dierentiated with respect to , with respect to
i
and with respect to
j
.
Equating the resulting equations to zero then gives the normal equations. Now,
S
= 2
t
i=1
b
j=1
(y
ij

i
j
) = 0 n + b
t
i=1
i
+ t
b
j=1
j
= G,
37
S
i
= 2
b
j=1
(y
ij

i
j
) = 0 b + b
i
+
b
j=1
j
= T
i
,
S
j
= 2
t
i=1
(y
ij

i
j
) = 0 t +
t
i=1
i
+ t
j
= B
j
.
In total these are 1 +t +b equations in 1 +t +b unknowns: one equation for , t equations
for the s and b equations for the s. As in Section 2.3 the normal equations are over-
parameterized and there are innitely many solutions to the system. Now there are two
more parameters than are really needed.
Also as in Section 2.3 we employ constraints to get a unique solution ,
1
, . . . ,
t
and
1
, . . . ,

b
. This time there is one constraint for the s and one constraint for the s. For
the s we use
t
i=1
i
= 0
and the corresponding constraint for the s is
b
j=1
j
= 0.
Note that for the randomized block design considered here the constraint
t
i=1
i
= 0 is
equivalent to the constraint
t
i=1
r
i
i
= 0 we have used before, since all treatments have
replication r
i
= b.
By using the constraints
t
i=1
i
= 0 and
b
j=1
j
= 0 it follows from the normal equations,
that the estimates of ,
i
and
j
are respectively equal to
= G/n = y
..

i
=
T
i
b
= y
i.
y
..
j
=
B
j
t
= y
.j
y
..
for i = 1, . . . , t and j = 1, . . . , b. As before y
..
is the average of all the responses and y
i.
is the
mean of the responses which receive treatment i. Moreover, y
.j
is the mean of the responses
in block j.
The tted values are
y
ij
= +
i
+

j
= y
..
+ (y
i.
y
..
) + (y
.j
y
..
) = y
i.
+ y
.j
y
..
for i = 1, . . . , t and j = 1, . . . , b. Hence the residuals are equal to
e
ij
= y
ij
y
ij
= y
ij
y
i.
y
.j
+ y
..
38
for i = 1, . . . , t and j = 1, . . . , b. Note that as in Section 2.3 both the tted values y
ij
and the
residuals e
ij
do not depend on the particular constraints used to t the normal equations.
It was noted earlier that S
G
and S
T
are calculated in the same way as when we are doing
a one-way ANOVA. Consequently, as in Section 2.3 the sums of squares can be written in
terms of the treatment means and the overall mean. We have
S
G
=
t
i=1
b
j=1
(y
ij
y
..
)
2
and
S
T
=
t
i=1
b(y
i.
y
..
)
2
as all treatments have replication r
i
= b. Moreover, it is not dicult to verify that
S
B
=
b
j=1
t(y
.j
y
..
)
2
and that
S
E
=
t
i=1
b
j=1
(y
ij
y
i.
y
.j
+ y
..
)
2
=
t
i=1
b
j=1
e
2
ij
.
These alternative expressions show more clearly that all the sums of squares are nonnegative.
Of course the decomposition
S
G
= S
T
+ S
B
+ S
E
still holds if the above expressions for S
G
, S
T
, S
B
and S
E
are used.
3.4 Regression approach to tting the model
As for the one-way ANOVA there is an equivalent multiple regression model for the random-
ized block design, which allows us to estimate the parameters in the model equation
y
ij
= +
i
+
j
+
ij
by using the equation (X
X)
1
X
y from Statistical Modelling I. In order to be able to use

this equation we impose the same constraints on the parameters as in Section 3.3, that is
t
i=1
i
= 0 and
b
j=1
j
= 0. These two constraints mean that we have to use two sets of
dummy variables.
The model equation for this regression model is
y
ij
= +
1
x
1ij
+ . . . +
t1
x
(t1)ij
+
1
z
1ij
+ . . . +
b1
z
(b1)ij
39
where for k = 1, . . . , t 1
x
kij
=
_
_
_
1 if i = k
1 if i = t
0 otherwise
and for l = 1, . . . , b 1
z
lij
=
_
_
_
1 if j = l
1 if j = b
0 otherwise.
In matrix notation the multiple regression model for the randomized block design when using
the constraints
t
i=1
i
= 0 and
b
j=1
j
= 0 can be written as
y =

X
+ ,
where
y =
_
_
_
_
_
_
_
_
_
_
_
_
y
11
.
.
.
y
1b
.
.
.
y
t1
.
.
.
y
tb
_
_
_
_
_
_
_
_
_
_
_
_
,

=
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
t1
1
.
.
.
b1
_
_
_
_
_
_
_
_
_
_
_
and =
_
_
_
_
_
_
_
_
_
_
_
_
11
.
.
.
1b
.
.
.
t1
.
.
.
tb
_
_
_
_
_
_
_
_
_
_
_
_
.
The matrix

X has n = tb rows and t + b 1 columns. The rst column is for the overall
mean, followed by t 1 columns containing the values of x
1ij
, . . . , x
(t1)ij
as dened above
and b 1 columns containing the values of z
1ij
, . . . , z
(b1)ij
. We explain the denition of

X
below for the litters Example 3.1
By using the formula (

X)
1

X
y we obtain estimates of the overall mean,

1
, . . . ,
t1
of
the treatment eects and

1
, . . . ,

b1
of the block eects, which are exactly the same as those
in Section 3.3. By using the constraint
t
i=1
i
= 0 we can then nd
t
as
t
=
t1
i=1

i
.
Similarly,

b
can be obtained using the formula

b
=
b1
j=1
j
.
From the estimates we can obtain tted values and residuals and these also agree exactly
with those derived in Section 3.3. We now return to the litters Example 3.1 and look at how
the design matrix

X is dened.
40
Example 3.2 Litter example revisited.
Recall that we wish to compare ve diets and that each diet is assigned to one pig in each of
four litters, so that there t = 5 treatments and b = 4 blocks. The design matrix

X is shown
below:
X =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 1 0 0 0 1 0 0
1 1 0 0 0 0 1 0
1 1 0 0 0 0 0 1
1 1 0 0 0 1 1 1
1 0 1 0 0 1 0 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 0 1
1 0 1 0 0 1 1 1
1 0 0 1 0 1 0 0
1 0 0 1 0 0 1 0
1 0 0 1 0 0 0 1
1 0 0 1 0 1 1 1
1 0 0 0 1 1 0 0
1 0 0 0 1 0 1 0
1 0 0 0 1 0 0 1
1 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0
1 1 1 1 1 0 1 0
1 1 1 1 1 0 0 1
1 1 1 1 1 1 1 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
In the matrix, the rst four rows correspond to the replications of treatment 1 in the
blocks or litters 1, . . . , 4, followed by four rows for the replications of treatment 2 in the
blocks 1, . . . , 4 and so forth for the remaining three treatments. The horizontal lines indicate
this grouping.
Turning to the columns note that the vertical lines separate the column for the overall
mean from the columns for the treatments and those for the blocks. The rst column of

X
has all elements equal to 1, to account for the overall mean . The second column basically
indicates if a pig had received treatment 1. If so, the column contains a 1 in the corresponding
row and a 0 otherwise. However, when it comes to rows for the nal treatment we have to
be a bit careful and put a 1 in rows 17-20 of column 2. In a similar way, the next column
essentially indicates if a pig had received treatment 2. Note that now we have a 1 in rows 5-8,
a 1 in rows 17-20, and everywhere else a 0. The columns corresponding to the treatments 3
and 4 (that is columns 4 and 5) are lled in a similar manner. Note that there is no column
for treatment 5.
The nal three columns of the matrix are for the blocks and a similar coding is used to
represent if a pig was in block 1, block 2, block 3 or block 4. If we look at the rows for
treatment 1 rst, then we can see that the rst column for block has a 1 in its rst row since
the row corresponds to a pig in the rst block, rows 2 and 3 contain a zero since the pig was
neither in block 2 nor was it in block 3. It was also not in block 4, but since this is the last
block we put a 1 in that row. Similarly, the second column for blocks has a 1 in row 2 since
41
that row is for a pig in block 2, a 1 in row four and a 0 in rows one and three. Finally the
third column for blocks contains a 1 in row three, a 1 in row 4 and a 0 in rows one and
two. Note that there is no column for block 4 but that for the pig in row four 4, which was
in block 4, we have a 1 in all three columns for the blocks.
The same pattern is repeated for the four rows corresponding to the pigs which received
treatment 2. It is also repeated for the four rows corresponding to the pigs which received
treatment 3 and so forth.
Note that when

X is multiplied by the

vector dened earlier, then for every row this
gives
+
i
+
j
where
i
is for the treatment i which the pig represented by the row had received and
j
is
for the block j the pig was in. This shows that the multiple regression model is equivalent
to the ANOVA model for the randomized block design.
3.5 Fixed and random eects models
So far, we have assumed that the treatments and blocks have xed eects. However, either
treatments or blocks or both can also have random eects. As for the one-way ANOVA, the
analysis is unchanged, but interpretation is dierent.
We consider only the case where the treatments have xed and the blocks have random
eects. For instance, in Example 3.1 we might have selected a few litters out of all those
available. The model equation would then be modied to read
y
ij
= +
i
+ b
j
+
ij
where all terms are dened as in Section 3.2 except that the block eects are now represented
by random variables b
1
, . . . , b
b
. It would then be assumed that b
j
N(0,
2
b
) for j = 1, . . . , b
all independent. Moreover, all the b
j
and all the
ij
are assumed to be independent as well.
It is clear from the above that E(y
ij
) = +
i
, V ar(y
ij
) =
2
b
+
2
, Cov(y
ij
, y
i
j
) =
2
b
for
i = i
, Cov(y
ij
, y
i
j
) = 0 for i = i
, j = j
and Cov(y
ij
, y
ij
) = 0 for j = j
. The expected
block mean square is then equal to
E(M
B
) =
2
+ t
2
b
and the test of block dierences is now of H
0
:
2
b
= 0 against H
1
:
2
b
> 0, but the hypotheses
of the test for treatment dierences are still the same as before. For the randomized block
design, both tests are performed as described in Section 3.2 by using F
T
and F
B
from the
ANOVA table.
42
4 Factorial Designs
4.1 Introduction
So far we have considered situations where the treatments were unstructured. Many
experiments do however involve treatments which have a factorial structure. This means
that the treatments are all combinations of the levels of two or more factors. A design in
which the treatments are all combinations of the levels of several factors is called a factorial
design. In this chapter we look at the analysis of variance for such factorial designs.
Suppose there are two factors A and B with n
A
and n
B
levels respectively. In a factorial
experiment each treatment is a combination of a level of A with a level of B. Thus in total
there are t = n
A
n
B
treatments. All treatments have the same replication r and so the
number of observations is rn
A
n
B
. Of course the numbers n
A
and n
B
can be dierent.
Example 4.1 Steel.
A steel company studied the eect of carbon content and tempering temperature on the
strength of steel. The carbon content was high or low and the temperature was high or low
as well. So there was a factor A carbon content with n
A
= 2 levels and another factor B
temperature with n
B
= 2 levels. The t = 2 2 = 4 treatments are given below.
Treatment Description
1 Low carbon, low temperature
2 Low carbon, high temperature
3 High carbon, low temperature
4 High carbon, high temperature
4.2 Main eects and interactions

Suppose that for each treatment in Example 4.1 we had made r = 3 observations and
obtained the following means:
Levels of A
Levels of B 1 2
1 20 30
2 40 50
43
The means may be plotted as follows:
Level 1 of A Level 2 of A
10
20
30
40
50
60 Level 1 of B
Level 2 of B
From the table reporting the means and the plot we can see that at level 1 of B the eect
of changing A from 1 to 2 is 30 20 = 10. Similarly, at level 2 of B the eect of changing
the level of A from 1 to 2 is 50 40 = 10. Thus we have looked at the eect of changing
the level of A separately for each level of B. If we want to summarize the eect of in a single
measure then it is natural to average the above dierences at level 1 and level 2 of B, that
is to consider
1
2
((30 20) + (50 40)) =
1
2
(30 + 50)
1
2
(20 + 40) = 10.
This is called the main eect of A. Note that (30 + 50)/2 is the mean of all responses at
level 2 of A and that (20 + 40)/2 is the mean of all responses at level 1 of A.
In a similar way, at level 1 of A the eect of changing B from 1 to 2 is 40 20 = 20 and at
level 2 of A the eect of changing the level of B from 1 to 2is 50 30 = 20. Hence the main
eect of B is
1
2
((40 20) + (50 30)) =
1
2
(40 + 50)
1
2
(20 + 30) = 20.
Here (40 +50)/2 is the mean of all responses at level 2 of B and (20 +30)/2 the mean of all
responses at level 1 of B.
From these calculations and the plot we see that the eect of changing the level of A is the
same at each level of B and vice versa. We say there is no interaction between factors
A and B. If the squares were joined by a line, and similarly, if the triangles were joined
by a second line then these lines would be parallel. Parallel or almost parallel lines are
characteristic for situations where there is no interaction.
44
A similar plot can be drawn if the factors have more than two levels. Each symbol (square
and triangle here) then simply represents the mean of all the responses for one particular
combination of the levels. As we will see the calculation of the main eects is more compli-
cated if there are more than two levels. However, in a plot similar to the one shown here,
the lines connecting the means would still be roughly parallel.
Now suppose the means had been as follows:
Levels of A
Levels of B 1 2
1 20 40
2 50 12
The corresponding plot of the means is shown below. For clarity of presentation here the
lines connecting the means have been drawn.
Level 1 of A Level 2 of A
10
20
30
40
50
60 Level 1 of B
Level 2 of B
The main eect of A is
1
2
((30 20) + (12 40)) =
1
2
(30 + 12)
1
2
(20 + 40) = 9,
and the main eect of B is
1
2
((40 20) + (12 30)) =
1
2
(40 + 12)
1
2
(20 + 30) = 1.
45
Now, as can be easily seen, the eect of changing the level of A from 1 to 2 depends on
the level of B and vice versa. At the rst level of B, the eect of changing A from 1 to
2 is 40 20 = 20, but at the second level of B, the eect of changing A from 1 to 2 is
12 50 = 38. Since the eect of A depends on the level chosen for B (and vice versa), this
is called an interaction between the factors.
The lines connecting the means are not parallel. This is typical if there is an interaction.
Here the lines cross, but in other cases the interaction could give rise to a funnel shape.
4.3 Analysis of variance for two factors
The analysis of variance for a factorial design allows us to carry out separate tests for the
main eects of the factors and for the interactions. The idea is to split the treatment sum
of squares into sums of squares for the main eects and the interactions and to compare the
corresponding mean squares with the residual mean square. This is similar to what we did
in Section 2.6 on orthogonal contrasts. We rst look at the two-factor design.
Suppose there are two factors A and B with n
A
and n
B
levels, respectively, and r replicates
of each combination. For example, with n
A
= n
B
= r = 2, the data can be represented as
in the following 2 2 table:
Factor B
Factor A 1 2 Total
1 y
111
y
121
y
112
y
122
A
1
2 y
211
y
221
y
212
y
222
A
2
Total B
1
B
2
G
The general two-way ANOVA model for a two-factor design is
y
ijk
= +
i
+
j
+
ij
+
ijk
for i = 1, 2, . . . , n
A
, j = 1, 2, . . . , n
B
and k = 1, 2, . . . , r, where y
ijk
is the response of the
kth unit at the ith level of factor A and the jth level of factor B. The parameter is the
overall mean,
i
is the eect of the ith level of factor A,
j
is the eect of the jth level of
factor B,
ij
is the eect of interaction between the ith level of factor A and the jth level of
factor B. Note that some books write the interaction term as ()
ij
. All these parameters
are unknown constants. That is the model is a xed eects model. The
ijk
are random
errors. It is assumed that
ijk
N(0,
2
), all independent.
The pictures we looked at earlier suggest that the main eects have little meaning if there is
an interaction. So we test for interaction rst, and, if there is evidence for it, we do not try
testing the main eects. In that case we report a table of means for the treatments, that is
46
a table of the y
ij.
, the mean response from units with level i of factor A and level j of factor
B, i = 1, . . . , n
A
, j = 1, . . . , n
B
for the t = n
A
n
B
combinations of the levels of A with the
levels of B.
Thus we rst test the null hypothesis of no interaction, that is H
AB
0
: all
ij
are equal,
against the alternative H
AB
1
: at least two
ij
are dierent. If this is not rejected, then for
the main eect of factor A we can test H
A
0
:
1
= . . . =
n
A
against H
A
1
: at least two
i
are
dierent. Similarly, if the null hypothesis of no interaction is not rejected, for factor B we
test H
B
0
:
1
= . . . =
n
B
against H
B
1
: at least two
j
are dierent.
Frequently, in addition to giving the above model equation, the denition of the model
for the two-factor design includes the following constraints:
n
A
i=1
i
= 0,
n
B
j=1
j
= 0,
n
A
i=1
ij
= 0 and
n
B
j=1
ij
= 0. This does not change the ANOVA however and how we test
the hypotheses. We will only impose these constraints later when we will be looking at the
least squares estimation.
For the ANOVA we rst calculate the total, treatment and residual sums of squares in
essentially the same way as for the one-way ANOVA model. Subsequently, the treatment
sum of squares is broken down into three independent components for the factors A and B
and their interaction.
S
G
=
n
A
i=1
n
B
j=1
r
k=1
y
2
ijk

G
2
n
,
where G is the grand total. Note that although this involves three summations the formula
only says that we need to add up the squares of all the responses and subtract from this the
correction factor.
We next nd the treatment sum of squares, ignoring the factorial structure. Since there are
t = n
A
n
B
dierent treatments, we have
S
T
=
n
A
i=1
n
B
j=1
T
2
ij
r

G
2
n
,
where T
ij
is the total from units with level i for factor A and level j of factor B. Again, al-
though the notation is more complicated this is similar to the calculation of S
T
in Section 2.2.
The residual sum of squares is then given by
S
E
= S
G
S
T
.
Now, the sum of squares due to factor A is
S
A
=
n
A
i=1
A
2
i
n
B
r

G
2
n
,
47
where A
i
is the total from level i of factor A. Similarly, the sum of squares due to factor
B is
S
B
=
n
B
j=1
B
2
j
n
A
r

G
2
n
,
where B
j
is the total from level j of factor B. Finally, the interaction sum of squares is
dened as
S
AB
= S
T
S
A
S
B
.
Note that
S
G
= S
T
+ S
E
and further that
S
T
= S
A
+ S
B
+ S
AB
.
The ANOVA table is given below. It does not show S
T
, but only the components S
A
, S
B
,
S
AB
. It is easy to verify that similar to the sums of squares the degrees of freedom for
factor A, factor B and the interaction add up to t 1 = n
A
n
B
1, that is to the degrees
of freedom for treatments in the one-way ANOVA. Also note that the residual degrees of
freedom are equal to n
A
n
B
(r 1) = nt and that the the total degrees of freedom are equal
to n
A
n
B
r 1 = n 1, that is both degrees of freedom are equal to their counterparts in the
one-way ANOVA with t = n
A
n
B
unstructured treatments.
Source SS df MS F
Factor A S
A
n
A
1 M
A
F
A
=
M
A
M
E
Factor B S
B
n
B
1 M
B
F
B
=
M
B
M
E
Interaction S
AB
(n
A
1)(n
B
1) M
AB
F
AB
=
M
AB
M
E
Residual S
E
n t M
E
Total S
G
n 1
Note that, as before, M
E
is our estimate of
2
. If H
AB
0
is true, then F
AB
F
(n
A
1)(n
B
1),nt
,
where n = n
A
n
B
r and t = n
A
n
B
. If there is no interaction and H
A
0
is true, then F
A

F
n
A
1,nt
. Similarly, if there is no interaction and H
B
0
is true, then F
B
F
n
B
1,nt
.
Example 4.2 Bread.
A bakery supplies sliced wrapped bread to supermarkets. A factorial experiment was carried
out to investigate the eects of the height of a shelf (A) and the width of a shelf (B) in 12
supermarkets on the shelf-life of bread. There were three levels for A, bottom, middle and
top, and two levels for B, regular and wide. The data are given below.
48
Width
Regular Wide Total
Bottom 47 46 176
43 40
Height Middle 62 67 268
68 71
Top 41 42 168
39 46
Total 300 312 612
A plot of the mean responses for the t = 6 treatments is shown below. From this we may
conjecture that there is no interaction and that only the factor height has some eect. The
ANOVA will allow us to carry out the corresponding tests.
Bottom Middle
Top
40
50
60
70
Regular
Wide
S
G
= 32854 31212 = 1642.
S
T
=
1
2
(90
2
+ 86
2
+ 130
2
+ 138
2
+ 80
2
+ 88
2
) 31212 = 1580.
So the residual sum of squares is
S
E
= 1642 1580 = 62.
Now, the main eects sums of squares are
S
A
=
1
4
(176
2
+ 268
2
+ 168
2
) 31212 = 1544
and
S
B
=
1
6
(300
2
+ 312
2
) 31212 = 12.
Thus, the interaction sum of squares is
S
AB
= 1580 1544 12 = 24.
49
Source SS df MS F
Height 1544 2 772.00 74.73
Width 12 1 12.00 1.16
Interaction 24 2 12.00 1.16
Residual 62 6 10.33
Total 1642 11
From tables, we have F
2,6,0.05
= 5.143. Thus, the null hypothesis of no interaction is not
rejected at the 5% level of signicance, since F
AB
= 1.16 is smaller than F
2,6,0.05
. For width,
F
B
= 1.16 is smaller than F
1,6,0.05
= 5.987, and so there is also no evidence of dierences
between widths at the 5% level. However, again from tables, we have F
2,6,0.001
= 27.0.
Thus, as F
A
= 74.73 is larger than F
2,6,0.001
, the p-value for dierences between heights is
P < 0.001, and so there is very strong evidence that the mean shelf life diers for dierent
heights.
Least squares estimation for the two-factor or two-way ANOVA model is similar to what
was done in Sections 2.3 and 3.3. To t the model
y
ijk
= +
i
+
j
+
ij
+
ijk
for i = 1, . . . , n
A
, j = 1, . . . , n
B
and k = 1, . . . , r we minimize
S =
n
A
i=1
n
B
j=1
r
k=1
(y
ijk

i
j

ij
)
2
to get least squares estimates ,
1
, . . . ,
n
A
,

1
, . . . ,

n
B
and
11
, . . . ,
n
A
n
B
of the model
parameters ,
1
, . . . ,
n
A
,
1
, . . . ,
n
B
and
11
, . . . ,
n
A
n
B
.
To this end, S is dierentiated with respect to , with respect to
i
, with respect to
j
and with respect to
ij
. Equating these partial derivatives to zero then gives the normal
equations. As before, the model is overparameterized so that without imposing additional
constraints there is no unique solution to the normal equations. We therefore employ the
constraints
n
A
i=1
i
= 0,
n
B
j=1
j
= 0,
n
B
j=1
ij
= 0, i = 1, . . . , n
A
and
n
A
i=1
ij
= 0, j = 1, . . . , n
B
.
By using these we get
S
= 2
n
A
i=1
n
B
j=1
r
k=1
(y
ijk

i
j

ij
) = 0 n = G,
S
i
= 2
n
B
j=1
r
k=1
(y
ijk

i
j

ij
) = 0 n
B
r( +
i
) = A
i
,
50
S
j
= 2
n
A
i=1
r
k=1
(y
ijk

i
j

ij
) = 0 n
A
r( +
j
) = B
j
,
S
ij
= 2
r
k=1
(y
ijk

i
j

ij
) = 0 T
ij
= r( +
i
+
j
+
ij
)
and so the estimates of the model parameters are:
= G/n = y
...

i
=
A
i
n
B
r
= y
i..
y
...
j
=
B
j
n
A
r
= y
.j.
y
...

ij
=
T
ij
r

i
j
= y
ij.
y
i..
y
.j.
+ y
...
,
where y
...
= G/n, y
i..
=
A
i
n
B
r
and y
.j.
=
B
j
n
A
r
.
From the estimates we obtain the tted values as
y
ijk
= +
i
+

j
+
ij
= y
...
+ (y
i..
y
...
) + (y
.j.
y
...
) + (y
ij.
y
i..
y
.j.
+ y
...
) = y
ij.
and the residuals
e
ijk
= y
ijk
y
ij.
for i = 1, . . . , n
A
and j = 1, . . . , n
B
. Note that as in Sections 2.3 and 3.3 the tted values
and the residuals do not depend on the constraints we have imposed to get a unique solution
of the normal equations.
As for the models considered in the previous chapters there are equivalent formulas for the
sums of squares in Section 4.3 which show more clearly that all the sums of squares are
nonnegative. We include them here for completeness:
S
G
=
n
A
i=1
n
B
j=1
r
k=1
(y
ijk
y
...
)
2
S
A
= n
B
r
n
A
i=1
(y
i..
y
...
)
2
S
B
= n
A
r
n
B
j=1
(y
.j.
y
...
)
2
S
AB
= r
n
A
i=1
n
B
j=1
(y
ij.
y
i..
y
.j.
+ y
...
)
2
S
E
=
n
A
i=1
n
B
j=1
r
k=1
(y
ijk
y
ij.
)
2
51
Of course we still have the decomposition
S
G
= S
A
+ S
B
+ S
AB
+ S
E
of the total sum of squares if the alternative forms of the sums of squares are used. The for-
mulas in Section 4.3 are more convenient for hand calculations, but note that the alternative
expressions given above are also often used to dene the sums of squares for a two-factor
design.
4.4.1 Orthogonal contrasts from one-way model
There is a simple relationship between the decomposition of S
T
into S
A
, S
B
and S
AB
for a
two-factor design and the method of orthogonal contrasts in the one-way ANOVA. In fact,
if both factors A and B have two levels, then S
A
, S
B
and S
AB
are exactly the contrast
sums of squares S
k
dened in Theorem 2.1 of Section 2.6 for three orthogonal contrasts L
k
,
k = 1, 2, 3.
The following example illustrates this relationship.
Example 4.3 Laboratories.
Consider a 2 2 factorial experiment in which there are two laboratories (factor A), each
with 16 rats, and two types of atmosphere (factor B). In each laboratory, eight rats are
put into a dusty atmosphere and eight into a dust-free atmosphere. After three months,
measurements are taken on the lung function of each rat. So the four treatments are as
given below and each of them has replication r = 8.
Treatment Factor A Factor B Description
1 1 1 Laboratory 1, dust-free atmosphere
2 1 2 Laboratory 1, dusty atmosphere
3 2 1 Laboratory 2, dust-free atmosphere
4 2 2 Laboratory 2, dusty atmosphere
To avoid confusion, in the one-way ANOVA model label the treatment eects
1
, . . . ,
4
.
Then, from the table above, it is clear that
1
=
1
+
1
+
11
;
2
=
1
+
2
+
12
;
3
=
2
+
1
+
21
;
4
=
2
+
2
+
22
.
Then, you should have no diculties in verifying that the contrasts
L
1
=
1
+
4
L
2
=
2
+
4
L
3
=
3
+
4
52
are mutually orthogonal.
The contrast L
1
compares the two laboratories and L
2
compares the two atmospheres. The
contrast L
3
compares the dierence between the dust-free and dusty atmospheres in labora-
tory 1 with the corresponding dierence between the atmospheres in laboratory 2. If there
is no interaction, the dierence between the two types of atmosphere should be roughly the
same in each lab. Thus the contrasts L
1
and L
2
represent the main eects of the factors and
L
3
represents the interaction. In terms of the factorial parametrization, these are
L
1
=
1
+
1
+
11
+
1
+
2
+
12
(
2
+
1
+
21
+
2
+
2
+
22
)
= 2(
1
2
)
L
2
=
1
+
1
+
11
+
2
+
1
+
21
(
1
+
2
+
12
+
2
+
2
+
22
)
= 2(
1
2
)
L
3
=
1
+
1
+
11
+
2
+
2
+
22
(
1
+
2
+
12
+
2
+
1
+
21
)
=
11
+
22
12
21
.
It is clear that these correspond to the main eect of A, the main eect of B and the
interaction respectively.
It can be shown that the contrast sum of squares S
1
for L
1
in the one-way ANOVA model
(see Section 2.6) is equal to S
A
for the two-factor design, the contrast sum of squares S
2
for
L
2
is equal to S
B
, and the contrast sum of squares S
3
for L
3
is equal to S
AB
. The numerical
equalities can be easily seen in GenStat, as in Practical 6.
It follows that the test statistics F
1
= S
1
/M
E
, F
2
= S
2
/M
E
and F
3
= S
3
/M
E
for the
orthogonal contrasts in the one-way ANOVA model are respectively equal to F
A
, F
B
and
F
AB
for the two-factor design. Moreover, the distributions are the same.
4.5 Random eects models
So far, we have considered the xed eects ANOVA model for a two-factor design, where the
parameters representing the eects of the factors A and B and their interaction are unknown
constants. In principle, however, the eects of both factors may be random. In this case the
model is called a random eects model. Alternatively, one factor may have random and
the other factor xed eects. Models of this type are known as mixed models.
Here we consider the random eects model. The model equation for this is similar to the one
for the xed eects model in Section 4.3, but now the , and parameters are replaced by
random variables. More precisely, the model equation for a two-factor or two-way random
eects ANOVA is
y
ijk
= + a
i
+ b
j
+ g
ij
+
ijk
for i = 1, . . . , n
A
, j = 1, . . . , n
B
and k = 1, . . . , r, where y
ijk
, and
ijk
have the same
meaning as before, but the eect of the ith level of factor A is represented by a random
53
variable a
i
, the eect of the jth level of factor B by a random variable b
j
and the interaction
eect for the levels i of A and j of B by a random variable g
ij
. This is appropriate, for
instance, if the n
A
levels of A and the n
B
levels of B are chosen at random.
It is assumed that a
i
N(0,
2
A
), b
j
N(0,
2
B
), g
ij
N(0,
2
AB
) and
ijk
N(0,
2
), all
independent. Note that the independence assumption being made here says that a
1
, . . . , a
n
A
,
b
1
, . . . , b
n
B
, g
11
, . . . , g
n
A
n
B
and all the
ijk
are mutually independent.
This model is appropriate for situations where the variability in the responses may be at-
tributed to the factors and their interaction, but where the specic factor levels used are not
of primary interest. Thus the random eects model is used when the levels of A and B have
actually been or can be assumed to have been chosen at random from larger sets of possible
levels.
The variances
2
A
,
2
B
,
2
AB
and
2
are the variance components, and in order to test if
there is variation in the data due to factor A, due to factor B or due to the interaction we
test hypotheses about
2
A
,
2
B
,
2
AB
. Also, we may wish to estimate the variance components
which gives us an idea about the size of the eects.
The model equation implies that E(y
ijk
) = and
V ar(y
ijk
) =
2
A
+
2
B
+
2
AB
+
2
.
We are interested in testing H
AB
0
:
2
AB
= 0 for the interaction, H
A
0
:
2
A
= 0 for the main
eect of factor A and H
B
0
:
2
B
= 0 for the main eect of factor B. As before, the calculation
of sums of squares and mean squares remains unchanged. More precisely, S
A
, S
B
and S
AB
are computed as in Section 4.3. However, the F statistics for testing some of the hypotheses
will be dierent.
To determine the test statistics, we examine the expected mean squares. From this we can
see which mean squares should be compared to test the hypotheses. Although the model
assumes that all the a
i
, b
j
, g
ij
and
ilk
random variables are independent, this is not true
for the random variables y
ijk
representing the responses. More precisely, from the model
assumptions it follows that the covariances are given as follows:
cov(y
ijk
, y
i
k
) =
_
2
A
+
2
B
+
2
AB
if i = i
, j = j
, k = k
2
A
if i = i
, j = j
2
B
if i = i
, j = j
0 if i = i
, j = j
.
When calculating the expected mean squares these covariances need to be taken into account.
It can be shown that
E(M
A
) =
2
+ n
B
r
2
A
+ r
2
AB
,
54
E(M
B
) =
2
+ n
A
r
2
B
+ r
2
AB
,
E(M
AB
) =
2
+ r
2
AB
,
E(M
E
) =
2
.
To test H
AB
0
:
2
AB
= 0, we look at
F
AB
=
M
AB
M
E
which has an F
(n
A
1)(n
B
1),nt
distribution, where n t = n
A
n
B
(r 1) if H
AB
0
is true.
However, to test H
A
0
:
2
A
= 0, we look at
F
A
=
M
A
M
AB
which is distributed as F
n
A
1,(n
A
1)(n
B
1)
if H
A
0
is true. Similarly, to test H
B
0
:
2
B
= 0, we
look at
F
B
=
M
B
M
AB
which, if H
B
0
is true, has an F
n
B
1,(n
A
1)(n
B
1)
distribution. Note that the tests for the main
eects of A and B are not the same as for the xed eects model. The two-way ANOVA table
for the random eects model is identical to the one for the xed eects model in Section 4.3,
except that the test statistics are now calculated by the above formulas. Also when carrying
out tests of the main eects or when calculating p-values the F
n
A
1,(n
A
1)(n
B
1)
distribution
has to be used for factor A and the F
n
B
1,(n
A
1)(n
B
1)
distribution for factor B.
From the formulas for the expected mean squares unbiased estimators of the variance com-
ponents can be easily derived as
2
= M
E
,

2
AB
=
M
AB
M
E
r
,

2
A
=
M
A
M
AB
n
B
r
and

2
B
=
M
B
M
AB
n
A
r
.
Note that these formulas can give negative values in which case we take the estimate of the
variance component to be equal to zero.
Example 4.4 Quality control.
An engineer carried out a factorial experiment in order to investigate the eect of the factors
robot (A) and machines (B) on the breaking strength of metallic sheets. Three robots
and four machines were selected at random and for each combination the breaking strength
of two sheets was measured. The data are given below.
55
Machine
1 2 3 4 Total
1 112 113 111 113 903
113 118 112 111
Robot 2 113 113 114 118 916
115 114 112 117
3 119 115 117 123 950
117 118 122 119
Total 689 691 688 701 2769
The ANOVA table for the random eects model is as follows.
Source SS df MS F stat. p-val.
Robot (A) 147.250 2 73.625 9.00 < 0.025
Machine (B) 17.792 3 5.931 0.72 > 0.1
Interaction 49.083 6 8.181 2.07 > 0.1
Residual 47.500 12 3.958
Total 261.625 23
The only signicant eect at the 5% level is due to robots. One practical implication of
this analysis could be that in order to reduce the variability in the responses perhaps some
maintenance work is needed to make sure that the robots perform in a more uniform way.
For the random eects model, the estimates of the variance components are
2
= 3.958

2
AB
=
8.1813.958
2
= 2.11

2
A
=
73.6258.181
42
= 8.18

2
B
=
5.9318.181
32
= 0.38.
The estimate of
2
B
is negative. As was indicated before in such cases we then set the estimate
equal to
2
B
= 0, which is also in line with the non-signicant F test for machines in the
above ANOVA table.
4.6 Three-factor design
The previous sections have considered the two-way ANOVA for a two-factor design. The
models and methods can however be extended to experiments involving three or more factors.
For example, the three-way ANOVA is for situations where there are three factors.
We explain the model and the ANOVA table. More details for three and more factors are
presented in the module Design of Experiments which is oered in Semester B.
Suppose that there are three factors A, B and C with n
A
, n
B
and n
C
levels respectively.
The treatments are again the combinations of the levels of the factors and so in total there
are t = n
A
n
B
n
C
of them. Also we assume that each combination or treatment is replicated
r times. We only consider the xed eects model.
56
The model for the three-factor xed eects design is
y
ijkl
= +
i
+
j
+
k
+ ()
ij
+ ()
ik
+ ()
jk
+ ()
ijk
+
ijkl
,
where y
ijkl
is the response from the lth replicate with level i of A, level j of B and level k
of C, for i = 1, . . . , n
A
, j = 1, . . . , n
B
, k = 1, . . . , n
C
, l = 1, . . . , r, is the overall mean,
i
is a parameter for the main eect of factor A at level i,
j
is a parameter for the main
eect of factor B at level j and
k
is a parameter for the main eect of factor C at level k,
()
ij
is a parameter for the two-factor interaction of factors A and B at levels i and j
respectively, ()
ik
is a parameter for the two-factor interaction of factors A and C at levels
i and k respectively and ()
jk
is a parameter for the two-factor interaction of factors B and
C at levels j and k respectively, ()
ijk
is a parameter for the three-factor interaction of
factors A, B and C at levels i, j and k respectively, which measures how much any two-factor
interaction diers across levels of the third factor and
ijkl
iid
N(0,
2
).
As before, we start by testing the highest order term, here the three-way interaction, the
null hypothesis being
H
ABC
0
: ()
111
= = ()
n
A
n
B
n
C
.
If this is not rejected, then we test hypotheses for the two-factor interactions,
H
AB
0
: ()
11
= = ()
n
A
n
B
,
H
AC
0
: ()
11
= = ()
n
A
n
C
and
H
BC
0
: ()
11
= = ()
n
B
n
C
.
If we fail to reject either of the null hypotheses concerning the interactions involving factor
A, H
AB
0
and H
AC
0
, then we test
H
A
0
:
1
= =
n
A
.
B, H
AB
0
and H
BC
0
, then we test
H
B
0
:
1
= =
n
B
.
C, H
AC
0
and H
BC
0
, then we test
H
C
0
:
1
= =
n
C
.
The analysis of variance for a three-factor design is obtained as in the following table:
57
Source SS df MS F
A S
A
n
A
1 M
A
F
A
=
M
A
M
E
B S
B
n
B
1 M
B
F
B
=
M
B
M
E
C S
C
n
C
1 M
C
F
C
=
M
C
M
E
AB S
AB
(n
A
1)(n
B
1) M
AB
F
AB
=
M
AB
M
E
AC S
AC
(n
A
1)(n
C
1) M
AC
F
AC
=
M
AC
M
E
BC S
BC
(n
B
1)(n
C
1) M
BC
F
BC
=
M
BC
M
E
ABC S
ABC
(n
A
1)(n
B
1)(n
C
1) M
ABC
F
ABC
=
M
ABC
M
E
Residual S
E
n t M
E
Total S
G
n 1
The distributions of the test statistics under the null hypotheses follow the pattern of the
ANOVA as usual. Under H
ABC
0
, F
ABC
F
(n
A
1)(n
B
1)(n
C
1),nt
. Under H
AB
0
, F
AB

F
(n
A
1)(n
B
1),nt
, under H
AC
0
, F
AC
F
(n
A
1)(n
C
1),nt
and under H
BC
0
, F
BC
F
(n
B
1)(n
C
1),nt
.
Under H
A
0
, F
A
F
(n
A
1),nt
, under H
B
0
, F
B
F
(n
B
1),nt
and under H
C
0
, F
C
F
(n
C
1),nt
.
The sums of squares are:
S
G
=
n
A
i=1
n
B
j=1
n
C
k=1
r
l=1
(y
ijkl
y
....
)
2
,
S
A
= n
B
n
C
r
n
A
i=1
(y
i...
y
....
)
2
,
S
B
= n
A
n
C
r
n
B
j=1
(y
.j..
y
....
)
2
,
S
C
= n
A
n
B
r
n
C
k=1
(y
..k.
y
....
)
2
,
S
AB
= n
C
r
n
A
i=1
n
B
j=1
(y
ij..
y
i...
y
.j..
+ y
....
)
2
,
S
AC
= n
B
r
n
A
i=1
n
C
k=1
(y
i.k.
y
i...
y
..k.
+ y
....
)
2
,
S
BC
= n
A
r
n
B
j=1
n
C
k=1
(y
.jk.
y
.j..
y
..k.
+ y
....
)
2
,
S
ABC
= r
n
A
i=1
n
B
j=1
n
C
k=1
(y
ijk.
y
ij..
y
i.k.
y
.jk.
+ y
i...
+ y
.j..
+ y
..k.
y
....
)
2
,
where, as before, y
....
is the overall mean, y
i...
is the mean of observations with the ith level
58
of factor A, y
ij..
is the mean of observations with the ith level of A and the jth level of B,
y
ijk.
is the mean of observations with the ith level of A, the jth level of B and the kth level
of C and other terms are dened in a similar way. As usual,
S
E
= S
G
S
A
S
B
S
C
S
AB
S
AC
S
BC
S
ABC
.
Degrees of freedom follow the same pattern as with two factor designs. Mean squares are
obtained by dividing the sums of squares by the degrees of freedom. F-ratios for each eect
are obtained by dividing its mean square by the residual mean square.
In Practical 6, we saw that in order to carry out the two-way ANOVA for a two-factor design
with factors A and B, in GenStat the treatment structure has to be specied as
A B
which tells GenStat to use a model that includes the main eect of A, the main eect of
B and the interaction of A and B. The corresponding treatment structure for a three-way
ANOVA with factors A, B and C is
A B C
which is a short-hand notation for the following GenStat expression:
A + B + C + A.B + A.C + B.C + A.B.C
The various terms in the model specied by this expression have the following meaning:
The single letters represent the main eects of A, B and C. The term A.B stands for the
two-factor interaction of the factors A and B. The term A.B.C represents the three-factor
interaction of A, B and C.
We can again use means plots to get an idea which eects are important. For example, a
plot showing the means for the combinations of the levels of A with the levels of B may
indicate that there is or is not an A.B interaction. Unfortunately, the A.B.C interaction
cannot be visualized in this way, and so we can only look at the means plots for the three
possible pairs of factors.
Example 4.5 Exercise
The number of minutes of exercise on a bicycle until a person is fatigued was recorded. The
three factors were gender (A), amount of body fat (B) and smoking history (C). Each factor
has two levels. The data from a factorial design where each combination is replicated three
times are given below.
59
Smoking History
Light Heavy
24 18
Male 29 19
Low 25 23
Fat 20 15
Female 22 10
Amount of 18 11
Body Fat 15 15
Male 15 20
High 12 13
Fat 16 10
Female 9 14
11 6
We rst look at the GenStat means plots for the three possible pairs of factors in Figure 1.
The lines in plots involving the factor gender are nearly parallel and hence suggest that there
is no interaction between gender and amount of body fat or gender and smoking history. On
the other hand, the lines in the means plot for body fat and smoking history are not parallel
which suggests that there may be an interaction between the two factors.
The ANOVA table from GenStat is as follows:
Source of variation d.f. s.s. m.s. v.r. F pr.
gender 1 181.500 181.500 20.74 <.001
fat 1 253.500 253.500 28.97 <.001
smoking 1 73.500 73.500 8.40 0.010
gender.fat 1 13.500 13.500 1.54 0.232
gender.smoking 1 13.500 13.500 1.54 0.232
fat.smoking 1 73.500 73.500 8.40 0.010
gender.fat.smoking 1 1.500 1.500 0.17 0.684
Residual 16 140.000 8.750
Total 23 750.500
To interpret the results we start with looking at the three-factor interaction. This allows
us to answer questions such as if the interaction between, for example, fat and smoking is
the same at each level of gender, that is for men and women. Similarly, if the interaction
between gender and smoking was dierent at dierent levels of fat then there would be a
three-factor interaction. Since the p-value of 0.684 is very large, clearly there is no evidence
for a three-factor interaction here. Had the three-factor interaction been signicant, then
it would not make much sense to look at the other tests, because it would be dicult to
interpret the eects. In this case we would report a table of means for the eight treatment
combinations.
Since the three-factor interaction is not signicant we continue with looking at the results
for the three two-factor interactions. At the 5% level of signicance only the two-factor
interaction for amount of body fat and smoking history is signicant. This means that the
60
Figure 1: Means plots for Example 4.6
61
eects of fat and smoking cannot be described in simple terms. In other words, when trying
to describe the eects of fat and smoking it is not possible to say how the time until a person
gets fatigued depends on the amount of body fat without taking into account the smoking
history and vice versa. Thus we would report a table of the four means of the responses for
the combinations of amount of body fat and smoking history.
The two two-factor interactions involving gender are not signicant at the 5% level. That
means that the eect of gender is neither inuenced by the amount of body fat nor by the
smoking history. In other words, the eect of gender is additive. Considering men instead of
woman aects the response in (roughly) the same way no matter what the levels of amount
of body fat and smoking history are.
That gender matters is further supported by the fact that the corresponding main eect
is signicant. Looking at this main eect makes sense, because gender does not interact
with the other factors. Main eects of fat and smoking history are also signicant at the 5%
level, but because of the interaction of those two factors, these main eects do not have a
clear interpretation.
62
5 Nested Designs
5.1 Introduction
In the factorial models discussed so far, every level of one factor appears with every level of
every other factor. The factors are said to be crossed. An alternative situation is when the
factors are nested.
In this chapter we consider two-way nested designs or classications where there are again
two factors A and B. The factor B is said to be nested within A, if every level of B appears
with only a single level of A in the design. This means that if A has n
A
levels then for every
i = 1, . . . , n
A
there is a distinct set J
i
of n
B
levels of B such that the levels of B in J
i
only
appear with the level i of A. Thus in total there are n
A
n
B
levels of B. An example may help
to clarify how this works.
Example 5.1 Suppliers.
A company purchases raw material from three dierent suppliers. The company wishes to
determine if the purity of the raw material is the same from each supplier. There are four
batches of raw material available from each supplier and three determinations of purity are
taken from each batch.
Here the factor A is supplier and has n
A
= 3 levels. The second factor B is batch which
is nested within A since for each supplier there are four batches of raw material. Batch 1
from supplier 1 is however not related in any way to batch 1 from supplier 2 or batch 1 from
supplier 3 and a similar statement applies if look at batch 2 or batch 3. Note the dierence
with a factorial design where batch 1 would have to be the same across all three suppliers.
Similarly, batch 2 (and batch 3) would need to be the same for each of the three suppliers.
We can also represent a nested design by drawing a picture. The diagram below is for the
situation in Example 5.1 but for other nested two-factor designs it would look similar. Note
that in addition to the factors, the replications corresponding to the three determinations of
purity from each batch are shown.
Supplier 1 2 3
Batch 1 2 3 4 1 2 3 4 1 2 3 4
Replication
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
Figure 2: Schematic representation of two-factor nested design in Example 5.1
63
The main dierence with factorial designs where the factors are crossed is that nesting
removes the connections between the experimental units that exist in a factorial design. For
instance, in the bread Example 4.2 the level 2 of the width factor B always referred to wide
shelves, regardless of the level of A with which it was paired. In Example 5.1 however, the
second batch from supplier 1 has nothing to do with the second batch from supplier 2 or the
second batch from supplier 3.
To keep the notation simple for a design where B is nested within A we denote the responses
by y
ijk
as in Chapter 4. Here the subscript i indicates the level of factor A for i = 1, . . . , n
A
.
The subscript j stands for the level of B and it runs from j = 1, . . . , n
B
. We assume that
each combination of a level i of A with a level j of B is replicated r times. The subscript k
then indicates the replication for k = 1, . . . , r. Note however that two responses such as y
131
and y
232
which have the same subscript for factor B do in fact use dierent levels of B due
to the nesting.
Obviously, these ideas can be extended to more than two levels of nesting. We will however
only look at two-factor nested designs. We consider xed eects, random eects and mixed
models. As in Chapter 4, sums of squares, degrees of freedom and mean squares in the
ANOVA table are the same, but there are dierences in the tests that we carry out.
5.2 Nested xed eects
The xed eects model for the two-factor nested design which is also known as the
two-stage nested hierarchical classication is
y
ijk
= +
i
+
j(i)
+
ijk
for i = 1, . . . , n
A
, j = 1, . . . , n
B
and k = 1, . . . , r. In this equation, the
i
and
j(i)
parameters
are unknown constants and the
ijk
are random errors. As for the other xed eects models
we have considered, it is assumed that
ijk
N(0,
2
) and that all the random errors are
independent. Further, it is assumed that
n
A
i=1
i
= 0 and
n
B
j=1
j(i)
= 0 for every i.
For the ANOVA we start with calculating the total sum of squares S
G
. This is done as for
the crossed two-factor design in Section 4.3 and so
S
G
=
n
A
i=1
n
B
j=1
r
k=1
y
2
ijk

G
2
n
.
Also the treatment sum of squares and the residual sum of squares are calculated as in
Section 4.3. Hence the treatment sum of squares is
S
T
=
n
A
i=1
n
B
j=1
T
2
ij
r

G
2
n
,
where as before T
ij
is the total for the combination of the level i of A and the level j of B.
Note however that for every i the total T
ij
is only computed for the levels j of B within the
64
level i of A. The residual sum of squares is then
S
E
= S
G
S
T
.
Next, the treatment sum of squares is decomposed into the sum of squares due to factor A
and the sum of squares due to factor B nested within A. Again, the former of these
is calculated as in Section 4.3 and hence given by
S
A
=
n
A
i=1
A
2
i
n
B
r

G
2
n
where A
i
is the total for level i of A.
Subtracting S
A
from S
T
then gives the sum of squares due to factor B nested within A,
denoted by S
B(A)
. We thus have
S
B(A)
= S
T
S
A
.
The ANOVA table for the xed eects model is given below.
Source SS df MS F
Factor A S
A
n
A
1 M
A
F
A
=
M
A
M
E
Factor B within A S
B(A)
t n
A
M
B(A)
F
B(A)
=
M
B(A)
M
E
Residual S
E
n t M
E
Total S
G
n 1
Appropriate test statistics for testing the eect of A and the eect of B within A are again
motivated by looking at the expected mean squares. The assumptions made for the xed
eects model imply that
E(M
A
) =
2
+
n
B
r
n
A
1
n
A
i=1
2
i
and
E(M
B(A)
) =
2
+
r
t n
A
n
A
i=1
n
B
j=1
2
j(i)
,
where t = n
A
n
B
. Also, as usual, we have E(M
E
) =
2
.
For the eect of factor A we test the null hypothesis H
A
0
:
1
= . . . =
n
A
by means of
the test statistic F
A
= M
A
/M
E
which has an F
n
A
1,nt
distribution if H
A
0
is true, where
n = n
A
n
B
r and t = n
A
n
B
. Similarly, for B we test H
B
0
:
1(i)
= . . . =
n
B
(i)
for all i by using
F
B(A)
= M
B(A)
/M
E
which has an F
tn
A
,nt
distribution if H
B
0
is true.
65
Example 5.2 Schools.
Suppose we wish to investigate dierences in test scores for three selected schools and two
named teachers. Each teacher teaches two dierent classes.
In this case, schools can be regarded as a factor A with n
A
= 3 levels and teacher is a
second factor B nested within A. Within each school n
B
= 2 dierent levels of B are used
and the replication is r = 2. The table below gives the average test scores for each class.
School Teacher Total
1 1 25 29 79
2 14 11
2 1 11 6 57
2 22 18
3 1 17 20 44
2 5 2
180
Since we are interested in the specic schools and the named teachers we use the xed
eects model to analyze the data. The grand total is G = 180 and so the correction factor
is equal to G
2
/12 = 2700. The total sum of squares is
S
G
= 25
2
+ 14
2
+ . . . + 2
2
180
2
12
= 3466 2700 = 766
and for the treatment sum of squares we obtain
S
T
=
1
2
(54
2
+ 25
2
+ 17
2
+ 40
2
+ 37
2
+ 7
2
) 2700 = 3424 2700 = 724.
Hence the residual sum of squares is
S
E
= 766 724 = 42.
Next, the sum of squares due to schools is
S
A
=
79
2
+ 57
2
+ 44
2
4

180
2
12
= 2856.5 2700 = 156.5
and, from this and the treatment sum of squares, the sum of squares due to teachers nested
within schools is obtained as
S
B(A)
= 724 156.5 = 567.5.
Source SS df MS F
Schools 156.5 2 78.25 11.2
Teachers within Schools 567.5 3 189.17 27.0
Residual 42.0 6 7.00
Total 766.0 11
66
Carrying out tests at the 5% level, for schools the value of F
A
= 11.2 is compared
with F
2,6,0.05
= 5.143 from Table 12(b) of the New Cambridge Statistical Tables. Since
F
A
> F
2,6,0.05
we reject the null hypothesis of no eect due to schools at the 5% level.
Similarly, for teachers within schools F
B(A)
= 27.0 is compared with F
3,6,0.05
= 4.757 which
again leads to a rejection of the null hypothesis of no dierences between teachers within
schools.
We can also nd the p-values. From tables, we have F
2,6,0.01
= 10.92 and so the p-value
for schools is P < 0.01, that is there is strong evidence of dierences. Again, from tables,
we have F
3,6,0.001
= 23.70. Thus, the p-value for teachers is P < 0.001, and so there is very
strong evidence for dierences between teachers at the same school. The nding regarding
dierences between teachers may be of particular interest since it may indicate that some
action is needed.
For the xed eects model, in addition to carrying out the ANOVA we may wish to compare
group means corresponding to certain levels of the xed factors. For example, in the xed
analysis in Example 5.2 interest could be in comparing schools 1 and 2.
We may wish to nd condence intervals (CIs) for
the factor level means of the xed factor;
dierences between two such means.
We may also do pre-planned comparisons of group means for two levels of the xed factor
which can be done in a way similar to the pre-planned comparisons in Chapter 2.
If both A and B have xed eects then the model assumptions imply that
V ar(y
i..
) =

2
n
B
r
which can be estimated by
V ar(y
i..
) =
M
E
n
B
r
.
Thus, a 100(1 )% condence interval for the population mean E(y
i..
) = +
i
for level i
of A is
y
i..
t
nt,/2
_
M
E
n
B
r
and a 100(1 )% condence interval for the dierence E(y
i..
) E(y
i
..
) =
i

i
of the
population means for levels i and i
is
y
i..
y
i
..
t
nt,0.025
_
2M
E
n
B
r
,
where as before n = n
A
n
B
r and t = n
A
n
B
. Note that
_
2M
E
n
B
r
is the standard error of the
dierence (SED).
67
Example 5.3 Schools example revisited.
We see that y
1..
= 79/4 = 19.75, M
E
= 7.00, n
B
= 2 and r = 2. Thus, a 95% condence
interval for the mean test score for school 1 is
19.75 t
6,0.025
_
7.00
4
= 19.75 2.447
7.00
2
= 19.75 3.24 or (16.51, 22.99).
Also, since y
2..
= 57/4 = 14.25, a 95% condence interval for the mean dierence in test
scores between schools 1 and 2 is
(19.75 14.25) 2.447
_
7.00
2
= 5.50 4.58 or (0.92, 10.08).
Note that the fact that the 95% condence interval for
i

i
in Example 5.3 does not
contain the value 0 is equivalent to rejecting at the 5% signicance level the null hypothesis
that the means for schools 1 and 2 are equal by means of the test statistic
T =
y
i..
y
i
..
SED
which is similar to the statistic for pre-planned comparisons in Section 2.5.
Assuming that schools and teachers both have xed eects may not be a particularly ap-
propriate assumption since for example, the results in Example 5.2 then only apply to the
three specic schools and the two teachers per school who participated in the study. It
may be more realistic to assume that teachers or schools or both have random eects. This
motivates the consideration of random eects and mixed eects versions of the model for a
nested design to which we turn next.
5.3 Mixed eects models
The model equation for the mixed model where the factor A has xed eects and the nested
factor B within A has random eects is given by
y
ijk
= +
i
+ b
j(i)
+
ijk
where the constant
i
represents the eect of the ith level of A and b
j(i)
represents the eect
of the jth level of factor B within the ith level of A. Here, it is assumed that
n
A
i=1
i
= 0
and that b
j(i)
N(0,
2
B
). Moreover, all the b
j(i)
are assumed to be mutually independent
and also to be independent from the random errors
ijk
N(0,
2
).
The ANOVA table for the mixed eects model is given below.
Source SS df MS F
Factor A S
A
n
A
1 M
A
F
A
=
M
A
M
B(A)
Factor B within A S
B(A)
t n
A
M
B(A)
F
B(A)
=
M
B(A)
M
E
Residual S
E
n t M
E
Total S
G
n 1
68
This has exactly the same sums of squares and degrees of freedom as the ANOVA for the
xed eects model but F
A
is dierent.
It can then be shown that the expected mean squares for the mixed eects model are as
follows:
Mean square Expectation
E(M
A
)
2
+
n
B
r
n
A
1
n
A
i=1
2
i
+ r
2
B
E(M
B(A)
)
2
+ r
2
B
E(M
E
)
2
These show that we can get unbiased estimators of the variance components as follows.

2
= M
E
and

2
B
=
M
B(A)
M
E
r
.
If A has xed and B has random eects, then the null hypothesis for A is H
A
0
:
i
= 0 for
all i. Motivated by the expected mean squares given above, we use
F
A
=
M
A
M
B(A)
to test for the eect of A. Under H
A
0
, this has an F
n
A
1,tn
A
distribution, where n = n
A
n
B
r
is the total number of observations and t = n
A
n
B
as before.
The null hypothesis for B is H
B
0
:
2
B
= 0. From the expected mean squares we can see that
for B we should use the test statistic
F
B(A)
=
M
B(A)
M
E
.
Under H
B
0
, this follows the F
tn
A
,nt
distribution.
Example 5.4 Supplier example revisited.
Recall that there are four batches of raw material from each supplier and three determinations
of purity are taken from each batch. We assume that the batches were chosen at random.
The data in appropriate units are given below.
Supplier 1 2 3
Batch 1 2 3 4 1 2 3 4 1 2 3 4
1 2 2 1 1 0 1 0 2 2 1 3
1 3 0 4 2 4 0 3 4 0 1 2
0 4 1 0 3 2 2 2 0 2 2 1
Total 0 9 1 5 4 6 3 5 6 0 2 6
69
Since we are interested in comparing the specic suppliers (A) and since the batches (B)
were chosen at random we use the mixed model for the nested design.
The supplier totals are A
1
= 5, A
2
= 4 and A
3
= 14, so that the grand total is G = 13
and the correction factor is equal to G
2
/36 = 4.69. The total sum of squares is
S
G
= 153 4.69 = 148.31.
S
T
=
1
3
(0
2
+ (9)
2
+ (1)
2
+ 5
2
+ (4)
2
+ 6
2
+ (3)
2
+ 5
2
+ 6
2
+ 0
2
+ 2
2
+ 6
2
) 4.69
= 89.67 4.69 = 84.98
and so the residual sum of squares is equal to
S
E
= 148.31 84.98 = 63.33.
The sum of squares due to suppliers is
S
A
=
1
12
((5)
2
+ 4
2
+ 14
2
) 4.69 = 19.75 4.69 = 15.06
and the sum of squares due to batches nested within suppliers is
S
B(A)
= 84.98 15.06 = 69.92.
Source SS df MS F
Suppliers 15.06 2 7.53 0.97
Batches within Suppliers 69.92 9 7.77 2.94
Residual 63.33 24 2.64
Total 148.31 35
The test statistic F
A
= 0.97 for suppliers is smaller than F
2,9,0.05
= 4.256, so there is no
evidence of dierences between suppliers at the 5% level of signicance. On the other hand
F
B(A)
= 2.94 is greater than F
9,24,0.05
= 2.30 which indicates that at the 5% level there is
variability due to batches within suppliers. Note that the latter result not only applies to
the actual batches used in the study but more generally to the populations of batches that
could be obtained from each of the three suppliers.
For the model with xed eects of A and random eects of B, in addition to carrying out
the ANOVA we may wish to compare group means corresponding to certain levels of the
xed factors. For example, in the analysis in Example 5.4 interest could be in comparing
the mean dierence between suppliers 1 and 2.
We may wish to nd condence intervals (CIs) for
70
the factor level means of the xed factor;
dierences between two such means.
We may also do pre-planned comparisons of group means for two levels of the xed factor
which can be done in a way similar to the pre-planned comparisons in Chapter 2.
If A has xed and B has random eects, then the mixed model assumptions imply that
V ar(y
i..
) =

2
+ r
2
B
n
B
r
and since E(M
B(A)
) =
2
+ r
2
B
this can be estimated unbiasedly by
V ar(y
i..
) =
M
B(A)
n
B
r
.
Now, a 100(1 )% condence interval for E(y
i..
) = +
i
for level i of A is
y
i..
t
tn
A
,0.025
M
B(A)
n
B
r
and a 100(1 )% condence interval for the dierence E(y
i..
) E(y
i
..
) =
i

i
of the
population means for levels i and i
is
y
i..
y
i
..
t
tn
A
,0.025
2M
B(A)
n
B
r
.
Here the SED is
_
2M
B(A)
n
B
r
. Again, T =
y
i..
y
i
..
SED
can be used to test if the two means are
equal. In the mixed eects model this however has a t
tn
A
distribution if the null hypothesis
of no dierences is true.
Example 5.5 Suppliers example revisited.
Recall that y
1..
= 5/12 = 0.42 and M
B(A)
= 7.77. Thus, a 95% condence interval for
the mean purity for supplier 1 is
0.42 t
9,0.025
_
7.77
12
= 0.42 2.262 0.805 = 0.42 1.82,
that is, (2.24, 1.40).
In addition to the mixed model we have considered there is another version of the model
where the non-nested factor A has random and the nested factor B has xed eects. We do
not present any details for this model, since it is unusual in practice, but only note that in
this case eects of the factors are tested by using the exactly the same test statistics as for
the xed eects model.
71
5.4 Nested random eects
In the random eects model all of the xed eects parameters are replaced by random
variables. The random eects model equation is thus
y
ijk
= + a
i
+ b
j(i)
+
ijk
where the random variable a
i
represents the eect of the ith level of factor A and b
j(i)
represents the eect of the jth level of factor B within the ith level of A. All other terms in
the equation are dened as before. It is assumed that a
i
N(0,
2
A
), b
j(i)
N(0,
2
B
) and
ijk
N(0,
2
), all independent.
The random eects model is appropriate when the levels of the factors are randomly chosen
or can be assumed to have been randomly chosen from a larger population of possible levels.
For example, in a study similar to the one in Example 5.2 the schools could have been
selected at random from all schools in the borough of Tower Hamlets. Likewise, within each
school the teachers could have been chosen at random from all teachers in the school.
It can then be shown that the expected mean squares for the random eects models are as
follows:
Mean square Expectation
E(M
A
)
2
+ n
B
r
2
A
+ r
2
B
E(M
B(A)
)
2
+ r
2
B
E(M
E
)
2
When A is random and B is random, for the eect of A we test H
A
0
:
2
A
= 0. Motivated by
the expected mean squares, we use
F
A
=
M
A
M
B(A)
which under H
A
0
follows the F
n
A
1,tn
A
distribution, where n = n
A
n
B
r is the total number of
observations and t = n
A
n
B
as before. For the eect of B the null hypothesis is H
B
0
:
2
B
= 0.
From the expected mean squares we use the test statistic
F
B(A)
=
M
B(A)
M
E
.
Under H
B
0
, this has an F
tn
A
,nt
distribution.
72
6 Linear Algebra approach to ANOVA
In the previous chapters we have been concerned with the analysis of various ANOVA models.
We now show how the analysis of such models can be treated within the framework of Linear
Algebra.
The key concept from Linear Algebra I which we will use in this context is the orthogonal
projection of a vector onto a subspace of the vector space R
n
. We also use the scalar
product v
u and the length v =
v.
Let U be a subspace of R
n
with dim(U) = d and let {u
1
, . . . , u
d
} be an orthogonal basis
of U. Then, from Linear Algebra I it is known that for any vector v R
n
the orthogonal
projection of v onto U, which we denote by P
U
v, is dened by
P
U
v =
d
i=1
v
u
i
u
i
u
i
u
i
.
The orthogonal projection P
U
v of v onto U is a vector in U. Moreover, P
U
v minimizes the
distance between v and U. Figure 3 below gives an illustration.
R
n
U
0
v v P
U
v
P
U
v
Figure 3: Orthogonal projection
The vector vP
U
v is orthogonal to every vector in U and so is an element in the orthogonal
complement U
of U. Furthermore, Pythagoras theorem shows that v

2
= P
U
v
2
+
v P
U
v
2
.
73
In order to see how the orthogonal projection is related to the analysis of variance we consider
the one-way ANOVA model from Chapter 2. To illustrate the concepts and calculations we
use the petrol data from Example 2.1
In Example 2.1 the three petrol types A, B and C were used on 4, 5, and 3 cars, respectively.
Consider the data vector y in which the observations for A come rst, followed by those for
B and C and the following vectors u
1
, u
2
and u
3
:
y =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
24.0
25.0
24.3
25.5
25.3
26.5
26.4
27.0
27.6
23.3
24.0
24.7
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, u
1
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
1
1
1
0
0
0
0
0
0
0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
, u
2
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0
0
0
1
1
1
1
1
0
0
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
and u
3
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
0
0
0
0
0
0
0
0
1
1
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
The vector u
i
contains a 1 in a row if the corresponding unit has treatment i. In general, if
there are t treatments, then there are t vectors u
1
, . . . , u
t
. The vector space which is spanned
by these vectors is denoted by V
T
and is called the treatment subspace, that is
V
T
= Span(u
1
, . . . , u
t
).
It is easy to verify that u
1
, . . . , u
t
are linearly independent and so the dimension of V
T
is
dim(V
T
) = t. Furthermore, u
1
, . . . , u
t
are mutually orthogonal which shows that {u
1
, . . . , u
t
}
is an orthogonal basis of V
T
.
For the petrol data we can easily compute the orthogonal projection of the data vector y
onto the treatment subspace V
T
. It follows that
74
P
V
T
y =
3
i=1
y
u
i
u
i
u
i
u
i
=
T
1
4
u
1
+
T
2
5
u
2
+
T
3
3
u
3
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
24.70
24.70
24.70
24.70
26.56
26.56
26.56
26.56
26.56
24.00
24.00
24.00
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
We can recognize P
V
T
y as the vector of tted values in the one-way ANOVA model and
y P
V
T
y is the vector of residuals.
By applying Pythagoras theorem we see that
y
2
= P
V
T
y
2
+y P
V
T
y
2
i=1
r
i
j=1
y
2
ij
=
t
i=1
T
2
i
r
i
+y P
V
T
y
2
and so y P
V
T
y
2
is the residual sum of squares S
E
in the one-way ANOVA model.
The vector y P
V
T
y is the orthogonal projection of y onto V
T
. Further dim(V
T
) = n t
is equal to the residual degrees of freedom.
The space V
T
contains another subspace V
0
which is dened by
V
0
= {cu
0
: c R},
where u
0
is the column vector of length n with all elements equal to 1. Clearly dim(V
0
) = 1.
The orthogonal projection of the data vector y onto V
0
is
P
V
0
y =
y
u
0
u
0
u
0
u
0
=
G
n
u
0
and
P
V
0
y
2
=
G
2
n
which is the correction factor.
We now consider the orthogonal projection P
V
0
(P
V
T
y) of P
V
T
y onto V
0
. We can see that
P
V
0
(P
V
T
y) =
(P
V
T
y)
u
0
u
0
u
0
u
0
=
G
n
u
0
= P
V
0
y.
75
We can picture this in a similar way to Figure 3 but where R
n
is replaced by V
T
and U by
the one-dimensional subspace V
0
.
Again, by applying Pythagoras theorem it follows that
P
V
T
y
2
= P
V
0
y
2
+P
V
T
y P
V
0
y
2
i=1
T
2
i
r
i
=
G
2
n
+P
V
T
y P
V
0
y
2
and so P
V
T
y P
V
0
y
2
is the treatment sum of squares S
T
in the one-way ANOVA model.
The vector P
V
T
y P
V
0
y is the orthogonal projection P
V
T
V

0
y of y onto the subspace V
T

V
0
, which is called the orthogonal complement of V
0
in V
T
. It can be shown that
dim(V
T
V
0
) = dim(V
T
) dim(V
0
) = t 1 which is equal to the degrees of freedom for
treatments.
Thus, by rst considering the orthogonal projection of the data vector y onto the treatment
subspace V
T
and secondly the orthogonal projection of P
V
T
y onto V
0
we have shown that
y
2
= P
V
T
y
2
+y P
V
T
y
2
= P
V
0
y
2
+P
V
T
y P
V
0
y
2
+y P
V
T
y
2
and we were able to relate the squared lengths above to terms in the one-way ANOVA model.
The following schematic representation shows how we have decomposed the data vector into
the sum of three vectors, which are orthogonal projections of y onto three subspaces and
respectively lie in V
0
, V
T
V
0
and V
T
. Each of these subspaces corresponds to one term in
the one-way ANOVA model.
y = P
V
0
y + (P
V
T
y P
V
0
y) + (y P
V
T
y)
R
n
= V
0
V
T
V
0
V
T
with
dimension n = 1 + t 1 + n t
and y
2
= P
V
0
y
2
+ P
V
T
y P
V
0
y
2
+ y P
V
T
y
2
or
t
i=1
r
i
j=1
y
2
ij
=
G
2
n
+ S
T
+ S
E
We can now give precise denitions of sums of squares etc. with respect to a subspace U of R
n
.
Denitions Let U be any subspace of R
n
and let y be the data vector.
(a) The sum of squares for U is P
U
y
2
.
76
(b) The degrees of freedom for U are dim(U).
(c) The mean square for U is P
U
y
2
/ dim(U).
The decomposition of R
n
for the one-way ANOVA model can be shown in an ANOVA table
similar to the one in Sec. 2.2. Note that the subspaces are not part of the table and that we
have not included a column for the F statistics.
Subspace Source SS df MS
V
0
Mean
G
2
n
1
G
2
n
V
T
V
0
Treatments S
T
t 1 M
T
V
T
Residual S
E
n t M
E
R
n
Total
t
i=1
r
i
j=1
y
2
ij
n
Comparing this table with the one in Section 2.2 we can see that the rows for Treatments
and for Residual are exactly the same. Here we have an additional row for Mean which
corresponds to the parameter in the model equation. However if we subtract the sum of
squares for Mean from the sum of squares for Total we get S
G
and similarly subtracting
the degrees of freedom for Mean which are equal to 1 from the degrees of freedom for Total
which are equal to n we get a value of n 1 which is the degrees of freedom for Total in
the table from Section 2.2. It is in this sense that the total sum of squares S
G
in the usual
ANOVA table is corrected for the eect of the mean and this also explains why G
2
/n is
called the correction factor although we have seen that it is simply the sum of squares for
the subspace V
0
.
77

MTH6134 Notes11

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

MTH6134 Notes11

Hochgeladen von

Copyright:

Verfügbare Formate

MTH6134 Statistical Modelling II

is the vector of responses, X is the n p design matrix, =

is the parameter vector and = (

is the error vector. Note

X has rank t and

y. As we have seen, we can get around this problem by

which would be denoted as

X did not have an inverse and so

X has no inverse and so cannot be estimated by

y. In that section the constraint

X does have an inverse. First the constraint

y. This gives estimates ,

+ it can be veried that the ANOVA table for testing the

are not correlated,

y from Statistical Modelling I. In order to be able to use

y we obtain estimates of the overall mean,

4.2 Main eects and interactions

u and the length v =

of U. Furthermore, Pythagoras theorem shows that v

Das könnte Ihnen auch gefallen