0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

354 Ansichten92 SeitenSep 01, 2010

© Attribution Non-Commercial (BY-NC)

PDF, TXT oder online auf Scribd lesen

Attribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

354 Ansichten92 SeitenAttribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 92

explain in

some detail the technique of extracting, from masses of data the main features of the

relationship

hidden or implied in the tabulated figures. Regression analysis is a statistical tool that utilizes

the

relation between two or more quantitative variables so that one variable can be predicted from

the other or the others.

RELATIONS BETWEEN VARIABLES.

The concept of relation between two variables, such as between family income and family

expenditures is a familiar one. We can distinguish between a functional relation and a

statistical

relation.

Functional relation between two variables.

A functional relation between two variables is expressed by a mathematical formula

Y f(X)

where X - independent variable and Y - dependent one.

Example: Sales in rands and number of unit sold. If selling price is R5 the relation is

expressed in

Y 5X

Number of units and Rand sales during three recent periods were as follows

period number of unit sold Rand sales

1 5 25

2 10 50

3 15 75

These observations are plotted in figure below

A statistical relation, unlike a functional relation, is not a perfect one. In general, the

observations

for a statistical relation do not fall directly on the curve of relationship.For a certain set of

observation the following plot is drawn

1

However this relation is not a perfect one. There is a scattering of points suggesting that some

variation of Y is not accounted by X. The plotted line indicates the general tendency by which

Y vary with changes in X. This scattering of points around the line represents variation in Y

which is not associated with X, and which is usually considered to be of a random nature.

Basic concepts

A regression model is a formal means of expressing the two essential ingredients

of a statistical relation:

1. A tendency of the dependent variable to vary with the independent variable or

variables in a systematic fashion.

2. A scattering of observations around the curve of statistical relationship.

These two characteristics are embodied in a regression model by postulating that:

1. In the population of observations associated with the sampled process, there

is a probability distribution of Y for each level of X. [ E(Y) h(X) ]

2. The means of these probability distributions vary in some systematic fashion with X.

Regression analysis serves three major purposes:

1. description

2. control

3. prediction

We will start with the case where is only one independent variable and the regression

function is linear. In this case we consider the following:

(2.1) Yi o 1Xi i

where:

Y i - is the value of the response in the ith trial

o and 1 are parameters

X i is the known constant, namely,the value of the independent

variable in the ith trial

i is a random error term with mean E i 0 and variance 2 2

and i and j are uncorrelated so the covariance cov i , j 0

for all i,j such that i ≠ j.

This model is said to be simple, linear in the parameters, and linear in the independent

variable.

Let us notice that:

1. The observed value of Y in the ith trial is the sum of two components:

the term o 1 X i and the random term i . Hence Y i is a random variable

2. Since E i 0, it follows that

(2.2 ) EY i E o 1 X i i o 1 X i E i o 1 X i

Therefore the regression function for the simple linear regression is

2

(2.3) EY o 1 X i

3. The observed value of Y in the ith trial differs from the value of the regression

function by error term amount i .

4. The error terms i are assumed to have constant variance 2 . It therefore follows

that the variance of the response Y i is:

(2.4) 2 Y i 2 .

The parameters o and 1 in regression model (2.1) are called regression coefficients.

1 is the slope of the regression line. It indicates the change in the mean of the probability

distribution of Y per unit increase in X. The parameter o is the Y intercept of the regression

line. If the scope of the model includes X 0, o gives the mean of the probability

distribution of Y at X 0. When the scope of the model does not cover X 0,

o does not have any particular meaning as a separate term in the regression model.

We shall denote (X,Y) observations for the first trial as (X 1 ,Y 1 ), for the second trial

(X 2 ,Y 2 ), and in general for the ith trial (X i ,Y i ), where i 1, 2, . . . , n. To find estimators

for o and 1 we use the method of least squares. According to the method of least squares,

the estimators of o and 1 are those values b o and b 1 , respectively, that minimize the

criterion Q given by:

n

(2.8) Q ∑Y i − o − 1 X i 2

i1

To find this we differentiate Q with respect to o and 1 and solve the system of equations

∂Q ∂Q

∂ o

0 and ∂ 1 0.

From these we get the following

(2.9a) ∑ Y i nb o b 1 ∑ X i

(2.9b) ∑ X i Y i b o ∑ X i b 1 ∑ X 2i

The equations (2.9a) and (2.9b) are called normal equations; b o and b 1 are called point

estimators of o and 1 , respectively. Solving one can get the following solutions

∑ Xi∑ Yi

∑ XiYi− n ∑X i −XY i − Y

(2.10a) b1

∑ Xi2 ∑X i −X 2

∑ X 2i − n

(2.10b) b o 1n ∑ Y i − b 1 ∑ X i Y − b 1 X

3

where Y and X have the usual meaning.

(2.11) THEOREM: Under the conditions of model (2.1), the least squares estimators

b o and b 1 in (2.10) are unbiased and have minimum variance among all unbiased linear

estimators. Hence

Eb o o

Eb 1 1

in the regression function (2.3):

EY o 1 X

it is naturalthat we estimate the regression function as follows:

(2.12) Y b o b 1 X

where Y is the value of the estimated regression function at the level X of the independent

variable. For the observations in the sample, we will call Y i the fitted value for the ith

observation given by:

(2.13) Y i b o b 1 X i , i 1, 2, 3, . . . , n.

Residuals.

The ith residual

fitted value Y i . Denoting

this residual by e i , we can write

(2.16) ei Yi − Y i Yi − bo − b1Xi

We need to distinguish between

the model error term value i Y i − EY i

and the residual e i Y i − Y i . The former involves the vertical deviation of Y i

from the unknown population regression line, and hence is unknown. On the

other hand, the residual is the observed vertical deviation of Y i from the fitted

regression line. Residuals are useful for studying the inferences in regression

analysis.

n

(2.17) ∑ ei 0

i1

Proof: Let us recall the normal equation (2.9a)

∑ Y i nb o b 1 ∑ X i

Hence

n

∑ e i ∑ Y i − nb o − b 1 ∑ X i 0

i1

n

2. The sum of the squared residuals, ∑ e 2i , is a minimum. (since Q)

i1

3. The sum of the observed values Y i equals the sum of the fitted values Y i ;

n n

(2.18) ∑ Yi ∑ Y i

i1 i1

This condition is implicit in the first normal equation

4

n

∑ Y i nb o b 1 ∑ X i ∑b o b 1 X i ∑ Y i .

i1

4. The sum of the weighted residuals is zero when the residual in the ith trial

is weighted by the level of the independent variable in the ith trial:

n

(2.19) ∑ Xiei 0

i1

This follows from the second normal equation (2.9b)

n n

∑ X i e i ∑ X i Y i − b o − b 1 X i ∑ X i Y i − b o ∑ X i − b 1 ∑ X 2i 0

i1 i1

n

5. ∑ Y iei 0

i1

6. The regression line always goes through the point (X, Y

PROBLEMS

Let us notice that Y i come from different probability distributions with different means,

depending upon the level of X i . To estimate the error we start with

n n n

(2.21) SSE ∑Y i − Y i 2 ∑Y i − b o − b 1 X i 2 ∑ e 2i

i1 i1 i1

where SSE stands for error sum of squares or residual sum of squares.

The sum of squares SSE has n-2 degrees of freedom associated with it ( two are lost

because both o and 1 had to be estimated in obtaining Y i ).

Hence, the appropriate mean square, denoted by MSE, is

n n n

∑Y i −Y i 2 ∑Y i −b o −b 1 X i 2 ∑ e 2i

(2.22) MSE SSEn−2

i1 n−2 i1 n−2 i1

n−2

where MSE stands for error mean square or residual mean square.

SSE ∑Y i − Y i 2 ∑Y i − b o − b 1 X i 2

∑Y i − Y b 1 X − b 1 X i 2

∑ Y 2i − n Y 2 ∑ X 2i − nX 2 b 21 2nXb 1 Y − 2 ∑ X i Y i b 1

ESSE E∑ Y 2i − n Y 2 ∑ X 2i − nX 2 b 21 2nXb 1 Y − 2 ∑ X i Y i b 1

∑ EY 2i − nE Y 2 ∑ X 2i − nX 2 Eb 21 2nXEb 1 Y − 2 ∑ X i EY i b 1

∑ EY 2i − nE Y 2 ∑X i − X 2 Eb 21 2nXEb 1 Y − 2 ∑ X i EY i b 1

nE Y 2 n n o 1 X 2 2 n 2o 2n o 1 X n 21 X 2

2

∑X i − X 2 Eb 21 ∑X i − X 2 2

21 2 21 ∑X i − X 2

∑X i −X 2

2 21 ∑ X 2i − n 21 X 2 ∑X i − X 2 Eb 21

Eb 1 Y 1

E∑ Yi ∑ Yj 1

E ∑ YiYj ∑ Y 2i

n

∑X i −X 2

n

i≠j ∑X i −X 2 ∑X i −X 2

5

X i −X X i −X

1

∑ EY i Y j 1

∑ EY 2i

n

i≠j ∑X i −X 2

n

∑X i −X 2

X i −X X i −X

1

∑ o 1 X i o 1 X j 1

∑ 2 o 1 X i 2

n

i≠j ∑X i −X 2

n

∑X i −X 2

X i −X X i −X

1

∑ o 1 X i o 1 X j ∑ o 1 X i 2

n

i≠j ∑X i −X 2 ∑X i −X 2

X i −X X i −X

1n ∑ 2 1

∑ o 1 X i o 1 X j

∑X i −X 2 n

i,j ∑X i −X 2

X i −X X i −X

1

∑∑ o 1 X i o 1 X j 1

∑ o 1 X j ∑ o 1 X i

n

∑ j i X i −X 2

n

j i ∑X i −X 2

X i −XX i X i −XX i

n n o 1 ∑ X j 1 ∑

1

o 1 X 1 ∑ o 1 X 1

i ∑X i −X 2 i ∑X i −X 2

2nXEb 1 Y 2nX o 1 X 1 2n o 1 X 2n 21 X 2 2nXEb 1 Y

X j −X X j −X X i −X

EY i b 1 E ∑ YjYi ∑ EY j Y i EY 2i

∑X i −X 2 j,,j≠i ∑X i −X 2 ∑X i −X 2

X j −X X i −X

∑ o 1 X i o 1 X j 2 o 1 X i 2

j≠i ∑X i −X 2 ∑X i −X 2

X j −X X i −X X i −X

∑ o 1 X i o 1 X j o 1 X i 2 2

j≠i ∑X i −X 2 ∑X i −X 2 ∑X i −X 2

X j −X X i −X

∑ o 1 X i o 1 X j 2

j ∑X i −X 2 ∑X i −X 2

X j −X X i −X

o 1 X i ∑ o 1 X j 2

j ∑ X i −X 2 ∑ X i −X 2

o 1 X i o ∑ 1 ∑ 2

j ∑X i −X 2 j ∑X i −X 2 ∑X i −X 2

X i −X

o 1 X i 1 2

∑ X i −X 2

X i −X

2 ∑ X i EY i b 1 2 ∑ X i o 1 X i 1 2

∑X i −X 2

X i −XX i

2 ∑ X i o 1 X i 1 2 ∑ 2

∑X i −X 2

2 o 1 ∑ X i 2 21 ∑ X 2i 2 2

2n o 1 X 2 21 ∑ X 2i 2 2 2n o 1 X 2 21 ∑ X 2i 2 2 2 ∑ X i EY i b 1

Hence

ESSE E∑ Y 2i − n Y 2 ∑ X 2i − nX 2 b 21 2nXb 1 Y − 2 ∑ X i Y i b 1

∑ EY 2i − nE Y 2 ∑ X 2i − nX 2 Eb 21 2nXEb 1 Y − 2 ∑ X i EY i b 1

∑ EY 2i − nE Y 2 ∑X i − X 2 Eb 21 2nXEb 1 Y − 2 ∑ X i EY i b 1

n 2 n 2o 2n o 1 X 21 ∑ X 2i − 2 − n 2o − 2n o 1 X − n 21 X 2 2

21 ∑ X 2i − n 21 X 2 2n o 1 X 2n 21 X 2 − 2n o 1 X − 2 21 ∑ X 2i − 2 2

n − 2 2

We used the following

X ∑X i − X 0

j

6

∑X j −XX j −X ∑X i −X ∑X j −XX j −XX i −X

X j −XX j

∑

j j j

j ∑X i −X 2 ∑X i −X 2 ∑X i −X 2

∑X j −XX j −X

j

1

∑X i −X 2

and ∑ X 2i − nX 2 ∑X i − X 2 ∑ X 2i − nX 2

Finally

ESSE n−2 2

EMSE E SSE

n−2

n−2

n−2

2

(2.24a) SSE ∑ Y 2i − b o ∑ Y i − b 1 ∑ X i Y i

∑X i −XY i − Y 2

(2.24b) SSE ∑Y i − Y 2 −

∑X i −X 2

∑ Xi ∑ Yi

2

∑ XiYi−

∑ Yi2 n

(2.24c) SSE ∑ Y 2i − −

n

∑ Xi2

∑ X 2i − n

PROBLEMS

Solution:

2

SSE ∑Y i −Y i 2 ∑Y 2i − 2Y i Y i Y i

2

∑Y 2i − Y i Y i − Y i Y i Y i

∑ Y 2i − ∑ Y i b o b 1 X i − ∑b o b 1 X i Y i − Y i

∑ Y 2i − b o ∑ Y i − b 1 ∑ X i Y i − b o ∑Y i − Y i − b 1 ∑ X i Y i − Y i

∑ Y 2i − b o ∑ Y i − b 1 ∑ X i Y i

∑Y i − Y i 0, ∑ X i Y i − Y i ∑ X i e i 0

(2.24a) SSE ∑ Y 2i − b o ∑ Y i − b 1 ∑ X i Y i

(2.21) SSE ∑Y i − b o − b 1 X i 2

∑ Xi∑ Yi

∑ XiYi− n ∑X i −XY i − Y

(2.10a) b1

∑ X i 2 ∑X i −X 2

∑ X 2i − n

(2.10b) b o 1n ∑ Y i − b 1 ∑ X i Y − b 1 X

∑X i −XY i − Y 2

SSE ∑Y i − 1n ∑ Y i − b 1 ∑ X i − Xi

∑X i −X 2

∑X i −XY i − Y ∑X i −XY i − Y 2

∑Y i − 1n ∑ Y i − ∑ X1 − Xi

∑X i −X 2

∑X i −X 2

∑X i −XY i − Y ̄ ∑X i −XY i − Y 2

∑Y i − Y X− Xi

∑X i −X 2 ∑X i −X 2

∑X i −XY i − Y 2

∑Y i − Y − X i − X̄

∑X i −X 2

∑X i −XY i − Y ̄ 2 ∑ i

X −XY i − Y

∑Y i − Y 2 − 2 ∑Y i − Ȳ X i − X̄ ∑ X i − X 2

∑ i X −X 2

∑ iX −X 2

7

∑X i −XY i − Y 2 ∑X i −XY i − Y 2

∑Y i − Y 2 − 2

∑X i −X 2 ∑X i −X 2

∑X i −XY i − Y 2

∑Y i − Y 2 −

∑X i −X 2

∑

2

X i −XY i − Y

(2.24b) SSE ∑Y i −Ȳ −2

∑X i −X 2

∑X i −XY i − Y 2

(2.24c) SSE ∑Y i − Y 2 −

∑X i −X 2

Let us notice that

∑X i − XY i − Y 2 ∑X i Y i − X i Y − Y i X X Y 2

∑ X i Y i − ∑ X i Y − ∑ XY i nX Y 2

∑ Xi ∑ Yi 2

∑ XiYi − n

∑ Z 2i − Z ∑ Z i − Z ∑Z i − Z ∑ Z 2i − 1n ∑ Z i ∑ Z i

∑ Z i 2

∑ Z 2i − n

Using it for Z i Y i and for Z i X i

∑ Xi ∑ Yi

2

∑ XY−

∑X i −XY i − Y 2 ∑ Y i 2

i i n

SSE ∑Y i − Y − 2

∑ Y 2

− −

∑X i −X 2 i n

∑ Xi2

∑ X 2i − n

∑ Xi ∑ Yi

2

∑ XiYi−

∑ Yi 2

n

(2.24c) SSE ∑ Y 2i − −

n

∑ Xi2

∑ X 2i − n

(2.25) Yi o 1Xi i

where:

Y i - is the value of the response in the ith trial

o and 1 are parameters

X i is the known constant, namely,the value of the independent

variable in the ith trial

i are independent N0, 2 i 1, 2, . . . , n.

PROBLEMS

Solution: Since Y i is a linear transformation of a normally distributed

random variable i therefore Y i has normal distribution.

EY i E o 1 X i i o 1 X i E i o 1 X i

VarY i Var o 1 X i i Var i 2

Hence under normal model Y i has N o 1 X i , 2

or

Since i is N0, 2 that means that the corresponding density

function is given by

8

2

f e i z f N0, 2 z 1

exp− 2z 2

2 2

Since Y 1 o 1 X i i therefore

f Y i y i f i y i − o − 1 X i f N0, 2 y i − o − 1 X i

y − o − 1 X i 2

1

exp− i 2 2

2 2

P i y − o − 1 X i F i y − o − 1 X i

Hence

f Y i y ∂y∂ F Y i y ∂y∂ F i y − o − 1 X i f i y − o − 1 X i

y i − o − 1 X i 2

f N0, 2 y i − o − 1 X i 1

exp− 2 2

2 2

Hence under normal model Y i has N o 1 X i , 2

The likelihood function for the normal error model (2.25), given the

sample observations Y 1 , . . . , Y n , is:

n

(2.26) L o , 1 ,

2 1

exp − 21 2 Y i − o − 1 X i 2

i1 2 2

The values of o , 1 and 2 which maximize this likelihood function

are the maximum likelihood estimators. These are:

parameter maximum likelihood estimator

o bo the same as (2.10b)

1 b1 the same as (2.10a)

(2.27)

n

∑Y i −Y i 2

2

2 i1

n

2

Thus, the maximum likelihood estimator is biased, and ordinary MSE as given in

(2.22) is used.

PROBLEMS.

Question 1.

The results of a certain experiments are shown below

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

X i 5.5 4.8 4.7 3.9 4.5 6.2 6.0 5.2 4.7 4.3 4.9 5.4 5.0 6.3 4.6 4.3 5.0 5.9 4.1 4.7

Y i 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6 2.0 2.9 2.3 3.2 1.8 1.4 2.0 3.8 2.2 1.5

Summary calculation results are:

∑ X i 100. 0, ∑ Y i 50. 0, ∑ X 2i 509. 12, ∑ Y 2i 134. 84, ∑ X i Y i 257. 66.

a) Obtain the least squares estimates of o and 1 , and state the estimated regression

function.

b) Obtain the point estimate for mean Y when X score is 5.0

c) What is the point estimate of change in the mean response when the X score increases

by one .

Question 2

For the following set of data:

9

X i 30 20 60 80 40 50 60 30 70 60

Y i 73 50 128 170 87 108 135 69 148 132

1) Obtain the estimated regression function. 2) Interpret b o and b 1 .

Question 3.

Prove theorem 2.11.

Question 4.

Prove the following statements:

1) The sum of the observed values Y i equals the sum of the fitted values Y i ;

n n

∑ Yi ∑ Y i

i1 i1

2) The regression line always goes through the point (X, Y

Question 5.

Prove the following statements:

1) The sum of residuals is zero:

n

∑ ei 0

i1

2) The sum of the weighted residuals is zero when the residual in the ith trial

is weighted by the level of the independent variable in the ith trial:

n

∑ Xiei 0

i1

Throughout remainder of this course unless otherwise stated, we assume that the normal

error model (2.25) is applicable.

The model is

3. 1 Yi o 1Xi i

where

o and 1 are parameters

X i are known constants

i are independent N0, 2

Inferences concerning 1 .

Ho : 1 0

Ha : 1 ≠ 0

The reason for this is that 1 0 indicates that there is no linear association between X

and Y. To develop appropriate test first we have to find out the sampling distribution of

b 1 - our point estimator of 1 .

Sampling distribution of b 1 :

∑X i −XY i − Y

3. 2 b1

∑X i −X 2

Theorem: For the model (3.1), the sampling distribution of b 1 is normal

with:

3. 3a Eb 1 1

2

3. 3b Varb 1 2

∑X i −X

Proof: Let us notice that b 1 is a linear combination of Y i

10

∑X i −XY i − Y X i −X ∑X i −X

b1 ∑ Y i − Y

∑X i −X 2 ∑X i −X 2 ∑X i −X 2

but ∑X 1 − X ∑ X i − nX ∑ X i − ∑ X i 0

hence

X i −X

3. 5 b 1 ∑ Yi

∑X i −X 2

X i −X

According to our model are constants (since X i are fixed), and we

∑X i −X 2

know that linear combination of independent normally distributed random

variables is normally distributed. The unbiasedness of the point estimator b 1

stated earlier in the Gauss-Markov theorem. We denote the quantities in (3.5) by

X i −X

3. 5a k i

∑X i −X

2

The variance of b 1 we calculate using the fact that Y i are independent random

variables , each with the variance 2 .

2

X i −X

Hence Varb 1 Var∑ k i Y i ∑ k 2i VarY i ∑

2

k 2i ∑

2

∑X i −X 2

∑X i −X 2

2 2

2 1

.

∑X i −X 2 ∑ i −X 2

X

2 with the unbiased estimator of 2 namely

, MSE

∑Y i −Y i 2

3. 9 s 2 b 1 MSE

.

∑X i −X 2 n−2 ∑X i −X 2

Let us notice that k i have a number of properties:

1. ∑ k i 0

2. ∑ k i X i 1

Proof of 2

∑ k i X i ∑ k i X i − X ∑ k i ∑ k i X i − ∑ k i X ∑ k i X i − X

X i −X X −XX i −X

∑ X i − X ∑ i 1.

∑X i −X 2 2

∑X i −X

3. ∑ k 2i 1

∑X i −X 2

Sampling distribution of (b 1 − 1 /sb 1

If Z 1 , . . , Z n are independent random variables each N0, 1

then ∑ Z 2i is 2 n

If Z is N0, 1 and V is 2 n and Z and V are independent then

T Z have an t distribution with n degrees of freedom.

V/n

If Z 1 , . . , Z n are independent random variables each N0, 1 then

Z and (Z 1 − Z , . . , Z n − Z are independent.

Using the above theorems one can get that

b 1 − 1

3. 10 sb 1

is distributed as t(n-2) for the model (3.1)

The reason for the n-2 degrees of freedom is that the two parameters

( o and 1 need to be estimated, hence , two degrees of freedom are

lost here.

11

probability statement:

(3.12) Pt/2; n − 2 b 1 − 1 /sb 1 t1 − /2; n − 2 1 −

where t/2; n − 2 denotes /2100 percentile of the t distribution with

n-2 degrees of freedom. Since t distribution is symmetric we know that

(3.13) t/2; n − 2 −t1 − /2; n − 2

Therefore we get the following confidence interval

(3.14) Pb 1 − t1 − /2; n − 2sb 1 1 b 1 t1 − /2; n − 2sb 1 1 − .

or we can rewrite it as

(3.15) b 1 t1 − /2; n − 2sb 1

Tests concerning 1

(3.16) H o : 1 0

Ha : 1 ≠ 0

b 1 − 1

Since sb is distributed as t(n-2) the test statistics is

1

(3.17) t ∗ sbb 1

1

The decision rule (at the level of significance ) is

(3.17a) if |t ∗ | t1 − /2; n − 2, conclude H o

if |t ∗ | t1 − /2; n − 2, conclude H a

P-value together with the value of the test statistics. In this way,

one can conduct a test at any desired level of significance by

comparing the P-value with the specific level . Users of

computers need to be careful to ascertain whether one sided

or two sided P-values are furnished.

Sampling distribution of b o

(3.19) bo Y − b1 X

For the model (3.1), (normal error model) the sampling distribution of

b o is normal with the following parameters:

(3.20a) Eb o o

∑ X 2i 2

(3.20b) Varb o 2 2 1

X

n ∑X i −X 2 ∑X i −X 2

n

estimator MSE;

∑ X 2i 2

(3.21) s 2 b o MSE MSE 1

X

n ∑X i −X 2 ∑X i −X 2

n

b o − o

(3.22) sb o

is distributed as tn − 2 for the model (3.1)

The confidence limits for o are calculated in the

same manner as those for 1 . The confidence limits for o are as follows:

(3.23) b o t1 − /2; n − 2sb o

Let X h denote the level of X for which we wish to estimate the mean response.

X h may be the value which occurred in the sample, or it may be some other value of

the independent variable within the scope of the model. The mean response when

12

X X h is denoted by E(Y h ). Formula (2.12) gives us the point estimator Y h of

E(Y h ) :

(3.27) Y h bo b1Xh

We will consider now the sampling distribution of Y h.

For model (3.1), the sampling distribution of Y h is normal, with

(3.28a) E Y h EY h

X h −X 2

(3.28b) Var Y h 2 1n

∑X i −X 2

When MSE is substituted for 2 in (3.28b), we obtain s 2 Y h , the estimated

variance of Y h :

X h −X 2

(3.30) s 2 Y h MSE 1n

∑X i −X

2

Statement

Y h −EY h

(3.31) is distributed as tn − 2 for model (3.1)

sY h

A 1 − confidence

interval for E(Y

h ) is given by:

(3.32) Y h t1 − /2; n − 2s Y h

The basic idea of a prediction interval is thus to choose a range in the distribution of Y

where in

most of the observations will fall, and to declare that the next observation will fall in the

range.

In general, when the regression parameters are known, the 1- prediction limits for Y hnew

are:

(3.33) EY h z1 − /2

In case when

the parameters are unknown 1- prediction limits are

(3.35) Y h t1 − /2; n − 2sY hnew

where

(3.37) s 2 Y hnew s 2 Y h MSE

Using (3.30) one can get

X h −X 2

(3.37a) s 2 Y hnew MSE1 1n

∑X i −X 2

Basic notations. The analysis of variance is based on the partitioning of sums of squares

and degrees of freedom associated with the response variable Y. To illustrate this approach

we will consider the following data

13

Run i X 1 Y i

1 30 73

2 20 50

3 60 128

4 80 170

5 40 87

6 50 108

7 60 135

8 30 69

9 70 148

10 60 132

The first picture shows Y with the observed Y i

(3.41) Yi − Y

The measure of total variation , denoted by SSTO is equal to

(3.42) SSTO ∑Y i − Y 2

Here SSTO stands for total sum of squares. If SSTO 0, all observations are the same.

When we use the regression approach, the variation reflecting the uncertainty in the data

is that of Y observations

around the regression line:

(3.43) Yi − Y i

These deviation are shown in the next figure. The measure of variation in the data with

the regression model is sum of2 squared deviations (3.43)

(3.44) SSE ∑Y i − Y i

If SSE 0 , all observations fall on the fitted regression line.

14

In Our case

SSTO 13 660 and SSE 60.

What accounts for this difference? The difference, is another sum of squares

(3.45) SSR ∑Y i − Y 2

where SSR stands for regression sum of squares.

This deviations are shown in the next figure

Each deviation is simply the difference between the fitted value on the regression line

and the mean of the fitted values Y (Recall from (2.18) that the mean of fitted

values Y i is Y ).

decompose this deviation as follows:

Yi − Y Yi − Y Yi − Y i

Total Deviation around

regression value

deviation regression line

aroud mean

Next figure shows this decomposition for one of the observations

15

It is very interesting property that the sum of these squared deviations have the same

relationship:

(3.48) ∑Y i − Y 2 ∑Y i − Y 2 ∑Y i − Y i 2

or using notation in (3.42), (3.44) and (3.45):

(3.48a) SSTO SSR SSE

Proof:

∑Y i − Y 2 ∑Y i −Y Y i− Y i 2

∑Y i − Y 2 Y i − Y i 2 2Y i − Y Y i − Y i

∑Y i − Y 2 ∑Y i − Y i 2 2 ∑Y i − Y Y i − Y i

The last

term on the right is zero

since:

2 ∑Y i − Y Y i − Y i 2 ∑ Y i Y i − Y i − 2 Y ∑Y i − Y i

and

2 Y ∑Y i − Y i 2 Y ∑ e i 0 ((2.17))

and

2 ∑ Y i Y i − Y i 2 ∑ Y i e i 2 ∑b o b 1 X i e 2

2b o ∑ e i 2b 1 ∑ X i e i 0 (since (2.19))

Hence (3.48) follows.

Computational formulas.

The definitional formulas for SSTO, SSR and SSE presented above are often not

convenient for hand computation. Useful formulas for SSTO and SSR are:

∑ Yi2

(3.49) SSTO ∑ Y 2i − n ∑ Y 2i − n Y 2

∑ Xi ∑ Yi

(3.50a) SSR b 1 ∑ X i Y i − n b 1 ∑X i − XY i − Y

or

(3.50b) SSR b 21 ∑X i − X 2

PROBLEMS.

2. Prove (3.49), (3.50a) and (3.50b)

Solution:

SSTO ∑Y i − Y 2

SSE ∑Y i − Y i 2

SSR ∑Y i − Y 2

SSTO SSR SSE

SSTO ∑ Y 2i − n Y 2

16

2

SSR ∑Y i − 2Y i Y Y

2

2 2

∑ Y i − 2n Y 2 n Y 2 ∑ Y i − n Y 2

∑b o b 1 X i 2 − n Y 2

∑b 2o 2b o b 1 X i b 21 X 2i − n Y 2

n Y − b 1 X 2 2 Y − b 1 Xb 1 nX b 21 ∑ X 2i − n Y 2

n Y 2 − 2nb 1 X Y nb 21 X 2 2nb 1 X Y − 2nb 21 X 2 b 21 ∑ X 2i − n Y 2

∑ Xi2

b 21 ∑ X 2i − nb 21 X 2 b 21 ∑ X 2i − n b 21 ∑X i − X 2

or

SSR ∑Y i − Y 2 ∑b o b 1 X i − Y 2 ∑ Y − b 1 X b 1 X i − Y 2

∑b 1 X i − X 2 b 21 ∑X i − X 2

so we get 3. 50b

SSR b 21 ∑X i − X 2 b 1 b 1 ∑X i − X 2

∑ Xi ∑ Yi

∑ XiYy− ∑ Xi ∑ Yi

∑ ∑

n

b1 X i − X 2

b 1 X i Y y −

∑X i −X 2 n

∑ Xi ∑ Yi

∑ XiYy− n

b1

∑X i −X 2

bo Y − b1 X

Degrees of freedom.

Corresponding to the partitioning of the total sum of squares SSTO there is partitioning

of associated degrees of freedom.

SSE has n - 2 degrees of freedom. SSR has one degree of freedom. Therefore SSTO

has n - 1 degrees of freedom.

Mean squares.

(3.51) MSR SSR 1

SSR

and in the error mean square, MSE defined in (2.22):

(3.52) MSE SSEn−2

In our example we have SSR 13. 600 and SSE 60

hence MSE 13.6001

13. 600 and MSE 608 7. 5

Basic table. The breakdowns of the total sum of squares and associated degrees

of freedom are displayed in the form of an analysis of variance table

(ANOVA table):

source of variation SS df MS E(MS)

regression SSR ∑Y i − Y 2 1 MSR SSR 1

2 21 ∑X i − X 2

error SSE ∑Y i − Y i 2 n − 2 MSE SSE n−2

2

total SSTO ∑Y i − Y 2 n − 1

17

source of variation SS df MS

regression 13600 1 13600

error 60 8 7. 5

total 13660 9

decomposition is utilized. Recall that by (3.49):

∑ Yi2

SSTO ∑ Y 2i − n ∑ Y 2i − n Y 2

In the modified ANOVA table, the total uncorrected sum of squares, denoted by

SSTOU, is defined as:

(3.53) SSTOU ∑ Y 2i

and the correction for the mean sum of squares, denoted by SS (correction for mean)

is defined as:

source of variation SS df MS

regression SSR ∑Y i − Y 2 1 MSR SSR

1

error SSE ∑Y i − Y i 2 n − 2 MSE SSE

n−2

2

correction for mean SScorrection for mean n Y 1

total uncorrected SSTOU ∑ Y 2i n

F test of 1 0 versus 1 ≠ 0.

ANOVA provides us with battery of highly useful tests for regression model.

For a simple regression case considered here, the analysis of variance provides us with

test for:

(3.58) Ho : 1 0

Ha : 1 ≠ 0

Test statistics:

(3.59) F ∗ MSR

MSE

The decision rule:

(3.61) If F ∗ F1 − ; 1, n − 2, conclude H o

If F ∗ F1 − ; 1, n − 2, conclude H a

where F1 − ; 1, n − 2 is 1 − 100 percentile of the appropriate F distribution.

Equivalence of F test and t test.

For a given level, the F test of 1 0 versus 1 ≠ 0 is equivalent algebraically to

the two-tailed t test. To see this, recall from (3.50b) that

SSR b 21 ∑X i − X 2

Thus, we can write:

b 21 ∑X i −X 2

F ∗ MSR

MSE

SSR1

SSEn−2

MSE

Since s b 1

2 MSE

we have

∑X i −X 2

18

b 21 ∑X i −X 2 b 21 b 21 2

(3.62) F∗ s b

b1

sb 1

t ∗ 2

1

MSE MSE 2

∑Xi−X2

Corresponding to the relation between t ∗ and F ∗ , we have the following

relation between the required percentiles of the t and F distributions in the tests:

t1 − /2; n − 2 2 F1 − ; 1; n − 2.

Thus, at given level, we can use either t test or the F test for testing

of 1 0 versus 1 ≠ 0. Whenever one test leads to H o, so will the other, and

correspondingly to H a . The t test is more flexible since it can be used for one-sided

alternatives involving 1 ( 1 0 or 1 0 while the test F cannot.

Descriptive measures of association between X and Y in regression model.

Coefficient of determination.

We saw earlier that SSTO measures the variation in the observations Y i , or the

uncertainty in predicting Y, when no account of independent variable X is not

considered. Also SSE measures the variation in the Y i when a regression model

utilizing the independent variable X is employed. A natural measure of the effect

of X in reducing variation in Y i , (the uncertainty in predicting Y) is

(3.69) r 2 SSTO−SSE

SSTO

SSTO

SSR

1 − SSTO

SSE

it follows that

(3.70) 0 r2 1

We may interpret r 2 as the proportionate reduction of total variation associated with

the use of independent variable. Thus, the larger r 2 , the more is the total variation of Y

reduced by introducing the independent variable X. The limiting value for r 2 occur

as follows:

1. If all observations fall on the fitted regression line, SSE 0 and r 2 1.

In this case the independent variable accounts for all variation in the

observations Y i

2. If the slope of the fitted regression line is zero (b 1 0) so that Y i Y , SSE SSTO

and r 2 0. In this case there is no linear association between X and Y in the sample

data, and the independent variable is no help in reducing the variation in the

observations Y i with linear regression.

Coefficient of correlation

(3.71) r r2

is called the coefficient of correlation. A plus or minus sign is attached to this measure

according to whether the slope of the fitted regression line is positive or negative.

The range is

(3.72) −1 r 1

r does not have such a clear-cut interpretation as r 2 .

A direct computational formula for r, which automatically furnishes the

proper sign is:

∑X i −XY i − Y

(3.73) r

∑X i −X 2 ∑Y i − Y 2 1/2

EXAMPLES

For our data we get the following

19

Xi Yi XiYi X 2i Y i b o b 1 X i e i Y i − Y i e 2i

30 73 2190 900 70 3 9

20 50 1000 400 50 0 0

60 128 7680 3600 130 -2 4

80 170 13600 6400 170 0 0

40 87 3480 1600 90 -3 9

50 108 5400 2500 110 -2 4

60 135 8100 3600 130 5 25

30 69 2070 900 70 -1 1

70 148 10360 4900 150 -2 4

60 132 7920 3600 130 2 4

500 1100 61800 28400 1100 0 60 ←TOTALS

Therefore

X 1n ∑ X i 500 10

50

Y n ∑ Y i 10 110

1 1100

∑ Xi∑ Yi

∑ XiYi− n 61800− 5001100

b1 10

2. 0

∑ Xi2 28400− 500

2

∑ X 2i − n

10

b o Y − b 1 X 110 − 2 50 10

So

Y i 10 2 ∗ X i

In our case

SSE ∑Y i − Y i 2 ∑ e 2i 60

MSE SSE n−2

608

7. 5

s b 1

2 MSE

MSE

7.5

7.5

0. 002206

∑X i −X

2 2 ∑ Xi 28400− 500 2 3400

∑ X 2i − n

10

The 95% confidence interval for 1 is

b 1 − t1 − /2; n − 2sb 1 , b 1 t1 − /2; n − 2sb 1

2 − 2. 306 0. 046968, 2 2. 306 0. 046968 1. 89, 2. 11

where t1 − /2; n − 2 t0. 975; 8 2. 306.

Test for 1

Ho; 1 0

Ha; 1 ≠ 0

The test statistics

t ∗ sbb 1 0.046968

2

42. 582

1

The decision rule in our case is

If |t ∗ | t1 − /2; n − 2, conclude H o

If |t ∗ | t1 − /2; n − 2, conclude H a

In our case |t ∗ | 42. 582 2. 306 t0. 975; 8 t1 − /2; n − 1

so we conclude H a ( 1 ≠ 0), that means that there is a linear association

between X and Y.

The P-value for our calculated value of the test statistics is

20

Pt8 t ∗ 42. 58 0. 0005

First we calculate

∑ X 2i

s 2 b o MSE 7. 5 103400

28400

6. 264 7

n ∑X i −X 2

0ne can use also

2

50 2

s 2 b o MSE 1n X

7. 5 101 6. 264 7

∑X i −X 2 3400

The 90% confidence interval for o is

b o − t1 − /2; n − 2sb o , b o t1 − /2; n − 2sb o

10 − 1. 860 2. 5029, 10 1. 860 2. 5029 5. 34, 14. 66

where t1 − /2; n − 2 t0. 95; 8 1. 860

We would like to find the confidence interval for mean response corresponding to the

level of explanatory variable denoted by X h .

X h −X 2

s 2 Y h MSE 1n

∑X i −X2

55−50 2

s 2 Y 55 7. 5 10

1

3400 0. 805 15

Hence

sY 55 0. 80515 0. 897 3

Using the regression we get

Y 55 b o b 1 X 55 10 2 55 120

The

90% confidence formean EY 55 is given by

Y h − t1 − /2; n − 2sY h , Y h t1 − /2; n − 2sY h

120 − 1. 860 0. 8973, 120 1. 860 0. 8973 118. 3, 121. 7

where t1 − /2; n − 2 t0. 95; 8 1. 860.

We know that

s 2 Y hnew s 2 Y h MSE

Let X hnew 55

We get

s 2 Y 55new s 2 Y 55 MSE 0. 80515 7. 5 8. 305 2

and

sY 55new 8. 305 2 2. 881 9

The

confidence interval is given by

Y h − t1 − /2; n − 2sY hnew , Y h t1 − /2; n − 2sY hnew

For 90% confidence level we get

120 − 1. 860 2. 8819, 120 − 1. 860 2. 8819 114. 6, 125. 4

where t1 − /2; n − 2 t0. 95; 8 1. 860.

Analysis of variance table.

Basic table. The breakdowns of the total sum of squares and associated degrees

of freedom are displayed in the form of an analysis of variance table

(ANOVA table):

21

source of variation SS df MS E(MS)

regression SSR ∑Y i − Y 2 1 MSR SSR

1

2 21 ∑X i − X 2

error SSE ∑Y i − Y i 2 n − 2 MSE SSE

n−2

2

total SSTO ∑Y i − Y 2 n − 1

source of variation SS df MS

regression 13600 1 13600

error 60 8 7. 5

total 13660 9

F test of 1 0 versus 1 ≠ 0.

The hypothesis.

Ho : 1 0

Ha : 1 ≠ 0

Test statistics:

F ∗ MSR

MSE

13600

7.5

1813. 3

The decision rule:

If F ∗ F1 − ; 1, n − 2, conclude H o

If F ∗ F1 − ; 1, n − 2, conclude H a

If 0. 05 then F1 − ; 1; n − 2 F0. 95; 1; 8 5. 32

Since F ∗ 1813. 3 5. 32 we conclude H a , that 1 ≠ 0.

Coefficient of determination.

r 2 SSTO−SSE

SSTO

SSTO

SSR

1 − SSTO

SSE

1 − 13660

60

0. 995 61

Coefficient of correlation

r r 2 0. 99561 0. 997 8

(a plus sign since the slope of the fitted regression line is positive)

PROBLEMS.

Question 1.

The results of a certain experiments are shown below

i 1 2 3 4 5 6 7 8 9

Xi 7 6 5 1 5 4 7 3 4

Y i 97 86 78 10 75 62 101 39 53

i 10 11 12 13 14 15 16 17 18

Xi 2 8 5 2 5 7 1 4 5

Y i 33 118 65 25 71 105 17 49 68

∑X i − X 2 74. 5, ∑Y i − Y X i − X 1098.

1) Obtain the estimated regression function. 2) Plot the estimated regression function and

the data. 3) Interpret b o and b 1 . 4) Find the 90% confidence interval for: o , 1 , and

interpret them. 5)Test the H o : 1 0 versus H a : 1 ≠ 0 using t ∗ and ANOVA.

using 0. 05

22

6) Find 90% confidence intervals for mean of response variable corresponding

to the level of the explanatory equal to 5.

7)Find 95% prediction limits for new observation of the response variable

corresponding to the level of the explanatory equal to 5.

8) Obtain the residuals e i . 9) Estimate 2 and .

Question 2.

The results of a certain experiments are shown below

i 1 2 3 4 5 6 7 8 9 10

Xi 1 0 2 0 3 1 0 1 2 0

Y i 16 9 17 12 22 13 8 15 19 11

1) Obtain the estimated regression function. 2) Plot the estimated regression function

and the data. 3) Interpret b o and b 1 . 4) Find the 95% confidence interval for: o , 1 ,

and interpret them. 5)Test the H o : 1 0 versus H a : 1 ≠ 0 using t ∗ and ANOVA.

using 0. 05 6) Find 95% confidence intervals for mean of response variable

corresponding to the level of the explanatory equal to 3.

7)Find 90% prediction limits for new observation of the response variable

corresponding to the level of the explanatory equal to 3.

8) Obtain the residuals e i . 9) Estimate 2 and . 10)Compute ∑ e 2i

Question 3.

In a test of the alternatives H o : 1 0 versus H a : 1 0, a student concluded

H o . Does this conclusion imply that there is no linear association between X and Y?

Explain.

Question 4.

Show that b o as defined in (3.19) is an unbiased estimator of o .

Question 5.

Obtain the likelihood function for the sample observations Y 1 , . . . , Y n

given X 1 , . . . , X n if the normal model is assumed to be applicable.

Question 6.

The following data were obtained in the study of solution concentration.

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Xi 9 9 9 7 7 7 5 5 5 3 3 3 1 1 1

Y i 0.07 0.09 0.08 0.16 0.17 0.21 0.49 0.58 0.53 1.22 1.15 1.07 2.84 2.57 3.1

Summary calculational results are: ∑ X i 75, ∑ Y i 14. 33, ∑ X 2i 495

∑ Y 2i 29. 2117, ∑ X i Y i 32. 77.

1) Fit a linear regression function 2) Perform an F test to determine whether or not there

is lack of fit of a linear regression function. Use 0. 05.

Question 7.

The following data were obtained in a certain study.

i 1 2 3 4 5 6 7 8 9 10 11 12

Xi 1 1 1 2 2 2 2 4 4 4 5 5

Y i 6.2 5.8 6 9.7 9.8 10.3 10.2 17.8 17.9 18.3 21.9 22.1

Summary calculational results are: ∑ X i 33, ∑ Y i 156, ∑ X 2i 117

∑ Y 2i 2448. 5, ∑ X i Y i 534.

1) Fit a linear regression function 2) Perform an F test to determine whether or not there

is lack of fit of a linear regression function. Use 0. 05.

RESIDUALS

A residuale i , as defined in (2.16) is

ei Yi − Y i

23

As such, it may be regarded as the observed error, in distinction to the unknown true

error i in the regression model

(4.2) i Y i − EY i

Properties of residuals

The mean of the n residuals e i is by (2.17)

(4.3) e 1n ∑ e i 0

where e denotes the mean of the residuals.

The variance of n residuals is defined as follows

∑e i − e 2 ∑ e 2i

(4.4) n−2

n−2 SSEn−2

MSE

If the model is appropriate, MSE is an unbiased estimator for the variance

of the error terms 2 .

Standardized residuals.

Since the standard deviation of the error terms i is , which is estimated by MSE

we shall define the standardized residual as follow:

ei− e

(4.5) ei

MSE MSE

The following six important departures from model (3.1), the simple linear

regression model with the normal errors, are as follows:

1) The regression function is not linear

2) The error terms do not have the constant variance

3) The error terms are not independent.

4) The model fits all but one or few outlier observations

5) The error terms are not normally distributed

6) One or several important independent variables have been omitted from

the model.

We take up now some informal ways in which graphs of residuals can be analyzed

to provide information on whether any of the six types of departures from the simple

linear regression model (3.1) are present.

To study this we plot residuals against explanatory variable X. The following prototype

residual plots are considered:

24

Figure (a) shows a prototype situation of the residual plot against X if the linear

model is appropriate. The residuals should tend to fall within a horizontal band

centered around 0, displaying no systematic tendencies to be positive and negative.

Figure (b) shows a prototype situation of a departure from the linear regression

model indicating the need for curvilinear regression function. Here the residuals

tend to vary in a systematic fashion between being positive and negative.

Similar interpretation will be if the picture is convex (like x 2 ).

Figure (c) shows a prototype situation of a fact that the error terms do not

have the constant variance. (error variance increases with X) Similar

picture (but oriented differently) will be in case of the decrease of the error

variance with X.

Whenever data are obtained in a time sequence it is a good idea to plot the

residuals against the time, even though time has not been explicitly incorporated

as a variable in the model. The purpose is to see if there is any correlation between

the error terms over time. Example of time related effect is shown in figure (d).

If errors are time independent we would expect the residuals to follow the figure (a).

Presence of outliers

Outliers are extreme observations. In residual plot, they are points that lie far beyond

the scatter of the remaining residuals, perhaps four or more standard deviations from

zero. The following figure present standardized residuals and contains one outlier,

which is circled

Outliers can create great difficulty. When we encounter one, our first suspicion is that

25

the observation resulted from a mistake or other extraneous effect, and hence should

be discarded. On the other hand, outliers may convey significant information. A safe

rule is to discard an outlier if there is direct evidence that it represents an error in

recording, a miscalculation, a malfunction of equipment, or a similar type of

circumstances.

The normality of the error terms can be studied informally by examining the residuals in

a variety of graphic ways. One can construct a histogram of the residuals and see if gross

departures from normality are shown by it. Another possibility is to determine whether,

say, about 68 percent of the standardized residuals e i / MSE fall between -1 and 1,

or about 90 fall between -1.64 and 1.64. (If the sample size is small, the corresponding

t values would be used).

Another possibility is to prepare a normal probability plot of the residuals. This plot

of the ordered residuals against their expected values. To find the expected values

of the i − th smallest observation under normality we use the following expression

(4.6) MSE z i−0.375

n0.25

where zA as usual denotes the A100 percentile of the standard normal distribution

( F N0,1 zA 1 − A2 )

One method of assessing the linearity of the normal probability plot is to calculate the

coefficient of correlation (3.73) relating residuals e i to their expected values under

normality. A high value of the coefficient of correlation, say 0.9 or more is indicative

of normality.

The following test is used for determining whether or not a specified regression

function adequately fits the data. The test assumes that the observations Y for

given X are 1) independent 2) normally distributed and 3) the distributions of Y

have the same variance 2 . The lack of fit test requires repeat observations at one or

more X levels. Repeated trials for the same level of independent variable, of type

described are called replications. The resulting observations are called replicates

Decomposition of SSE

Pure error component. The basic idea for the first component of SSE rest on the fact

that there are replications at some levels of X. Let us denote the i − th observation

for the j − th level of X by Y i,j where i 1, 2, . . . , n j (n j - number of observations

at j − th level of X) j 1, 2, . . . , c (c - number of observed levels of X).

nj

Yj 1

nj ∑ Y i,j - the sample mean of Y at j − th level of X.

i1

The square deviations of Y at j − th level of X is equal to

nj

(4.8) ∑Y i,j − Y j 2

i1

Then we add these sums of squares over all levels of X and denote this sum by SSPE

c nj

(4.9) SSPE ∑ ∑Y i,j − Y j 2

j1 i1

SSPE stands for pure error sum of squares. The degrees of freedom associated with

c

SSPE are n − c (where n ∑ n j - number of all observations)

j1

The pure error mean square MSPE is given by:

26

(4.11) MSPE SSPE n−c

Lack of fit component. The second component of SSE is:

(4.12) SSLF SSE − SSPE

where SSLF denotes lack of fit sum of squares. It can be shown that

c

(4.13) SSLF ∑ Y j − Y j 2

j1

where Y j denotes the fitted value when X X j . The are c − 2 degrees of freedom

associated with SSLF. Thus, the lack of fit mean square is

(4.14) MSLF SSLF c−2

F test.

Test statistics

(4.15) F ∗ MSPE

MSLF

The hypothesis

H o: EY o 1 X

(4.17)

H a: EY ≠ o 1 X

The decision rule

If F ∗ F1 − ; c − 2, n − c conclude H o

(4.18)

If F ∗ F1 − ; c − 2, n − c conclude H a

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Xi 9 9 9 7 7 7 5 5 5 3 3 3 1 1 1

Y i 0.07 0.09 0.08 0.16 0.17 0.21 0.49 0.58 0.53 1.22 1.15 1.07 2.84 2.57 3.1

Summary calculational results are: ∑ X i 75, ∑ Y i 14. 33, ∑ X 2i 495

∑ Y 2i 29. 2117, ∑ X i Y i 32. 77.

1) Fit a linear regression function

2) Perform an F test to determine whether or not there is lack of fit

of a linear regression function. Use 0. 05.

Solution:1) We have

∑ Xi∑ Yi

∑ XiYi− n 32.77− 7514.33

b1 15

− 0. 324

∑ Xi2 495−

75 2

∑ X 2i − n

15

and

b o Y − b 1 X 1n ∑ Y i − b 1 1n ∑ X i 151 14. 33 0. 324 151 75 2. 575 3

Therefore

Y i 2. 575 3 − 0. 324 X i

2) F test for lack of fit.

We have c 5 levels for X and 3 replicates for each level (hence each n j 3

and n 15.

Hence

nj

Yj 1

nj ∑ Y i,j - the sample mean at j − th level of X.

i1

Y1 1

3

3. 1 2. 57 2. 84 2. 836 7 at level X 1

Y2 1

3Y

1. 07 1. 15 1. 22 1. 146 7 at level X 3

Y3 1

3

0. 53 0. 58 0. 49 0. 533 33 at level X 5

27

Y4 1

3

0. 21 0. 17 0. 16 0. 18 at level X 7

Y4 1

3

0. 08 0. 09 0. 07 0. 0 8 at level X 9

c nj

SSPE ∑ ∑Y i,j − Y j 2 2. 84 − 2. 8367 2 2. 57 − 2. 8367 2

j1 i1

3. 1 − 2. 8367 2 1. 22 − 1. 1467 2 1. 15 − 1. 1467 2

1. 07 − 1. 1467 2 0. 49 − 0. 53333 2 0. 58 − 0. 5333 2

0. 53 − 0. 5333 2 0. 16 − 0. 18 2 0. 17 − 0. 18 2

0. 21 − 0. 18 2 0. 07 − 0. 08 2 0. 09 − 0. 08 2 0. 08 − 0. 08 2 0. 157 4

MSPE SSPEn−c 15−5 0. 0 157 4

0.157 4

n

SSE ∑Y i − Y i 2 ∑ Y 2i − b o ∑ Y i − b 1 ∑ X i Y i

i1

29. 2117 − 2. 575 30 14. 33 0. 324 32. 77 2. 925 1

SSLF SSE − SSPE 2. 925 1 − 0. 1574 2. 767 7

MSLF SSLF c−2

2.7677

5−2

0. 922 57

The hypothesis

H o: EY o 1 X

H a: EY ≠ o 1 X

Test statistics

F ∗ MSPE

MSLF

0. 922 57

0.0 157 4

58. 613

The decision rule

If F ∗ F1 − ; c − 2, n − c conclude H o

If F ∗ F1 − ; c − 2, n − c conclude H a

F1 − ; c − 2; n − c F0. 95; 3; 10 3. 71

Since F ∗ 58. 613 F1 − ; c − 2; n − c 3. 71 we conclude H a .

PROBLEMS

Question 1.

Distinguish between:

1) residual and standardized residual

2) E i 0 and e 0

3) error term and residual.

Question 2.

The fitted values and residuals are

i 1 2 3 4 5 6 7 8 9 10

Y i 2.92 2.33 2.25 1.58 2.08 3.51 3.34 2.67 2.25 1.91

e i 0.18 -0.03 0.75 0.32 0.42 0.19 0.06 -0.07 0.55 -0.31

i 11 12 13 14 15 16 17 18 19 20

Y i 2.42 2.84 2.50 3.59 2.16 1.91 2.50 3.26 1.74 2.25

e i -0.42 0.06 -0.20 -0.39 -0.36 -0.51 -0.50 0.54 0.46 -0.75

a) Plot the residuals e i against the fitted values Y i . What departures from

regression model (3.1) can be studied from this plot? What are your findings?

b) Prepare a normal probability plot. Also calculate the coefficient of correlation

between the ordered residuals and their expected values. What is your conclusion?

Question 3.

The following data were obtained in the study of solution concentration.

28

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Xi 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0

Y i 0.65 0.87 0.79 0.15 0.18 0.22 0.51 0.57 0.55 1.23 1.17 1.09 2.91 2.61 3.11

a) Fit a linear regression function

b) Perform an F test to determine whether or not there is lack of fit

of a linear regression function. Use 0. 025

c) Does the test in part b) indicate what regression function is appropriate when it

leads to conclusion that lack of fit of a linear regression exists? Explain.

Question 4.

The following data were obtained in a study of the relation between diastolic

blood pressure (Y) and age (X) for boys 5 to 13 years old

i 1 2 3 4 5 6 7 8

Xi 5 8 11 7 13 12 12 6

Y i 63 67 74 64 75 69 90 60

a) Assuming regression model (3.1) is appropriate, obtain the estimated

regression function and plot the residuals e i against X i . What does your

residual plot show?

b) Omit observation 7 from the data and obtain the estimated regression

line based on the remaining seven observations. Compare this estimated

regression function to that obtained in part a). What can you conclude

about the effect of the observation 7?

c) Using your fitted regression function in part b) obtain a 99 percent

confidence interval for a new observation Y at X 12. Does observation

Y 7 fall outside this prediction interval? What is the significance of this?

Question 5.

The following data were obtained in a certain study.

i 1 2 3 4 5 6 7 8 9 10 11 12

Xi 1 1 1 2 2 3 3 3 3 5 5 5

Y i 4.8 4.9 5.1 7.9 8.3 10.9 10.8 11.3 11.1 16.5 17.3 17.1

Summary calculational results are: ∑ X i 34, ∑ Y i 126, ∑ X 2i 122

∑ Y 2i 1554. 66, ∑ X i Y i 434.

1) Fit a linear regression function

2) Perform an F test to determine whether or not there is lack of fit

of a linear regression function. Use 0. 05.

General approach

The least squares criterion (2.8):

n

Q ∑Y i − o − 1 X i 2

k1

weights each observation equally. There are times, when some observations should

receive greater weight and others smaller weight. The weighted last square criterion for

simple linear regression is

n

(5.31) Q w ∑ w i Y i − o − 1 X i 2

k1

where w i is the given weight of the ith observation. Minimizing Q w with respect to o

29

and 1 leads to the normal equations

∑ wiYi bo ∑ wi b1 ∑ wiXi

(5.32)

∑ w i X i Y i b o ∑ w i X i b 1 ∑ w i X 2i

Solving we get:

∑ wiXi ∑ wiYi

∑ wiXiYi−

∑ wi

(5.33a) b1

∑ w i X i 2

∑ w i X 2i −

∑ wi

∑ w i Y i −b 1 ∑ w i X i

(5.33b) bo

∑ wi

Note that if all weights are equal so w i is the same for all i, the normal equations (5.32)

for weighted least squares reduce to the ones for unweighted least squares (2.10).

Very popular choice for weights is

(5.35) w i X12 .

i

This particular weight relate the error term variance ( case then 2 kX 2i )

is frequently encountered in business, economic.

PROBLEMS

Question 1.

Data from a study of the relation between the size of a bid in million rands (X) and

the cost to the firm of preparing the bid in thousands rands (Y) for 12 recent bids

are presented in table below:

i 1 2 3 4 5 6 7 8 9 10 11 12

X i 2.13 1.21 11.0 6.0 5.6 6.91 2.97 3.35 10.39 1.1 4.36 8.0

Y i 15.5 11.1 62.6 35.4 24.9 28.1 15.0 23.2 42.0 10 20 47.5

The scatter plot strongly suggest that the error variance increase with X.

Fit the weighted lest squares regression line using weights w i X12 .

i

Question 2.

Data from a study of computer-assisted learning by 12 students, showing the total

number of responses in completing a lesson (X) and the cost of computer time

(Y, in cents), follow

i 1 2 3 4 5 6 7 8 9 10 11 12

X i 16 14 22 10 14 17 10 13 19 12 18 11

Y i 77 70 85 50 62 70 52 63 88 57 81 54

Fit the weighted lest squares regression line using weights w i 1

X 2i

.

MATRICES.

The matrix A has elements denoted by a i,k where k refers to the

column and i to the row.

A a i,k

Example:

30

Column 1 Column 2 Column 3

Row 1 6 5 1

Row 2 13 7 2

Row 3 -1 5 2

A matrix with r rows and k columns will be referred as rk matrix.

Transpose of a matrix A denoted by A ′ is the matrix that is obtained by

interchanging corresponding columns and rows of the matrix A.

For example let A be 32 matrix given by

2 5

A 7 10 that is a 1,1 2, a 1,2 5, a 2,1 7, a 2,2 10, a 3,1 3, a 3,2 4

3 4

then

2 7 4

A′

5 10 4

that is

b 1,1 a 1,1 , b 1,2 a 2,1 , b 1,3 a 3,1 , b 2,1 a 1,2 , b 2,2 a 2,2 , b 2,3 a 3,2

Two matrices A and B are said to be equal if they have the same dimension and all

corresponding elements are equal.

Let us notice that if Y is n1 matrix then

Y1

Y2

Y and Y ′ Y1 Y2 . . . Yn

:

Yn

Adding or subtracting two matrices requires that they have the same dimension.

The sum of two matrices is another matrix whose elements each consist of the

sum of the corresponding elements of the two matrices e.g.:

1 4 2 3 12 43 3 7

2 5 1 4 21 54 3 9

3 6 1 3 31 63 4 9

The difference of two matrices is another matrix whose elements each consist of the

difference of the corresponding elements of the two matrices e.g.:

1 4 2 3 1−2 4−3 −1 1

2 5 − 1 4 2−1 5−4 1 1

3 6 1 3 3−1 6−3 2 3

Multiplication of a matrix A by a matrix B. The product AB is only defined

when the number of columns in A equals the number of rows in B.

In general if A has dimension rc and B has dimension cs then

a 1,1 a 1,2 . . . a 1,c b 1,1 b 1,2 . . . b 1,s

a 2,1 a 2,2 . . . a 2,c b 2,1 b 2,2 . . . b 2,s

A B

: : : : : : : :

a r,1 a r,2 . . . a r,c b c,1 b c,2 . . . bc r,s

31

and

c

AB ∑ a i,k b k,j

k1

(we multiply the corresponding elements of i − th row of A by corresponding elements

of j − th column of B and put the sum of the products as i, j − th element

in the resulting matrix)

The determinant of A will be denoted by ether |A|or det(A).

a b

det ad − cb

c d

a b c

det d e f iae − afh − idb dch gbf − gce

g h i

Formulas for det A in case of higher dimensions is given in theorem 18

Special types of matrices

Symmetric matrix ; If AA ′ then A is said to be symmetric matrix.

Diagonal matrix ; A diagonal matrix is a square matrix whose off-diagonal

elements are all zeros e.g:

4 0 0 0

0 1 0 0

C

0 0 −1 0

0 0 0 7

so if Cc i,j then c i,j 0 for i ≠ j.

Identity matrix; The identity matrix denoted by I is a matrix where diagonal

elements are equal to 1 and all the others equal to zero e.g.

1 0 0 0

0 1 0 0

I 44

0 0 1 0

0 0 0 1

Inverse matrix; Let A be a square matrix. If there exists a matrix B such that AB I,

then B is called the inverse of A, denoted by A −1 . Also if AB I, then it can be shown

that BA I. When there exists a matrix B such that AB BA I, the matrix A is

said to be nonsingular; in the contrary case, A is said to be singular.

Theorem 1

If a matrix has an inverse, the inverse is unique.

Theorem 2

If A has an inverse, then A −1 has an inverse and (A −1 −1 A.

Theorem 3

If A and B are nonsingular matrices, then AB has an inverse and (AB) −1 B −1 A −1 .

This can be extended to any finite number of matrices.

Theorem 4

If A is a nonsingular matrix and k is a nonzero constant, then

(kA) −1 1k A −1

Theorem 5

If A and B are mn matrices, and a, b are scalars, then

aA ′ Aa ′ A ′ a aA ′

32

and

aA bB ′ aA ′ bB ′

Theorem 6

If A is any matrix, then

A ′ ′ A

Theorem 7

Let A and B be any matrices such that AB is defined, then

AB ′ B ′ A ′

This can be extended to any finite number of matrices.

Theorem 8

If D is a diagonal matrix, then D D ′

Theorem 9

If A is any matrix, then AA ′ and A ′ A are symmetric.

Theorem 10

If A is a nonsingular matrix, then A ′ and A −1 are nonsingular and

A −1 ′ A ′ −1

The matrices A,B,C discussed in this section are assumed to have size nn.

Theorem 11

detA detA ′

Corollary 12

Any theorem about detA that is true for rows (columns) of a matrix A

is true for columns (rows).

Theorem 13

If two rows (columns) of a matrix are interchanged, the determinant for the matrix

changes sign .

Theorem 14

If each element of the i − th row of an nm matrix A contains a given factor k, then

we may write detA |k| detB, where the rows of B are the same as the rows of A

except that the number k has been factored from each element of the i − th row of A.

Theorem 15

If each element of a row of the matrix A is zero, then detA 0

Theorem 16

If two rows of a matrix A are identical, then detA 0

Theorem 17

The determinant of a matrix is not changed if the elements of the i − th row are

multiplied by a scalar k and the results are added to the corresponding elements of the

h − th row, h ≠ i.

Theorem 18

If A and B are nn matrices, then

detAB detAdetB

this result can be extended to any finite number of matrices.

Let A be any mn matrix. From this matrix, if one deletes any set r m rows and

any set s n columns, the matrix of the remaining elements is a submatrix of A.

If A is an nn matrix and if the i − th row and j − th column are deleted, the determinant

of the remaining matrix, denoted by m ij , is called the minor of a ij . We call A ij the

cofactor of the element a ij , where

A ij −1 ij m i,j

Theorem 19

Let A ij be the cofactor of a ij , then

33

n

detA ∑ a ij A ij

j1

for any i.

Theorem 20

Let A be the nonsingular. (this is equivalent to the detA ≠0)

Then

A −1 detA

1

A ij ′

where A ij is the matrix of cofactors.

Example:

1 0 1

Let A 2 1 2

0 4 6

1 0 1

then detA det 2 1 2 6

0 4 6

1 2 2 1

A 1,1 −1 11 det −2, A 1,2 −1 12 det −12

4 6 0 6

2 1 0 1

A 1,3 −1 13 det 8, A 2,1 −1 21 det 4

0 4 4 6

1 1 1 0

A 2,2 −1 22 det 6, A 2,3 −1 23 det −4

0 6 0 4

0 1 1 1

A 3,1 −1 31 det −1, A 3,2 −1 32 det 0

1 2 2 2

1 0

A 3,3 −1 33 det 1

2 1

Hence

−2 −12 8

A i,j 4 6 −4

−1 0 1

and

′

−2 −12 8 −2 4 −1

′

A i,j 4 6 −4 −12 6 0

−1 0 1 8 −4 1

Hence

34

−2 4 −1 − 13 2

3

− 16

A −1 1

6

A i,j ′ 1

6 −12 6 0 −2 1 0

8 −4 1 4

3

− 2

3

1

6

RANK OF MATRICES

An nm matrix A is said to be of the rank r if the size of the largest nonsingular

square submatrix of A is r. This definition is rather difficult to apply directly to find out

the rank of any given matrix.

Each of the following operations is called an elementary transformation of a matrix A;

1) The interchange of two rows (or two columns) of A.

2) The multiplication of the elements of a row (or a column) of A by the same

nonzero scalar k.

3) The addition of the elements of a row (or a column) of A, after they have been

multiplied by the scalar k, to the corresponding elements of another row (or another

column) of A.

We define the inverse of an elementary transformation as the transformation that restores

the resulting matrix to the original form of A.

Theorem 21

The inverse of an elementary transformation of a matrix is an elementary transformation

of the same type.

Theorem 22

The size and rank of a matrix are not altered by an elementary transformation of a matrix.

Two matrices that have the same size and rank are said to be equivalent

Theorem 23

Every elementary matrix has an inverse of the same type.

Theorem 24

Any nonsingular matrix can be written as the product of elementary matrices.

Theorem 25

The size and rank of the matrix A are not altered by multiplying and postmultiplying

A by elementary matrices.

Theorem 26

If matrices A and B are nonsingular, then for any matrix C the following

matrices AC, CB, and ACB all have the same rank (provided that all multiplication

are defined)

Theorem 27

If A is an mn matrix of rank r, then there exist nonsingular matrices

P and Q such that PAQ is equal to

I I 0

I, I 0 , ,

0 0 0

depending on n,m,r.

Theorem 28

Two matrices A and B of the same size are equivalent if and only if B can be

obtained from A by multiplying and postmultiplying by finite number of elementary

matrices.

Theorem 29

The rank of the product of matrices A and B cannot exceed the rank of either A or B.

35

PROBLEMS

Question 1.

1 −1 0

a) Find the determinant of A 2 1 1

1 0 3

b) Find A −1

Question 2

Suppose that you want to perform the following operations on 33 matrix A

1) Interchange the first and third row

2) Interchange the first and third columns

3) Multiply the first row by -2 and add the result to the third row

4) Multiply the first column by -2 and add the result to the third column

5) Multiply the second row by -2 and add the result to the third row

6) Multiply the second column by -2 and add the result to the third column

7) Mmultiply the second row by 1/2

8) Multiply the third column by 7

0 4 2

If A 4 2 0 find the resulting matrix after performing the eight transformations.

2 0 1

Question 3

For the matrices A and X

1 2 −1 1 3

2 1 0 3 4

A X

0 −3 2 1 1

−3 0 −1 −5 1

find AX, X ′ A.

REGRESSION EXAMPLE.

In regression analysis, one basic matrix is the vector Y (n1), consisting of the n

observations on the dependent variable

Y1

Y2

(6.4) Y

:

Yn

Note that the transpose Y ′ is the row vector;

(6.5) Y′ Y1 Y2 . . . Yn

Another basic matrix in regression analysis is the X (n2) matrix, which is defined

as follows for simple regression analysis:

36

1 X1

1 X2

(6.6) X

: :

1 Xn

The matrix X consists of column of 1’s and a column containing the n values of

the independent variable X. Note that the transpose of X is

1 1 ... 1

(6.7) X′

X1 X2 . . . Xn

The regression model;

Y i EY i i i 1, 2, . . . , n

can be written compactly in matrix notation.

First, let us define the vector (n1) of mean responses:

EY 1

EY 2

(6.9) EY

:

EY n

and the vector of error terms

1

2

(6.10)

:

n

Using Y (6.4) we can write

Y E(Y)

because

Y1 EY 1 1 EY 1 1

Y2 EY 2 2 EY 2 2

: : : :

Yn EY n n EY n n

Thus, the observations vector Y equals to the sum of two vectors, a vector

containing the expected values and another containing the error terms.

Let us define vector of the regression coefficients as follows:

o

(6.13)

1

Then the product X, where X is defined in (6.60 is

1 X1 o 1X1

1 X2 o o 1X2

(6.14) X

: : 1 :

1 Xn o 1Xn

Since o 1 X i EY i we see that X is the vector of expected values EY i

for simple linear regression model i.e. EY X where EY is defined in (6.9).

Another product frequently needed is Y ′ Y,

37

Y1

Y2

(6.15) Y′Y Y1 Y2 . . . Yn ∑ Y 2i

:

Yn

′

Note that Y Y is a 11 matrix,or a scalar. In a compact way we can write the sum of

squares Y ′ Y ∑ Y 2i .

We also will need X ′ X

1 X1

1 1 ... 1 1 X2 n ∑ Xi

(6.16) X′X

X1 X2 . . . Xn : : ∑ X i ∑ X 2i

1 Xn

and X ′ Y

Y1

1 1 ... 1 Y2 ∑ Yi

(6.17) X′Y

X1 X2 . . . Xn : ∑ XiYi

Yn

The principal inverse matrix in regression analysis is the inverse of

the matrix X ′ X

n ∑ Xi ∑ X i 2

det n ∑ X 2i − ∑ X i 2 n∑ X 2i − n n ∑X i − X 2

∑ Xi ∑ Xi 2

Hence

−1

∑ X 2i ∑ Xi

−

(6.26) X ′ X −1

∑ X i ∑ X 2i −∑ X i

n

n ∑X i −X 2 n ∑X i −X 2

Since ∑ X i nX we can simplify (6.26):

∑ X 2i −X

′ −1 n ∑X i −X 2 ∑X i −X 2

(6.27) X X

−X 1

∑X i −X 2 ∑X i −X 2

A random vector or a random matrix contains elements which are random variables.

Expectation of a random vector or matrix.

Suppose we have the following observation vector

Y1

Y Y2

Y3

The expected value of Y is a vector, denoted by E(Y), which is defined as follows:

38

EY 1

E(Y) EY 2

EY 3

In general we can use the following notation

(6.41) E(Y) EY i i 1, 2, . . , n

and for random matrix Y with dimension np, the expectation is

(6.42) E(Y) EY i,k i 1, 2, . . . , n k 1, 2, . . . , p

Y1

Y Y 2

Y3

Each random variable (coordinate) has a variance 2 Y i and any two random

variables have a covariance Y i , Y j . We can assemble these in a matrix called the

variance-covariance matrix of Y denoted by 2 Y

2 Y 1 Y 1 , Y 2 Y 1 , Y 3

(6.43) 2 Y Y 2 , Y 1 2 Y 2 Y 2 , Y 3

Y 3 , Y 1 Y 3 , Y 2 2 Y 3

Let us notice that

Y 1 − EY 1

2 Y E Y 2 − EY 2 Y 1 − EY 1 Y 2 − EY 2 Y 3 − EY 3

Y 3 − EY 3

It follows readily that

(6.44) 2 Y EY − EYY − EY ′

If

Y1

Y2

Y

:

Yn

in general we can write

2 Y 1 Y 1 , Y 2 . . . Y 1 , Y n

Y 2 , Y 1 2 Y 2 . . . Y 2 , Y n

(6.45) 2 Y

: : ... :

Y n , Y 1 Y n , Y 2 . . . Y n

2

2

Basic theorems

Y by a constant matrix A

(6.46) W AY

Then

39

(6.47) E(A) A

(6.48) E(W) E(AY) AE(Y)

(6.49) 2 W 2 AY A 2 YA ′

PROBLEMS

Question 1.

For the matrices below obtain AB,A-B,AC,AB ′ ,B ′ A

1 4 1 3

3 8 1

A 2 6 B 1 4 C

5 4 0

3 8 2 5

Question 2.

For the matrices below obtain AC,A-C,B ′ A,AC ′

2 1 6 3 8

3 5 9 8 6

A B C

5 7 3 5 1

4 8 1 2 4

State the dimension of each resulting matrix.

Question 3.

Find the inverse of each of the following matrices

4 3 2

2 4

A B 6 5 10

3 1

10 1 6

Question 4.

Consider the following functions of the random variables

Y 1 , Y 2 and Y 3 :

W1 Y1 Y2 Y3

W2 Y1 − Y2

W3 Y1 − Y2 − Y3

a) State above in matrix notation

W1

b) Find the expectation of the random vector W W2

W3

c) Find the variance-covariance matrix of W.

the error terms have constant variance 2 2 , and are uncorrelated so

i , j 0 for i ≠ j. We can then write the variance-covariance matrix for the

random vector as follows (using the fact that E i 0 i 1, 2, 3)

1 11 12 13

2 E 2 1 2 3 E 21 22 23

3 31 32 33

40

E 1 1 E 1 2 E 1 3 2 0 0

E 2 1 E 2 2 E 2 3 0 2 0 2I

E 3 1 E 3 2 E 3 3 0 0 2

b) Let us consider W AY where

W1 1 −1 Y1

W 2 1 A 2 2 Y

W2 1 1 Y2

then

W1 1 −1 Y1 Y1 − Y2

W2 1 1 Y2 Y1 Y2

1 −1 EY 1 EY 1 − EY 2

E(W)

1 1 EY 2 EY 1 EY 2

and

1 −1 2 Y 1 Y 1 , Y 2 1 1

2 W)

1 1 Y 2 , Y 1 Y 2

2

−1 1

2 Y 1 2 Y 2 − 2 2 Y 1 , Y 2 2 Y 1 − 2 Y 2

2 Y 1 − 2 Y 2 2 Y 1 2 Y 2 2 2 Y 1 , Y 2

(6.50) Y i o 1 X i i i 1, 2, . . . , n

This implies

Y1 o 1X1 1

Y2 o 1X2 2

(6.51)

:

Yn o 1Xn n

and in matrix terms

Y1 1 X1 1

Y2 1 X2 o 2

(6.52) Y X

: : : 1 :

Yn 1 Xn n

Now we can write

(6.53) Y X

n1 n2 21 n1

since

Y1 1 X1 1 o 1X1 1

Y2 1 X2 o 2 o 1X2 2

: : : 1 : :

Yn 1 Xn n o 1Xn n

With respect to the error terms, model (3.10 assumes that E i 0

and 2 i 2 i 1, 2, . . . , n, and that i are independent normal random

41

variables. The condition E i 0 in matrix terms is:

1 E 1 0

2 E 2 0

(6.54) E E 0

: : :

n E n 0

The condition that the error terms have constant variance and that

all i , j 0 for i ≠ j (since independence) is expressed in matrix

terms through the variance-covariance matrix:

2 0 . . . 0 1 0 ... 0

0 2 . . . 0 0 1 ... 0

(6.55) 2 2 2I

: : ... : : : ... :

0 0 . . . 2 0 0 ... 1

Thus the normal error model (3.1) in matrix terms is:

Let

Y1 1 X1 1

Y2 1 X2 o 2

Y X

: : : 1 :

Yn 1 Xn n

(6.56 ′ ) Y X

n1 n2 n1 n1

where: is the vector of parameters

X - matrix of known constants,namely,the values of the independent

variables

is a vector of independent normal random variables with E 0

and 2 2 I.

nb o b 1 ∑ X i ∑ Y i

(6.57)

b o ∑ X i b 1 ∑ X 2i ∑ X i Y i

in matrix terms are:

(6.58) X ′ Xb X ′ Y

where b is the vector of the least squares regression coefficients:

bo

(6.58a) b

b1

One can verify that

1 X1

1 1 ... 1 1 X2 n ∑ Xi

X′X

X1 X2 . . . Xn : : ∑ X i ∑ X 2i

1 Xn

42

Y1

1 1 ... 1 Y2 ∑ Yi

X′Y

X1 X2 . . . Xn : ∑ XiYi

Yn

Using this result we have

n ∑ Xi bo ∑ Yi

∑ X i ∑ X 2i b1 ∑ XiYi

or

nb o b 1 ∑ X i ∑ Yi

b o ∑ X i b 1 ∑ X 2i ∑ XiYi

To obtain the estimated regression coefficient from the normal equations:

X ′ Xb X ′ Y

by matrix methods, we premultiply both sides by the inverse of X ′ X (we

assume this exists)

X ′ X −1 X ′ Xb X ′ X −1 X ′ Y

so that we find, since X ′ X −1 X ′ X I and Ib b;

(6.59) b X ′ X −1 X ′ Y

The estimators b o and b 1 in b are the same as those given earlier in (2.10a)

and (2.10b).

Example:

Let us find estimated regression coefficients for the following data:

Xi Yi XiYi X 2i

30 73 2190 900

20 50 1000 400

60 128 7680 3600

80 170 13600 6400

40 87 3480 1600

50 108 5400 2500

60 135 8100 3600

30 69 2070 900

70 148 10360 4900

60 132 7920 3600

500 1100 61800 28400 ←TOTALS

Therefore

n 10, ∑ Y i 1100, ∑ X i 500, ∑ X 2i 28400, ∑ X i Y i 61800

Let us now use (6.26)

−1

∑ X 2i ∑ Xi

−

X ′ X −1

∑ X i ∑ X 2i −∑ X i

n

n ∑X i −X 2 n ∑X i −X 2

43

to evaluate X ′ X −1 . We have

∑ Xi2 500 2

n ∑X i − X 2 n ∑ X 2i − n 10 28400 − 10

34 000

Therefore

∑ X 2i ∑ Xi

−

28400 −500

′ −1 34000 34000

X X

−∑ X i −500 10

n 34000 34000

n ∑X i −X 2 n ∑X i −X 2

We also use (6.17) to evaluate X ′ Y :

Y1

1 1 ... 1 Y2 ∑ Yi 1100

X′Y

X1 X2 . . . Xn : ∑ XiYi 61800

Yn

hence

28400 −500

bo 1100 10

b X ′ X −1 X ′ Y 34000

−500

34000

10

b1 34000 34000

61800 2

so b 0 10 and b 1 2.

To reduce the error in calculations we can write

∑ X 2i − ∑ X i

X ′ X −1 1

n ∑X i −X 2 − ∑ Xi n

In our case

28400 −500

X ′ X −1 1

34000

−500 10

and

∑ X 2i − ∑ X i ∑ Yi

b 1

n ∑X i −X 2 − ∑ Xi n ∑ XiYi

28400 −500 1100 10

1

34000

−500 10 61800 2

Let the vector of the fitted values Y i be denoted by Y :

Y1

Y2

(6.64) Y n 1

:

Yn

and the vector of residuals e i Y i − Y i be denoted by e :

44

e1

e2

(6.65) e n 1

:

en

In matrix notation we have

(6.66) Y Xb

because

Y1 1 X1 bo b1X1

Y2 1 X2 bo bo b1X2

: : : b1 :

Yn 1 Xn bo b1Xn

Similarly:

(6.67) e Y −Y Y − Xb

Sums of squares.

we begin with SSTO. We know that

∑ Yi2

SSTO ∑ Y 2i − n Y ∑ Y 2i −

2

(6.68) n

We also known from (6.15)

Y ′ Y ∑ Y 2i

1

1

Let 1 n 1

:

1

Using this we have

1 Y1

1 Y2

(6.69) 1n Y ′ 11 ′ Y 1n Y1 Y2 . . . Yn 1 1 ... 1

: :

1 Yn

∑ Yi

2

1n Y 1 Y 2 . . . Y n Y 1 Y 2 . . . Y n n

Hence

(6.70a) SSTO Y ′ Y − 1n Y ′ 11 ′ Y

The same way as for ∑ Y 2i we obtain that SSE ∑ e 2i in matrix terms is:

(6.70b) SSE e ′ e Y − Xb ′ Y − Xb

which can be shown to equal:

(6.70c) SSE Y ′ Y − b ′ X ′ Y

For SSR SSTO − SSE in matrix terms we have:

(6.70c) SSR b ′ X ′ Y − 1n Y ′ 11 ′ Y.

Example.

45

Y ′ Y ∑ Y 2i 134660

and

10 1100

b X′Y

2 61800

Hence:

1100

b′X′Y 10 2 134 600

61800

and

SSE Y ′ Y − b ′ X ′ Y 134660 − 134600 60

73

50

128

170

87

Y Y′ 73 50 128 170 87 108 135 69 148 132

108

135

69

148

132

Hence

1

1

1

1

1

Y′1 73 50 128 170 87 108 135 69 148 132 1100

1

1

1

1

1

46

73

50

128

170

87

1′Y 1 1 1 1 1 1 1 1 1 1 1100

108

135

69

148

132

′ ′

Y 11 Y

1

n 1100 1100 121 000

1

10

and finally:

SSR b ′ X ′ Y − 1n Y ′ 11 ′ Y 134 600 − 121 000 13 600

An example of a quadratic form of the observations Y i when n 2 is:

(6.71) 5Y 21 6Y 1 Y 2 4Y 22

Note that tis expression is a second-degree polynomial containing terms

involving the squares of the observations and the cross product. We can

express (6.71) in the matrix terms as follows:

5 3 Y1

(6.71a) Y1 Y2 Y ′ AY

3 4 Y2

where A is asymmetric matrix of coefficients.

In general, a quadratic form is defined as:

n n

(6.72) Y ′ AY ∑ ∑ a i,j Y i Y j where a i,j a j,i

i1 j1

A is a symmetric nn matrix and is called the matrix of the quadratic form.

The ANOVA sums of squares SSTO, SSR and SSE are all quadratic forms.

To see this, we need to express the matrix form for these sums of squares

in (6.70). We do this by using

(6.73) 1 1′ J

n11n nn

where J is the nn matrix all of whose elements are 1 ′ s. Also, the transpose of b

in (6.59) can be obtained using the following facts:

A B A B, AB ′ B ′ A ′ and A −1 ′ A ′ −1 in

(6.74) b ′ X ′ X −1 X ′ Y ′ Y ′ XX ′ X −1

by noting that X ′ Xis a symmetric matrix so that it equals its transpose.

Hence

(6.75a) SSTO Y ′ I − 1n JY

(6.75b) SSR Y ′ XX ′ X −1 X ′ − 1n J Y

(6.75c) SSE Y ′ I − XX ′ X −1 X ′ Y

Each of these sums of squares can now be seen to be of the form Y ′ AY.

It can be show that the three A matrices

(6,76a) I − 1n J

(6.76b) XX ′ X −1 X ′ − 1n J

47

(6.76c) I − XX ′ X −1 X ′

are symmetric. Hence, SSTO, SSR, and SSE are quadratic forms, with the matrices

of quadratic forms given in (6.76)

Regression coefficients

The variance-covariance matrix of b:

bo 2 b o b o , b 1

(6.77) 2 b 2

b1 b 1 , b o 2 b 1

is

(6.78) 2 b 2 X ′ X −1

22

or, using (6.27)

∑ X 2i

2

X 2

−

n ∑X i −X 2 ∑X i −X 2

(6.78a) 2 b

X 2 2

−

∑X i −X 2 ∑X i −X 2

When MSE is substituted for 2 in (6.78a) we have

MSE ∑ X 2i

− XMSE

n ∑X i −X 2 ∑X i −X 2

22 − XMSE MSE

∑ X i −X 2 ∑X i −X 2

where s 2 b is the estimated variance-covariance matrix of b. In (6.78a), we can

recognize the variances of b o (3.20b) and b 1 (3.3b) and the covariance of b o

and b 1 .

Estimation.

Mean response.

To estimate the mean response at X h let us define the vector

1

(6.81) Xh or X ′ 1 X h

21 Xh

The fitted value

in matrix notation is

(6.82) Y h X ′h b

since

bo

X ′h b 1 Xh b o b 1 X h Y h Y h

b1

The variance of Y h , given earlier in (3.28b), is in matrix notation:

(6.83) Y h X h X ′ X −1 X h X ′h 2 bX h

2 2 ′

matrix of the regression coefficients

in (6.78) Therefore 2 Y h is a function of variances 2 b o , 2 b 1 and

of the covariance b o , b 1 .

The estimated variance of Y h , given earlier in (3.30), is in matrix notation:

(6.84) s 2 Y h MSEX ′h X ′ X −1 X h X ′h s 2 bX h

where s b is the estimated variance-covariance matrix of the regression

2

coefficients in (6.79).

48

The estimated variance s 2 Y hnew given earlier in (3.37), is in matrix notation:

(6.85) s 2 Y hnew MSE s 2 Y h MSE X ′h s 2 bX h

MSE1 X ′h X ′ X −1 X h

Example:

1) For data (examples previously considered)

−1

1 30

1 20

1 60

1 80

1 1 1 1 1 1 1 1 1 1 1 40

X ′ X −1

30 20 60 80 40 50 60 30 70 60 1 50

1 60

1 30

1 70

1 60

−1

10 500

71

85

− 681 . 835 29 −0. 1470 588

500 28 400 − 68

1 1

3400

−0. 1470 588 0. 002 941 2

We found earlier that MSE 7. 5

Hence

s 2 b MSEX ′ X −1

. 835 29 −0. 1470 588 6. 264 706 −0. 1102941

7. 5

−0. 1470 588 0. 002 941 2 −0. 1102941 0. 0022059

Thus

s 2 b o 6. 264706 and s 2 b

1 0. 002206

For the same data we will find s Y h when X h 55.

2

6. 264 706 −0. 1102941 1

s 2 Y 55 X ′h s 2 bX h 1 55 0. 8052

−0. 1102941 0. 0022059 55

The regression results for the weighted least squares can be stated in matrix algebra

using the following notation:

w1 0 . . . 0

0 wi . . . 0

(6.88) W

nn : : : :

0 0 . . . wn

is the diagonal matrix containing the weights w i .

The weighted normal equations (5.32) can be rewritten as

(6.89) X ′ WXb X ′ WY

and the weighted least squares estimators are:

49

(6.90) b X ′ WX −1 X ′ WY

21

The variance-covariance matrix of the weighted least squares estimators is:

(6.91) 2 b 2 X ′ WX −1

22

and the estimated variance-covariance matrix is:

(6.92) s 2 b MSE w X ′ WX −1

22

where MSE w is based on the weighted

squared deviations:

∑ w i Y i −Y i 2

(6.92a) MSE w n−2

Residuals

For the analysis of residuals, it will be useful to recognize that each residual e i

can be expressed as a linear combination of the observations Y i . It can be shown

that e defined in (6.65), equals:

(6.93) e I − H Y

n1 nn nn n1

where

(6.93a) H XX ′ X −1 X ′

nn

Note from (6.76c) that the matrix I − H is the matrix of the quadratic form

(6.75c) for SSE ∑ e 2i .

The square nn matrix H is called the hat matrix and plays an important

role in regression analysis.

The variance covariance matrix of e can be derived by means of (6.49)

2 W 2 AY A 2 YA ′

Since e I − HY then

2 e I − H 2 YI − H ′

Now 2 Y 2 2 I for the normal error model.

The matrix I − H has the special property. First it is symmetric and

I − HI − H I − H

Hence

2 e 2 I − HII − H 2 I − H.

PROBLEMS.

Question 1.

Assume that the normal regression model is applicable.

For the following data given by:

i 1 2 3 4 5

Xi 8 4 0 -4 -8

Y i 7.8 9 10.2 11 11.7

using matrix method find:

1) Y ′ Y 2) X ′ X 3) X ′ Y

4) b 5) ANOVA table

6) covariance-variance matrix s 2 b

Question 2.

Find the matrix A of the quadratic form:

3Y 21 10Y 1 Y 2 6Y 22

Question 3.

Assume that the normal regression model is applicable.

For the following data given by:

50

i 1 2 3 4 5 6

Xi 4 1 2 3 3 4

Y i 16 5 10 15 13 22

using matrix method find:

1) Y ′ Y 2) X ′ X 3) X ′ Y 4) b 5) ANOVA table

6) covariance-variance matrix s 2 b

7) Y 8) s 2 Y hnew when X h 3. 5

Question 4.

For the matrix

1 0 4

A 0 3 0

4 0 9

find the quadratic form of the observations Y 1 , Y 2 and Y 3 .

Question 5.

The results of a certain experiments are shown below

i 1 2 3 4 5 6 7 8 9

Xi 7 6 5 1 5 4 7 3 4

Y i 97 86 78 10 75 62 101 39 53

i 10 11 12 13 14 15 16 17 18

Xi 2 8 5 2 5 7 1 4 5

Y i 33 118 65 25 71 105 17 49 68

∑X i − X 2 74. 5, ∑Y i − Y X i − X 1098.

Find: 1) Y ′ Y 2) X ′ X 3) X ′ Y

4) b 5) ANOVA table

Question 6.

The results of a certain experiments are shown below

i 1 2 3 4 5 6 7 8 9 10

Xi 1 0 2 0 3 1 0 1 2 0

Y i 16 9 17 12 22 13 8 15 19 11

Summary calculational results are: ∑ X i 10, ∑ Y i 142, ∑ X 2i 20

∑ Y 2i 2194, ∑ X i Y i 182.

Find: 1) Y ′ Y 2) X ′ X 3) X ′ Y

4) b 5) ANOVA table

Question 7.

The following data were obtained in a certain study.

i 1 2 3 4 5 6 7 8 9 10 11 12

Xi 1 1 1 2 2 2 2 4 4 4 5 5

Y i 6.2 5.8 6 9.7 9.8 10.3 10.2 17.8 17.9 18.3 21.9 22.1

Summary calculational results are: ∑ X i 33, ∑ Y i 156, ∑ X 2i 117

∑ Y 2i 2448. 5, ∑ X i Y i 534.

Find: 1) Y ′ Y 2) X ′ X 3) X ′ Y

4) b 5) ANOVA table

51

Question 8.

Consider the simple linear regression model

Prove the following

SSR b ′ X ′ Y − 1n Y ′ 11 ′ Y

Question 9.

1) Define the quadratic form.

2) Find the matrix A of the quadratic form:

3Y 21 10Y 1 Y 2 6Y 22

3) Find the quadratic form for SSR.

Multiple regression analysis is one of the most widely used of all statistical

tools. In many practical situations a number of key independent variables

affects the response variable in important and distinctive way. Furthermore,

in such case one will find that predictions of the response variable based on

the model containing only a single independent variable are to imprecise to be

useful. A more complex model, containing additional independent variables,

is more helpful in providing sufficiently precise predictions of the response

variable.

(7.1) Y i o 1 X i,1 2 X i,2 i

is called a first-order model with two independent variables. It is linear in

parameters and linear in the independent variables. Y i denotes the response in the

i − th trial, and X i,1 and X i,2 are the values of the two independent variables in the

i − th trial. The parameters of the model are o , 1 and 2 , and the random error

term i . Assuming that E i 0, the regression function for the model (7.1) is:

(7.2) EY o 1 X 1 2 X 2

Analogous to simple linear regression, where the regression function EY o 1 X

is a line, the regression function (7.2) is a plane:

Note that a point on the response plane corresponds to the mean response EY

at the given combination of levels of X 1 and X 2 . Figure above also shows

a series of observations Y i corresponding to given levels of the two independent

variables X i,1 , X i,2 . Note that each vertical rule in picture above represents the

difference between Y i and the mean EY i . Hence, the vertical distance from Y i to

the response plane represents the error term i Y i − EY i . Frequently the regression

function in multiple regression is called a regression surface or a response surface.

52

Meaning of regression coefficients.

function (7.2). The parameter o is the Y intercept of the regression plane. If the

scope of the model includes X 1 0, X 2 0, o gives the mean response at X 1 0,

X 2 0. Otherwise o does not have any particular meaning as a separate

term in regression model. The parameter 1 indicates the change in the mean in the

response per unit increase in X 1 when X 2 is held constant. Likewise, 2 indicates the

change in the mean in the response per unit increase in X 2 when X 1 is held constant.

The parameters 1 and 2 are frequently called partial regression coefficients

because they reflect the partial effect of one of independent variables when the other

independent variable is included in the model and is held constant. We can readily

establish the meaning of 1 and 2 by calculus, taking partial derivatives of the

response surface (7.2) with respect to X 1 and X 2 in turn:

∂EY ∂EY

∂X 1

1 ∂X 2

2.

We consider now the case where there are p − 1 independent variables X 1 , . . . X p−1 .

The model

(7.5) Y i o 1 X i,1 2 X i,2 . . . p−1 X i,p−1 i

is called a first-order model with p − 1 independent variables. It can also be written:

p−1

(7.5a) Y i o ∑ k X i,k i

k1

Assuming that E i 0, the response function model for model (7.5) is:

(7.6) EY i o 1 X i,1 2 X i,2 . . . p−1 X i,p−1

This response function is a hyperplane, which is a plane in more than two

dimensions.

represent different independent variables. The general linear regression model

is:

(7.7) Y i o 1 X i,1 2 X i,2 . . . p−1 X i,p−1 i

where:

o , 1 , 2 , . . . , p−1 are parameters

X i,1 , X i,2 , . . . X i,p−1 are known constants

i are independent N0, 2

i 1, 2, . . . n

The response function for model (7.7) is

(7.8) EY i o 1 X i,1 2 X i,2 . . . p−1 X i,p−1

This implies that the observations Y i are independent normal variables with mean EY i )

given by (7.8) and with constant variances 2 .

It is a remarkable property of matrix algebra that the results for the general

linear regression model (7.7) appear exactly the same in matrix notation

as those for the simple linear regression model (6.56).

To express the general linear regression model in matrix terms, we need to

53

define the following matrices:

Y1

Y2

(7.17a) Y

n1 :

Yn

1 X 1,1 X 1,2 . . . X 1,p−1

1 X 2,1 X 2,2 . . . X 2,p−1

(7.17b) X

np : : : ... :

1 X n,1 X n,2 . . . X n,p−1

o

1

(7.17c)

p1 :

p−1

1

2

(7.17d)

n1 :

n

Note that Y and vectors are the same as for simple regression. The vector

contains additional regression parameters, and X matrix contains a column of 1’s

as well as a column of the n values for the each of the p − 1 X variables in the

regression model. The row subscript for each element X i,k in the X matrix identifies

the trial, and the column subscript identifies the X variable.

In matrix terms, the general linear regression model (7.7) is:

(7.18) YX

n1 npp1 n1

where:

Y is a vector of observations

is a vector of parameters

X is a matrix of constants

is a vector of independent normal random variables with expectation

E 0 and variance-covariance matrix 2 2 I

Consequently, the random vector Y has expectation

(7.18a) EY X

and the variance-covariance matrix of Y is:

(7.18b) 2 Y 2 I

as b

bo

b1

(7.19) b

p1 :

b p−1

54

The least squares normal equations for the general linear regression model (7.18)

are:

(7.20) X ′ X b X ′ Y

pp p1 pn n1

and the least squares estimators are:

(7.21) b X ′ X −1 X ′ Y

p1 pp p1

For the model (7.18), these least squares estimators are also maximum likelihood

estimators and have all the properties stated before: they are unbiased, minimum

variance unbiased, and sufficient.

Let the vector of thefitted values Y i be denoted by Yand the vector of the

residual terms e i Y i − Y i be denoted by e :

Y1

Y2

(7.22a) Y

n1 :

Yn

e1

e2

(7.22b) e

n1 :

en

The fitted values are represented by

(7.23) Y Xb

and the residual terms by:

(7.24) e Y −Y Y − Xb

(7.25) SSTO Y ′ Y − 1n Y ′ 11 ′ Y

(7.26) SSR b ′ X ′ Y − 1n Y ′ 11 ′ Y

(7.27) SSE e ′ e Y − Xb ′ Y − Xb Y ′ Y − b ′ X ′ Y

where 1 is an n 1 vector of 1’s.

SSTO, as usual, has n − 1 degrees of freedom associated with it. SSE has n − p degrees

of freedom associated with it since p parameters need to be estimated in the regression

function for the model (7.18). Finally, SSR has p − 1degrees of freedom associated with

it, representing the number of X variables X 1 , . . . , X p−1 .

(7.28) MSR SSR p−1

(7.29) MSE SSE n−p

The expectation of MSE is 2 , as for simple regression. If p − 1 2 we have

EMSR 2 21 ∑X i,1 − X 1 2 22 ∑X i,2 − X 2 2 2 1 2 ∑X i,1 − X 1 X i,2 − X 2 /2

Note that if both 1 and 2 equal zero, EMSR 2 . Otherwise EMSR 2 .

55

Source of variation SS df MS

Regression SSR b ′ X ′ Y − 1n Y ′ 11 ′ Y p − 1 MSR SSR

p−1

n−p

and the set of X variables X 1 , . . . , X p−1 we test the following alternatives:

H o : 1 2 . . . p−1 0

(7.30a)

H a : not all k (k 1, 2, . . , p − 1 equal to zero

We use the test statistics

(7.30b) F ∗ MSR

MSE

The decision rule is

If F ∗ F1 − , p − 1, n − p, conclude H o

(7.30c)

If F ∗ F1 − , p − 1, n − p, conclude H a

Note that when p − 1 1, this test reduces to the F test in (3.61).

follows:

(7.31) R 2 SSTO

SSR

1 − SSTO

SSE

the use of the set of X variables X 1 , . . . , X p−1 . Just as for r 2 we have

(7.32) 0 R2 1

R 2 assumes the value 0 when all b k 0 k 1, . . . , p − 1 and takes on the

value 1 when all observations fall directly on the fitted response surface,

(Y i Y i for all i.

(7.34) R R2

(7.35) Eb

The variance-covariance matrix 2 b :

2 b 0 b 0 , b 1 . . . b 0 , b p−1

b 1 , b 0 2 b 1 . . . b 1 , b p−1

(7.36) 2 b

: : ... :

b p−1 , b 0 b p−1 , b 1 . . . b p−1

2

is given by:

(7.37) 2 b 2 X ′ X −1

The estimated variance-covariance matrix s 2 b :

56

s 2 b 0 sb 0 , b 1 . . . sb 0 , b p−1

sb 1 , b 0 s 2 b 1 . . . sb 1 , b p−1

(7.38) s b

2

: : ... :

sb p−1 , b 0 sb p−1 , b 1 . . . s 2 b p−1

is given by

(7.39) s 2 b MSEX ′ X −1

From s 2 b, one can obtain s 2 b 0 , s 2 b 1 or whatever other variance is needed, or any

needed covariance.

Interval estimation of k

b k − k

(7.40) sb k

tn − p k 0, 1, 2, . . . , p − 1

Hence the confidence limits for k with 1 − confidence coefficient are:

(7.41) b k t1 − /2, n − psb k

Let us notice that sb k we can have using (7.38) and (7.39)

Tests for k

Ho : k 0

(7.42a)

Ha : k ≠ 0

we may use the test statistics

(7.42b) t ∗ sbb k

k

and the decision rule is

If |t ∗ | t1 − /2, n − p, conclude H o

(7.42c)

If |t ∗ | t1 − /2, , n − p, conclude H a

Number of degrees of freedom is n − p.

As with simple regression, to test whether or not k 0 in multiple

regression models can also be conducted by means of an F test.

Joint inferences

The boundary of the joint confidence region for all p of the k regression parameters

(k 0, 1, . . . , p − 1) with confidence coefficient 1 − is:

b− ′ X ′ Xb−

(7.43) pMSE

F1 − , p, n − p

The region defined by this boundary is generally difficult to obtain and interpret.

The Bonferroni joint confidence intervals, are easy to obtain and interpret.

If g parameters are to be estimated jointly (where g p), the confidence

limits with family confidence coefficient 1 − are:

(7.44) b k Bsb k

where

(7.44a) B t1 − /2g, n − p

57

For given values of X 1 , X 2 , . . . , X p denoted by X h,1 , X h,2 , . . . , X h,p , the mean

response is denoted by EY h . We define the vector X h :

1

X h,1

(7.45) Xh X h,2

:

X h,p−1

so the mean response to be estimated is:

(7.46) EY h X ′h

The estimated

mean response corresponding to X h denoted by Y h , is:

(7.47) Y h X ′h b

This estimator is unbiased:

(7.48) EY h EX ′h b X ′h Eb X ′h EY h

and its variance is: 2 ′ ′ −1

(7.49) Y h X h X X X h X ′h 2 bX h

2

Note that the variance 2 Y h is a function of the variances 2 b k of the regression

coefficients and of the covariances b k , b j between pairs of the regression

coefficients, just

as in simple regression. The estimated variance s 2

Y h is given by

(7.50) s 2 Y h MSEX ′h X ′ X −1 X h X ′h s 2 bX h

The 1 − confidence

h are:

limits for EY

(7.51) Y h t1 − /2; n − psY h

(7.54) EY 0 1 X 1 . . . p−1 X p−1

is an appropriate response surface for the data at hand requires repeat observations,

as for simple regression analysis. Thus, with two independent variables repeat

observations require that X 1 and X 2 each remain at given levels from trial to trial.

The procedures described in case of simple linear regression of F test for lack of fit

are applicable to multiple regression. Once ANOVA table has been calculated,

SSE is decomposed into pure error and lack of fit component. The pure error of

squares SSPE is obtained by first calculating for each replicate group the sum

of squared deviations of Y observations around the group mean, where a replicate

group has the same values for the X 1 , X 2 . . . . X p−1 variables. Suppose that there are c

replicate groups with distinct sets of levels for X variables, and let the mean

observation of the Y for the jth group be denoted by Y j . Then the sum of squares for

the jth group is given by (4.8), and the pure error sum of squares is the sum of these

sums, as given in (4.9). The lack of fit sum of squares SSLF SSE − SSPE as

indicated in (4.12). The number of degrees of freedom associated with SSPE

is n − c, and the number of degrees of freedom associated with SSLF is

n − p − n − c c − p.

The F test is conducted as described in (4.15), but with the degrees of freedom modified

to those just stated.

Therefore:

MSPE SSPE n−c

The second component of SSE is:

SSLF SSE − SSPE

where SSLF denotes lack of fit sum of squares.

Thus, the lack of fit mean square is

58

MSLF SSLF c−p

Test statistics

F ∗ MSPE

MSLF

The hypothesis

H o: EY o 1 X 1 . . . p−1 X p−1

H a: EY ≠ o 1 X 1 . . . p−1 X p−1

The decision rule

If F ∗ F1 − ; c − p, n − c conclude H o

If F ∗ F1 − ; c − p, n − c conclude H a

Y hnew corresponding to X h , the specified values of the X Variables, are:

(7.55) Y h t1 − /2, n − psY hnew

where

(7.55a) s 2 Y hnew MSE X ′h s 2 bX h MSE1 X ′h X ′ X −1 X h

to be predicted,

the 1 − prediction limits are:

(7.56) Y h t1 − /2, n − ps Y hnew

where:

(7.56a) s 2 Y hnew MSE

m s Y h

2

MSE

m X ′h s 2 bX h MSE m1 X ′h X ′ X −1 X h

of X h withfamily confidence 1 − are given by:

(7.57) Y h SsY hnew

where

(7.57a) S 2 gF1 − ; g; n − p

and s 2 Y hnew is given by (7.55a)

Alternatively, the Bonferroni simultaneous prediction limits can be used.

For g predictions

with 1 − family confidence coefficient, they are:

(7.58) Y h BsY hnew

where

(7.58a) B t1 − /2g; n − p

Q ∑Y i − b o − b 1 X i,1 − b 2 X i,2 2

∂Q

∂b o

∂b∂ o ∑ i Y i − b o − b 1 X i,1 − b 2 X i,2 2 2 ∑ i −Y i b o b 1 X i,1 b 2 X i,2

59

So

∂Q

∂b o

0 2 ∑ i −Y i b o b 1 X i,1 b 2 X i,2 0

Therefore first normal equation is

nb o b 1 ∑ X i,1 b 2 ∑ X i,2 ∑ Y i

∂Q

∂b 1

∂b∂ 1 ∑ i Y i − b o − b 1 X i,1 − b 2 X i,2 2 2 ∑ i −X i,1 Y i X i,1 b o b 1 X 2i,1 X i,1 b 2 X i,2

Next

∂Q

∂b 1

0 ∑ i −X i,1 Y i X i,1 b o b 1 X 2i,1 X i,1 b 2 X i,2 0

Hence the second normal equation is

b o ∑ X i,1 b 1 ∑ X 2i,1 b 2 ∑ X i,1 X i,2 ∑ X i,1 Y i

Finally

∂Q

∂b 2

∂b∂ 2 ∑ i Y i − b o − b 1 X i,1 − b 2 X i,2 2 2 ∑ i −X i,2 Y i X i,2 b o X i,2 b 1 X i,1 b 2 X 2i,2

So

∂Q

∂b 2

0 ∑ i −X i,2 Y i X i,2 b o X i,2 b 1 X i,1 b 2 X 2i,2 0

and the third normal equation is

b o ∑ X i,2 b 1 ∑ X i,1 X i,2 b 2 ∑ X 2i,2 ∑ X i,1 Y i

EXAMPLE 1:

Let us consider the following data:

i Y i X i,1 X i,2

1 162 274 2450

2 120 180 3254

3 223 375 3802

4 131 205 2838

5 67 86 2347

6 169 265 3782

7 81 98 3008

8 192 330 2450

9 116 195 2137

10 55 53 2560

11 252 430 4020

12 232 372 4427

13 144 236 2660

14 103 157 2088

15 212 370 2605

The linear model in use:

(7.59) Y i o 1 X i,1 2 X i,2 i

Basic calculations

60

162 1 274 2450

120 1 180 3254

223 1 375 3802

131 1 205 2838

67 1 86 2347

169 1 265 3782

81 1 98 3008

Y 192 X 1 330 2450

116 1 195 2137

55 1 53 2560

252 1 430 4020

232 1 372 4427

144 1 236 2660

103 1 157 2088

212 1 370 2605

′

1 274 2450 1 274 2450

1 180 3254 1 180 3254

1 375 3802 1 375 3802

1 205 2838 1 205 2838

1 86 2347 1 86 2347

1 265 3782 1 265 3782

1 98 3008 1 98 3008

′

1) X X 1 330 2450 1 330 2450

1 195 2137 1 195 2137

1 53 2560 1 53 2560

1 430 4020 1 430 4020

1 372 4427 1 372 4427

1 236 2660 1 236 2660

1 157 2088 1 157 2088

1 370 2605 1 370 2605

15 3626 44 428

3626 1067 614 11 419 181

44 428 11 419 181 139 063 428

Let us consider the following data:

61

′

1 274 2450 162

1 180 3254 120

1 375 3802 223

1 205 2838 131

1 86 2347 67

1 265 3782 169

1 98 3008 81 2259

′

2) X Y 1 330 2450 192 647 107

1 195 2137 116 7096 619

1 53 2560 55

1 430 4020 252

1 372 4427 232

1 236 2660 144

1 157 2088 103

1 370 2605 212

−1

15 3626 44 428

3) X ′ X −1 3626 1067 614 11 419 181

44 428 11 419 181 139 063 428

1. 246 348 416 2. 129 664 176 10 −4 −4. 156 712 541 10 −4

2. 129 664 176 10 −4 7. 732 903 033 10 −6 −7. 030 251 792 10 −7

−4. 156 712 541 10 −4 −7. 030 251 792 10 −7 1. 977 185 133 10 −7

Algebraic equivalents

Note that X ′ X is

(7.63)

1 X 1,1 X 1,2

1 1 ... 1 n ∑ X i,1 ∑ X i,2

1 X 2,1 X 2,2

X′X X 1,1 X 2,1 . . . X 2,2 ∑ X i,1 ∑ X 2i,1 ∑ X i,1 X i,2

: : :

X 1,2 X 2,2 . . . X n,2 ∑ X i,2 ∑ X i,1 X i,2 ∑ X 2i,2

1 X n,1 X n,2

So in our case

15 3626 44 428 n ∑ X i,1 ∑ X i,2

3626 1067 614 11 419 181 ∑ X i,1 ∑ X 2i,1 ∑ X i,1 X i,2

44 428 11 419 181 139 063 428 ∑ X i,2 ∑ X i,1 X i,2 ∑ X 2i,2

Also

62

Y1

1 1 ... 1 ∑ Yi

Y2

(7.64) ′

XY X 1,1 X 2,1 . . . X 2,2 ∑ Y i X i,1

:

X 1,2 X 2,2 . . . X n,2 ∑ Y i X i,2

Yn

hence in our case

2259 ∑ Yi

647 107 ∑ Y i X i,1

7096 619 ∑ Y i X i,2

b X ′ X −1 X ′ Y

1. 246 348 416 2. 129 664 176 10 −4 −4. 156 712 541 10 −4 2259

2. 129 664 176 10 −4 7. 732 903 033 10 −6 −7. 030 251 792 10 −7 647 107

−4. 156 712 541 10 −4 −7. 030 251 792 10 −7 1. 977 185 133 10 −7 7096 619

3. 452 611 738 bo

. 496 004 976 1 b1

9. 199 080 488 10 −3 b2

and the

estimated function is

Y 3. 453 0. 496X 1 0. 00920X 2

Fitted values

1 180 3254 122. 667 315 3

1 375 3802 224. 429 381 8

1 205 2838 131. 240 622 3

1 86 2347 67. 699 281 59

1 265 3782 169. 684 852 8

1 98 3008 3. 452 611 738 79. 731 933 5

Y Xb 1 330 2450 . 496 004 976 1 189. 672 001

1 195 2137 9. 199 080 488 10 −3 119. 832 017 1

1 53 2560 53. 290 521 52

1 430 4020 253. 715 055

1 372 4427 228. 690 792 2

1 236 2660 144. 979 340 2

1 157 2088 100. 533 073

1 370 2605 210. 938 057 6

Also we find

63

162 161. 895 722 4 0. 104 277 6

120 122. 667 315 3 −2. 667 315 3

223 224. 429 381 8 −1. 429 381 8

131 131. 240 622 3 −0. 240 622 3

67 67. 699 281 59 −0. 699 281 59

169 169. 684 852 8 −0. 684 852 8

81 79. 731 933 5 1. 268 066 5

e Y −Y 192 − 189. 672 001 2. 327 999

116 119. 832 017 1 −3. 832 017 1

55 53. 290 521 52 1. 709 478 48

252 253. 715 055 −1. 715 055

232 228. 690 792 2 3. 309 207 8

144 144. 979 340 2 −0. 979 340 2

103 100. 533 073 2. 466 927

212 210. 938 057 6 1. 061 942 4

Analysis of variance

using (7.25)-(7.29). The basic quantities needed are

′

162 162

120 120

223 223

131 131

67 67

169 169

81 81

′

YY 192 192 394 107

116 116

55 55

252 252

232 232

144 144

103 103

212 212

64

′ ′

162 1 1 162

120 1 1 120

223 1 1 223

131 1 1 131

67 1 1 67

169 1 1 169

81 1 1 81

′ ′

Y 11 Y

1

n

1

15 192 1 1 192 340 205. 4

116 1 1 116

55 1 1 55

252 1 1 252

232 1 1 232

144 1 1 144

103 1 1 103

212 1 1 212

Thus

SSTO Y ′ Y − 1n Y ′ 11 ′ Y 394 107 − 340 205. 4 53901. 6

and

′

3. 452 611 738 2259

′ ′ ′

SSE Y Y − b X Y 394 107 − . 496 004 976 1 647 107

9. 199 080 488 10 −3 7096 619

56. 888 641 05

Finally we have

SSR SSTO − SSE 53901. 6 − 56. 884 53844. 716

The ANOVA table

Source of variation SS df MS

Regression SSR 53844. 716 p − 1 2 MSR SSR

p−1

26922. 358

Error SSE 56. 844 n − p 12 MSE SSE

n−p 4. 740

Total SSTO 53901. 6 n − 1 14

The hypothesis

Ho : 1 2 0

H a : not both 1 and 2 are equal to zero

We use the test statistics

F ∗ MSR

MSE

26922.358

4.740

5678

The decision rule is

If F ∗ F1 − , p − 1, n − p, conclude H o

If F ∗ F1 − , p − 1, n − p, conclude H a

Assuming that 0. 05 from table we get

65

F1 − , p − 1, n − p F0. 95; 2; 12 3. 89

Since F ∗ 56878 F0. 95; 2; 12 3. 89

we conclude H a , that our Y is related to X 1 and X 2 .

R 2 SSTO

SSR

53844.716

53901.6

0. 998 944 669 5

Thus, when our X 1 and X 2 are considered, the variation in Y

is reduced by 99.9 percent.

variables is:

∑ Yi

(7.66) SSE Y ′ Y − b ′ X ′ Y ∑ Y 2i − bo b1 b2 ∑ Y i X i,1

∑ Y i X i,2

∑ Y 2i − b o ∑ Y i − b 1 ∑ Y i X i,1 − b 2 ∑ Y i X i,2

Let us use 0. 1

First we need to estimate s 2 b :

(7.67) s 2 b MSEX ′ X −1

1. 246 348 416 2. 129 664 176 10 −4 −4. 156 712 541 10 −4

4. 740 2. 129 664 176 10 −4 7. 732 903 033 10 −6 −7. 030 251 792 10 −7

−4. 156 712 541 10 −4 −7. 030 251 792 10 −7 1. 977 185 133 10 −7

5. 907 462 . 00 100 947 78 −. 00 197 027 58

. 00 100 947 78 3. 665 394 6 10 −5

−3. 332 362 2 10 −6

−. 00 197 027 58 −3. 332 362 2 10 −6 9. 371 928 10 −7

The two elements we require are:

s 2 b 1 3. 665 394 6 10 −5 or sb 1 0. 006054 250 243

s 2 b 2 9. 371 928 10 −7 or sb 2 0. 0009680 871 862

Next for g 2 from the table

B t1 − 0. 1/2 2; 12 t0. 975; 12 2. 179

and finally for 1

0. 4960 − 2. 179 0. 006054 1 0. 4960 2. 179 0. 006054

or

0. 483 1 0. 509

and for 2 simultaneously

0. 009199 − 2. 179 0. 0009681 2 0. 009199 2. 179 0. 0009681

or

0. 0071 2 0. 0113

With the confidence coefficient 0.90 we conclude that 1 falls

between 0. 483 and 0. 509 and that 2 falls between 0. 0071 and 0. 0113.

66

Suppose that we would like to estimate the expected (mean) value for Y

when X h,1 220 and X h,2 2500.

We define

1

Xh 220

2500

The point estimate of mean for Y is by (7.47)

3. 452 611 738

Y h X ′h b 1 220 2500 . 496 004 976 1 135. 571 407 7

9. 199 080 488 10 −3

Theestimated variance by (7.50) and using the results in (7.67) is:

s 2 Y h X ′h s 2 bX h 0. 46638

and

sY h 0. 68292

Assume that the confidence coefficient for the interval estimate of EY h is

to be 0,95. We then need t1 − /2; n − p t0. 975; 12 2. 179 and

135. 57 − 2. 179 0. 68292 EY h 135. 57 2. 179 0. 68292

so

134. 1 EY h 137. 1

Thus, with confidence coefficient 0.95 we estimate that the mean Y

at levels X 1 220 and X 2 2500 is somewhere between 134.1 and 137.1.

Prediction limits for new observations.

Suppose that we would like to predict Y in two cases of levels of independent ones.

A B

X h,1 220 375

X h,2 2500 3500

In this case g 2. To determine which simultaneous prediction intervals are best

here, we shall find S as given in (7.57a) and B as given in (7.58a) assuming

the confidence coefficient 0.90.

S 2 gF1 − ; g; n − p 2F0. 90; 2; 12 2 2. 81 5. 62

so

S 5. 62 2. 370 7

and

B t1 − /2g; n − p t0. 975; 12 2. 179

Hence, the Bonferroni limits are more efficient here.(They give shorter intervals)

For explanatory variables level A we have

1

XA 220

2500

The point estimate of mean for Y is by (7.47)

3. 452 611 738

′

Y A X A b 1 220 2500 . 496 004 976 1 135. 571 407 7

9. 199 080 488 10 −3

and

s 2 Y A X ′A s 2 bX A 0. 46638 and MSE 4. 7403

67

Hence by (7.55a)

s 2 Y Anew MSE s 2 Y A 4. 7403 0. 46638 5. 206 7

or

sY Anew 2. 28182

1

XB 375

3500

The point estimate of mean for Y is by (7.47)

3. 452 611 738

Y B X ′B b 1 375 3500 . 496 004 976 1 221. 65

9. 199 080 488 10 −3

and

s 2 Y B X ′B s 2 bX B

1 375 3500

5. 907 462 . 00 100 947 78 −. 00 197 027 58 1

. 00 100 947 78 3. 665 394 6 10 −5 −3. 332 362 2 10 −6 375

−. 00 197 027 58 −3. 332 362 2 10 −6 9. 371 928 10 −7 3500

0. 760 26

Hence

s 2 Y Bnew MSE s 2 Y B 4. 7403 0. 760 26 5. 500 6

and

sY Bnew 5. 500 6 2. 345 3

We found before that B 2. 179. The simultaneous Bonferroni prediction

intervals

with confidence coefficient 0.90 are:

Y h BsY hnew so

135. 57 − 2. 179 2. 28182 Y Anew 135. 57 2. 179 2. 28182

221. 65 − 2. 179 2. 34536 Y Bnew 221. 65 − 2. 179 2. 34536

or

130. 6 Y Anew 140. 5

216. 5 Y Bnew 226. 8

comparisons between regression coefficients. It is difficult to compare regression

coefficients because of differences in the unit involved. When considering the fitted

response

function:

Y 200 20000X 1 0. 2X 2

one may be tempted to conclude that X 1 is the only important independent variable

and that X 2 has little effect on the dependent variable .

Suppose that the units are:

Y in rands

X 1 in thousands rands

X 2 in cents

In that event, the effect on the mean response of a 1000R increase in X 1 when

X 2 is constant would be exactly the same as the effect of a 1000R increase in X 2

when X 1 is constant.

68

Standardized regression coefficients, also called beta coefficients,

are defined as follows:

1

∑Xi,k−X k2 2

∑X i,k −X k 2

1

2

sk

(7.69) Bk bk bk n−1

bk

sy

∑Yi− Y 2 ∑Y i − Y 2

n−1

where s k and s y are the standard deviations of the X k and Y observations,

respectively.

For our example

1

B 1 0. 496 191089

53902

2

0. 933 89

1

B 2 0. 00920 7473616

53902

2

0. 108 33

PROBLEMS

Question 1.

The following data were obtained in certain experiment:

i Y i X i,1 X i,2

1 64 4 2

2 73 4 4

3 61 4 2

4 76 4 4

5 72 6 2

6 80 6 4

7 71 6 2

8 83 6 4

9 83 8 2

10 89 8 4

11 86 8 2

12 93 8 4

13 88 10 2

14 95 10 4

15 94 10 2

16 100 10 4

Assume that regression model (7.1) with independent normal errors is appropriate.

1) Find the estimated regression coefficients.

2)Test whether there is a regression relation using 0. 01.

3) Estimate 1 and 2 jointly by the Bonferroni procedure using

99 percent family confidence coefficient.

4) Obtain an interval estimate of EY h when X h,1 5 and X h,2 4.

Use a 99 percent confidence coefficient.

5) Obtain an ANOVA table and use it to test whether there is a regression

relation using 0. 01.

6) Obtain the residuals.

Question 2.

The following data were obtained in certain experiment:

69

i Yi X i,1 X i,2 i Yi X i,1 X i,2

1 58 7 5.11 11 121 17 11.02

2 152 18 16.72 12 112 12 9.51

3 41 5 3.20 13 50 6 3.79

4 93 14 7.03 14 82 12 6.45

5 101 11 10.98 15 48 8 4.60

6 38 5 4.04 16 127 15 13.86

7 203 23 22.07 17 140 17 13.03

8 78 9 7.03 18 155 21 15.21

9 117 16 10.62 19 39 6 3.64

10 44 5 4.76 20 90 11 9.57

1) Find the estimated regression coefficients.

2)Test whether there is a regression relation using 0. 01.

3) Estimate 1 and 2 jointly by the Bonferroni procedure using

99 percent family confidence coefficient.

4) Obtain an interval estimate of EY h when X h,1 5 and X h,2 3. 2.

Use a 99 percent confidence coefficient.

5) Obtain an ANOVA table and use it to test whether there is a regression

relation using 0. 01.

6) Obtain the residuals.

7) Calculate the coefficient of multiple determination R 2 .

8) Obtain the simultaneous interval estimates for five levels of X :

1 2 3 4 5

X1 5 6 10 14 20

X 2 3.2 4.8 7.0 10.0 18.0

using 95 percent confidence coefficient.

Question 3.

Consider the multiple regression model:

Y i 1 X i,1 2 X i,2 i

where i are independent normally distributed random errors

with N0, 2 .

1) Derive the least squares estimators for 1 and 2

2) Obtain the maximum likelihood estimators for 1 and 2 .

Question 4.

A pharmaceutical company testing a new pain-killing drug tests the drug on 20 people

suffering from arthritis. The time elapsed, in minutes, from taking the drug until noticeable

relief in pain is detected, is to be predicted from dosage (in grams) and the age

of patient (in years). The results are given below

70

i Time (Y i ) Dosage (X i,1 ) Age (X i,2 )

1 11 2 59

2 3 2 57

3 20 2 22

4 25 2 12

5 27 2 18

6 15 5 40

7 10 5 64

8 34 5 27

9 14 5 54

10 34 5 22

11 35 7 33

12 28 7 49

13 23 7 29

14 21 7 32

15 33 7 20

16 27 10 43

17 8 10 61

18 3 10 69

19 12 10 62

20 14 10 61

1) Find the estimated regression coefficients.

2)Test whether there is a regression relation using 0. 01.

3) Estimate 1 and 2 jointly by the Bonferroni procedure using

99 percent family confidence coefficient.

4) Obtain an interval estimate of EY h when X h,1 6 and X h,2 45.

Use a 99 percent confidence coefficient.

5) Obtain an ANOVA table and use it to test whether there is a regression

relation using 0. 01.

6) Obtain the residuals.

7) Calculate the coefficient of multiple determination R 2 .

8) Obtain the simultaneous interval estimates for five levels of X :

1 2 3 4 5

X1 5 6 2 4 5

X 2 41 39 52 61 75

using 95 percent confidence coefficient.

Question 5.

A large discount department store chain advertises on television (X 1 ),

on the radio (X 2 ), and in newspapers (X 3 ). A sample of 12 of its stores in

a certain area showed the following advertising expenditures and revenues

during a given month. ( All figures are in thousands of rands)

71

i Revenues (Y i ) X i,1 X i,2 X i,3

1 84 13 5 2

2 84 13 7 1

3 80 8 6 3

4 50 9 5 3

5 20 9 3 1

6 68 13 5 1

7 34 12 7 2

8 30 10 3 2

9 54 8 5 2

10 40 10 5 3

11 57 5 6 2

12 46 5 7 2

1) Find the estimated regression coefficients.

2)Test whether there is a regression relation using 0. 01.

3) Estimate 1 , 2 and 3 jointly by the Bonferroni procedure using

99 percent family confidence coefficient.

4) Obtain an interval estimate of EY h when X h,1 11, X h,2 6 and X h,3 2.

Use a 99 percent confidence coefficient.

5) Obtain an ANOVA table and use it to test whether there is a regression

relation using 0. 01.

6) Obtain the residuals.

7) Calculate the coefficient of multiple determination R 2 .

8) Obtain the simultaneous interval estimates for two levels of X :

i 1 2

X 1 11 15

X2 7 9

X3 2 3

using 95 percent confidence coefficient.

Question 6.

Assume that the normal regression model is applicable.

For the following data given by:

i 1 2 3 4 5

Xi 8 4 0 -4 -8

Y i 7.8 9 10.2 11 11.7

using matrix method find:

1) Y ′ Y

2) X ′ X

3) X ′ Y

4) b

5)Test the H o : 1 0 versus H a : 1 ≠ 0 using ANOVA.

with 0. 05

6) covariance-variance matrix s 2 b

72

MULTICOLLINEARITY AND ITS EFFECTS

In multiple regression analysis the relation between the independent variables and

the dependent one is of prime interest. Questions that are frequently asked include:

1. What is the relative importance of the effects of the different independent variables ?

2. What is the magnitude of the effect of a given independent variable on the dependent

one?

3. Can any independent variable be dropped from the model because it has little or no

effect on the dependent one ?

4. Should any independent variables not yet included in the model be considered

for possible inclusion ?

If the independent variables included in the model are uncorrelated among themselves

and uncorrelated with any other independent variables that are related to the dependent

variable but omitted from the model, relatively simple answers can be given.

Unfortunately, in many situations the independent variables tend to be correlated

among themselves and with other variables that are related to the dependent variable but

are not included in the model. When the independent variables are correlated among

themselves, intercorrelation or multicollinearity among them is said to exist.

Table below contains data for a small-scale experiment on the effect of crew size X 1

and level of bonus pay X 2 on crew productivity score Y. It is easy to show that X 1

and X 2 are uncorrelated here, that is r 212 0, where r 212 denotes the coefficient of

simple determination between X 1 and X 2 . We will use the notation SSRX 1 , X 2

and SSEX 1 , X 2 to indicate explicitly the two independent variables in the model.

SSRX 1 and SSEX 1 to show that only one independent variable X 1 is in model

(case of simple linear regression) and SSRX 2 and SSEX 2 in case of X 2 only.

Trial i X i,1 X i,2 Y i

1 4 2 42

2 4 2 39

3 4 3 48

4 4 3 51

5 6 2 49

6 6 2 53

7 6 3 61

8 6 3 60

ANOVA tables I

a) Regression of Y on X 1 and X 2

Y 0. 375 5. 375X 1 9. 250X 2

73

Source of variation SS df MS

Regression SSRX 1 , X 2 402. 250 2 MSRX 1 , X 2 201. 125

Error SSEX 1 , X 2 17. 625 5 MSEX 1 , X 2 3. 525

Total SSTO 419. 875 7

b) Regression

of Y on X 1

Y 23. 500 5. 375X 1

Source of variation SS df MS

Regression SSRX 1 231. 125 1 MSRX 1 231. 125

Error SSEX 1 188. 750 6 MSEX 1 31. 458

Total SSTO 419. 875 7

c) Regression of Y on X 2

Y 27. 250 9. 250X 2

Source of variation SS df MS

Regression SSRX 2 171. 125 1 MSRX 2 171. 125

Error SSEX 2 248. 750 6 MSEX 2 41. 458

Total SSTO 7

X 2 are the same, whether only the given independent variable is included

in the model or both independent variables are included. This is result of

the two independent variables being uncorrelated.

Another important feature is related to the sum of squares. Note from a)

that the error sum of squares when both X 1 and X 2 are included in the model is

SSEX 1 , X 2 17. 625. When only X 1 is included in the model, the error sum

of squares is SSEX 1 188. 750. We may ascribe the difference:

SSEX 1 − SSEX 1 , X 2 188. 750 − 17. 652 171. 125

to the effect of X 2 . We shall denote this difference by SSRX 2 ∣ X 1

(8.1) SSRX 2 ∣ X 1 SSEX 1 − SSEX 1 , X 2

This is equal in our case to SSRX 2 . The reason for this is that X 1 and X 2

are uncorrelated. The story is the same for independent variable X 2 . Let

(8.2) SSRX 1 ∣ X 2 SSEX 2 − SSEX 1 , X 2

In our example we have

SSRX 1 ∣ X 2 SSEX 2 − SSEX 1 , X 2 248. 750 − 17. 625 231. 123

and again it is equal to SSRX 1 .

The table III.1 below contains data for study of the relation of body fat Y to triceps

skinfold thickness X 1 and thigh circumferences X 2 based on sample of 20 healthy

females.

Table III.1

X i,1 X i,2 Yi

19.50 43.10 11.90

74

24.70 48.80 22.80

30.70 51.90 18.70

29.80 54.30 20.10

19.10 42.20 12.90

25.60 53.90 21.70

31.40 58.50 27.10

27.90 52.10 25.40

22.10 49.90 21.30

25.50 53.50 19.30

31.10 56.60 25.40

30.40 56.70 27.20

18.70 46.50 11.70

19.70 44.20 17.80

14.60 42.70 12.80

29.50 54.40 23.90

27.70 53.30 22.60

30.20 58.60 25.40

22.70 48.20 14.80

25.20 51.00 21.10

The triceps skinfold thickness (X 1 ) and thigh circumferences (X 2 ) are highly

correlated, as the scatter plot below suggests. The coefficient of simple correlation

between these two variables is equal to 0.92.

ANOVA tables II

a) Regression of Y on X 1 and X 2

Y −19. 174 0. 224X 1 0. 6594X 2

Source of variation SS df MS

Regression SSRX 1 , X 2 385. 44 2 MSRX 1 , X 2 192. 72

Error SSEX 1 , X 2 109. 95 17 MSEX 1 , X 2 6. 47

Total SSTO 495. 39 19

b) Regression of Y on X 1

Y −1. 496 0. 8572X 1

75

Source of variation SS df MS

Regression SSRX 1 352. 27 1 MSRX 1 352. 27

Error SSEX 1 143. 12 18 MSEX 1 7. 95

Total SSTO 495. 39 19

c) Regression of Y on X 2

Y −23. 634 0. 8566X 2

Source of variation SS df MS

Regression SSRX 2 381. 97 1 MSRX 2 381. 97

Error SSEX 2 113. 42 18 MSEX 2 6. 30

Total SSTO 495. 39 19

Note first that the regression coefficient for X 1 is not the same in a) and b).

Thus, the effect ascribed to X 1 by the fitted response varies here, depending

upon whether one X 1 or both X 1 and X 2 are being considered in the model.

The reason for that is correlation between X 1 and X 2 .

Note from table II a, that the error sum of squares when both X 1 and X 2

are included in the model is SSEX 1 , X 2 109. 95. When only X 2 is included

in the model, the error sum of squares is SSEX 2 113. 42 as seen from table II c.

Using (8.2) we obtain

SSRX 1 ∣ X 2 SSEX 2 − SSEX 1 , X 2 3. 47

When we fit a regression function containing only X 1 , we also obtain a measure of

reduction in variation of Y associated with X 1 namely SSRX 1 . For our example,

table II b indicates that SSRX 1 352. 27 which is not the same as

SSRX 1 ∣ X 2 3. 47. The reason for the large difference is the high positive

correlation between X 1 and X 2 .

The story is the same for the other independent variable.

The important conclusion is: When independent (explanatory ones) variables are

correlated, there is no unique sum of squares which can be ascribed to an independent

variable as reflecting its effect in reducing the total variation in Y. The reduction

in the total variation ascribed to an independent variable must be viewed in the

context of the other independent variables included in the model, whenever the

independent variables are correlated.

The terms SSRX 1 ∣ X 2 and SSRX 2 ∣ X 1 are called extra sums of squares,

since they indicate the additional or extra reduction in the error sum of squares

achieved by introducing an additional independent variable.

Let us consider the first-order regression model with two independent variables:

(8.4) Y i o 1 X i,1 2 X i,2 i full model

If the test on 1 indicates it is zero, the regression model (8.4) would be:

Y i o 2 X i,2 i

If the test on 2 indicates it is zero, the regression model (8.4) would be

Y i o 1 X i,1 i

76

However, if the separate tests indicate that 1 0 and 2 0, that does not

necessarily imply that:

Yi o i

since neither of the test consider this alternative: not both 1 and 2 equal to zero

The proper test for the existence of regression relation :

Ho : 1 2 0

H a : not both 1 and 2 equal to zero

is the F test of (7.30)

F ∗ MSR

MSE

variable, yet all of the individual tests on the regression coefficients will lead to the

conclusion that they are equal to zero.

(8.7a) SSRX 1 ∣ X 2 SSEX 2 − SSEX 1 , X 2

Likewise, we defined in (8.1):

(8.7b) SSRX 2 ∣ X 1 SSEX 1 − SSEX 1 , X 2

These extra sums of squares reflect the reduction in the error sum of squares by adding

an independent variable to the model, given that another independent variable is already

in the model.

Any reduction in the error sum of squares, of course, is equal to the same increase in

the regression sum of squares since always:

SSTO SSR SSE so SSE SSTO − SSR

Hence, an extra sum of squares can also be thought of as the increase in the regression

sum of squares achieved by introducing the new variable. We can state

(8.8a) SSRX 1 ∣ X 2 SSRX 1 , X 2 − SSRX 2

and

(8.8b) SSRX 2 ∣ X 1 SSRX 1 , X 2 − SSRX 1

Proof:

SSRX 1 ∣ X 2 SSEX 2 − SSEX 1 , X 2 SSTO − SSRX 2 − SSTO − SSRX 1 , X 2

SSRX 1 , X 2 − SSRX 2

The same is true for (8.8b).

Extensions for three or more X variables is straightforward. We can define:

(8.9) SSRX 3 ∣ X 1 , X 2 SSEX 1 , X 2 − SSEX 1 , X 2 , X 3

SSRX 3 ∣ X 1 , X 2 measures the reduction in the error sum of squares which is

achieved by introducing X 3 into regression model when X 1 and X 2 are already in

the model.

Decomposition of SSR

squares SSR. Let us consider the case of three X variables (X 1 , X 2 , X 3 ). We begin

with the following equality for variable X 1 :

(8.10) SSTO SSRX 1 SSEX 1

where the notation now shows explicitly that X 1 is in the model (Y i o 1 X i,1 i )

From (8.7b) we have:

SSEX 1 SSRX 2 ∣ X 1 SSEX 1 , X 2

Using it in (8.10) one can get:

(8.10a) SSTO SSRX 1 SSRX 2 ∣ X 1 SSEX 1 , X 2

From (8.9) we have

77

SSEX 1 , X 2 SSRX 3 ∣ X 1 , X 2 SSEX 1 , X 2 , X 3

and hence

(8.10b) SSTO SSRX 1 SSRX 2 ∣ X 1 SSRX 3 ∣ X 1 , X 2 SSEX 1 , X 2 , X 3

For multiple regression with three independent variables, using our notation, we

can write equivalent of (8.10)

(8.11) SSTO SSRX 3 , X 1 , X 2 SSEX 1 , X 2 , X 3

Hence

(8.12) SSRX 3 , X 1 , X 2 SSRX 1 SSRX 2 ∣ X 1 SSRX 3 ∣ X 1 , X 2

Thus, the regression sum of squares has bee decomposed into marginal components,

each associated with one degree of freedom. Of course order of the independent

variables is arbitrary. For instance:

(8.13) SSRX 3 , X 1 , X 2 SSRX 3 SSRX 1 ∣ X 3 SSRX 2 ∣ X 1 , X 3

We can define extra sum of squares for two or more independent variables at a time

and obtain still other decompositions. We can define

(8.14) SSRX 2 , X 3 ∣ X 1 SSEX 1 − SSEX 1 , X 2 , X 3

Thus, SSRX 2 , X 3 ∣ X 1 represents the reduction the reduction in the error sum of

squares which is achieved by introducing X 2 , X 3 into regression model already

containing X 1 . There are two degrees of freedom associated with SSRX 2 , X 3 ∣ X 1 ,

and also

(8.14a) SSRX 2 , X 3 ∣ X 1 SSRX 2 ∣ X 1 SSRX 3 ∣ X 1 , X 2

With SSRX 2 , X 3 ∣ X 1 we can make use of the decomposition

(8.15) SSRX 3 , X 1 , X 2 SSRX 1 SSRX 2 , X 3 ∣ X 1 .

Source of variation SS df MS

Regression SSRX 1 , X 2 , X 3 3 MSRX 1 , X 2 , X 3

X1 SSRX 1 1 MSRX 1

X2 ∣ X1 SSRX 2 ∣ X 1 1 MSRX 2 ∣ X 1

X3 ∣ X1, X2 SSRX 3 ∣ X 1 , X 2 1 MSRX 3 ∣ X 1 , X 2

Error SSEX 1 , X 2 , X 3 n − 4 MSEX 1 , X 2 , X 3

Total SSTO n−1

Source of variation SS df MS

Regression SSRX 1 , X 2 , 2 MSRX 1 , X 2 ,

X1 SSRX 1 1 MSRX 1

X2 ∣ X1 SSRX 2 ∣ X 1 1 MSRX 2 ∣ X 1

Error SSEX 1 , X 2 n − 3 MSEX 1 , X 2

Total SSTO n−1

in predicting Y) is

78

r 2 SSTO−SSE

SSTO

SSTO

SSR

1 − SSTO

SSE

A coefficient of partial determination measures the marginal contribution of one

variable, when all the others are already included in the model.

Let us consider a first-order multiple regression model with two independent variables

Y i o 1 X i,1 2 X i,2 i .

SSEX 2 measures the variation in Y when X 2 is included in the model. SSEX 1 , X 2

measures the variation in Y when both X 1 and X 2 are included in the model. Hence the

relative marginal reduction in the variation in Y associated with X 1 when X 2 is already

in the model is:

SSEX 2 −SSEX 1 ,X 2

SSEX 2

This measure is the coefficient of partial determination between Y and X 1 , given

that X 2 is in the model. We denote it by r 2Y 1.2 :

SSEX −SSEX ,X SSEX ,X

(8.16) r 2Y 1.2 2

SSEX 2

1 2

1 − SSEX1 2

2

Using (8.7a) we get

SSRX 1 ∣X 2

(8.16a) r 2Y 1.2 SSEX

2

The coefficient of partial determination between Y and X 2 , given that X 1 is in the

model is defined by

SSEX 1 −SSEX 1 ,X 2 SSEX ,X SSRX 2 ∣X 1

(8.17) r 2Y 2.1 SSEX 1

1 − SSEX1 2 SSEX

1 1

For our example given in table III.1 we have:

r 2Y 1.2 113.42−109.95

113.42

0. 03059 4

and

r 2Y 2.1 143.12−109.95

143.12

0. 231 76

General case

independent variables in the model is as follows:

SSRX 1 ∣X 2 ,X 3

(8.18a) r 2Y 1.2,3 SSEX

(X 1 to be added when X 2 and X 3 are in model)

1

SSRX 2 ∣X 1 ,X 3

(8.18b) r 2Y 2.1,3 SSEX 2

SSRX 3 ∣X 1 ,X 2

(8.18c)

r 2Y 3.1,2 SSEX 3

Note that in the subscripts to r 2 , the entries to the left of the dot show in turn

the variable taken as the response (dependent one), then the variable being added.

The entries to the right of the dot show variables already in the model.

The coefficient of partial determination can take values between 0 and 1.

coefficient of partial correlation and is denoted by r Y 1.2 , r Y 1.2,3 ,...

depending on model.

IN MULTIPLE REGRESSION

We have already discussed how to conduct two types of tests concerning the

79

regression coefficients in multiple regression model. We will summarize these

tests and then take some additional types of tests.

The hypothesis are

H o : 1 2 . . . p−1 0

(8.21)

H a : not al k (k 1, 2, . . . , p − 1) equal 0

and the test statistics

SSRX 1 ,....,X p−1 SSEX 1 ,...,X p−1

(8.22) F∗ p−1

n−p MSR

MSE

Decision rule

If F ∗ F1 − , p − 1, n − p, conclude H o

If F ∗ F1 − , p − 1, n − p, conclude H a

The hypothesis are

Ho : k 0

(8.23)

Ha : k ≠ 0

and the test statistics

SSRX k ∣X 1 ,...,K k−1 ,X k1 ,....,X p−1 SSEX 1 ,...,X p−1 MSRX k ∣X 1 ,...,K k−1 ,X k1 ,....,X p−1

(8.24) F∗ 1

n−p MSE

Decision rule

If F ∗ F1 − , 1, n − p, conclude H o

If F ∗ F1 − , 1, n − p, conclude H a

An equivalent test statistics is

(8.25) t ∗ sbb k

k

and the decision rule is

If |t ∗ | t1 − /2, n − p, conclude H o

(7.42c)

If |t ∗ | t1 − /2, , n − p, conclude H a

to zero. The approach is that of general linear test.

For general multiple regression model:

(8.29) Y i o 1 X i,1 . . . p−1 X i,p−1

we wish to test

H o : q q1 . . . p−1 0

(8.30)

H a : not al k (k q, q 1, , . . . , p − 1) equal 0

where for convenience, we arrange the model so that the last p − q coefficients

are the ones to be tested. We first fit the full model and obtain SSEX 1 , . . . , X p−1 .

Then we fit the reduced model:

(8.31) Y i o 1 X i,1 . . . q−1 X i,q−1 ← reduced model

and obtain SSEX 1 , . . . , X q−1 . Finally we use the general linear test statistic

80

SSEX ,...,X −SSEX ,...,X SSEX ,...,X

(8.32) F∗ 1 q−1

n−q−n−p

1 p−1

1

n−p

p−1

or equivalently

SSRX q ,X q1 ,...,X p−1 ∣X 1 ,...,X q−1

(8.32a) F∗ p−q MSEX 1 , . . . , X p−1

Decision rule

If F ∗ F1 − , p − q, n − p, conclude H o

If F ∗ F1 − , p − q, n − p, conclude H a

Correlation transformation.

calculations. Most of these errors are made primarily when the inverse of X ′ X is

calculated. Some variables have substantially different magnitudes so that entries in

the X ′ X matrix may cover a wide range, say from 15 to 49 000 000. A solution for this

condition is to transform the variables and thereby reparameterize the regression model.

The transformation which we consider is called the correlation transformation.

It makes all entries in the X ′ X matrix for the transformed variables fall between

−1 and 1 inclusive. We shall illustrate the correlation transformation for the case

of two independent variables. The basic regression model we assume is:

(11.1) Y i o 1 X i,1 2 X i,2 i

The first step is to use deviations X i,1 − X 1 instead of X i,1 and X i,2 − X 2 instead of X i,2 .

To use deviations we have to modify model (11.1)

Y i o 1 X 1 2 X 2 1 X i,1 − X 1 2 X i,2 − X 2 i

or

(11.2) Y i ′o 1 X i,1 − X 1 2 X i,2 − X 2 i

where

(11.2a) ′o o 1 X 1 2 X 2

It can be shown that least squares estimator of ′o is always Y . Hence, we can

rewrite (11.2) as follows:

(11.3) Y i − Y 1 X i,1 − X 1 2 X i,2 − X 2 i

The second step in developing the correlation transformation is to express

each deviation in units of standard deviation:

Yi− Y X i,1 −X 1 X i,2 −X 2

(11.4) sY s1 s2

where

∑Y i − Y 2

(11.5a) sY n−1

∑X i,1 −X 1 2

(11.5b) s1 n−1

∑ X i,2 −X 2 2

(11.5c) s2 n−1

The final step in obtaining the correlation transformation is to use the following

function of standardized variables in (11.4)

Yi− Y

(11.6a) Y ′i 1 sYn−1

X i,1 −X 1

(11.6b) X ′i,1 1

s1

n−1

X i,2 −X 2

(11.6c) X ′i,2 1

s2

n−1

The regression model with the transformed variables Y ′ , X ′1 and X ′2 is a simple

extension of model (11.3)

(11.7) Y ′i ′1 X ′i,1 ′2 X ′i,1 ′i

It is easy to show that the new parameters ′1 and ′2 and the original parameters

o , 1 and 2 in (11.1) are related as follows:

81

s

(11.8a) 1 s 1y ′1

s

(11.8b) 2 s 2y ′2

(11.8c) o Y − 1 X 1 − 2 X 2

Thus, the new regression coefficients ′1 and ′2 and the original regression

coefficients 1 and 2 are related by simple scaling factors involving ratios

of standard deviations.

X ′1,1 X ′1,2

X ′2,1 X ′2,2

X

: :

X ′n,1 X ′n,2

The X ′ X matrix is:

X ′1,1 X ′1,2

′

X ′1,1 X ′2,1 . . . X ′n,1 X ′2,1 X ′2,2

(11.9) X X

X ′1,2 X ′2,2 . . . X ′n,2 : :

X ′n,1 X ′n,2

∑ X ′i,1 X ′i,2 ∑X ′i,2 2

Let us consider the elements of this matrix. First, we have

X i,1 −X 1

2 ∑X i,1 −X 1

∑X ′i,1 2 ∑ s 1 n−1

n−1

s 21 1

Similarly:

∑X ′i,2 2 1

Finally:

∑X i,1 −X 1 X i,2 −X 2

∑ X ′i,1 X ′i,2 ∑ sXi,1 −X 1

n−1

X i,2 −X 2

s 2 n−1

n−1 1

s1s2

1

∑X i,1 −X 1 X i,2 −X 2

r 1,2

∑X i,1 −X 1 2 ∑X i,2 −X 2 2

1/2 1/2

Therefore the X ′ X matrix for the transformed variables, denoted by r xx is

1 r 1,2

(11.10) r xx X ′ X

r 1,2 1

and is called the correlation matrix of the independent variables.

Hence

1 −r 1,2

(11.11) r −1

xx 1−r 2

1

1,2 −r 1,2 1

and

r Y,1

(11.12) X′Y

r Y,2

where r Y,1 and r Y,2 are the coefficients of correlation between Y and X 1

82

and between Y and X 2 respectively. Hence , the estimated regression coefficients

for the reparameterized model (11.7) are

b ′1 1 −r 1,2 r Y,1

(11.13) b 1

b ′2

2

1−r 1,2

−r 1,2 1 r Y,2

r Y,1 − r 1,2 r Y,2

1

1−r 21,2

r Y,2 − r 1,2 r Y,1

The return to the estimated regression coefficients for the original model

is accomplished by employing the relations in (11.8):

s

(11.14a) b 1 s 1y b ′1

s

(11.14b) b 2 s 2y b ′2

(11.14c) bo Y − b1 X 1 − b2 X 2

multicollinearity

Informal methods.

diagnostics:

1. Large changes in the estimated regression coefficients when a variable is added

or deleted, or when an observation is altered or deleted.

2. Nonsignificant results in individual tests on the regression coefficients for

important independent variables.

3. Estimated regression coefficients with an algebraic sign that is the opposite

of that expected from theoretical considerations or prior experience.

4. Large coefficients of correlation between pairs of independent variables in the

correlation matrix r XX .

5. Wide confidence intervals for the regression coefficients representing important

independent variables.

coefficients is:

(11.27) 2 b 2 X ′ X −1

To reduce roundoff errors in calculating X ′ X −1 , we noted that it is desirable to

first transform the variables by means of correlation transformation. The

variance-covariance matrix of estimated standardized regression coefficient is

(11.28) 2 b ′ 2 r −1XX

where r XX is the matrix of pairwise simple correlation coefficients among

the independent variables, as illustrated in (11.10) for p − 1 2 independent

variables, and ′ 2 is the error term variance for transformed model.

Note from (11.28) that the variance of b ′k (k 1, . . . , p − 1) is equal to the product of

the error term variance ′ 2 and the kth diagonal element of the matrix r −1 XX . This

second factor is called variance inflation factor (VIF) It can be shown that the

variance inflation factor for b ′k , denoted by VIF k is

(11.29) VIF k 1 − R 2k −1 k 1, 2, . . . , p − 1

2

where R k is the coefficient of multiple determination when X k is regressed on the

p − 2 other X variables in the model. Hence, we have

′ 2

(11.30) 2 b ′k ′ 2 VIF k 1−R 2

k

83

The variance inflation factor VIF k is equal to 1 when R 2k 0

i.e. when X k is not linearly related to the other X variables. When R 2k ≠ 0,

then VIF k is greater than 1, indicating an inflated variance for b ′k .

observations which are outlying or extreme, i.e., observations which are well

separated from the remainder of the data. These outlying observations may

involve large residuals and often have dramatic effect on the fitted least squares

regression function.

We know (6.93) that the least square residuals can be expressed as a linear

combination of the observations Y i by means of hat matrix:

(11.44) e I − HY

The hat matrix H is given by (6.93a):

(11.45) H XX ′ X −1 X

′

observations Y i by:

(11.46) Y HY

Further, we noted that

(11.47) 2 e 2 I − H

so that the variance of the residual e i , denoted by 2 e i , is:

(11.48) 2 e i 2 1 − h i,i

where h i,i is the ith element of the main diagonal of the hat

matrix and it is equal to:

(11.49) h i,i X ′i X ′ X −1 X i

where X i is

1

X i,1

(11.49a) Xi X i,2

:

X i,p−1

The diagonal element h i,i in the hat matrix is called the leverage of the ith observation.

Thus, a large leverage value h i,i indicates that the ith observation is distant from the

center of the X observations. The mean leverage value

∑ h i,i p

(11.51) h n n

2p

Hence, leverage values greater than n are considered by this rule to indicate outlying

observations with regard to the X values.

2 e i 2 1 − h i,i

an unbiased estimator of this variance is:

(11.54) s 2 e i MSE1 − h i,i

The ratio of e i to se i is called the studentized residual and will be denoted by e ∗i

(11.55) e ∗i see ii

84

Let Y i denote the fitted value using fitted regression based on the observations

excluding the ith one(this observation is deleted). The residual

(1.56) d i Y i − Y i

is called a deleted residual and is denoted by d i .

The studentized deleted residual, denoted by d ∗i is :

(1.57) d ∗i sdd ii

Fortunately, the studentized deleted residuals d ∗i can be calculated without having

to fit regression function with the ith observation omitted. It can be shown that :

1/2

d ∗i e i

n−p−1

(11.58) SSE1−h i,i −e 2i

and they follow the t distribution with n − p − 1 degrees of freedom.

To identify outlying Yobservations, we examine the studentized deleted

residuals for large absolute values and use appropriate t distribution to

ascertain how far in the tails such outlying values fall.

PROBLEMS

Question 1

The following data were obtained in certain experiment:

i Y i X i,1 X i,2

1 64 4 2

2 73 4 4

3 61 4 2

4 76 4 4

5 72 6 2

6 80 6 4

7 71 6 2

8 83 6 4

9 83 8 2

10 89 8 4

11 86 8 2

12 93 8 4

13 88 10 2

14 95 10 4

15 94 10 2

16 100 10 4

a) Fit the first-order simple regression model for relating Y to X 1 .

State the fitted regression function.

b) Find the estimated regression coefficients for full model (Y on X 1 and X 2 ).

c) Does SSRX 1 equal SSRX 1 ∣ X 2 here? If not, is the difference substantial?

d) Calculate the coefficient of simple correlation between X 1 and X 2 .

The diagonal elements of the hat matrix are:

i 1 2 3 4 5 6 7 8

h i,i 0.237 0.237 0.237 0.237 0.137 0.137 0.137 0.137

85

i 9 10 11 12 13 14 15 16

h i,i 0.137 0.137 0.137 0.137 0.237 0.237 0.237 0.237

e) Identify any outlying X observations using the hat matrix method.

f) Obtain the studentized deleted residuals and identify any outlying Y observations.

Question 2.

The following data were obtained in certain experiment:

i Y i X i,1 X i,2 i Y i X i,1 X i,2

1 58 7 5.11 11 121 17 11.02

2 152 18 16.72 12 112 12 9.51

3 41 5 3.20 13 50 6 3.79

4 93 14 7.03 14 82 12 6.45

5 101 11 10.98 15 48 8 4.60

6 38 5 4.04 16 127 15 13.86

7 203 23 22.07 17 140 17 13.03

8 78 9 7.03 18 155 21 15.21

9 117 16 10.62 19 39 6 3.64

10 44 5 4.76 20 90 11 9.57

1) Find the estimated regression coefficients for full model (Y on X 1 and X 2 ).

2) Fit the first-order simple linear regression model for relating Y and X 2

3) Does SSRX 2 equal to SSRX 2 ∣ X 1 here? If not is the difference substantial?

4) Calculate the coefficient of simple correlation between X 1 and X 2 .

The diagonal elements of the hat matrix are:

i 1 2 3 4 5 6 7 8 9 10

h i,i 0.91 0.194 0.131 0.268 0.149 0.141 0.429 0.067 0.135 0.165

i 11 12 13 14 15 16 17 18 19 20

h i,i 0.179 0.059 0.110 0.156 0.095 0.128 0.97 0.230 0.112 0.073

5) Identify any outlying X observations using the hat matrix method.

6) Obtain the studentized deleted residuals and identify any outlying Y observations.

Question 3.

For a certain experiment the first-order regression model with two independent

variables was used. The calculated diagonal elements of the hat matrix are:

i 1 2 3 4 5 6 7 8

h i,i 0.237 0.237 0.237 0.237 0.137 0.137 0.137 0.137

i 9 10 11 12 13 14 15 16

h i,i 0.137 0.137 0.137 0.137 0.237 0.237 0.237 0.237

1) Describe use of hat matrix for identifying outlying X observations.

2) Identify any outlying X observations using the hat matrix method.

The all-possible-regressions selection procedure calls for an examination of all possible

regression models involving potential X variables and identifying “good” subsets according

86

to some criterion. The following can be used:

R 2p criterion

The R 2p criterion calls for an examination of the coefficient of multiple determination R 2

in order to select one or several subsets of X variables. We show the number of parameters

in the regression model as a subscript of R 2 . Thus, R 2p indicates that there are p parameters,

or p − 1 predictor variables, in the regression equation on which R 2p is based.

Since R 2p is a ratio of sums of squares:

SSR SSE

(12.1) R 2p SSTOp 1 − SSTOp

and the denominator is constant for all possible regressions, R 2p varies inversely with the

error sums of squares . But we know that can never increase as additional independent

variables are included in the model. Thus, R 2p will be a maximum when all p − 1 potential X

variables are included in the regression model. The reason for using the R 2p criterion with the

all-possible-regressions approach therefore cannot be to maximize R 2p . Rather, the intent

is to find the point where adding more X variables is not worthwhile because it leads to

a very small increase in R 2p . Often, the point is reached when only a limited number of X

variables is included in the regression model. Clearly, the determination of where

diminishing returns set in is a judgmental one.

Example 12.1. The following table contains the R 2p values for all possible regression models

for a certain data (54 observations) with 4 possible variables:

X variables p df SSE p R 2p MSE p C p

none 1 53 3.9728 0 0.0750 1721.6

X1 2 52 3.4960 0.120 0.0672 1510.7

X2 2 52 2.5762 0.352 0.0495 1100.1

X3 2 52 2.2154 0.442 0.0426 939.0

X4 2 52 1.8777 0.527 0.0361 788.3

X1, X2 3 51 2.2324 0.438 0.0438 948.6

X1, X3 3 51 1.4073 0.646 0.0276 580.3

X1, X4 3 51 1.8759 0.528 0.0368 789.5

X2, X3 3 51 0.7431 0.813 0.0146 283.7

X2, X4 3 51 1.3922 0.650 0.0273 573.5

X3, X4 3 51 1.2455 0.687 0.0244 508.0

X1, X2, X3 4 50 0.1099 0.972 0.0022 3.1

X1, X2, X4 4 50 1.3905 0.650 0.0278 574.8

X1, X3, X4 4 50 1.1157 0.719 0.0223 452.1

X2, X3, X4 4 50 0.4653 0.883 0.0093 161.7

X 1 , X 2 , X 3 , X 4 5 49 0.1098 0.972 0.0022 5.0

Using the R 2p one can notice that the use of the subset X 1 , X 2 , X 3 appears

to be reasonable since there is a little increase after these three X variables are included

in the model.

MSE p or R 2a criterion.

Since R 2p does not take account of the number of parameter in the model, and since

maxR 2p can never decrease as p increases, the use of the adjusted coefficient of multiple

determination R 2a

SSE p MSE p

(12.2) R 2a 1 − n−1

n−p SSTO 1 − SSTO/n−1

87

has been suggested as a criterion which takes the number of parameters in the model into

account through the degrees of freedom. It can be seen from (12.2) that R 2a increases if and

only if MSE p decreases since SSTP/n − 1 is fixed for the given Y observations. Hence, R 2a

and MSE p are equivalent criteria. We shall consider here the criterion MSE p . minMSE p

can, indeed, increase as p increases when the reduction in SSE p becomes so small that it is

not sufficient to offset the loss of an additional degree of freedom. Users of the MSE p

criterion either seek to find the subset of X variables that minimizes MSE p , or one or several

subsets for which MSE p is so close to the minimum that adding more variables is not

worthwhile. Using the table from example 12.1 one can see that the subset X 1 , X 2 , X 3

appears to be best using MSE p (R 2a ) criterion too.

C p criterion.

This criterion is concerned with the total mean squared error of the n fitted values for each

of the various subset regression models. The mean squared error concept involves a bias

component and a random error component. Here, the mean squared error pertains to the fitted

values for theregression model employed. The bias component for the ith fitted value is:

(12.3) EY i − EY i

where EY i is the expectation of the ith fitted value for the given

regressionmodel and EY i

is the true mean response. The random error component for Y i is simply 2 Y i , its variance.

The mean squared error for Y i is then the sum of the squared bias and the variance:

2

(12.4) EY i − EY i 2 Y i

The total mean squared error for all n fitted values is the sum of the n individual mean

squared

errors:

n 2

n

(12.5) ∑ EY i − EY i ∑ 2 Y i

i1 i1

The criterion measure, denoted by Γ p , is simply the total mean squared error divided by 2 ,

the true error variance:

n 2

n

(12.6) Γ p 12 ∑ EY i − EY i ∑ 2 Y i

i1 i1

The model which includes all p − 1 potential X variables is assumed to have been carefully

chosen so that MSEX 1 , . . . , X p−1 is an unbiased estimator of 2 . It can than be shown

that an estimator of Γ p is C p :

SSE p

(12.7) C p MSEX ,...,X − n − 2p

1 p−1

where SSE p is the error sum of squares for the fitted subset regression model with p

parameters

( i.e., with p − 1 predictor variables).

When there is no bias in the regression model with p − 1 predictor variables so that

EY i ≡ EY i , the expected value of C p is approximately p:

(12.8) E C p ∣ EY i ≡ EY i ≃ p

Thus, when values for all possible regression models are plotted against p, those models with

little bias will tend to fall near the line C p p. Models with substantial bias will tend to fall

considerably above this line.

In using the C p criterion, one seeks to identify subsets of X variables for which

(1) the C p value is small and

(2) the value C p is near p.

Sets of X variables with small C p values have a small total mean squared error, and when

C p value is also near p, the bias of the regression model is small. The regression model

based on the subset of X variables with the smallest C p value may involves substantial bias.

In that case, one may at times prefer a regression model based on somewhat larger subset

of X variables for which the C p value is slightly larger but which does not involve a

88

substantial bias component.

Identification of "best" subsets by use of algorithm.

In those occasional cases when the pool of potential X variables contains many variables.

An automatic search procedure that develops sequentially the sbset of X variables

to be included in the regression model may be helpful in those cases. It was developed

to economize on computational efforts, as compared with the all possible regression approach,

while arriving at a reasonable good subset of independent variables.

Stepwise Regression.

Stepwise regression uses t statistics (and related prob-values) to determine the importance

(or significance) of the independent (explanatory) variables in various regression models.

In this context the t statistics indicates that the independent variable is significant at the

level if and only if the related p − value is less than . This implies that we

can reject H o : j 0 in favor of H a : j ≠ 0 with the probability of Type I error equal

to . Before beginning the stepwise procedure we choose a value of entry , which we call

“the probability of a Type I error related to entering an independent variable into regression

model”. We also choose stay , which we call “the probability of a Type I error related to

retaining an independent variable that was previously entered into model”

The SAS default values are entry 0. 5 and stay 0. 15.

Then the stepwise regression is performed as follows:

Step 1. The stepwise procedure considers all possible one-independent variable regression

models of the form

y o 1xj

Each different model includes a different potential independent variable. For each model

the t statistics (and p − value) related to testing H o : 1 0 versus H a : 1 ≠ 0

is calculated. If the largest absolute value of t statistics is significant (that is the

corresponding smallest p − value entry ) the corresponding variable is included in

the model say X 1 , we consider the model

y o 1 x 1

and relevant in step one we have: variable entered.

If the t statistics does not indicate that X 1 is significant at the entry level, then

the stepwise procedure terminates by choosing the model

y o

Step 2.The stepwise procedure considers all possible two-independent-variable models

(with one variable already selected in the first step). For each model the t statistics related

to testing H o : 2 0 versus H a : 2 ≠ 0 is calculated. The variable with the biggest

absolute value is included if is significant (corresponding p − value entry ).

Further steps. The stepwise procedure continues by adding independent variables one at

the time if and only if it has the largest (in absolute value) and if its t statistics is significant

(the corresponding p − value entry ). After adding an independent variable the stepwise

procedure checks all the independent variables already included in the model and removes

any if it is not significant at the level stay . The stepwise procedure terminates when all

independent variables not in a model are insignificant at entry level or when the variable to

be added to the model is the one just removed from it.

Notice: In some packages instead of t statistics F statistics (t 2 is used in reports.

MAXR SAS PROCEDURE.

This method does not settle on a single model. Instead, it looks for the “best”

one-variable model, the “best” two-variable model, and so forth.

The MAXR methods begins, by finding one-variable model that would producing

the highest R 2 . Then another variable, that one that would yield the greatest increase in

R 2 is added. Once two-variable model is obtained, each variables is compared to each

variable not in a model. For each comparison, MAXR determines if removing one variable

and replacing it with the other variable would increase R 2 . After comparing all possible

89

switches, the one that produces the largest increase in R 2 is made.

Comparison begin again, and the procedure continues until no switch could increase R 2 .

The two-variable model thus achieved is considered the “best” two-variable model the

technique can find. Another variable is then added to the model, and the comparing and

switching process is repeated to find the “best” three-variable model, and so forth.

The difference between the stepwise technique and maximum improvement method is that

all switches are evaluated before any switch is made in the MAXR method. In stepwise

method, the “worst” variable may be removed without considering what adding the “best”

remaining variable may accomplish.

The basic regression models considered so far have assumed that the random

error terms i are either uncorrelated random variables or independent normal

random variables. Many regression applications involve time series data. For such

data, the assumption of uncorellated or independent error terms is often not

appropriate. Error term correlated over time are said to be autocorrelated or

serially correlated.

The simple linear regression model with the random terms following

a first-order autoregressive process is:

Yt o 1Xt t

(13.1)

t t−1 u t

where:

is parameter such that || 1

u t are independent N0, 2

Each error term in model (13.1) consists of a fraction of previous error term

plus a new disturbance term u t . The parameter is called the autocorrelation parameter.

Multiple regression

The multiple regression model with the random error following a first-order

autoregressive process is:

Y t o 1 X t,1 . . . p−1 X t,p−1 t

(13.2)

t t−1 u t

where:

is parameter such that || 1

u t are independent N0, 2

The Durbin-Watson test assumes that first-order autoregressive error models (13.1)

or (13.2). Because correlated errors in applications tend to show positive serial

correlation, the usual test alternatives are:

Ho : 0

(13.3)

Ha : 0

The test statistics

90

n

∑e t −e t−1 2

(13.4) D t2

n

∑ e 2t

t1

where n is the number of observations.

The critical values d L and d U are given in tables with decision rule

If D d U conclude H o

(13.5) If D d L conclude H a

If d L D d U the test is inconclusive

PROBLEMS

Question 1

For the following data:

t 1 2 3 4 5 6 7 8

X t 2.052 2.026 2.002 1.949 1.942 1.887 1.986 2.053

Y t 102.9 101.5 100.8 98.0 97.3 93.5 97.5 102.2

t 9 10 11 12 13 14 15 16

X t 2.102 2.113 2.058 2.060 2.035 2.080 2.102 2.150

Y t 105.0 107.2 105.1 103.9 103.0 104.8 105.0 107.2

1) Fit a simple linear regression line.

2) Conduct a formal test for positive autocorrelation using 0. 05.

Question 2.

The fitted values and residuals of a regression analysis are given below

i 1 2 3 4 5 6 7

Y i 2.92 2.33 2.25 1.58 2.08 3.51 3.34

e i 0.18 -0.03 0.75 0.32 0.42 0.19 0.06

i 8 9 10 11 12 13 14 15

Y i 2.42 2.84 2.50 3.59 2.16 1.91 2.50 3.26

e i -0.42 0.06 -0.20 -0.39 -0.36 -0.51 -0.50 0.54

Assume that the simple linear regression model with the random

terms following a first-order autoregressive process is appropriate.

Conduct a formal test for positive autocorrelation using 0. 05.

Question 3

The following data were obtained in a certain experiment:

91

i Yi X i,1 X i,2

1 64 4 2

2 73 4 4

3 61 4 2

4 76 4 4

5 72 6 2

6 80 6 4

7 71 6 2

8 83 6 4

9 83 8 2

10 89 8 4

11 86 8 2

12 93 8 4

13 88 10 2

14 95 10 4

15 94 10 2

16 100 10 4

The data summary is given below in matrix form

16 112 48

99

80

− 807 − 163

X′X 112 864 336 X ′ X −1 − 807 1

80

0

48 336 160 − 163 0 1

16

1308

′

XY 9510 Y ′ Y 108896

3994

Assume that first-order regression model with independent normal errors is appropriate.

1) Find the estimated regression coefficients.

2) Obtain an ANOVA table and use it to test whether there is a regression

relation using 0. 05.

3) Estimate 1 and 2 jointly by the Bonferroni procedure using

80 percent family confidence coefficient.

1

1

2

exp− x22 dx 0. 682 69

−1

2

1

2

exp− x22 dx 0. 954 5

−2

3

1

2

exp− x22 dx 0. 997 3

−3

4

1

2

exp− x22 dx 0. 999 94

−4

92

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.