You are on page 1of 26

MATH2831/2931

August 2, 2013

ANOVA.

Confidence intervals for the mean.

Prediction intervals.

So far we have considered the model with a single

independent variable to be the true model i.e. presumption
that y |x is related to x linearly.

NOTE 1: Prediction of the response will be poor in situations

in which there are several independent variables, each
affecting response.

NOTE 2: Prediction of the response will be poor in situations

in which presumption that y |x is related to x linearly is false
in range of variables considered.

regression line.

If the true unknown model is linear in more than one variable

x, ie.
y |x = 0 + 1 x1 + 2 x2
then standard least squares estimate derived so far
b1 =

Sxy
Sxx

for 1 .

We need a methodology to assess the quality of our

regression line !

Analysis of the quality of an estimated regression line can be

handled by an ANOVA approach.

ANOVA procedure: considers total variation in dependent

variable as subdivided into meaningful components which are
then observed and treated in a systematic fashion.
(recall Week 1 Lecture 3 - partitioning of variance.)

Fundamental identity:
SStotal = SSreg + SSres
Regression sum of squares (variation explained by the fit)
SSreg =

n
X

(
yi y )2 .

i=1

SSres =

n
X
i=1

(yi yi )2 .

Week 2 Lecture 3 - The ANOVA table

Question: Why is testing H0 : 1 = 0 of particular interest?
useful for explaining the response.
t-test: statistic for testing H0 : 1 = 0 was
T =

b
1 .

/ Sxx

This statistic (T) has a t-distribution with n 2 degrees of

freedom under H0 .

From our distribution results (Week 2 Lecture 1) we also can

derive that under H0
F = T2 =

b12 Sxx

Week 2 Lecture 3 - The ANOVA table

Theorem: One can express the regression sum of squares SSreg
as a function of the least squares as follows:
SSreg = b12 Sxx
With SSreg = b12 Sxx the above statistic, under H0 ,
F =
can be written as
F =

b12 Sxx

SSreg /1
SSres /(n 2)

variation).

Week 2 Lecture 3 - The ANOVA table

Proof:
SSreg

n
X
i=1

n
X
i=1

n
X
i=1

=
=

n
X

(
yi y )2
(b0 + b1 xi y )2
(
y b1 x + b1 xi y )2
b12 (xi x)2

i=1
b12 Sxx .

So
F =

SSreg /1
SSres /(n 2)

Week 2 Lecture 3 - The ANOVA table

With F as test statistic we obtain a test of
H0 : 1 = 0
against
H1 : 1 6= 0
at significance level by using the critical region
F > F;1,n2 .
Computation of p-value:
p = Pr (F f |1 = 0)
where f is the observed value of F (given 1 = 0, F F1,n2 ).

Source

Sum of
Squares

Degrees
of
freedom

Mean
Square

Regression

SSreg

MSreg
/
2

Residual

SSres

n2

MSreg =
SSreg /1
MSres =
SSres
2
(n2) =

Total

SStotal

n1

When the null hypothesis is rejected, i.e. computed F-statistic

exceeds a critical value f (1, n 2) the conclusion is:

there is a significant amount of variation in the response

accounted for by the postulated model (simple linear
regression)

NOTE: the t-test allows for testing both two-side and one-sided
alternative hypothesis, where as the ANOVA F-test is restricted to
testing against the two-sided alternative.

Week 2 Lecture 3 - Example 1: market model of stock

returns
Monthly rate of return on a stock (R) is linearly related to monthly
return on the overall stock market (Rm ).
R = 0 + 1 Rm +
Rm is taken to be the monthly rate of return on some major stock
market index
RECALL:

1 > 1 indicates stocks rate of return is more senstive to

overall market than average

returns

Scatter plot of Host International (y-axis) versus overall market

returns (x-axis) with fitted regression line.

Week 2 Lecture 3 - Example 1: Market model of stock

returns
Fitted line is R = 0.14 + 1.60Rm

= 9.27, Sxx = 1117.90.

RECALL: 100(1 ) percentage confidence interval for 1 is



.
b1 t/2;n2 , b1 + t/2;n2
Sxx
Sxx
95% confidence interval for 1 :


9.27
9.27
, 1.60 + 2.002
1.60 2.002
= (1.04, 2.16),
1117.90
1117.90
which doesnt contain 1
A value of 1 for the slope does not seem plausible based on
the data.

Week 2 Lecture 3 - Example 1: Market model of stock

returns
Hypothesis testing equivalent:
H0 : 1 = 1
versus
H1 : 1 6= 1
Test statistic:
1.60 1
b1 1

=
= 2.16.

/ Sxx
9.27/ 1117.90
So if T t58 , the p-value for the test is
p = Pr (|T | 2.16)
= 2Pr (T 2.16)

= 0.0349

Week 2 Lecture 3 - Example 2: Risk assessment from

financial reports
DATA Collection:

Want company financial reports to provide information helpful

for assessing risk.

Seven accounting determined measures of risk (available from

a companys financial reports)

Divident payout, current ratio, asset size, asset growth,

leverage, variability in earnings, covariability in earnings.

annual reports.

Week 2 Lecture 3 - Example 2: risk assessment from

financial reports
Experiment
Data sent to a random sample of 500 financial analysts of which
209 responded
Mean rating assigned by the 209 analysts recorded for each of the
25 stocks

Mean rating by analysts taken as reasonable surrogate of

market risk for each stock

AIM: Want to predict market risk from accounting measures:

response is market risk, predictors are accounting measures
(multiple).

Estimated market risk (y-axis) versus log(asset size) (x-axis).

Fitted line is y = 8.143 0.412x and R 2 = 0.21

Week 2 Lecture 3 - Prediction in the simple linear

regression model
Reason for building a simple linear regression model is often to
predict a new response value when the value of the predictor is
known.
Example: risk assessment data

Riskiness of a stock rated by 209 financial analysts (response).

Predictors are various accounting determined measures of risk.

Aim: Simple linear regression model for risk with asset size as
predictor.

Outcome: For a company not assessed by the financial

analysts we can determine asset size from company reports
and predict risk using the fitted model.

Week 2 Lecture 3 - Confidence intervals for the mean and

prediction intervals
Prediction of a new response value when predictor is x0 :
y (x0 ) = b0 + b1 x0 .
True conditional mean of response at x0 :
0 + 1 x0
New response value y0 when predictor is x0 : write
y0 = 0 + 1 x0 + 0 ,
0 independent of previous responses, normal with mean zero,
variance 2 .
Want to find confidence interval for conditional mean, and interval
which covers y0 with specified confidence (prediction interval).

Confidence interval for conditional mean will reflect our

uncertainty due to estimating 0 , 1 .

Prediction interval will reflect our uncertainty due to

estimating 0 , 1 , and the level of variation of the responses
about the conditional mean (captured by our estimate of 2 ).

First well define a statistic which can be used for constructing a

confidence interval for the conditional mean at x0 .
Consider y (x0 ) = b0 + b1 x0 .

y (x0 ) is a Gaussian random variable.

E(
y (x0 )) = E(b0 + b1 x0 ) = 0 + 1 x0 .


x )2
Var(
y (x0 )) = 2 n1 + (x0S
xx

Week 2 Lecture 3 - Confidence interval for the mean

Var(
y (x0 )) = Var (b0 + b1 x0 )
= Var (b0 ) + x02 Var (b1 ) + 2x0 Cov (b0 , b1 )

 2 

 2
x2
x

2 1
2
=
+
+ x0
2x0
n Sxx
Sxx
Sxx


2
1 (x0 x)
+
.
= 2
n
Sxx

(1)

y (x0 ) is normally distributed, so

y (x0 ) 0 1 x0
q
x )2
n1 + (x0S
xx

N(0, 1).

As in previous lectures

(n 2)
2
2n2
2
independently of y (x0 ) (since y (x0 ) is a linear combination of b0 ,
b1 both independent of
2 ).

We have

y (x0 ) 0 1 x0
q
tn2 .
x )2

n1 + (x0S
xx

Write t/2,n2 for upper 100 /2 percentage point of t

distribution with n 2 degrees of freedom,

(x
)

x
0
0
1 0
q
t/2,n2
Pr t/2,n2
(x0
x )2
1

n + Sxx
=1

y (x0 ) t/2,n2

1 (x0 x)2
+
n
Sxx

Be able to use the ANOVA table to answer questions

regarding suitability of a postulated statistical model.

the mean.