You are on page 1of 26

# MATH2831/2931

August 2, 2013

ANOVA.

## Confidence intervals for the mean.

Prediction intervals.

## So far we have considered the model with a single

independent variable to be the true model i.e. presumption
that y |x is related to x linearly.

## NOTE 1: Prediction of the response will be poor in situations

in which there are several independent variables, each
affecting response.

## NOTE 2: Prediction of the response will be poor in situations

in which presumption that y |x is related to x linearly is false
in range of variables considered.

regression line.

## If the true unknown model is linear in more than one variable

x, ie.
y |x = 0 + 1 x1 + 2 x2
then standard least squares estimate derived so far
b1 =

Sxy
Sxx

for 1 .

## We need a methodology to assess the quality of our

regression line !

## Analysis of the quality of an estimated regression line can be

handled by an ANOVA approach.

## ANOVA procedure: considers total variation in dependent

variable as subdivided into meaningful components which are
then observed and treated in a systematic fashion.
(recall Week 1 Lecture 3 - partitioning of variance.)

Fundamental identity:
SStotal = SSreg + SSres
Regression sum of squares (variation explained by the fit)
SSreg =

n
X

(
yi y )2 .

i=1

SSres =

n
X
i=1

(yi yi )2 .

## Week 2 Lecture 3 - The ANOVA table

Question: Why is testing H0 : 1 = 0 of particular interest?
useful for explaining the response.
t-test: statistic for testing H0 : 1 = 0 was
T =

b
1 .

/ Sxx

## This statistic (T) has a t-distribution with n 2 degrees of

freedom under H0 .

## From our distribution results (Week 2 Lecture 1) we also can

derive that under H0
F = T2 =

b12 Sxx

## Week 2 Lecture 3 - The ANOVA table

Theorem: One can express the regression sum of squares SSreg
as a function of the least squares as follows:
SSreg = b12 Sxx
With SSreg = b12 Sxx the above statistic, under H0 ,
F =
can be written as
F =

b12 Sxx

SSreg /1
SSres /(n 2)

variation).

## Week 2 Lecture 3 - The ANOVA table

Proof:
SSreg

n
X
i=1

n
X
i=1

n
X
i=1

=
=

n
X

(
yi y )2
(b0 + b1 xi y )2
(
y b1 x + b1 xi y )2
b12 (xi x)2

i=1
b12 Sxx .

So
F =

SSreg /1
SSres /(n 2)

## Week 2 Lecture 3 - The ANOVA table

With F as test statistic we obtain a test of
H0 : 1 = 0
against
H1 : 1 6= 0
at significance level by using the critical region
F > F;1,n2 .
Computation of p-value:
p = Pr (F f |1 = 0)
where f is the observed value of F (given 1 = 0, F F1,n2 ).

Source

Sum of
Squares

Degrees
of
freedom

Mean
Square

Regression

SSreg

MSreg
/
2

Residual

SSres

n2

MSreg =
SSreg /1
MSres =
SSres
2
(n2) =

Total

SStotal

n1

## When the null hypothesis is rejected, i.e. computed F-statistic

exceeds a critical value f (1, n 2) the conclusion is:

## there is a significant amount of variation in the response

accounted for by the postulated model (simple linear
regression)

NOTE: the t-test allows for testing both two-side and one-sided
alternative hypothesis, where as the ANOVA F-test is restricted to
testing against the two-sided alternative.

## Week 2 Lecture 3 - Example 1: market model of stock

returns
Monthly rate of return on a stock (R) is linearly related to monthly
return on the overall stock market (Rm ).
R = 0 + 1 Rm +
Rm is taken to be the monthly rate of return on some major stock
market index
RECALL:

## 1 > 1 indicates stocks rate of return is more senstive to

overall market than average

returns

## Scatter plot of Host International (y-axis) versus overall market

returns (x-axis) with fitted regression line.

## Week 2 Lecture 3 - Example 1: Market model of stock

returns
Fitted line is R = 0.14 + 1.60Rm

## = 9.27, Sxx = 1117.90.

RECALL: 100(1 ) percentage confidence interval for 1 is



.
b1 t/2;n2 , b1 + t/2;n2
Sxx
Sxx
95% confidence interval for 1 :


9.27
9.27
, 1.60 + 2.002
1.60 2.002
= (1.04, 2.16),
1117.90
1117.90
which doesnt contain 1
A value of 1 for the slope does not seem plausible based on
the data.

## Week 2 Lecture 3 - Example 1: Market model of stock

returns
Hypothesis testing equivalent:
H0 : 1 = 1
versus
H1 : 1 6= 1
Test statistic:
1.60 1
b1 1

=
= 2.16.

/ Sxx
9.27/ 1117.90
So if T t58 , the p-value for the test is
p = Pr (|T | 2.16)
= 2Pr (T 2.16)

= 0.0349

## Week 2 Lecture 3 - Example 2: Risk assessment from

financial reports
DATA Collection:

## Want company financial reports to provide information helpful

for assessing risk.

## Seven accounting determined measures of risk (available from

a companys financial reports)

## Divident payout, current ratio, asset size, asset growth,

leverage, variability in earnings, covariability in earnings.

annual reports.

## Week 2 Lecture 3 - Example 2: risk assessment from

financial reports
Experiment
Data sent to a random sample of 500 financial analysts of which
209 responded
Mean rating assigned by the 209 analysts recorded for each of the
25 stocks

## Mean rating by analysts taken as reasonable surrogate of

market risk for each stock

## AIM: Want to predict market risk from accounting measures:

response is market risk, predictors are accounting measures
(multiple).

## Estimated market risk (y-axis) versus log(asset size) (x-axis).

Fitted line is y = 8.143 0.412x and R 2 = 0.21

## Week 2 Lecture 3 - Prediction in the simple linear

regression model
Reason for building a simple linear regression model is often to
predict a new response value when the value of the predictor is
known.
Example: risk assessment data

## Riskiness of a stock rated by 209 financial analysts (response).

Predictors are various accounting determined measures of risk.

Aim: Simple linear regression model for risk with asset size as
predictor.

## Outcome: For a company not assessed by the financial

analysts we can determine asset size from company reports
and predict risk using the fitted model.

## Week 2 Lecture 3 - Confidence intervals for the mean and

prediction intervals
Prediction of a new response value when predictor is x0 :
y (x0 ) = b0 + b1 x0 .
True conditional mean of response at x0 :
0 + 1 x0
New response value y0 when predictor is x0 : write
y0 = 0 + 1 x0 + 0 ,
0 independent of previous responses, normal with mean zero,
variance 2 .
Want to find confidence interval for conditional mean, and interval
which covers y0 with specified confidence (prediction interval).

## Confidence interval for conditional mean will reflect our

uncertainty due to estimating 0 , 1 .

## Prediction interval will reflect our uncertainty due to

estimating 0 , 1 , and the level of variation of the responses
about the conditional mean (captured by our estimate of 2 ).

## First well define a statistic which can be used for constructing a

confidence interval for the conditional mean at x0 .
Consider y (x0 ) = b0 + b1 x0 .

## y (x0 ) is a Gaussian random variable.

E(
y (x0 )) = E(b0 + b1 x0 ) = 0 + 1 x0 .


x )2
Var(
y (x0 )) = 2 n1 + (x0S
xx

## Week 2 Lecture 3 - Confidence interval for the mean

Var(
y (x0 )) = Var (b0 + b1 x0 )
= Var (b0 ) + x02 Var (b1 ) + 2x0 Cov (b0 , b1 )

 2 

 2
x2
x

2 1
2
=
+
+ x0
2x0
n Sxx
Sxx
Sxx


2
1 (x0 x)
+
.
= 2
n
Sxx

(1)

## y (x0 ) is normally distributed, so

y (x0 ) 0 1 x0
q
x )2
n1 + (x0S
xx

N(0, 1).

As in previous lectures

(n 2)
2
2n2
2
independently of y (x0 ) (since y (x0 ) is a linear combination of b0 ,
b1 both independent of
2 ).

We have

y (x0 ) 0 1 x0
q
tn2 .
x )2

n1 + (x0S
xx

## Write t/2,n2 for upper 100 /2 percentage point of t

distribution with n 2 degrees of freedom,

(x
)

x
0
0
1 0
q
t/2,n2
Pr t/2,n2
(x0
x )2
1

n + Sxx
=1

y (x0 ) t/2,n2

1 (x0 x)2
+
n
Sxx

## Be able to use the ANOVA table to answer questions

regarding suitability of a postulated statistical model.

the mean.