You are on page 1of 26

MATH2831/2931

Linear Models/ Higher Linear Models.

August 2, 2013

Week 2 Lecture 3 - Last lecture:

Review of Hypothesis testing two-sided and one-sided tests.

Hypothesis testing for 0 and 1 .

Example 1: zinc concentrations in plants.

Example 2: sales and advertising data.

Week 2 Lecture 3 - This lecture:

The analysis of variance table

Inference on 1 : some further examples on testing and


ANOVA.

Confidence intervals for the mean.

Prediction intervals.

Week 2 Lecture 3 - The ANOVA table

So far we have considered the model with a single


independent variable to be the true model i.e. presumption
that y |x is related to x linearly.

NOTE 1: Prediction of the response will be poor in situations


in which there are several independent variables, each
affecting response.

NOTE 2: Prediction of the response will be poor in situations


in which presumption that y |x is related to x linearly is false
in range of variables considered.

Here we utilize ANOVA to analyze the quality of the estimated


regression line.

Week 2 Lecture 3 - The ANOVA table

If the true unknown model is linear in more than one variable


x, ie.
y |x = 0 + 1 x1 + 2 x2
then standard least squares estimate derived so far
b1 =

Sxy
Sxx

which is calculated only considering x1 , is a biased estimate


for 1 .

The bias will be a function of the additiona coefficient 2

We need a methodology to assess the quality of our


regression line !

Week 2 Lecture 3 - The ANOVA table

Analysis of the quality of an estimated regression line can be


handled by an ANOVA approach.

ANOVA procedure: considers total variation in dependent


variable as subdivided into meaningful components which are
then observed and treated in a systematic fashion.
(recall Week 1 Lecture 3 - partitioning of variance.)

Fundamental identity:
SStotal = SSreg + SSres
Regression sum of squares (variation explained by the fit)
SSreg =

n
X

(
yi y )2 .

i=1

Residual sum of squares (variation unexplained by the fit)


SSres =

n
X
i=1

(yi yi )2 .

Week 2 Lecture 3 - The ANOVA table


Question: Why is testing H0 : 1 = 0 of particular interest?
Answer: It helps answer question about whether the predictor is
useful for explaining the response.
t-test: statistic for testing H0 : 1 = 0 was
T =

b
1 .

/ Sxx

This statistic (T) has a t-distribution with n 2 degrees of


freedom under H0 .

From our distribution results (Week 2 Lecture 1) we also can


derive that under H0
F = T2 =

b12 Sxx

has an F distribution with 1 and n 2 degrees of freedom.

Week 2 Lecture 3 - The ANOVA table


Theorem: One can express the regression sum of squares SSreg
as a function of the least squares as follows:
SSreg = b12 Sxx
With SSreg = b12 Sxx the above statistic, under H0 ,
F =
can be written as
F =

b12 Sxx

SSreg /1
SSres /(n 2)

(ratio of variation explained by the model to scaled residual


variation).

Week 2 Lecture 3 - The ANOVA table


Proof:
SSreg

n
X
i=1

n
X
i=1

n
X
i=1

=
=

n
X

(
yi y )2
(b0 + b1 xi y )2
(
y b1 x + b1 xi y )2
b12 (xi x)2

i=1
b12 Sxx .

So
F =

SSreg /1
SSres /(n 2)

has an F1,n2 distribution under H0 : 1 = 0 as claimed.

Week 2 Lecture 3 - The ANOVA table


With F as test statistic we obtain a test of
H0 : 1 = 0
against
H1 : 1 6= 0
at significance level by using the critical region
F > F;1,n2 .
Computation of p-value:
p = Pr (F f |1 = 0)
where f is the observed value of F (given 1 = 0, F F1,n2 ).

Week 2 Lecture 3 - The ANOVA table

Source

Sum of
Squares

Degrees
of
freedom

Mean
Square

Regression

SSreg

MSreg
/
2

Residual

SSres

n2

MSreg =
SSreg /1
MSres =
SSres
2
(n2) =

Total

SStotal

n1

Week 2 Lecture 3 - The ANOVA table

When the null hypothesis is rejected, i.e. computed F-statistic


exceeds a critical value f (1, n 2) the conclusion is:

there is a significant amount of variation in the response


accounted for by the postulated model (simple linear
regression)

NOTE: the t-test allows for testing both two-side and one-sided
alternative hypothesis, where as the ANOVA F-test is restricted to
testing against the two-sided alternative.

Week 2 Lecture 3 - Example 1: market model of stock


returns
Monthly rate of return on a stock (R) is linearly related to monthly
return on the overall stock market (Rm ).
R = 0 + 1 Rm +
Rm is taken to be the monthly rate of return on some major stock
market index
RECALL:

Coefficient 1 is called the beta coefficient of the stock

1 > 1 indicates stocks rate of return is more senstive to


overall market than average

1 < 1 less sensitive than average

Estimate 1 and is it significantly different from 1?

Week 2 Lecture 3 - Example 1: Market model of stock


returns

Scatter plot of Host International (y-axis) versus overall market


returns (x-axis) with fitted regression line.

Week 2 Lecture 3 - Example 1: Market model of stock


returns
Fitted line is R = 0.14 + 1.60Rm

= 9.27, Sxx = 1117.90.


RECALL: 100(1 ) percentage confidence interval for 1 is



.
b1 t/2;n2 , b1 + t/2;n2
Sxx
Sxx
95% confidence interval for 1 :


9.27
9.27
, 1.60 + 2.002
1.60 2.002
= (1.04, 2.16),
1117.90
1117.90
which doesnt contain 1
A value of 1 for the slope does not seem plausible based on
the data.

Week 2 Lecture 3 - Example 1: Market model of stock


returns
Hypothesis testing equivalent:
H0 : 1 = 1
versus
H1 : 1 6= 1
Test statistic:
1.60 1
b1 1

=
= 2.16.

/ Sxx
9.27/ 1117.90
So if T t58 , the p-value for the test is
p = Pr (|T | 2.16)
= 2Pr (T 2.16)

= 0.0349

so that we reject H0 : 1 = 1 at the 5% level

Week 2 Lecture 3 - Example 2: Risk assessment from


financial reports
DATA Collection:

Investors are interested in the riskiness of a stock.

Want company financial reports to provide information helpful


for assessing risk.

Seven accounting determined measures of risk (available from


a companys financial reports)

Divident payout, current ratio, asset size, asset growth,


leverage, variability in earnings, covariability in earnings.

These were computed for 25 well known stocks based on


annual reports.

Week 2 Lecture 3 - Example 2: risk assessment from


financial reports
Experiment
Data sent to a random sample of 500 financial analysts of which
209 responded
Mean rating assigned by the 209 analysts recorded for each of the
25 stocks

Mean rating by analysts taken as reasonable surrogate of


market risk for each stock

AIM: Want to predict market risk from accounting measures:


response is market risk, predictors are accounting measures
(multiple).

Week 2 Lecture 3 - Example 2: Risk assessment example

Estimated market risk (y-axis) versus log(asset size) (x-axis).


Fitted line is y = 8.143 0.412x and R 2 = 0.21

Week 2 Lecture 3 - Prediction in the simple linear


regression model
Reason for building a simple linear regression model is often to
predict a new response value when the value of the predictor is
known.
Example: risk assessment data

Riskiness of a stock rated by 209 financial analysts (response).


Predictors are various accounting determined measures of risk.

Aim: Simple linear regression model for risk with asset size as
predictor.

Outcome: For a company not assessed by the financial


analysts we can determine asset size from company reports
and predict risk using the fitted model.

Week 2 Lecture 3 - Confidence intervals for the mean and


prediction intervals
Prediction of a new response value when predictor is x0 :
y (x0 ) = b0 + b1 x0 .
True conditional mean of response at x0 :
0 + 1 x0
New response value y0 when predictor is x0 : write
y0 = 0 + 1 x0 + 0 ,
0 independent of previous responses, normal with mean zero,
variance 2 .
Want to find confidence interval for conditional mean, and interval
which covers y0 with specified confidence (prediction interval).

Week 2 Lecture 3 - Confidence and Prediction Intervals

Confidence interval for conditional mean will reflect our


uncertainty due to estimating 0 , 1 .

Prediction interval will reflect our uncertainty due to


estimating 0 , 1 , and the level of variation of the responses
about the conditional mean (captured by our estimate of 2 ).

First well define a statistic which can be used for constructing a


confidence interval for the conditional mean at x0 .
Consider y (x0 ) = b0 + b1 x0 .

y (x0 ) is a Gaussian random variable.

E(
y (x0 )) = E(b0 + b1 x0 ) = 0 + 1 x0 .


x )2
Var(
y (x0 )) = 2 n1 + (x0S
xx

Week 2 Lecture 3 - Confidence interval for the mean

Var(
y (x0 )) = Var (b0 + b1 x0 )
= Var (b0 ) + x02 Var (b1 ) + 2x0 Cov (b0 , b1 )

 2 

 2
x2
x

2 1
2
=
+
+ x0
2x0
n Sxx
Sxx
Sxx


2
1 (x0 x)
+
.
= 2
n
Sxx

(1)

Week 2 Lecture 3 - Confidence interval for the mean

y (x0 ) is normally distributed, so


y (x0 ) 0 1 x0
q
x )2
n1 + (x0S
xx

N(0, 1).

As in previous lectures

(n 2)
2
2n2
2
independently of y (x0 ) (since y (x0 ) is a linear combination of b0 ,
b1 both independent of
2 ).

Week 2 Lecture 3 - Confidence intervals for the mean


We have

y (x0 ) 0 1 x0
q
tn2 .
x )2

n1 + (x0S
xx

Write t/2,n2 for upper 100 /2 percentage point of t


distribution with n 2 degrees of freedom,

(x
)

x
0
0
1 0
q
t/2,n2
Pr t/2,n2
(x0
x )2
1

n + Sxx
=1

Confidence interval for 0 + 1 x0 :


y (x0 ) t/2,n2

1 (x0 x)2
+
n
Sxx

Week 2 Lecture 2 - Learning Expectations.

Understand the quantities within the analysis of variance table

Be able to use the ANOVA table to answer questions


regarding suitability of a postulated statistical model.

Be able to formulate and evaluate a confidence intervals for


the mean.

Be able to formulate and evaluate a prediction interval