Original Title: MATH2931 Lecture 6

August 2, 2013

ANOVA.

Prediction intervals.

independent variable to be the true model i.e. presumption

that y |x is related to x linearly.

in which there are several independent variables, each

affecting response.

in which presumption that y |x is related to x linearly is false

in range of variables considered.

regression line.

x, ie.

y |x = 0 + 1 x1 + 2 x2

then standard least squares estimate derived so far

b1 =

Sxy

Sxx

for 1 .

regression line !

handled by an ANOVA approach.

variable as subdivided into meaningful components which are

then observed and treated in a systematic fashion.

(recall Week 1 Lecture 3 - partitioning of variance.)

Fundamental identity:

SStotal = SSreg + SSres

Regression sum of squares (variation explained by the fit)

SSreg =

n

X

(

yi y )2 .

i=1

SSres =

n

X

i=1

(yi yi )2 .

Question: Why is testing H0 : 1 = 0 of particular interest?

Answer: It helps answer question about whether the predictor is

useful for explaining the response.

t-test: statistic for testing H0 : 1 = 0 was

T =

b

1 .

/ Sxx

freedom under H0 .

derive that under H0

F = T2 =

b12 Sxx

Theorem: One can express the regression sum of squares SSreg

as a function of the least squares as follows:

SSreg = b12 Sxx

With SSreg = b12 Sxx the above statistic, under H0 ,

F =

can be written as

F =

b12 Sxx

SSreg /1

SSres /(n 2)

variation).

Proof:

SSreg

n

X

i=1

n

X

i=1

n

X

i=1

=

=

n

X

(

yi y )2

(b0 + b1 xi y )2

(

y b1 x + b1 xi y )2

b12 (xi x)2

i=1

b12 Sxx .

So

F =

SSreg /1

SSres /(n 2)

With F as test statistic we obtain a test of

H0 : 1 = 0

against

H1 : 1 6= 0

at significance level by using the critical region

F > F;1,n2 .

Computation of p-value:

p = Pr (F f |1 = 0)

where f is the observed value of F (given 1 = 0, F F1,n2 ).

Source

Sum of

Squares

Degrees

of

freedom

Mean

Square

Regression

SSreg

MSreg

/

2

Residual

SSres

n2

MSreg =

SSreg /1

MSres =

SSres

2

(n2) =

Total

SStotal

n1

exceeds a critical value f (1, n 2) the conclusion is:

accounted for by the postulated model (simple linear

regression)

NOTE: the t-test allows for testing both two-side and one-sided

alternative hypothesis, where as the ANOVA F-test is restricted to

testing against the two-sided alternative.

returns

Monthly rate of return on a stock (R) is linearly related to monthly

return on the overall stock market (Rm ).

R = 0 + 1 Rm +

Rm is taken to be the monthly rate of return on some major stock

market index

RECALL:

overall market than average

returns

returns (x-axis) with fitted regression line.

returns

Fitted line is R = 0.14 + 1.60Rm

RECALL: 100(1 ) percentage confidence interval for 1 is

.

b1 t/2;n2 , b1 + t/2;n2

Sxx

Sxx

95% confidence interval for 1 :

9.27

9.27

, 1.60 + 2.002

1.60 2.002

= (1.04, 2.16),

1117.90

1117.90

which doesnt contain 1

A value of 1 for the slope does not seem plausible based on

the data.

returns

Hypothesis testing equivalent:

H0 : 1 = 1

versus

H1 : 1 6= 1

Test statistic:

1.60 1

b1 1

=

= 2.16.

/ Sxx

9.27/ 1117.90

So if T t58 , the p-value for the test is

p = Pr (|T | 2.16)

= 2Pr (T 2.16)

= 0.0349

financial reports

DATA Collection:

for assessing risk.

a companys financial reports)

leverage, variability in earnings, covariability in earnings.

annual reports.

financial reports

Experiment

Data sent to a random sample of 500 financial analysts of which

209 responded

Mean rating assigned by the 209 analysts recorded for each of the

25 stocks

market risk for each stock

response is market risk, predictors are accounting measures

(multiple).

Fitted line is y = 8.143 0.412x and R 2 = 0.21

regression model

Reason for building a simple linear regression model is often to

predict a new response value when the value of the predictor is

known.

Example: risk assessment data

Predictors are various accounting determined measures of risk.

Aim: Simple linear regression model for risk with asset size as

predictor.

analysts we can determine asset size from company reports

and predict risk using the fitted model.

prediction intervals

Prediction of a new response value when predictor is x0 :

y (x0 ) = b0 + b1 x0 .

True conditional mean of response at x0 :

0 + 1 x0

New response value y0 when predictor is x0 : write

y0 = 0 + 1 x0 + 0 ,

0 independent of previous responses, normal with mean zero,

variance 2 .

Want to find confidence interval for conditional mean, and interval

which covers y0 with specified confidence (prediction interval).

uncertainty due to estimating 0 , 1 .

estimating 0 , 1 , and the level of variation of the responses

about the conditional mean (captured by our estimate of 2 ).

confidence interval for the conditional mean at x0 .

Consider y (x0 ) = b0 + b1 x0 .

E(

y (x0 )) = E(b0 + b1 x0 ) = 0 + 1 x0 .

x )2

Var(

y (x0 )) = 2 n1 + (x0S

xx

Var(

y (x0 )) = Var (b0 + b1 x0 )

= Var (b0 ) + x02 Var (b1 ) + 2x0 Cov (b0 , b1 )

2

2

x2

x

2 1

2

=

+

+ x0

2x0

n Sxx

Sxx

Sxx

2

1 (x0 x)

+

.

= 2

n

Sxx

(1)

y (x0 ) 0 1 x0

q

x )2

n1 + (x0S

xx

N(0, 1).

As in previous lectures

(n 2)

2

2n2

2

independently of y (x0 ) (since y (x0 ) is a linear combination of b0 ,

b1 both independent of

2 ).

We have

y (x0 ) 0 1 x0

q

tn2 .

x )2

n1 + (x0S

xx

distribution with n 2 degrees of freedom,

(x

)

x

0

0

1 0

q

t/2,n2

Pr t/2,n2

(x0

x )2

1

n + Sxx

=1

y (x0 ) t/2,n2

1 (x0 x)2

+

n

Sxx

regarding suitability of a postulated statistical model.

the mean.

