Sie sind auf Seite 1von 2

STAT 3008: Applied Linear Regression

2014-15 Term 2
Assignment #2
Due: February 25th, 2015 (Wednesday) at 5:00pm
This assignment covers material from Chapter 2 and 3 of the lecture notes.
** Please submit the hardcopy of the R-code and R-outputs for Problem 2 and Problem 4.
You need to show your calculation in details order to obtain full scores.
Problem 1: Consider the multiple linear regression:

Y X

n1

e , with E (e) 0 n1 and Var (e) 2 I n


n1

n( p 1) ( p 1)1

In Section 3.2, we derived that

(X' X) 1 X' Y , and E ( RSS ) E (e ' e ) E (Y' Y Y' X(X' X) 1 X' Y) 2 (n p 1)


'Y
) in terms of X, , 2, n and p.
(a) Evaluate E ( Y

'Y
) E (e ' e )
(b) Prove or disprove the following: E (Y' Y) E ( Y

Problem 2: Consider the dataset htwt.txt from the alr3 library. Suppose that we want to fit a
simple linear regression with Wt as the response and Ht as the predictor.
(a) Draw a scatterplot of the data using the plot function in R.
(b) Obtain the least squares estimates 0 , 1 and using the lm and summary functions.
2

What are the standard errors of 0 and 1 ?


(c) Obtain the ANOVA table of the above regression using the anova function.
Does the ANOVA table suggest independence between the response and the predictor?
(d) Based on the summary functions in R, test for the hypotheses H0: 1 = 0 vs H1: 10 at
=0.05 using the T-test. What is the corresponding p-value?
(e) Repeat part (d) if the hypotheses are now H0: 1 = 2.0 vs H1: 12.0.
(f) Suppose we are interested in the data point (Ht,Wt)=(166.8, 58.2) in the dataset.
Construct a 95% confidence interval of its fitted value W t .
(g) Find a 99% prediction interval for Wt based on a new observation with Ht=166.8.

Page 1/2

Problem 3: Consider a multiple linear regression with n=3 and p=0 (i.e. no predictor!):
1
y1
2
3 1 0


X 1, Y y2 , E( Y) 1, Var( Y) 1 1 0
1
y
1
0 0 1

3

(a) Compute (XX)-1 and show that (XX)-1XY simplifies into y =(y1+y2+y3)/3.
(b) Compute H = X(XX)-1X and express Y(I-H)Y in terms of y1, y2 and y3. What is the
value of E(Y(I-H)Y)?
(c) Based on the fact that E(YAY)=E[tr(YAY)]=tr[AE(YY)], compute the value of
E(Y(I-H)Y).
Problem 4: Let Y = (21, 25, 19, 24, 36, 36, 24, 10), X1 = (3, 9, 4, 3, 7, 9, 4, 1) and
X2 =(3, 9, 4, 3, 7, 9, 4, 2). Suppose we want to model the response Y by X1, X2 and the
intercept using the multiple linear regression.
(a) Based on matrix operations in R (i.e. X%*%Y, t(X), solve(X) on page 21 of Chapter 3),
show that = (16.5784, 10.1144, -8.3464), and compute the following quantities:

, e , SYY, RSS, SSreg,


Y

2 ,

ar ( ) and
V

R2

(Note: In R, command like RSS <- t(y)%*%y-t(y)%*%X%*%solve(t(X)%*%X)%*%t(X)%*%y


will assign RSS as a 1x1 matrix object instead of a numeric object. you may want to use
the command as.numeric(RSS) to bring it back to a scalar quantity.)
(b) The ANOVA table below compares Model 1: E(Y|X) = 0 and Model 2: E(Y|X) = 0 + 1x1:
Source
df
SS
MS
F0
p-value
Regression
1
329.82 329.82
10.523
0.0176
Residual
6
188.05
31.34
Total
7
517.87
Suppose we want to test the hypotheses
H0: E(Y|X) = 0+ 1x1 vs H1: E(Y|X) = 0 + 1x1 + 2x2
Based on the above ANOVA table and the results in part (a), construct the appropriate
ANOVA table. What decision and conclusion you can make from the table?
(c) Consider a new data point (x1, x2)=(-2.5, -3). What is the best point estimator for the
response, and a 95% prediction interval for the response?
- End of the Assignment -

Page 2/2

Das könnte Ihnen auch gefallen