Sie sind auf Seite 1von 3

# STAT 3008: Applied Regression Analysis

2020-21 Term 1
Assignment #1(Problem 2b revised)

## Due: October 9th, 2020 (Friday) at 11:30pm

This assignment covers material up to Section 2.4 of the lecture notes.
You need to show your calculation in details order to obtain full scores.
Please submit the hardcopy of the R codes and results for Problems 5(a) and 5(b).

Problem 1 [30 points]: Suppose the following regression model is fitted to a data set with
observations {(xi, yi), i = 1, 2, …, n}:
i .i .d .
yi   xi  ei , ei ~ N (0, 2 )

(a) [9 points] Based on the least squares method and the fact that RSS/  2 ~  n21 (df = n-1
since df = n from the data and df = 1 from estimating  ), compute the least squares
estimates for  and  2 .

## (b) [5 points] Is ˆ an unbiased estimator for β? Verify.

(c) [3 points] Show that the fitted regression line passes through the point
x , xy   1n 
2 n
x2 ,
i 1 i
1 n
n
i 1 i i

x y  , but not the average point ( x , y ) .

~
(d) [7 points] Derive the maximum likelihood estimates (MLE)  and ~ 2 .
(e) [6 points] Suppose (x1, x2, x3, x4, x5) = (1,2,3,4,5) and (y1, y2, y3, y4, y5) = (3, 8, 11, 17, 20).

What the values of the least squares estimates ˆ and ̂ ? Does the sum of residuals
2

equal to zero?

Problem 2 [18 points]: Suppose a simple linear regression is fitted to the data {(xi, yi), i = 1,
2, …, n} with x1 = x2 = xn-1 = a and xn = a+nδ. Should be (n-1).
i.e. average of
(a) [5 points] Show that SXX  n(n  1) 2 . the first (n-1) yi
n
(b) [7 points] Show that the OLS estimate for β1 is ˆ1   yn  yn 1  , where yn 1 
1 1
n
y .
n  1 i 1
i

(c) [6 points] Do you think the regression line obtained from the OLS estimates would pass
through Point A and B below? Verify
Point A: ( x, y)  a, yn 1  Point B: ( x, y)  ( xn , yn )

Page 1/3
Problem 3 [10 points]: Consider the residuals { êi } from the simple linear regression:

## eˆi  yi  yˆ i  yi  ˆ0  ˆ1 xi , i = 1, 2, …, n

where ˆ1  SXY/SXX and ˆ0  y - ˆ1 x are the OLS estimates for β1and β0.

Show that { êi , i=1,2,…n} are uncorrelated with the explanatory variables {xi, i= 1,2,…n}.

1 n
That is, ˆ ( x, eˆ)   ( xi  x )(eˆi  eˆ)  0 .
n  1 i 1

Problem 4 [22 points]: Suppose simple linear regression is fitted to the data {(x1, y1), … (x19, y19)},
with E(Y | X  x)  0  1x, Var(Y | X  x)   2

The coefficient table and ANOVA table below shows some of the estimated values:

(a) [11 points] Replicate the two tables above, and fill in ALL the missing values (in 5 significant
figures) from the two tables.
(The p-values can be obtained from R commands like “> 1-pf(F0 , df1, df2)” for the
right-hand tailed probability of Fdf1, df2, or “pt(t0,d)” for the cdf of td)
(b) [3 points] Based on the results in part (a), what is the sample correlation coefficient between

## x and y? That is, rxy  Cˆ orr( x, y)   ( xi  x )( yi  y ) /  (x  x)  ( y

i
2
i  y)2 .

(c) [8 points] Based on the results in part (a), test the hypotheses on whether β1 = -0.2 at α=0.05.
You should setup the 4 steps of hypothesis testing as on Ch2 page 65.

Problem 5 (R problem) [20 points]: The R library ‘alr3’ contains the “segreg” data, which
contains the electricity consumption (in KWH) and mean temperature (in F) for a building at
the University of Minnesota Twin Cities campus for 39 months in 1988-1992.
(https://www.rdocumentation.org/packages/alr3/versions/2.0.5/topics/segreg)
Suppose that we are interested in how the electricity consumption (y=segreg\$C) is affected
by the monthly mean temperature (x=segreg\$Temp), primarily driven by the use of air
conditioning.
(a) [10 points] Based on the R codes similar to those from Ch2 page 23, obtain the OLS

## estimates ˆ0 , ˆ1 and ̂ 2 .

(b) [6 points] Based on the plot and the abline functions as in Ch1 page 26, generate the

Page 2/3
scatterplot of the data, and add the regression line obtained in part (a) to the plot.
(c) [4 points] Suppose an outlier is defined as observation (xi, yi) with | eˆi | 2̂ . Do you

## think there is outlier in the data set? Verify.

(Note: A more precise definition of outlier will be introduced in Chapter 7, which
removes the impact of the outlier (xi, yi) itself when estimating ˆ ).

Page 3/3