Multiple Linear Regression

Multiple Linear Regression
Simple Linear Regression (Chapter 2):

Mean function: Linear function of an explanatory variable x:
Chapter 3 Multiple Linear Regression E(Y | X x) 0 1x
Variance function: Var(Y | X x) 2
STAT 3008 Applied Regression Analysis Multiple Linear Regression (or Multiple Regression):
Department of Statistics
Mean function: Linear function of explanatory variables
The Chinese University of Hong Kong x1, x2 , xpE (Y | X 1 x1, , X p x p ) 0 1 x1 p xp
2020/21 Term 1 where p
2
Variance function: Var(Y | X1 x1 , ,Xp xp )
Dr. LEE Pak Kuen, Philip
2
Parameters: ( 0 , 1 , , p , )
1 2
Chapter Outline
Section 3.1: Random Vector
Section 3.2: Model Setup
Section 3.3: Ordinary Least Squares (OLS) Estimates
Section 3.4: Properties of the OLS Estimates Section 3.1
Section 3.5: Maximum Likelihood Estimates Random Vector
Section 3.6: Analysis of Variances (ANOVA)
Section 3.7: Confidence Intervals and Tests
Appendix: Useful Formula from Linear Algebra
3 4
Random Variable and Random Vector Multivariate Normal Distribution X1
Random Variable: A numeric quantity that (i) takes different Consider a n-dimensional Random Vector: X
X2
values with (ii) specified probabilities

Xn
Notation: X, Y, Z
(STAT2001): X ~ N( , 2), Y ~ Binomial(n, p), Z 1
Random Vector: A column vector of Random Variables Multivariate Normal: X ~ Nn , where 2
X1 Y1 n
X2 Y2
Notation: X , Y and n×n is the variance-covariance matrix for X.
Xn Yn Joint probability density function (pdf) of X = x:

1 1
(STAT2001): Bivariate Normal Distribution f (x) n/ 2 1/ 2
exp (x )' 1
(x )
(2 ) | | 2
2
X1 1
X ~ N2 , 1 1 2
2 where xi , i 1, 2, ,n
X2 2 1 2 2
5 6
Multivariate Normal Distribution: n=1 Multivariate Normal Distribution: n=2

X1
(Special Case) When n=1, X ~ N , 2
(Univariate Normal) (Special Case) When n=2, X ~ N2 , (Bivariate Normal)
X2
The joint probability density function (pdf) 2
1 1 ' 1
where 1
, 1 1 2
2
f (x) exp (x ) (x ) 2 1 2 2
(2 ) n / 2 | |1/ 2 2
The joint probability density function (pdf) is
is then reduced to
1 1
f (x) exp (x )' 1
(x )
1 1 ' 2 1
(2 ) n / 2 | |1/ 2 2
f ( x) 1/ 2 2 1/ 2
exp (x )( ) (x ) 2
(2 ) | | 2 1
exp
1
( x1 x2 )
1 2 1 2
x1 1
1 2 2 2 2 2
2 (2 ) 2
1
2
2
2 2
1
2
2
2 (1 ) 1 2 1 2 1
x2 2
1 1 x
exp
2 2 2 1 2
2 ( x1 1 )2 2
1 ( x2 2 )2 2 1 2 ( x1 1 )( x2 2 )
exp 2 2 2
2 (1 2
) 2
1
2
2
2(1 ) 1 2
7
Graphical Illustration: Next Page 8
Bivariate Normal Distribution Joint pdf Bivariate Normal Distribution Joint pdf
Joint pdf of Bivariate Normal Distribution has elliptical contour: Interactive Density Plot for Bivariate Normal (not required)
nsim<-100000 # Number of Simulation of Bivariate Normal (x,y)
f (x) x <- rnorm(nsim)
y <- 2 + x*rnorm(nsim,1,.1) + rnorm(nsim)
library(MASS)
den3d <- kde2d(x, y)
install.packages("plotly",repos="http://cran.rstudio.com/", dependencies=
Elliptical Contour: TRUE)
Same pdf values f(x) library(plotly)
for points on the plot_ly(x=den3d$x, y=den3d$y, z=den3d$z) %>% add_surface()
same ellipse
x2
x1
9 10
Error Random Vector e

(Chapter 2) Assumptions on error random variable {ei}:
(i) E(ei)=0, (ii) Var(ei)= 2, (iii) {e1, e2 en} uncorrelated
e1
e2
(Section 3.2) Define the Error Random Vector e
Section 3.2
E (e1 ) 0
E (e2 ) 0
en Model Setup
Mean Vector of e: E (e) 0n 1
E (en ) 0
Variance-covariance Matrix of e:
2
Cov(e1 , e1 ) Cov(e1 , e2 ) Cov(e1 , en ) 0 0
Cov(e2 , e1 ) Cov(e2 , e2 ) Cov(e2 , en ) 0 2
0 2
Var (e) In
2
Cov(en , e1 ) Cov(en , e2 ) Cov(en , en ) 0 0
11 12
Terms vs Explanatory Variables EV (x-variables) Matrix Notation for Multiple Linear Regression
Explanatory Variable (EV): Original data you collect Regression Model:
e.g. height, weight, color, gender E (Y | X1 x1, ,Xp xp ) 0 x
1 1 p xp
Simple Linear Regression (Ch2): x-variable = EV Var(Y | X1 x1 , ,Xp xp ) 2
Terms: Variables created from the EVs y1

In Matrix Form:
e.g. height2, log(weight), height × weight, color y2
Y
In general, terms includes
1) Intercept 2) EVs 3) Transformation of EVs yn
4) Polynomials 5) Interaction/combinations of EVs
1 x11 x1 p
6) Dummy variable and factor
1 x21 x2 p
Multiple Linear Regression (Ch3): x-variable = terms X
1
e.g. weight = 0 + 1 height + 2 (height)2+ 3 ([% of body fat]x [height]3)
1 xn1 xnp
* Important Question*:
13 intercept 14
Matrix Notation for Multiple Linear Regression Matrix Notation for Simple Linear Regression
Dimensions nx1 nx(p+1) (p+1)x1 nx1
1 x11 x1 p
[Chapter 3 => Chapter 2]:
y1 0 e1
y2 1 x21 x2 p e2 Put p=1 into Y X e,
1
Y , X , , e
1
y1 1 x11 e1
yn 1 xn1 xnp en
p
y2 1 x21 0 e2
Y , X , , e
Multiple linear regression - Matrix notation: Y X e 1
The ith row is yi 0 x
1 i1 x
2 i2 x
p ip ei with yn 1 xn1 en
Var(e1 ) Cov (e1 , e2 ) ... Cov (e1 , en )
Cov (e2 , e1 ) Var(e2 ) ... Cov (e2 , en )
With E(e) 0n 1, Var(e) 2
In
2
E (e) 0n 1 , Var(e) In Will estimate the parameter vector and study its properties
...
Cov (en , e1 ) ... ... Var(en )
in vector form!
Quantities in Bold: Either Vector or Matrix (not scalar)
Assumptions of {ei}: (i) mean zero, (ii) equal variance and (iii) {ei} are
uncorrelated with each other. 15 Example: Y, X , e, , I n 16
Matrix Notation for Multiple Regression
Multiple regression in matrix form: Y X e
Consider sum of square distances from yi to 0 x
1 1i p x pi:
n n 2
g( ) e2
1 i
yi 0 x
1 1i p x pi
Section 3.3 i i 1
(Y X )' ( Y X )
Ordinary Least Squares (OLS) Estimates
Y' Y Y' X ' X' Y ' X' X
Y' Y 2 Y' X ' X' X - Equation (1)
g( )
Put 0 => Solve for the OLS Estimates
** Need knowledge on matrix differentiation (next 2 pages) **
17 18
Matrix Differentiation Matrix Differentiation

Consider function of k variables f ( ) f ( 1, , , ) Example 3: 1 , 2 , 3 ', c c1, c2 , c3 '
2 k
, where
1, 2 , , k ' c1
f( ) 3 f( )
f( ) c' i 1 i
c i
c' c2 c
f( )
1
c3
f( )
Define the partial derivative of f ( ) wrt as 2 c1
3 f( )
f( ) f( ) 'c i 1 i i c 'c c2 c
k c3
3
f( ) Example 4: , , ' , M a 3x3 matrix
Example 1: f ( ) ( 1 2 ) 3
=> 3 1 2 3
Product Rule
1 2
2 f( ) 'M
f( )
1 2 f( ) 'M 'M ' M M' M
Example 2: f ( ) 1
2
2 log( 3 ) => 1
2
1/ 3 19 20
OLS Estimates for Multiple Linear Regression - Geometry
y1
Equation (1): Want to Minimize y2
(1)Response: Y
g( ) ( Y X )' ( Y X ) Y' Y 2Y' X ' X' X (2) Space span by the following (p+1) vectors:
yn 1 x11 x1 p
Differentiating g wrt c' c 'M (M' M)
1 x21 x2 p
, , ,
g( ) e
2( Y' X )' X ' X ( X ' X )' 2 X ' Y 2X ' X e xnp
1 xn1
g( ) 2X ' Y 2 X ' X 0
Put 0
X Y X (4) OLS Estimates: Projection
0(p+1)×1 of Y to the space
X' X X' Y
=> Residual e should be orthogonal
( X ' X ) 1 X' Y to the above (p+1) vectors in the
space. That is,
Since g is a convex function in , minimizes the function g. (3)X is a vector on the space. is X' e 0( p 1) 1
and the sum of squares of residual

=> (X' X) 1 X' Y is the OLS estimate for . far from the response Y
e' e should be the shortest.
21 22
Multiple Linear Regression - Geometry (This Section) vs ( 0 , 1 ) in Chapter 2

Derivation of OLS Estimates based on Geometric Property When p=1, does ( X' X) 1 X' Y replicate the estimates
Multiple Linear Regression: Y X e
1 SXY/SXX , 0 y 1 x in Chapter 2?
Start with the estimated model Y X e 1 x1 y1
1 x2 y2
* Key property from the Previous page *: X' e 0( p Consider X , Y . X' X X' Y
1) 1
Pre-multiply both sides by X ' , 1 xn yn
X' Y X' X X' e X' X 1

(X' X)
1
Pre-multiply both sides by ( X ' X) , we have ( X' X) 1 X' Y
which is the same as the OLS estimates two pages before (but (X' X) 1 X' Y
much simpler derivation!!)
Note: Linear Regression is a geometric problem. The orthogonal
property X' e 0 implies (and vice versa) the minimization of
the sum of squares function g ( ) two pages before. 23 24
OLS Estimates for 2 Trace of a Matirx: tr(A)
( X' X ) 1 X' Y Definition: tr(A) is the sum of diagonal elements of a square matrix A
a11 a12 ... a1m
2 RSS
[Ch2] based on the fact: E RSS ( n 2) 2
A
a21 a22 ... a2 m m
n 2 tr ( A ) aii
... ... ... ... i 1
This Section: RSS ( Y X )' ( Y X ) am1 am 2 ... amm
Y' Y 2Y' X ' X' X Properties: m
Y' Y Y' X ( X ' X ) X ' Y 1
1. Amxm& Bmxm : tr ( A B) aii bii tr ( A) tr (B)
i 1
Y' (I n X (X' X ) 1 X ' )Y - Equation (2) m n n m
2. Amxn & Bnxm : tr ( AB) aijb ji b jiaij tr (BA)
How to evaluate E(RSS)? i 1 j 1 j 1 i 1
Given that E(ee )= 2In, how to evaluate E(Y AY), where 3. Random Matrix Xmxm :
m m
A is a constant matrix?
tr ( E ( X )) E ( xii ) E xii E ( tr ( X ))
Answer: Need to know the trace operation of a matrix 25 i 1 i 1 26
OLS Estimates for 2 Data [Chapter 2, p50]

Example 1: From the data n=17,
1) From Equation (2): RSS Y' (In X(X' X) 1 X' )Y Y' AY ,
Summary Statistics:
where A In X( X' X) 1 X' x 202 .9529 , y 139 .6053 ,
2) Consider E (RSS) E (Y' AY) E ( tr (Y' AY)) E ( tr ( AYY ' )) SXX 530 .782 , SXY 475 .312 , SYY 427 .794
[Chapter 2] OLS Estimates:
tr ( AE (YY ' )) tr ( AE[( X e)(X e)' ])
1 SXY / SXX 0.89549 , 0 y 1 x 42.1378
2
tr ( A X ' X' n E ( X e ' ) E (e ' X' ) ) RSS SYY 2
SXY / SXX 2.1549 , 2
RSS /(n 2) 0.14367
2
tr ( A( X ' X' )) tr ( A) 0 0 [Chapter 3] OLS Estimates
1 194 .5 17 3450 .2 77.6611 0.38236
1
1 194 .3 X' X ( X' X )
3) Now, tr (A(X ' X' )) tr (In X(X' X) X' )X ' X' tr (0) 0 1 X 3450 .2 700759 0.38236 0.001884
2373 .29 42.1378
tr( A ) tr (In X ( X ' X ) 1 X ' ) tr ( In ) tr ( X ( X ' X ) 1 X ' ) 1 212 .2 X' Y (X' X ) 1 X' Y
131 .79 482141 .5 0.89549
n tr ( X' X ( X ' X ) 1 ) n tr (Ip 1 ) n p 1 1
Y
131 .79 RSS Y' (In X( X' X) X' )Y 2.1549
2 2 RSS
4) Step 2) + 3): E ( RSS) (n ( p 1)) 147 .8
2
RSS /(n 1 1) 0.1436
n p 1 27 28
Fuel Data [Ch1 p40] Fuel Data
Example 2: Want to model the fuel consumption (Y=Fuel) in different
Sample
states in the US to be affected by different factors:
Covariance
Tax (x1) Gasoline state tax Matrix:
rate (cents per gallon)
9.02151 2.852e 02 4.080e 03 5.981e 02 1.932e 01
Dlic (x2) 1,000x(# of licensed
drivers/ population of age 16+) 2.852e 02 9.788e 04 5.599e 06 4.263e 05 1.602e 04
1
in that state ( X' X ) 4.080e 03 5.599e 06 3.922e 06 1.189e 05 5.402e 06
Income (x3) Personal income 5.981e 02 4.263e 05 1.189e 05 1.143e 03 1.000e 03
(in US$1,000) 1.932e 01 1.602e 04 5.402e 06 1.000e 03 9.948e 03
logMiles (x4) Log (Total 154 .193
length of highway [in miles] of
4.2280
that state) RSS Y' Y Y' X( X' X) 1 X' Y 193,700
( X' X ) 1 X' Y 0.4719
Multiple Linear Regression: 2
RSS /( n p 1) 4210 .87
6.1353
y 0 x
1 1 x
2 2 x
3 3 x
4 4 e 18.5453
29 30
Fuel Data R Code for Least Squares Estimates Fuel Data R Code for Least Squares Estimates
library(car); library(alr3) # Load the alr3 library ### Multiple Linear Regression - Matrix Algebra###
Fuel=1000*fuel2001$FuelC/fuel2001$Pop # Define the Fuel variable Intercept=rep(1,length(Tax))
Tax=fuel2001$Tax; Dlic=1000*fuel2001$Drivers/fuel2001$Pop; Income=fuel2001$Income/1000 X=cbind(Intercept,Tax,Dlic,Income,logMiles); Y=Fuel # Construct the X matrix and Y vector
logMiles=log(fuel2001$Miles,2) # Define the 4 terms n<-length(Fuel); p<-dim(X)[[2]]-1 # Compute n and p
data=cbind(Tax,Dlic,Income,logMiles,Fuel); var(data) # Compute the sample covariance matrix BetaHat=solve(t(X)%*%X)%*%t(X)%*%Y; t(BetaHat) # OLS estimates for beta
Intercept Tax Dlic Income logMiles
### Multiple Linear Regression ###
[1,] 154.1928 -4.227983 0.4718712 -6.135331 18.54527
fit<-lm(Fuel~Tax+Dlic+Income+logMiles) # Model under H1
RSS=t(Y)%*%Y-t(Y)%*%X%*%solve(t(X)%*%X)%*%t(X)%*%Y; RSS # Compute the RSS
summary(fit)
[1,] 193700
Coefficients:
sigma2hat=RSS/(n-p-1); sigma2hat
Estimate Std. Error t value Pr(>|t|) [1,] 4210.87
(Intercept) 154.1928 194.9062 0.791 0.432938 fit1<-lm(Y~X[,-1]); summary(fit1)
Tax -4.2280 2.0301 -2.083 0.042873 * Estimate Std. Error t value Pr(>|t|)
2
Dlic 0.4719 0.1285 3.672 0.000626 *** (Intercept) 154.1928 194.9062 0.791 0.432938 4210 .87 64.89122
Income -6.1353 2.1936 -2.797 0.007508 ** X[, -1]Tax -4.2280 2.0301 -2.083 0.042873 *
logMiles 18.5453 6.4722 2.865 0.006259 ** X[, -1]Dlic 0.4719 0.1285 3.672 0.000626 ***
--- X[, -1]Income -6.1353 2.1936 -2.797 0.007508 **
Signif X[, -1]logMiles 18.5453 6.4722 2.865 0.006259 **
Residual standard error: 64.89 on 46 degrees of freedom Residual standard error: 64.89 on 46 degrees of freedom
Multiple R-squared: 0.5105, Adjusted R-squared: 0.4679
Multiple R-squared: 0.5105, Adjusted R-squared: 0.4679
31
31 F-statistic: 11.99 on 4 and 46 DF, p-value: 9.331e-07 32
32
F-statistic: 11.99 on 4 and 46 DF, p-value: 9.331e-07
Properties of the OLS Estimates
Regression Model: Y X e, E(e) 0n 1, Var(e) 2
In
OLS Estimates of : ( X' X) 1 X' Y
Section 3.4 Question: How good is the OLS estimates ?

Properties of the OLS Estimates Want to study E ( ) and Var( ) -- which require the
knowledge of probability calculation of matrix
33 34
Probability Calculation of Matrix Probability Calculation of Matrix

Consider a m×1 random vector Y, Consider a m×m constant matrix, A ( aij ) i 1, ,m
j 1, ,m
y1 y1 E ( y1 )
a1i yi a1i E ( yi )
y2 y E ( y2 )
Y Mean Vector E ( Y) E 2 a2i yi a2i E ( yi )
Mean of AY: E (AY) E AE ( Y)
... ...
ym ym E ( ym ) ami yi ami E ( yi )
Variance (Variance-Covariance Matrix) of Y: Variance of AY: Var( AY) E AY E ( AY) AY E ( AY) '
Var( y1 ) Cov ( y1 , y2 ) ... Cov ( y1 , yn ) E A( Y E ( Y)) A( Y E ( Y)) '
Cov ( y2 , y1 ) Var( y2 ) ... Cov ( y2 , yn ) AE ( Y E ( Y))( Y E ( Y))' A'
Var(Y) E (Y )( Y )'
... AVar( Y)A'
Cov ( yn , y1 ) ... ... Var( yn )
Results: E ( AY) AE( Y) and Var(AY) AVar(Y)A'
35 36
Example Properties of the OLS Estimates
E ( AY) AE( Y) Var(AY) AVar(Y)A' Model Y X e, E(e) 0n 1, Var(e) 2
In
Let A
1 0 Y1 OLS Estimates: (X' X) 1 X' Y
, Y
2 1 Y2
Property 1: is an unbiased estimate for
E(Y1 ) 5, E( Y2 ) 0, Var ( Y1 ) 1, Var ( Y2 ) 2, Cov ( Y1, Y2 ) 0.5
Y1 5 E( ) (X' X) 1 X' E ( Y) (X ' X ) 1X ' (X )
Method 1 (First Principle) AY , E(AY)
2Y1 Y2 10
Property 2: The variance of is given by 2
(X' X) 1
Var ( Y1 ) Cov(Y1,2 Y1 Y2 ) 1 2.5

Var( AY) Var( ) Var[( X ' X ) 1 X ' Y] ( X ' X ) 1 X ' Var( Y)X ( X ' X ) 1
Cov(Y1 ,2 Y1 Y2 ) Var(2 Y1 Y2 ) 2.5 8
1 0 5 5 (X' X ) 1 X' ( 2
I n )X( X' X ) 1
Method 2 (Formula Above!) E(AY)
2 1 0 10 2
'
( X ' X ) 1 ( X ' X )( X ' X ) 1
1 0 1 0.5 1 0 1 0.5 1 2 1 2.5 2 1

Var(AY)
2 1 0.5 2 2 1 2.5 3 0 1 2.5 8 37
(X' X ) 38
Maximum Likelihood Estimates

Y X e
with E(e) 0n 1, Var(e) 2
In
[Section 3.2] No distribution assumption on e for OLS Estimates
Maximum Likelihood Estimates: Assume further that ei are normally
2
distributed, e ~ Nn (0n 1 , In )
Section 3.5
Maximum Likelihood Estimates (MLE) 2 1 1
The likelihood function: L( , ) 2 n/2
exp 2
( Y X )' ( Y X )
(2 ) 2
n 1
Log-likelihood function: l ( , 2
) ln( 2 2
) 2
( Y X )' ( Y X )
2 2
n 2 1
ln( 2 ) 2
( Y' Y 2Y' X ' X' X )
2 2
l 1
Partial Derivatives: 2
2X' Y 2X' X
2
l n 1
2 2 4
( Y X )' ( Y X )
39 2 2 40
Maximum Likelihood Estimates Properties of the MLE
l l
Turning Point (also global maximum point): Put 2
0 Unbiased Properties: Based on the results in Section 3.3,
~ ~
( , ~2 ) ( , ~2 )
1 ~ ~
(X' X ) 1 X' Y ~ n p 1 n p 1
E ( ~2 )
2X ' Y 2X ' X 0 2 2
2 ~2 E( ) E( ) , E( )
n 1 ~ ~ ~ 2 1 ( Y X~ )' ( Y X~ ) RSS n n
(Y X )' (Y X ) 0 n n
2 ~2 2 ~4 ~
Hence, (1) is an unbiased estimates for
Summary: ~2 2
:
OLS Estimates Maximum Likelihood Estimates n p 1
( X' X ) 1 X' Y (Same)
~ 1 E ( ~2 | X ) 2 2
as n
(X' X ) X' Y n
2 RSS ~ 2 RSS Conclusion: OLS estimates are preferred over the MLE, as
(Different!!)
n p 1 n (1) no distribution assumption on e is required on OLS, and
Put p =1 => Same results as in Section 2.3!!! (2) the bias from ~ 2 could be significant when n < 30.
41 42
Analysis of Variance (ANOVA) [from Section 2.4]

ANOVA: A method to compare the mean functions of two models:
Model 1: E(Y|X=x)= 0 vs. Model 2: E(Y|X=x)= 0 + 1 x
Model 1: E(Y|X=x)= 0
n
0 can be estimated by minimizing g ( ) ( yi )2
Section 3.6 0 i 1 0
Differentiate w.r.t. gives y

The Analysis of Variance (ANOVA) 0 0
n n
Residual sum of squares: RSS1 ( yi 0)
2
( yi y )2 SYY
i 1 i 1
Model 2: E(Y|X=x)= 0 + 1x
n
SXY 2
[Section 2.1]: RSS2 [ yi ( 0 1 xi )]
2
SYY
i 1 SXX
SSreg / 1 RSS1 RSS2

Under Model 1, F0 ~ F1,n 2
RSS2 /( n 2) RSS2 /( n 2)
43 44
Analysis of Variance (ANOVA) Analysis of Variance (ANOVA)
1 x11 x1 p 1 x11 x1k x1,k x1 p
Hypotheses: H 0 : E ( Y | X ) X * k terms, 0 k p
1
*
1 x21 x2 p 1 x21 x2 k x2 ,k x2 p
Consider X
1
, X* , X H1 : E ( Y | X ) X p terms
1 1
1 xn1 xnp 1 xn1 xnk x n ,k 1 xnp Test Statistic: Under H0,
RSS1 RSS2 SSreg
where 0 k<p, 0 , 1 , , p ', * 0 , 1 , , k ', k 1 , , p ' [Previous Page] 2
2
n k 1 , 2
2
n p 1 and 2
2
p k
ANOVA: Compare the mean functions of two models: SSreg/( p k )

=> F0 ~ Fp k ,n p 1
Model 1: E ( Y | X ) X * * vs. Model 2: E ( Y | X) X RSS2 /(n p 1)
n
Model 1: RSS1 ( yi X*,i * )2 Y' (I X* (X* ' X* ) 1 X* ' )Y [Section 2.4] ANOVA for simple linear regression: Put k=0 and p=1,
i 1
n
Model 2: RSS2 ( yi Xi ) 2 Y' (I X(X' X) 1 X' )Y H 0 : E (Y | X x) 0 0 term
i 1 Hypotheses:
H1 : E (Y | X x ) 0 1 x 1 term
Define SSreg RSS1 RSS2
SSreg
RSS1 RSS2 SSreg F0 ~ F1,n 2
When n is large, 2
2
n k 1 , 2
2
n p 1 => 2
p k
RSS /( n 2)
2
45 46
ANOVA Table - Construction ANOVA Table Fuel Data

[Section 2.4] Simple Linear Regression Example: Fuel Data with n =51:
Source df SS MS F p-value RSS df

Regression 1 SSreg MSreg F0 Pr(F1,n-2 > F0) Model 1: y 0 e 395,694 50
Residual n-2 SSres MSres Model 2: y 0 1 1 x e 369,093 49
Total n-1 SStotal Model 3: y x x e
0 1 1 2 2 289,715 48
Model 4: y 0 1 x1 2 x2 x
3 3 e 228,265 47
k < p) Model 5: y 0 1 1 x x
2 2 x
3 3 x
4 4 e 193,693 46
Source df SS MS F p-value
Regression p-k SSreg MSreg F0 Pr(Fp-k,n-p-1>F0) Question 1: What can you tell by comparing the RSS?
Residual n-p-1 SSres MSres Question 2: Suppose we want to compare Model 2 with Model 5,
Total n-k-1 SStotal => Test for whether 2 3 4 0 (Next Page)
47 48
ANOVA Table Fuel Data Coefficient of Determination R2
Solutions: The ANOVA Table is given by Consider Model 1: y = 0+e vs Model 2: y = 0+ 1x1 pxp+e
Source df SS MS F p-value SSreg

Coefficient of Determination R 2
Regression 3 175,400 58466.7 13.885 < 0.00001 SYY
Residual 46 193,693 4210.7 which is the proportion of variability explained by the regression
Total 49 369,093 R2 is always between 0 and 1:
Hypotheses: H0: Model 2 vs H1: Model 5 If R2 1, good fit on the regression
Test Statistic: Under H0(Model 2), If R2 0, poor fit on the regression
(RSS1 - RSS2 )/( p k ) (369093 - 193693)/ 3 Consider the sample correlation coefficient between yi and yi :
F0 13.885 n
RSS2 /(n p 1) 193693 / 46 ( yi y )( yi y)
ry , y i 1
n n
Decision: Since p-value = Pr(F3,46 > F0) <0.00001, we reject H0 at =0.05. i 1
( yi y )2 i 1
( yi y )2
Conclusion: We have sufficient evidence that Model 5 is the appropriate
model vs Model 2. One can show that R 2 ry2, y
49 50
Example: Fuel Consumption Fuel Data R Code for the ANOVA table
library(car); library(alr3) # Load the alr3 library
Fuel=1000*fuel2001$FuelC/fuel2001$Pop # Define the Fuel variable
Tax=fuel2001$Tax; Dlic=1000*fuel2001$Drivers/fuel2001$Pop; Income=fuel2001$Income/1000
logMiles=log(fuel2001$Miles,2) # Define the 4 terms
fit<-lm(Fuel~Tax+Dlic+Income+logMiles) # Model under H1
ANOVA fit0<-lm(Fuel~1) # Model under H0
Hypotheses: H 0 : E ( Y | X) 0 k 0 term anova(fit0,fit)
Analysis of Variance Table
H1 : E ( Y | X ) X p 4 terms
Model 1: Fuel ~ 1
Test Statistic: Under H0, F0 = 11.992 Model 2: Fuel ~ Tax + Dlic + Income + logMiles
The output table looks weird:
p-value = Pr(F4,46 > 11.992) = 9.33×10-7 < 0.05 = Res.Df RSS Df Sum of Sq F Pr(>F)
Need to reorganize the terms a
1 50 395694
Decision: Since p-value < , we reject H0 at =0.05. 2 46 193700 4 201994 11.992 9.331e-07 ***
bit to obtain the standard
Conclusion: We have sufficient evidence that the multiple linear ---
ANOVA table
regression is the appropriate model vs the constant mean model. Signif
R2<-1-sum(fit$residual^2)/sum(fit0$residual^2)
SSreg 201994
R2 0.5105 => The 4 terms explain about R2
SYY 395694 [1] 0.5104804
half of the variability from the response. 51 52

Confidence Intervals and Tests
Multiple Linear Regression model:
E (Y | X x) 0 x
1 1 p xp
OLS Estimates of : ( X' X ) 1 X' Y

Section 3.7
Confidence Intervals and Tests Also interested in
1) Confidence Intervals (CI) for , and
2) Hypothesis Testing on
Need Var( ) 2
( X' X) 1 [Section 3.3]
53 54
*
Test for the Dependence of One Term Test for k 0 vs ANOVA
*
H0 : , arbitrary for i k H0 : 0, arbitrary for i k
Hypotheses: k k i Hypotheses: k i
H1 : *
, arbitrary for i k H1 : k 0, arbitrary i for i k
k k i
Test Statistic: Under H0, ~ N p 1( , 2

( X' X ) 1 ) Test Statistic [Previous page]: Under H0, t0 k
~ tn p 1
Vk
=> ~ N( , Vk ) , where Vk is the (k+1,k+1) element of ( X' X )
2 1 ANOVA [Section 3.5]: Hypotheses
k k
* 2
H0: Model 1 : E[Y | X ] 0 1x1 x
k 1 k 1 x
k 1 k 1 p xp
(n p 1)
Hence, k k
~ N (0,1) . Also, 2
~ 2
n p 1 H1: Model 2 : E[Y | X ] 0 1x1 x
k 1 k 1 x
k k x
k 1 k 1 p xp
Vk
F-statistic: Under H0 (Model 1),
*
=> t0 k k
~ tn p 1 F0
( RSS1 RSS2 ) /( p [ p 1]) SSreg / 1
~ F1,n
Vk RSS2 /( n p 1) 2 p 1
A 100(1- )% Confidence Interval for k: k tn p 1, / 2 se( k ) One can show that SSreg 2
k / Vk
where se( ) Vk 2
SSreg
k
=> Therefore t02 2
k
2
F0 => Two tests are equivalent!!
55 Vk 56
Confidence Intervals on Predicted y-value Confidence Intervals on Fitted y-value
Fitted multiple linear regression line: Y X Fitted multiple linear regression line Y X
Consider a new term (or predictor) value x* (1, x*1 , , x* p )' , would Consider a data x (1, x1, , x p )' within the data set. Would like to
like to obtain (i) point estimate y* and (ii) prediction interval for y* obtain (i) a point estimate y and (ii) confidence interval for y
Point Estimate for y*: y*~
1 x*1 ... p x* p x'* Point Estimate for y: y 1 x1 ... p xp x'
0
0
A 100(1- )% Prediction Interval for y*: ~ y* tn p 1, / 2sepred( ~
y* )
A 100(1- )% Confidence Interval for y : y tn p 1, / 2sefit ( y )
where sepred( y* | X x* ) 1 x'* (X' X) 1 x* is the
where sefit ( y | X x) x' (X' X) 1 x is the
estimated standard error of y* | X x* , with
estimated standard error of the fitted value y | X x
' ' 2
Var( y* | x* ) Var(x * | x* ) Var(e) x Var( | x* )x*
*
2
1 x*' ( X' X ) 1 x*
57 58
Chapter Summary
OLS Estimates: (X' X) 1 X' Y, 2
RSS /(n p 1)
Distribution of Estimates: When n is large,
2
2 (n p 1)
~ N p 1( , ( X' X) 1 ), 2
~ 2
n p 1
~
MLE: ( X' X) 1 X' Y, ~2 RSS / n
ANOVA: H0: Model 1 (k terms) vs H1: Model 2 (p terms)
(RSS1 RSS2 ) /( p k )
Under H0, F0 ~ Fp k ,n p 1
RSS2 /( n p 1)
*
Test for H 0 : k
*
k: Under H0, t0 k k
~ tn p 1
Vk
' '
A 100(1- )% Prediction Interval y*: x* tn p 1, / 2 1 x* (X' X) 1 x*
A 100(1- )% C.I. for fitted value y : x' tn p 1, / 2 x' (X' X) 1 x

59

Multiple Linear Regression

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multiple Linear Regression

Hochgeladen von

Copyright:

Verfügbare Formate

Multiple Linear Regression

Simple Linear Regression (Chapter 2):

Variance function: Var(Y | X x) 2

values with (ii) specified probabilities

Random Vector: A column vector of Random Variables Multivariate Normal: X ~ Nn , where 2

Xn Yn Joint probability density function (pdf) of X = x:

Multivariate Normal Distribution: n=1 Multivariate Normal Distribution: n=2

Error Random Vector e

Terms: Variables created from the EVs y1

** Need knowledge on matrix differentiation (next 2 pages) **

Matrix Differentiation Matrix Differentiation

and the sum of squares of residual

Multiple Linear Regression - Geometry (This Section) vs ( 0 , 1 ) in Chapter 2

X' Y X' X X' e X' X 1

OLS Estimates for 2 Data [Chapter 2, p50]

OLS Estimates of : ( X' X) 1 X' Y

Section 3.4 Question: How good is the OLS estimates ?

Probability Calculation of Matrix Probability Calculation of Matrix

Var ( Y1 ) Cov(Y1,2 Y1 Y2 ) 1 2.5

1 0 1 0.5 1 0 1 0.5 1 2 1 2.5 2 1

Maximum Likelihood Estimates

Analysis of Variance (ANOVA) [from Section 2.4]

Differentiate w.r.t. gives y

SSreg / 1 RSS1 RSS2

ANOVA: Compare the mean functions of two models: SSreg/( p k )

ANOVA Table - Construction ANOVA Table Fuel Data

Source df SS MS F p-value RSS df

Source df SS MS F p-value SSreg

half of the variability from the response. 51 52

OLS Estimates of : ( X' X ) 1 X' Y

Test Statistic: Under H0, ~ N p 1( , 2

A 100(1- )% C.I. for fitted value y : x' tn p 1, / 2 x' (X' X) 1 x

Das könnte Ihnen auch gefallen

Need knowledge on matrix differentiation (next 2 pages)