Beruflich Dokumente
Kultur Dokumente
STAT 3008 Applied Regression Analysis Multiple Linear Regression (or Multiple Regression):
Department of Statistics
Mean function: Linear function of explanatory variables
The Chinese University of Hong Kong x1, x2 , xpE (Y | X 1 x1, , X p x p ) 0 1 x1 p xp
2020/21 Term 1 where p
2
Variance function: Var(Y | X1 x1 , ,Xp xp )
Dr. LEE Pak Kuen, Philip
2
Parameters: ( 0 , 1 , , p , )
1 2
Chapter Outline
Section 3.1: Random Vector
Section 3.2: Model Setup
Section 3.3: Ordinary Least Squares (OLS) Estimates
Section 3.4: Properties of the OLS Estimates Section 3.1
Section 3.5: Maximum Likelihood Estimates Random Vector
Section 3.6: Analysis of Variances (ANOVA)
Section 3.7: Confidence Intervals and Tests
Appendix: Useful Formula from Linear Algebra
3 4
Random Variable and Random Vector Multivariate Normal Distribution X1
Random Variable: A numeric quantity that (i) takes different Consider a n-dimensional Random Vector: X
X2
X1 Y1 n
X2 Y2
Notation: X , Y and n×n is the variance-covariance matrix for X.
1 1 ' 1
where 1
, 1 1 2
2
f (x) exp (x ) (x ) 2 1 2 2
(2 ) n / 2 | |1/ 2 2
The joint probability density function (pdf) is
is then reduced to
1 1
f (x) exp (x )' 1
(x )
1 1 ' 2 1
(2 ) n / 2 | |1/ 2 2
f ( x) 1/ 2 2 1/ 2
exp (x )( ) (x ) 2
(2 ) | | 2 1
exp
1
( x1 x2 )
1 2 1 2
x1 1
1 2 2 2 2 2
2 (2 ) 2
1
2
2
2 2
1
2
2
2 (1 ) 1 2 1 2 1
x2 2
1 1 x
exp
2 2 2 1 2
2 ( x1 1 )2 2
1 ( x2 2 )2 2 1 2 ( x1 1 )( x2 2 )
exp 2 2 2
2 (1 2
) 2
1
2
2
2(1 ) 1 2
7
Graphical Illustration: Next Page 8
Bivariate Normal Distribution Joint pdf Bivariate Normal Distribution Joint pdf
Joint pdf of Bivariate Normal Distribution has elliptical contour: Interactive Density Plot for Bivariate Normal (not required)
nsim<-100000 # Number of Simulation of Bivariate Normal (x,y)
f (x) x <- rnorm(nsim)
y <- 2 + x*rnorm(nsim,1,.1) + rnorm(nsim)
library(MASS)
den3d <- kde2d(x, y)
install.packages("plotly",repos="http://cran.rstudio.com/", dependencies=
Elliptical Contour: TRUE)
Same pdf values f(x) library(plotly)
for points on the plot_ly(x=den3d$x, y=den3d$y, z=den3d$z) %>% add_surface()
same ellipse
x2
x1
9 10
E (en ) 0
Variance-covariance Matrix of e:
2
Cov(e1 , e1 ) Cov(e1 , e2 ) Cov(e1 , en ) 0 0
Cov(e2 , e1 ) Cov(e2 , e2 ) Cov(e2 , en ) 0 2
0 2
Var (e) In
2
Cov(en , e1 ) Cov(en , e2 ) Cov(en , en ) 0 0
11 12
Terms vs Explanatory Variables EV (x-variables) Matrix Notation for Multiple Linear Regression
Explanatory Variable (EV): Original data you collect Regression Model:
e.g. height, weight, color, gender E (Y | X1 x1, ,Xp xp ) 0 x
1 1 p xp
Simple Linear Regression (Ch2): x-variable = EV Var(Y | X1 x1 , ,Xp xp ) 2
Matrix Notation for Multiple Linear Regression Matrix Notation for Simple Linear Regression
Dimensions nx1 nx(p+1) (p+1)x1 nx1
1 x11 x1 p
[Chapter 3 => Chapter 2]:
y1 0 e1
y2 1 x21 x2 p e2 Put p=1 into Y X e,
1
Y , X , , e
1
y1 1 x11 e1
yn 1 xn1 xnp en
p
y2 1 x21 0 e2
Y , X , , e
Multiple linear regression - Matrix notation: Y X e 1
The ith row is yi 0 x
1 i1 x
2 i2 x
p ip ei with yn 1 xn1 en
Var(e1 ) Cov (e1 , e2 ) ... Cov (e1 , en )
Cov (e2 , e1 ) Var(e2 ) ... Cov (e2 , en )
With E(e) 0n 1, Var(e) 2
In
2
E (e) 0n 1 , Var(e) In Will estimate the parameter vector and study its properties
...
Cov (en , e1 ) ... ... Var(en )
in vector form!
Quantities in Bold: Either Vector or Matrix (not scalar)
Assumptions of {ei}: (i) mean zero, (ii) equal variance and (iii) {ei} are
uncorrelated with each other. 15 Example: Y, X , e, , I n 16
Matrix Notation for Multiple Regression
Multiple regression in matrix form: Y X e
Consider sum of square distances from yi to 0 x
1 1i p x pi:
n n 2
g( ) e2
1 i
yi 0 x
1 1i p x pi
Section 3.3 i i 1
(Y X )' ( Y X )
Ordinary Least Squares (OLS) Estimates
Y' Y Y' X ' X' Y ' X' X
Y' Y 2 Y' X ' X' X - Equation (1)
g( )
Put 0 => Solve for the OLS Estimates
17 18
Product Rule
1 2
2 f( ) 'M
f( )
1 2 f( ) 'M 'M ' M M' M
Example 2: f ( ) 1
2
2 log( 3 ) => 1
2
1/ 3 19 20
OLS Estimates for Multiple Linear Regression - Geometry
y1
Equation (1): Want to Minimize y2
(1)Response: Y
g( ) ( Y X )' ( Y X ) Y' Y 2Y' X ' X' X (2) Space span by the following (p+1) vectors:
yn 1 x11 x1 p
Differentiating g wrt c' c 'M (M' M)
1 x21 x2 p
, , ,
g( ) e
2( Y' X )' X ' X ( X ' X )' 2 X ' Y 2X ' X e xnp
1 xn1
g( ) 2X ' Y 2 X ' X 0
Put 0
X Y X (4) OLS Estimates: Projection
0(p+1)×1 of Y to the space
X' X X' Y
=> Residual e should be orthogonal
( X ' X ) 1 X' Y to the above (p+1) vectors in the
space. That is,
Since g is a convex function in , minimizes the function g. (3)X is a vector on the space. is X' e 0( p 1) 1
Given that E(ee )= 2In, how to evaluate E(Y AY), where 3. Random Matrix Xmxm :
m m
A is a constant matrix?
tr ( E ( X )) E ( xii ) E xii E ( tr ( X ))
Answer: Need to know the trace operation of a matrix 25 i 1 i 1 26
Fuel Data R Code for Least Squares Estimates Fuel Data R Code for Least Squares Estimates
library(car); library(alr3) # Load the alr3 library ### Multiple Linear Regression - Matrix Algebra###
Fuel=1000*fuel2001$FuelC/fuel2001$Pop # Define the Fuel variable Intercept=rep(1,length(Tax))
Tax=fuel2001$Tax; Dlic=1000*fuel2001$Drivers/fuel2001$Pop; Income=fuel2001$Income/1000 X=cbind(Intercept,Tax,Dlic,Income,logMiles); Y=Fuel # Construct the X matrix and Y vector
logMiles=log(fuel2001$Miles,2) # Define the 4 terms n<-length(Fuel); p<-dim(X)[[2]]-1 # Compute n and p
data=cbind(Tax,Dlic,Income,logMiles,Fuel); var(data) # Compute the sample covariance matrix BetaHat=solve(t(X)%*%X)%*%t(X)%*%Y; t(BetaHat) # OLS estimates for beta
Intercept Tax Dlic Income logMiles
### Multiple Linear Regression ###
[1,] 154.1928 -4.227983 0.4718712 -6.135331 18.54527
fit<-lm(Fuel~Tax+Dlic+Income+logMiles) # Model under H1
RSS=t(Y)%*%Y-t(Y)%*%X%*%solve(t(X)%*%X)%*%t(X)%*%Y; RSS # Compute the RSS
summary(fit)
[1,] 193700
Coefficients:
sigma2hat=RSS/(n-p-1); sigma2hat
Estimate Std. Error t value Pr(>|t|) [1,] 4210.87
(Intercept) 154.1928 194.9062 0.791 0.432938 fit1<-lm(Y~X[,-1]); summary(fit1)
Tax -4.2280 2.0301 -2.083 0.042873 * Estimate Std. Error t value Pr(>|t|)
2
Dlic 0.4719 0.1285 3.672 0.000626 *** (Intercept) 154.1928 194.9062 0.791 0.432938 4210 .87 64.89122
Income -6.1353 2.1936 -2.797 0.007508 ** X[, -1]Tax -4.2280 2.0301 -2.083 0.042873 *
logMiles 18.5453 6.4722 2.865 0.006259 ** X[, -1]Dlic 0.4719 0.1285 3.672 0.000626 ***
--- X[, -1]Income -6.1353 2.1936 -2.797 0.007508 **
Signif X[, -1]logMiles 18.5453 6.4722 2.865 0.006259 **
Residual standard error: 64.89 on 46 degrees of freedom Residual standard error: 64.89 on 46 degrees of freedom
Multiple R-squared: 0.5105, Adjusted R-squared: 0.4679
Multiple R-squared: 0.5105, Adjusted R-squared: 0.4679
31
31 F-statistic: 11.99 on 4 and 46 DF, p-value: 9.331e-07 32
32
F-statistic: 11.99 on 4 and 46 DF, p-value: 9.331e-07
Properties of the OLS Estimates
Regression Model: Y X e, E(e) 0n 1, Var(e) 2
In
33 34
Variance (Variance-Covariance Matrix) of Y: Variance of AY: Var( AY) E AY E ( AY) AY E ( AY) '
Var( y1 ) Cov ( y1 , y2 ) ... Cov ( y1 , yn ) E A( Y E ( Y)) A( Y E ( Y)) '
Cov ( y2 , y1 ) Var( y2 ) ... Cov ( y2 , yn ) AE ( Y E ( Y))( Y E ( Y))' A'
Var(Y) E (Y )( Y )'
... AVar( Y)A'
Cov ( yn , y1 ) ... ... Var( yn )
Results: E ( AY) AE( Y) and Var(AY) AVar(Y)A'
35 36
Example Properties of the OLS Estimates
E ( AY) AE( Y) Var(AY) AVar(Y)A' Model Y X e, E(e) 0n 1, Var(e) 2
In
Let A
1 0 Y1 OLS Estimates: (X' X) 1 X' Y
, Y
2 1 Y2
Property 1: is an unbiased estimate for
E(Y1 ) 5, E( Y2 ) 0, Var ( Y1 ) 1, Var ( Y2 ) 2, Cov ( Y1, Y2 ) 0.5
Y1 5 E( ) (X' X) 1 X' E ( Y) (X ' X ) 1X ' (X )
Method 1 (First Principle) AY , E(AY)
2Y1 Y2 10
Property 2: The variance of is given by 2
(X' X) 1
1 ~ ~
(X' X ) 1 X' Y ~ n p 1 n p 1
E ( ~2 )
2X ' Y 2X ' X 0 2 2
2 ~2 E( ) E( ) , E( )
n 1 ~ ~ ~ 2 1 ( Y X~ )' ( Y X~ ) RSS n n
(Y X )' (Y X ) 0 n n
2 ~2 2 ~4 ~
Hence, (1) is an unbiased estimates for
Summary: ~2 2
:
OLS Estimates Maximum Likelihood Estimates n p 1
( X' X ) 1 X' Y (Same)
~ 1 E ( ~2 | X ) 2 2
as n
(X' X ) X' Y n
2 RSS ~ 2 RSS Conclusion: OLS estimates are preferred over the MLE, as
(Different!!)
n p 1 n (1) no distribution assumption on e is required on OLS, and
Put p =1 => Same results as in Section 2.3!!! (2) the bias from ~ 2 could be significant when n < 30.
41 42
Model 1: E(Y|X=x)= 0
n
0 can be estimated by minimizing g ( ) ( yi )2
Section 3.6 0 i 1 0
Model 2: E(Y|X=x)= 0 + 1x
n
SXY 2
[Section 2.1]: RSS2 [ yi ( 0 1 xi )]
2
SYY
i 1 SXX
47 48
ANOVA Table Fuel Data Coefficient of Determination R2
Solutions: The ANOVA Table is given by Consider Model 1: y = 0+e vs Model 2: y = 0+ 1x1 pxp+e
Example: Fuel Consumption Fuel Data R Code for the ANOVA table
library(car); library(alr3) # Load the alr3 library
Fuel=1000*fuel2001$FuelC/fuel2001$Pop # Define the Fuel variable
Tax=fuel2001$Tax; Dlic=1000*fuel2001$Drivers/fuel2001$Pop; Income=fuel2001$Income/1000
logMiles=log(fuel2001$Miles,2) # Define the 4 terms
fit<-lm(Fuel~Tax+Dlic+Income+logMiles) # Model under H1
ANOVA fit0<-lm(Fuel~1) # Model under H0
Hypotheses: H 0 : E ( Y | X) 0 k 0 term anova(fit0,fit)
Analysis of Variance Table
H1 : E ( Y | X ) X p 4 terms
Model 1: Fuel ~ 1
Test Statistic: Under H0, F0 = 11.992 Model 2: Fuel ~ Tax + Dlic + Income + logMiles
The output table looks weird:
p-value = Pr(F4,46 > 11.992) = 9.33×10-7 < 0.05 = Res.Df RSS Df Sum of Sq F Pr(>F)
Need to reorganize the terms a
1 50 395694
Decision: Since p-value < , we reject H0 at =0.05. 2 46 193700 4 201994 11.992 9.331e-07 ***
bit to obtain the standard
Conclusion: We have sufficient evidence that the multiple linear ---
ANOVA table
regression is the appropriate model vs the constant mean model. Signif
R2<-1-sum(fit$residual^2)/sum(fit0$residual^2)
SSreg 201994
R2 0.5105 => The 4 terms explain about R2
SYY 395694 [1] 0.5104804
Need Var( ) 2
( X' X) 1 [Section 3.3]
53 54
*
Test for the Dependence of One Term Test for k 0 vs ANOVA
*
H0 : , arbitrary for i k H0 : 0, arbitrary for i k
Hypotheses: k k i Hypotheses: k i
H1 : *
, arbitrary for i k H1 : k 0, arbitrary i for i k
k k i
* 2
H0: Model 1 : E[Y | X ] 0 1x1 x
k 1 k 1 x
k 1 k 1 p xp
(n p 1)
Hence, k k
~ N (0,1) . Also, 2
~ 2
n p 1 H1: Model 2 : E[Y | X ] 0 1x1 x
k 1 k 1 x
k k x
k 1 k 1 p xp
Vk
F-statistic: Under H0 (Model 1),
*
=> t0 k k
~ tn p 1 F0
( RSS1 RSS2 ) /( p [ p 1]) SSreg / 1
~ F1,n
Vk RSS2 /( n p 1) 2 p 1
A 100(1- )% Confidence Interval for k: k tn p 1, / 2 se( k ) One can show that SSreg 2
k / Vk
where se( ) Vk 2
SSreg
k
=> Therefore t02 2
k
2
F0 => Two tests are equivalent!!
55 Vk 56
Confidence Intervals on Predicted y-value Confidence Intervals on Fitted y-value
Fitted multiple linear regression line: Y X Fitted multiple linear regression line Y X
Consider a new term (or predictor) value x* (1, x*1 , , x* p )' , would Consider a data x (1, x1, , x p )' within the data set. Would like to
like to obtain (i) point estimate y* and (ii) prediction interval for y* obtain (i) a point estimate y and (ii) confidence interval for y
Point Estimate for y*: y*~
1 x*1 ... p x* p x'* Point Estimate for y: y 1 x1 ... p xp x'
0
0
A 100(1- )% Prediction Interval for y*: ~ y* tn p 1, / 2sepred( ~
y* )
A 100(1- )% Confidence Interval for y : y tn p 1, / 2sefit ( y )
where sepred( y* | X x* ) 1 x'* (X' X) 1 x* is the
where sefit ( y | X x) x' (X' X) 1 x is the
estimated standard error of y* | X x* , with
estimated standard error of the fitted value y | X x
' ' 2
Var( y* | x* ) Var(x * | x* ) Var(e) x Var( | x* )x*
*
2
1 x*' ( X' X ) 1 x*
57 58
Chapter Summary
OLS Estimates: (X' X) 1 X' Y, 2
RSS /(n p 1)
Distribution of Estimates: When n is large,
2
2 (n p 1)
~ N p 1( , ( X' X) 1 ), 2
~ 2
n p 1
~
MLE: ( X' X) 1 X' Y, ~2 RSS / n
ANOVA: H0: Model 1 (k terms) vs H1: Model 2 (p terms)
(RSS1 RSS2 ) /( p k )
Under H0, F0 ~ Fp k ,n p 1
RSS2 /( n p 1)
*
Test for H 0 : k
*
k: Under H0, t0 k k
~ tn p 1
Vk
' '
A 100(1- )% Prediction Interval y*: x* tn p 1, / 2 1 x* (X' X) 1 x*