Beruflich Dokumente
Kultur Dokumente
Week 11
Week 2
Week 3
Week 4
Probability:
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
Idea;
Estimating using LSE (& BLUE estimator & relation MLE);
Partition of variability of the variable;
Testing:
i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.
Matrix notation;
LSE estimates;
Tests;
R-squared and adjusted R-squared.
Appendix
Simple linear regression in matrix form
3203/3252
X= .
..
.. . .
..
..
.
.
.
.
1 xn1 xn2 . . . xn,p1
Predicted values are:
yb = X.
3204/3252
X> y .
The residuals are:
b
= y yb = y X.
3205/3252
n
X
2i
i=1
n
X
n
X
(yi ybi )2
i=1
i=1
y X
>
y X .
>
>
> >
y X
y X =
0=
y y 2 X y
+ X X
>
= 2X> y + X> X + X> X
= 2X> y + 2X> X
1
X> y =X> X = X> X
X> y .
3206/3252
X> Y .
The corresponding vector of fitted (or predicted) values of y is:
b = X
b
Y
and the vector of residuals:
b = Y X
b
b = Y Y
gives the differences between the observed and fitted values.
3207/3252
Appendix
Simple linear regression in matrix form
y1
0
y2
1
Y = . , = .
.
..
.
[n1]
[p1]
yn
p1
3208/3252
for i = 1, 2, . . . , n.
, and
[n1]
1
2
..
.
n
1 x11 . . . x1,p1
1 x21 . . . x2,p1
X = .
.. . .
..
..
[np]
.
.
.
1 xn1 . . . xn,p1
Write the model in matrix form as follows:
+ .
Y = X
n1
[np]
[p1]
[n1]
[np]
b .
[p1]
Appendix
Simple linear regression in matrix form
Introduction
To apply linear regression properly:
Effects of the covariates (explanatory variables) must be
additive;
Homoskedastic (constant) variance (otherwise use
AutoRegressive Conditional Heteroscedasticity model (ARCH)
model, from Robert Engle; 2003 Nobel prize for Economics);
Errors must be independent of the explanatory variables with
mean zero (weak assumptions);
Errors must be Normally distributed, and hence, symmetric
(only in case of testing, i.e., strong assumptions).
3210/3252
Linear models
The first model
1
2
..
.
1 x1
1 x2
1 x3 0
= [1n X ] .
=
.. .. 1
. .
1 xn
n
| {z }
X
Linear models
The second model (the cubic) given above can
matrix-vector form as:
1
1 x1 x12 x13
2 1 x 2 x 2 x 3
2
2
3 1 x 3 x 2 x 3
3
3
=
.. .. ..
..
..
. . .
.
.
2
n
1 xn xn xn3
{z
}
|
X
3213/3252
be written in
0
1
.
2
3
y1
1 0 0
y2 1 0 0
y3 0 1 0 0
y4 = 0 1 0 1 + .
2
y5 0 0 1
y6
|
3214/3252
0 0 1
{z
}
X
Marginal effects
Assume that we have the multiple regression model of the
form:
y = 0 + 1 x1 + . . . + p1 xp1 + .
Assume that xk is a continuous variable so that if we increase
it by one unit while holding the values of the other variables
fixed, the value of y becomes:
y new = 0 + 1 x1 + . . . + k (xk +1) + . . . + p1 xp1 + .
Since E [] = 0, then the marginal effect of xk is:
k = E [y new ] E [y ] ,
is therefore the expected increase (or decrease) in the value of
y whenever you increase the value of xk by one unit.
3215/3252
Appendix
Simple linear regression in matrix form
Assumptions
The residuals terms i satisfy the following:
E [i |X = x] = 0,
Var (i |X = x) = 2 ,
Cov (i , j |X = x) = 0,
for i = 1, 2, . . . , n;
for i = 1, 2, . . . , n;
for all i =
6 j.
3216/3252
b2 ckk ,
Var bk =
Appendix
Simple linear regression in matrix form
se bk = s ckk .
Under the normality (strong) assumption, we have:
bk k
t (n p) .
se bk
A 100 (1 ) % confidence interval for k is given by:
bk t1/2,np se bk .
3219/3252
bk k0
se bk
v.s.
H1 : k > k0 .
T =
bk k0
> t1,np .
se bk
v.s.
H1 : k < k0 .
T =
bk k0
< t1,np .
se bk
Appendix
Simple linear regression in matrix form
3223/3252
Adjusted R-Squared
The coefficient of determination may is:
SSE
SST SSE
=1
.
R2 =
SST
SST
In the simple linear regression model, the R-squared provides a
descriptive measure of the success of the regressor variables in
explaining the variation in the dependent variable.
The R-squared will always increase when adding additional
regressor variables increase even if regressor variables added
do not strongly influence the dependent variable.
An alternative is to correct it for the number of regressor
variables present. Thus, we define adjusted R-squared:
Ra2 = 1
3224/3252
SSE/ (n p)
s2
n1
=1
=1
1 R2 .
SST/ (n 1)
MST
np
i=1
Error
Total
3226/3252
SSE=
n
P
SST=
i=1
n
P
i=1
FDFM,DFE (F )
(yi ybi )
DFE=n p
SSE
MSE= DFE
SST
(yi y )2 DFT=n 1 MST= DFT
Appendix
Simple linear regression in matrix form
MSE=
s2
Pn
2
i=1 i
np
CI s 2 :
CI s:
SSE
21/2 (np)
SSE
2/2 (np)
SSE
21/2 (np)
SSE
2/2 (np)
ANOVA
Source
Sum of squares
Regression SSM=
n
P
i=1
Error
Total
3227/3252
SSE=
n
P
SST=
i=1
n
P
i=1
Degrees
of freedom
Mean
square
SSM
(b
yi y )2 DFM=p 1 MSM= DFM
F
MSM
MSE
p-value
1
FDFM,DFE (F )
SSE
(yi ybi )2 DFE=n p MSE= DFE
SST
(yi y )2 DFT=n 1 MST= DFT
X X
1
>
X y
R2
R:
p
Ra2
Ra :
SSE
SST
SSE/(np)
SST/(n1)
q se()
b kk
Cov ()
t
b
b
se()
p-value
CI()
b
1 tnp (|t|) b t1/2 (n p) se()
b
b + t1/2 (n p) se()
Covariance matrix:
b = s 2 X> X 1
Cov()
3228/3252
Appendix
Simple linear regression in matrix form
Exercise regression
Given is the following linear regression:
Yi = 0 + 1 x1i + 2 x2i + i
For
with 20 observations we have
P20our sample
2
i=1 (yi y ) = 53.82:
20
X
2i = 11.67
i=1
Exercise regression
a. Solution: s 2 =
P20
2
i=1 i /(n
p) = 11.67/17 = 0.69.
0
=0.69 [0.04 0.14 0.08] 1 = 0.69 0.22 = 0.151.
1
3230/3252
Exercise regression
b =
c. Solution (cont.): se(D)
b = 0.151 = 0.389.
Var (D )
F&T page 163: t0.975 (17) = 2.110, thus 95% CI for 1 2 is;
b b1 b2 + t0.975 (17) se(D ))
b
(b1 b2 t0.975 (17) se(D ),
= (0.84,0.80)
d. Solution: SST=53.82; SSE=11.67; SSM=42.14;
MSM=42.14/2=21.07; MSE=11.67/17=0.687;
F=21.07/0.687=30.68.
F0.01 (2, 17) = 6.112, thus X1 and X2 are jointly significant
even for = 0.01.
3231/3252
Appendix
Simple linear regression in matrix form
3232/3252
3233/3252
3234/3252
SALARY
DRAFT
YRSEXP
PLAYED
STARTED
CITYPOP
OFFBACK
3235/3252
SALARY
DRAFT
-0.454
0.345
0.212
0.440
0.077
0.179
-0.059
-0.108
-0.253
0.126
-0.209
0.646
0.557
0.129
-0.050
0.633
0.193
-0.043
0.178
-0.081
CITYPOP
-0.067
OFFBACK
3236/3252
3237/3252
Source
Regression
Error
Total
3238/3252
Degree of
freedom
p1
np
n1
ANOVA Table
Sum of
Mean
Squares
Squares
SSM
MSM=SSM/p 1
SSE
MSE=SSE/n p
SST
MST=SST/n 1
F-Ratio
Prob(> F)
MSM/MSE
p-value
SSM
SST
3239/3252
SSE/ (n p)
s2
=1 2
SST/ (n 1)
Sy
SSE
.
s = s 2 = MSE =
np
It gives the average deviation of the actual y against that
predicted by the regression equation.
3241/3252
MSM
,
MSE
3242/3252
Coef
361663
-19139
21301
-7948
12965
-0.000699
82941
SE Coef
43734
3674
6370
3281
3189
0.003176
38241
T
8.17
-5.21
3.34
-2.42
4.07
-0.22
2.17
p
0.000
0.000
0.001
0.017
0.000
0.826
0.032
3243/3252
ANOVA Table:
Analysis of
SOURCE
Regression
Error
Total
3244/3252
Variance
DF
SS
6
4.20463E+12
162 6.72970E+12
168 1.09343E+13
MS
7.00772E+11
41541379329
F
16.87
p
0.000
3245/3252
3246/3252
Coef
11.7509
0.07332
-0.009815
0.026380
0.00000001
0.18741
0.9334
SE Coef
0.0814
0.01471
0.007607
0.007596
0.00000001
0.08691
0.1242
T
144.42
4.98
-1.29
3.47
0.70
2.16
7.52
p
0.000
0.000
0.199
0.001
0.482
0.033
0.000
3248/3252
Variance
DF
SS
6
43.3145
162 35.9891
168 79.3035
MS
7.2191
0.2222
F
32.50
p
0.000
3249/3252
3250/3252
Appendix
Simple linear regression in matrix form
y1
y2
y = .
..
1 x1
1 x2
X = .. .. .
. .
yn
1 xn
Hence
>
X X=
and
1
X> X
=
i=1 xi
Pn
Pni=1 x2i
i=1 xi
Pn
2
i=1 xi
P
Pn
P
n
n i=1 xi2 ( ni=1 xi )2 i=1 xi
|
{z
}
P
1
=n
3251/3252
Pnn
n
2
i=1 (xi x)
Pn
i=1 xi
.
Thus:
Pn
xi
i=1
P
X y=
.
n
i=1 xi yi
>
Hence
"
b =
b0
b1
1
= X> X
X> y
1
= Pn
n i=1 (xi x)2
3252/3252
Pn
P
P
Pn
xi2 ni=1 yi P ni=1 xP
i
i=1P
i=1 xi yi
n
n
n
n i=1 xi yi i=1 xi i=1 yi