Beruflich Dokumente
Kultur Dokumente
Lecture 2: Ordinary Least Squares and the Simple Linear Regression Model
Key observation: If we plot n observations
Yi , X i , i = 1,, n
on a dependent variable Y and an independent variable X in a scatterplot, then points ( X i , Yi ) not on any straight line
Y = + X
Econ 414
Econ 414
HOUSEPRICE vs. SQUAREFT 600 HOUSEPRICE
500
400
Y
300
200
Y = + X
X
100 1000
1500
2000
2500
3000
3500
SQUAREFT
Econ 414
How do we find the hidden straight line if there are omitted variables? Ordinary Least Squares (OLS) method Compute residuals for i = 1,, n
ui = Yi X i
= (Yi X i ) 2
i =1
Econ 414
The sum of squared residuals depends on , (as is obvious from figure) Minimize S ( , ) with respect to ,
the coefficients
Econ 414
xi = x1 + x2 + c = c + c +
n
+ xn
Squares (3)
(a + b + c) 2 = a 2 + b 2 + c 2 + 2ab + 2bc + 2ac
Econ 414
OLS solution
S ( , ) = (Yi X i ) 2 =
i =1 n
by (3)
= (Yi2 + 2 + 2 X i2 2Yi 2 X iYi + 2X i ) =
i =1 n n
X i2 i =1
2 Yi 2 X iYi + 2 X i
i =1 i =1 i =1
Econ 414
X i2 i =1
2 X iYi
i =1
with A, B, C constants. The graph of such a function (if B > 0 ) is a parabola with a minimum (see figure)
Econ 414
A + B 2 + C
Econ 414
10
10
Econ 414
Math reminder
dA =0 d d C d =C =C d d
d B 2 d 2 =B = 2 B d d
Using this
d S ( ) = 2 B + C d
or =
C 2B
11
11
Econ 414
Now
B=
X i2 i =1
n
C = 2 X iYi
i =1
and
= i =1 n
X iYi
i =1
X i2
12
12
Econ 414
This a quadratic function in , . Graph has same shape as parabola, but in 3D. In minimum slope is again 0.
i =1
X i2
2 Yi 2 X iYi + 2 X i
i =1 i =1 i =1
13
13
Econ 414
Now we have slope in -direction and slope in -direction. These are the partial derivatives with respect to and :
S ( , ) and
S ( , )
Computed like ordinary derivatives, except that other argument is kept constant: for slope/derivative in -direction we keep constant and for slope/derivative in -direction we keep constant With
S ( , ) = Yi + n +
2 2 i =1 n 2
i =1
X i2
2 Yi 2 X iYi + 2 X i
i =1 i =1 i =1
we have
14
14
Econ 414
directions is 0. Hence
n n S ( , ) = 2n 2 Yi + 2 X i = 0 i =1 i =1 n n n S ( , ) 2 = 2 X iYi + 2 X i + 2 X i = 0 i =1 i =1 i =1
15
15
Econ 414
. These are These are two (linear) equations in two unknowns , called the normal equations.
16
16
Econ 414
and
2 n n n n n 1 1 2 X i X i = X iYi X i Yi i =1 n n i =1 i =1 i =1 = 1 i
17
17
Econ 414
Now
i =1
(X i X ) =
2 n
i =1
( X i2
n
2 X i X X ) = X i2 2nX 2 nX 2 =
2 i =1 2
n 2 2 2 1 = X i nX = X i Xi n i =1 i =1 i =1
and
i =1
18
18
Econ 414
= i =1
( X i X )(Yi Y )
i =1
( X i X )2
19
19
Econ 414
Hence the values of , that minimize the sum of squared residuals are
= i =1
( X i X )(Yi Y )
i =1
( X i X )2
X =Y
These are the Ordinary Least Squares (OLS) solutions to the problem of fitting a straight line to the points in a scatterplot. The least squares line
X Y = +
is the straight line that fits the scatterplot best (see figure).
20
20
Econ 414
21
21
Econ 414
22
22
Econ 414
The residuals with respect to the least squares line are the OLS residuals
Xi ei = Yi
0 = X iYi
i =1
i =1
X i2
Xi ) = X i = X i (Yi
i =1 i =1
= X i ei
i =1
23
23
Econ 414
Hence
1 n ei = ei = 0 n i =1 i =1
1 n X i ei = X i ei = 0 n i =1 i =1
n
In words: The sample average (and sum) of the OLS residuals is 0 and the sample covariance of these residuals and X is also 0
24
24
Econ 414
These are all consequences of the fact that we minimize the sum of squared residuals. How good is the fit of the straight line to the scatterplot? Define the fitted value
Xi i = Y +
25
25
Econ 414
i =1
X i ) = ei X i = 0 = ei ( + ei +
i =1 i =1 i =1
26
26
Econ 414
Note that this implies that the sample covariance between the OLS residuals and the OLS fitted values is 0: OLS decomposes Yi into two parts (residual and fitted value) that have covariance 0, i.e. are unrelated Using this we find
1 n 1 n 2 1 n 2 2 (Yi Y ) = (Yi Y ) + ei n i =1 n i =1 n i =1
The sample variance of Y is equal to the sum of the sample variance and the sample variance of e or of Y Total Variance = Explained variance + +Unexplained variance
27
27
Econ 414
1 R 2 = i= n
i =1
(Yi Y ) 2
This is fraction of total variance that is explained by fitted straight line. Note R 2 = 1 if and only if Y i = Yi = + X i , i.e. if all observations are on the straight line.
28
28
Econ 414
or
( X X i ) = 0
for all i = 1,, n . If the X i are not all equal, then this can only be the = 0 . In that case X does not help in explaining Y . case if These are the extreme values for R 2 . We have
0 R2 1
29
29