Sie sind auf Seite 1von 29

Econ 414

Lecture 2: Ordinary Least Squares and the Simple Linear Regression Model
Key observation: If we plot n observations
Yi , X i , i = 1,, n

on a dependent variable Y and an independent variable X in a scatterplot, then points ( X i , Yi ) not on any straight line
Y = + X

Reason: Omitted variables that affect Y (beside X )

Econ 414

Contribution of omitted variables to Y is the residual u


u = Y X

See figure for illustration

Econ 414
HOUSEPRICE vs. SQUAREFT 600 HOUSEPRICE

500

400

Y
300

200

Y = + X

X
100 1000

1500

2000

2500

3000

3500

SQUAREFT

Econ 414

How do we find the hidden straight line if there are omitted variables? Ordinary Least Squares (OLS) method Compute residuals for i = 1,, n
ui = Yi X i

Compute the sum of squared residuals


S ( , ) = ui2 i =1
n

= (Yi X i ) 2
i =1

Econ 414

The sum of squared residuals depends on , (as is obvious from figure) Minimize S ( , ) with respect to ,

the coefficients

The last step is not obvious and will be justified later.

Econ 414

Math reminder Summation


i =1 n

xi = x1 + x2 + c = c + c +
n

+ xn

Hence (1) (2)


+ c = nc + cxn = c xi
i =1 n i =1 n i =1

cxi = cx1 + cx2 +

Squares (3)
(a + b + c) 2 = a 2 + b 2 + c 2 + 2ab + 2bc + 2ac

Econ 414

OLS solution
S ( , ) = (Yi X i ) 2 =
i =1 n

by (3)
= (Yi2 + 2 + 2 X i2 2Yi 2 X iYi + 2X i ) =
i =1 n n

by (1) and (2)


= Yi + n +
2 2 i =1 2

X i2 i =1

2 Yi 2 X iYi + 2 X i
i =1 i =1 i =1

Econ 414

Consider first case = 0


S ( ) = Yi +
2 i =1 n 2

X i2 i =1

2 X iYi
i =1

This is expression in of form


A + B 2 + C

with A, B, C constants. The graph of such a function (if B > 0 ) is a parabola with a minimum (see figure)

Econ 414

A + B 2 + C

Econ 414

How do we find minimum? At that minimizes S ( ) the slope of S ( ) is 0. Slope of S ( ) in is derivative of S ( )


d S ( ) d

10

10

Econ 414

Math reminder
dA =0 d d C d =C =C d d

d B 2 d 2 =B = 2 B d d

Using this
d S ( ) = 2 B + C d

Value of that minimizes S ( ) solution to


2 B + C = 0

or =

C 2B

11

11

Econ 414

Now
B=

X i2 i =1
n

C = 2 X iYi
i =1

and

= i =1 n

X iYi
i =1

X i2

12

12

Econ 414

Next, case with ,


S ( , ) = Yi + n +
2 2 i =1 n 2

This a quadratic function in , . Graph has same shape as parabola, but in 3D. In minimum slope is again 0.

i =1

X i2

2 Yi 2 X iYi + 2 X i
i =1 i =1 i =1

13

13

Econ 414

Now we have slope in -direction and slope in -direction. These are the partial derivatives with respect to and :
S ( , ) and

S ( , )

Computed like ordinary derivatives, except that other argument is kept constant: for slope/derivative in -direction we keep constant and for slope/derivative in -direction we keep constant With
S ( , ) = Yi + n +
2 2 i =1 n 2

i =1

X i2

2 Yi 2 X iYi + 2 X i
i =1 i =1 i =1

we have

14

14

Econ 414

n n S ( , ) = 2n 2 Yi + 2 X i i =1 i =1 n n n S ( , ) 2 = 2 X iYi + 2 X i + 2 X i i =1 i =1 i =1 S ( , ) is minimal at the value of ( , ) where the slope in both

directions is 0. Hence
n n S ( , ) = 2n 2 Yi + 2 X i = 0 i =1 i =1 n n n S ( , ) 2 = 2 X iYi + 2 X i + 2 X i = 0 i =1 i =1 i =1

15

15

Econ 414

. These are These are two (linear) equations in two unknowns , called the normal equations.

Solution: 1. Solve first equation for


1 n 1 n X = Yi X i = Y n i =1 n i =1 with Y , X the sample average of Y , X

16

16

Econ 414

2. Substitute this solution in the second equation


n n n 1 n 2 1 X iYi + X i + n Yi n X i X i = 0 i =1 i =1 i =1 i =1 i =1 n

and
2 n n n n n 1 1 2 X i X i = X iYi X i Yi i =1 n n i =1 i =1 i =1 = 1 i

17

17

Econ 414

Now

i =1

(X i X ) =
2 n

i =1

( X i2
n

2 X i X X ) = X i2 2nX 2 nX 2 =
2 i =1 2

n 2 2 2 1 = X i nX = X i Xi n i =1 i =1 i =1

and

i =1

( X i X )(Yi Y ) = ( X iYi Y X i XYi + X Y ) =


i =1 n n

n 1 n = X iYi nY X nXY + nX Y = X iYi X i Yi n i =1 i =1 i =1 i =1

18

18

Econ 414

Using these results we find for

= i =1

( X i X )(Yi Y )
i =1

( X i X )2

19

19

Econ 414

Hence the values of , that minimize the sum of squared residuals are

= i =1

( X i X )(Yi Y )
i =1

( X i X )2

X =Y

These are the Ordinary Least Squares (OLS) solutions to the problem of fitting a straight line to the points in a scatterplot. The least squares line
X Y = +

is the straight line that fits the scatterplot best (see figure).

20

20

Econ 414

21

21

Econ 414

Note If we divide numerator and denominator by n or n 1, then


= Sample covariance of X and Y Sample variance of X

From the OLS solution for


X + Y =

In words: The point (Y , X ) is on the least squares line

22

22

Econ 414

The residuals with respect to the least squares line are the OLS residuals
Xi ei = Yi

From the normal equations


X i = (Yi X i ) = ei 0 = n + Yi
i =1 i =1 i =1 i =1 n n n n

0 = X iYi
i =1

i =1

X i2

Xi ) = X i = X i (Yi
i =1 i =1

= X i ei
i =1

23

23

Econ 414

Hence
1 n ei = ei = 0 n i =1 i =1
1 n X i ei = X i ei = 0 n i =1 i =1
n

In words: The sample average (and sum) of the OLS residuals is 0 and the sample covariance of these residuals and X is also 0

24

24

Econ 414

These are all consequences of the fact that we minimize the sum of squared residuals. How good is the fit of the straight line to the scatterplot? Define the fitted value
Xi i = Y +

then by the definitions


i + ei Yi = Y

Because the OLS residuals have average 0


Y =Y

25

25

Econ 414

have the same sample average In words: Y and Y

Using this we have


i Y ) 2 + ei2 + 2ei (Y i Y ) (Yi Y ) 2 = (Y

If we take the sum over i we first observe


i Y ) = ei Y i Y ei = ei (Y
i =1 i =1 n n n n n n

i =1

X i ) = ei X i = 0 = ei ( + ei +
i =1 i =1 i =1

26

26

Econ 414

Note that this implies that the sample covariance between the OLS residuals and the OLS fitted values is 0: OLS decomposes Yi into two parts (residual and fitted value) that have covariance 0, i.e. are unrelated Using this we find

1 n 1 n 2 1 n 2 2 (Yi Y ) = (Yi Y ) + ei n i =1 n i =1 n i =1
The sample variance of Y is equal to the sum of the sample variance and the sample variance of e or of Y Total Variance = Explained variance + +Unexplained variance

27

27

Econ 414

A measure of goodness of fit is


i Y )2 (Y
n

1 R 2 = i= n
i =1

(Yi Y ) 2

This is fraction of total variance that is explained by fitted straight line. Note R 2 = 1 if and only if Y i = Yi = + X i , i.e. if all observations are on the straight line.

28

28

Econ 414

Also R 2 = 0 if and only if


( X X i ) = Xi = Y i = Y =Y Y +

or
( X X i ) = 0

for all i = 1,, n . If the X i are not all equal, then this can only be the = 0 . In that case X does not help in explaining Y . case if These are the extreme values for R 2 . We have
0 R2 1

29

29

Das könnte Ihnen auch gefallen