You are on page 1of 7

# ECON 2007 - Term 1

## Additional Notes and Proofs on OLS

Alexandros Theloudis

November 7, 2015

We are interested in estimating the relationship between x and y in the population. We assume
that there is a linear association between the two (Assumption SLR.1 ):

y = 0 + 1 x + u (1)

where 0 and 1 are the parameters of interest and u is an error term. The population is not
directly observable as it is assumed to be infinite; however we can still learn a lot about the
aforementioned relation by hinging on random samples from the population. A random sample
can be denoted as {(xi , yi ) : 1, . . . , n}; this notation implies that we have n observations (xi , yi )
where {xi , yi } are independently drawn from the same aforementioned population (we say that
{xi , yi } are i.i.d. - independent and identically distributed ). This is Assumption SLR.2.
The error terms are assumed to have zero conditional mean: conditional on x the expected
value of u in the population is 0. This can be written as E(u|x) = 0; it implies that no x conveys
any information about u (Assumption SLR.4 ).
The question is: how can we get 0 and 1 ? Well, we cant unless we have infinite amount of
information. However, given a random sample of observations, the OLS estimates 0 and 1 are
BLUE: Best Linear Unbiased Estimates of the items of interest 0 and 1 . How do we get 0 and
1 ?

## 1 Derivation of OLS Estimators

There are two equivalent ways to obtain 0 and 1 given the 3 assumptions above. The first way
is the Method of Moments ; the second is the Sum of Squared Residuals.

## 1.1 Method of Moments

E[u|x] = 0 implies that E[u] = 0 and E[xu] = 01 . From (1) we can write:

E[u] = E[y 0 1 x] = 0
E[xu] = E[x(y 0 1 x)] = 0

## These equations hold in the population. Their sample analogues are:

1 X 
n
yi 0 1 xi = 0
n i=1
1 X 
n
xi (yi 0 1 xi ) = 0
n i=1

Alexandros Theloudis: Department of Economics, University College London, Gower Street, WC1E 6BT Lon-

## don (email: alexandros.theloudis.10@ucl.ac.uk).

1 From the Law of Iterated Expectations: E (u) = E [E (u|x)] = E [0] = 0. Also: E (xu) = E [E (xu|x)] =

## E [xE (u|x)] = E [x0] = 0

1
where the summation over i implies summation over all sample observations.

1 X 
n
yi 0 1 xi = 0
n i=1
n n
1X 1X 1
yi 1 xi = n 0
n i=1 n i=1 n
n n
1X 1X
yi 1 xi = 0
n i=1 n i=1

y 1 x
= 0 (2)

## To reach the last line,

Pn we make use of the formula for calculating a sample average: for any generic
variable z: z = n1 i=1 zi .

## Working on the second equation, we can write:

1 X 
n
xi (yi 0 1 xi ) = 0
n i=1
1 X 
n
y 1 x
) 1 xi ) = 0
(2)
xi (yi (
n i=1
n 
X 
xi yi xi y + xi 1 x
1 x2i = 0
i=1
Xn  
xi (yi y) 1 xi (xi x
) = 0
i=1
n
X n
X
1 xi (xi x
) = xi (yi y)
i=1 i=1
n
X Xn
1 xi (xi x
) = xi (yi y)
i=1 i=1
Pn
x (y y)
1 = Pni=1 i i (3)
i=1 i (xi x
x )
Pn
Expression (3) is the OLS estimator 1 if i=1 xi (xi x
) 6= 0. This is guaranteed by Assumption
SLR.3. Substituting (3) into (2) we can get the OLS estimator 0 . Having obtained 0 and 1
we can then get:
Predicted/fitted values: yi = 0 + 1 xi

i = yi yi
Residuals: u

## 1.2 Sum of Squared Residuals

An alternative (but equivalent) way to obtain the OLS estimators is by minimizing the sum of
squared residuals (SSR hereafter). What really is a residual ui ? Think about observation i with
(xi , yi ); for xi the aforementioned OLS regression line predicts a y-value of yi . How far is yi from
the actual yi ? This information is given by u i ! In other words, u
i is the vertical distance between

2
yi and yi ; it can be both positive or negative depending on whether the actual point lies above or
below the OLS regression line.
If we want our OLS regression line to fit the data well, then we must minimize P the distances
n
i , i. How do we do that? One way would be to minimize the sum of residuals
u i . But
i=1 u
that sum is by definition equal to 0. Instead we can minimize the SSR: the smaller this sum is,
the closer the OLS regression line is to our sample observations.
Analytically:
Xn Xn n 
X 2
yi 0 1 xi
2
min 2i min
u (yi yi ) min
i=1 i=1 i=1

We need to find 0 and 1 so that the above SSR is minimized. Assuming that the above function
is well behaved, we will derive the first order conditions with respect to 0 and 1 and set them
equal to 0. For convenience we should open up the above expression:
Xn  2 X n  
yi 0 1 xi = yi2 + (0 + 1 xi )2 2yi (0 + 1 xi )
i=1 i=1

## The first order conditions are:

{with respect to 0 }:
n
X   Xn
2 0 + 1 xi 2yi = 0
i=1 i=1

n0 + 1 n
x = n
y

y 1 x
= 0
Notice that we now actually reached equation (2) above.
{with respect to 1 }:
n
X   n
X
2 0 + 1 xi xi 2yi xi = 0
i=1 i=1
n   n
(2) X X
y 1 x
+ 1 xi xi = yi xi
i=1 i=1
n
X n
X n
X
yxi + 1 xi (xi x
) = yi xi
i=1 i=1 i=1
Xn Xn
1 xi (xi x
) = xi (yi y)
i=1 i=1
n
X Xn
1 xi (xi x
) = xi (yi y)
i=1 i=1
Pn
x (y y)
1 = Pni=1 i i
i=1 i (xi x
x )
Now notice that this is actually the equation we found for 1 in (3). It should now be obvious
that the two ways of obtaining the OLS estimators are equivalent. As before, to obtain an
equation for 0 , one only needs to replace 1 in (2) with the expression in (3).

2 Unbiasedness
Every time we draw a new random sample from the population, the estimates 0 and 1 will be
different. The question we are asking now is: does the expected value of these estimates equal the
unknown true value for 0 and 1 in the population or not? The answer is yes. It will turn out
that E[0 ] = 0 and E[1 ] = 1 and thus we will say that the OLS estimates are unbiased.

3
2.1 Proof for 1
Pn
Following the lecture notes, we will set s2x = i=1 (xi x )2 . From (3) we have:
Pn
xi (yi y)
1 = Pni=1
i=1 i (xi x
x )
Notice that:
n
X n
X n
X
(xi x
x ) = xi
x x
x = n
xx n
xx=0
i=1 i=1 i=1
Pn Pn
Similarly we can show that i=1 y(xi x ) = 0 and i=1 x (yi y) = 0. Working on (3) we get:
Pn Pn Pn
xi (yi y) xi (yi y) i=1 x (yi y)
1 = Pni=1 = Pni=1 Pn
i=1 x i (x i
x ) i=1 x i (x i
x ) i=1 (xi x
x )
Pn
(xi x )(yi y)
= Pni=1
i=1 (xi x )(xi x )
Pn
(xi x )yi
= Pi=1
n
i=1 (x i )2
x
Pn
(xi x )yi
= i=1 2
sx
Pn
(1)
(xi x )(0 + 1 xi + ui )
1 = i=1
s2x
Pn Pn Pn
(xi x ) (xi x )xi (xi x)ui
= 0 i=1 2 +1 i=1 2 + i=1 2
sx sx sx
| {z } | {z }
=0 =1
Pn
i=1 (xi )ui
x
= 1 +
s2x
Now lets take expectations on both side conditional on the data x1 , x2 , . . . , xn we have available:
 Pn 
(xi x)ui
E[1 |x1 , . . . , xn ] = E 1 + i=1 2 |x1 , . . . , xn
sx
 Pn 
i=1 (xi x )ui
= E [1 |x1 , . . . , xn ] + E |x 1 , . . . , x n
s2x

## (because the expectation operator is linear)

 Pn 
i=1 (xi x
)ui
= 1 + E |x1 , . . . , xn
s2x

## (because 1 is just a number)

n
X
= 1 + s2
x (xi x
)E [ui |x1 , . . . , xn ]
i=1

(because conditional on x1 , . . . , xn the expected value of any function of xi is the function itself)
n
X
= 1 + s2
x (xi x
)E [ui |xi ]
i=1

## E[1 |x1 , . . . , xn ] = 1 (4)

4
(because of SLR.4 ). We are not done though. We have to prove that the un-conditional expecta-
tion of 1 is equal to 1 . By the Law of Iterated Expectations:
h i h i (4)
E 1 = E E[1 |x1 , . . . , xn ] = E [1 ] = 1

## 2.2 Proof for 0

From (2) we have that
0 = y 1 x

Notice that from (1) y = 0 + 1 x
+u
, so that (2) now becomes:

0 = 0 + 1 x
+u 1 x

= 0 + (1 1 )
x+u

Now lets take expectations on both side conditional on the data x1 , x2 , . . . , xn we have available:
h i
E[0 |x1 , . . . , xn ] = E 0 + (1 1 )
x+u
|x1 , . . . , xn
h i
= E [0 |x1 , . . . , xn ] + E (1 1 )
x|x1 , . . . , xn + E [
u|x1 , . . . , xn ]

## (because the expectation operator is linear)

h i
= 0 + xE (1 1 )|x1 , . . . , xn + E [
u|x1 , . . . , xn ]

## (because 0 is just a number; also conditional on x1 , . . . , xn x

is non-random)
" n #
1X
= 0 + E ui |xi
n i=1
h i
(because it has already been proved that E 1 = 1 )

n
1X
= 0 + E [ui |xi ]
n i=1
E[0 |x1 , . . . , xn ] = 0 (5)

(from SLR.4 ). As before, by the Law of Iterated Expectations one can show that E[0 ] = 0 .

## 3 Variance of OLS estimator

Here we impose an additional assumption:

V ar[ui |xi ] = 2 , i

This is Assumption SLR.5 according to the lecture notes. Recall from before that:
Pn
(xi x
)ui
1 = 1 + i=1 2
sx
h i
We are now interested in the conditional variance V ar 1 |x1 , . . . , xn :

h i  Pn 
i=1 (xi x
)ui
V ar 1 |x1 , . . . , xn = V ar 1 + |x1 , . . . , xn
s2x

5
 Pn 
i=1 (xi )ui
x
= V ar [1 |x1 , . . . , xn ] + V ar |x1 , . . . , xn
s2x
 Pn 
i=1 (xi x
)ui
+ 2Cov 1 , |x1 , . . . , xn
s2x

(because by the properties of the variance, V ar(a + b) = V ar(a) + V ar(b) + 2Cov(a, b))
 Pn 
i=1 (xi x
)ui
= V ar |x1 , . . . , xn
s2x

(because 1 is just a number so it doesnt vary or covary with any random variable)
n
1 2X
=( ) (xi x
)2 V ar [ui |x1 , . . . , xn ]
s2x i=1

## (because conditional on x1 , . . . , xn , any function of x is treated as a constant number; by the

properties of the variance: V ar(c z) = c2 V ar(z) where c is a constant and z is a random variable)
n
1 2X
=( ) (xi x
)2 V ar [ui |xi ]
s2x i=1
n
1 2X
=( ) (xi x
)2 2
s2x i=1

## (because of Assumption SLR.5 )

h i 1
V ar 1 |x1 , . . . , xn = 2 2 (6)
sx
Pn
as i=1 (xi x )2 = s2x . The derivation of the conditional variance of 0 follows a same logic. Can
you derive it?

4 Goodness-of-Fit (R2 )
How much of the variation in y is explained by variation in x? If we are interested in that question,
we are after the coefficient of determination R2 . R2 gives us a sense of the goodness-of-fit of our
regression; i.e. it informs us about what fraction of the variation in y is due to variation in x.
What do we mean by saying variation in y? The squared Pn distance of yi from the sample mean
y informs us about the spread of yi and is denoted by i=1 (yi y)2 . Notice that if we divide
this expression by n 1 we get the sample variance for yi . For what follows we will work on
P n
i=1 (yi y
)2 . As yi = u
i + yi we can write:
n
X n
X
2 2
(yi y) = (
ui + yi y)
i=1 i=1

n
X 
= 2i + (
u yi y)2 + 2
ui (
yi y)
i=1

n
X n
X n
X
= 2i +
u (
yi y)2 + 2 i (
u yi y) (7)
i=1 i=1 i=1

The last expression in (7) is 0. To see why, we can use yi = 0 + 1 xi and write:
n
X n
X
i (
u yi y) = i (0 + 1 xi y)
u
i=1 i=1

6
n
X n
X
= i (0 + 1 xi )
u i y
u
i=1 i=1

n
X n
X n
X
= 0 i + 1
u i xi y
u i
u
i=1 i=1 i=1
=0
Pn Pn
where the
Pnlast line comes from our sample moment conditions i=1 i = 0 and
u i=1 i xi = 0.
u
Hence: i (
i=1 u yi y) = 0. Going back to (7) we now have:
n
X n
X n
X
2
(yi y) = 2i +
u (
yi y)2
i=1 i=1 i=1
SST = SSR + SSE

where:

## SST: Total sum of squares (variation in y)

SSR: Sum of squared residuals (unexplained variation in y)

## Dividing across by SST we get:

SSR SSE
1= +
SST SST
SSE SSR
=1
SST SST
SSE SSR
R2 =1
SST SST
R2 is the fraction of sample variation in y that is explained by x.