Sie sind auf Seite 1von 46

Lecture Notes

Autumn 2010

## Dr. Getinet Haile, University of Mannheim

1. Introduction
Introduction & CLRM, Autumn Term 2010 1
What is econometrics?
Econometrics = economic statistics economic theory mathematics
1
Important notions:
Data perceived as realizations of random variables
Parameters are real numbers, not random variables
Joint distributions of random variables depend on parameters
A model a set of restrictions on the joint distribution of variables
1
according to Ragnar Frisch
Introduction & CLRM, Autumn Term 2010 2
Motivating Example
Mincer equation - The inuence of schooling on wages
ln(WAGE
i
) =
1
+
2
S
i
+
3
TENURE
i
+
4
EXPR
i
+
i
Notation:
Logarithm of the wage rate: ln(WAGE
i
)
Years of schooling: S
i
Experience in the current job: TENURE
i
Experience in the labor market: EXPR
i
Estimation of the parameters
k
, where
2
Introduction & CLRM, Autumn Term 2010 3
The importance of relationships
2
Relationship among variables is what empirical analysis is all about
Consider:
a dependent variable y
i
, and
a vector of k explanatory variables, x
i
.
Let r
i
= (y
i
, x
i
)

; i = 1,. . . , N be i.i.d.
Our interest is in the relationship between y
i
and the explanatory variables
with the ultimate agenda of:
2
this section draws from Angrist, 2009
Introduction & CLRM, Autumn Term 2010 4
Description - How does y
i
usually vary with x
i
?
Prediction - Can we use x
i
to forecast y
i
?
Causality - What is the eect of elements of x
i
on y
i
?
Generally we look for relationships that hold on average. On ave-
ragerelationships among variables are summarised by the Conditional
Expectation Function (CEF),
E[y
i
|x
i
] h(x
i
) y
i
= h(x
i
) +
i
where
E[
i
|x
i
] 0.
Introduction & CLRM, Autumn Term 2010 5
General regression equations
Generalization: y
i
=
1
x
i1
+
2
x
i2
+... +
K
x
iK
+
i
Index for observations i = 1, 2, ..., n and regressors k = 1, 2, ..., K
y
i
=

x
i
+
i
(1x1) (1xK) (Kx1) (1x1)
=
_
_
_
_
_

2
.
.
.

K
_
_
_
_
_
and x
i
=
_
_
_
_
_
x
i1
x
i2
.
.
.
x
iK
_
_
_
_
_
Introduction & CLRM, Autumn Term 2010 6
The key problem of econometrics: We deal with non-experimental
data
Unobservable variables, interdependence, endogeneity, causality
Examples:
- Ability bias in Mincer equation
- Reverse causality problem if unemployment is regressed on
liberalization index
- Causal eect on police force and crime is not an independent
outcome
- Simultaneity problem in demand price equation
Introduction & CLRM, Autumn Term 2010 7
2. The CLRM: Parameter Estimation by OLS
Hayashi p. 6/15-18
Introduction & CLRM, Autumn Term 2010 8
Classical linear regression model (CLRM)
y
i
=
1
x
i1
+
2
x
i2
+... +
K
x
iK
+
i
= x

i
+
i
(1xK) (Kx1)
y
i
: Dependent variable, observed
x

i
= (x
i1
, x
i2
, ..., x
iK
): Explanatory variables, observed

= (
1
,
2
, ...,
K
): Unknown parameters

i
: Disturbance component, unobserved
b

= (b
1
, b
2
, ..., b
K
) estimator of

e
i
= y
i
x

i
b: Estimated residual
Introduction & CLRM, Autumn Term 2010 9
For convenience we introduce matrix notation
y = X +
(nx1) (nxK) (Kx1) (nx1)
_

_
y
1
y
2
.
.
.
y
n
_

_
=
_
_
_
_
1 x
12
x
13
. . . x
1K
1 x
22
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n2
. . . x
nK
_
_
_
_

2
.
.
.

K
_

_
+
_

2
.
.
.

n
_

_
Introduction & CLRM, Autumn Term 2010 10
Writing extensively: A system of linear equations
y
1
=
1
+
2
x
12
+. . . +
K
x
1K
+
1
y
2
=
1
+
2
x
22
+. . . +
K
x
2K
+
2
.
.
.
y
n
=
1
+
2
x
n2
+. . . +
K
x
nK
+
n
Introduction & CLRM, Autumn Term 2010 11
We estimate the linear model and choose b such that SSR is
minimized
Obtain an estimator b of by minimizing the SSR (sum of squared
residuals):
argmin
{b}
S(b) = argmin

n
i=1
e
2
i
= argmin

n
i=1
(y
i
x

i
b)
2
Dierentiation with respect to b
1
, b
2
, ..., b
K
FOCs:
(1)
S(b)
b
1
!
= 0

e
i
= 0
(2)
S(b)
b
2
!
= 0

e
i
x
i2
= 0
.
.
.
(K)
S(b)
b
K
!
= 0

e
i
x
iK
= 0
FOCs can be conveniently written in matrix notation X

e = 0
Introduction & CLRM, Autumn Term 2010 12
The system of K equations is solved by matrix algebra
X

e = X

(y Xb) = X

y X

Xb = 0
Premultiplying by (X

X)
1
:
(X

X)
1
X

y (X

X)
1
X

Xb = 0
(X

X)
1
X

y Ib = 0
OLS-estimator:
b = (X

X)
1
X

y
Alternatively:
b =
_
1
n
X

X
_
1
1
n
X

y =
_
1
n

n
i=1
x
i
x

i
_
1
1
n

n
i=1
x
i
y
i
Introduction & CLRM, Autumn Term 2010 13
Zoom into the matrices X

X and X

y
b =
_
1
n
X

X
_
1
1
n
X

y =
_
1
n

n
i=1
x
i
x

i
_
1
1
n

n
i=1
x
i
y
i

n
i=1
x
i
x

i
=
_
_
_
_

x
2
i1

x
i1
x
i2

x
i1
x
i3
. . .

x
i1
x
iK

x
i1
x
i2

x
2
i2

x
i2
x
iK
.
.
.
.
.
.
.
.
.
.
.
.

x
i1
x
iK

x
i2
x
iK
. . .

x
2
iK
_
_
_
_

n
i=1
x
i
y
i
=
_
_
_
_
_
_

x
i1
y
i

x
i2
y
i

x
i3
y
i
.
.
.

x
iK
y
i
_
_
_
_
_
_
Introduction & CLRM, Autumn Term 2010 14
3. Assumptions of the CLRM
Hayashi p. 3-13
Introduction & CLRM, Autumn Term 2010 15
The four core assumptions of CLRM
1.1 Linearity y
i
= x

i
+
i
1.2 Strict exogeneity E(
i
|X) = 0
E(
i
) = 0 and Cov(
i
, x
ik
) = E(
i
x
ik
) = 0
1.3 No exact multicollinearity, P(rank(X) = k) = 1
No linear dependencies in the data matrix
1.4 Spherical disturbances: V ar(
i
|X) = E(
2
i
|X) =
2
Cov(
i
,
j
|X) = 0; E(
i

j
|X) = 0
E(
i
) =
2
i
and Cov(
i
,
j
) = 0 by LTE (see Hayashi p. 18)
Introduction & CLRM, Autumn Term 2010 16
Interpreting the parameters of dierent types of linear equations
Linear model y
i
=
1
+
2
x
i2
+ ... +
K
x
iK
+
i
: A one unit increase
in the independent variable x
ik
increases the dependent variable by
k
units
Semi-log form log(y
i
) =
1
+
2
x
i2
+ ... +
K
x
iK
+
i
: A one unit
increase in the independent variable increases the dependent variable
approximately by 100
k
percent
Log linear model log(y
i
) =
1
log(x
i1
) +
2
log(x
i2
) +... +
K
log(x
iK
) +

i
: A one percent increase in x
ik
increases the dependent variable y
i
approximately by
k
percent
Introduction & CLRM, Autumn Term 2010 17
Some important laws
Law of Total Expectation (LTE):
E
X
[E
Y |X
(Y |X)] = E
Y
(Y )
Double Expectation Theorem (DET):
E
X
[E
Y |X
(g(Y )|X)] = E
Y
(g(Y ))
Law of Iterated Expectations (LIE):
E
Z|X
[E
Y |X,Z
(Y |X, Z)|X] = E
Y |X
(Y |X)
Introduction & CLRM, Autumn Term 2010 18
Some important laws (continued)
Generalized DET:
E
X
[E
Y |X
(g(X, Y ))|X] = E
X,Y
(g(X, Y ))
Linearity of Conditional Expectations:
E
Y |X
[g(X)Y |X] = g(X)E
Y |X
[Y |X]
Introduction & CLRM, Autumn Term 2010 19
4. Finite sample properties of the OLS
estimator
Hayashi p. 27-31
Introduction & CLRM, Autumn Term 2010 20
Finite sample properties of b = (X

X)
1
X

y
1. E(b) = : Unbiasedness of the estimator
Holds for any sample size
Holds under assumptions 1.1 - 1.3
2. V ar(b|X) =
2
(X

X)
1
: Conditional variance of b
Conditional variance depends on the data
Holds under assumptions 1.1 - 1.4
3. V ar(

|X) V ar(b|X)

## is any other linear unbiased estimator of

Holds under assumptions 1.1 - 1.4
Introduction & CLRM, Autumn Term 2010 21
Some key results from mathematical statistics
z
(nx1)
=
_
_
_
_
z
1
z
2
.
.
.
z
n
_
_
_
_
A
(mxn)
=
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
_
_
_
_
A new random variable: v
(mx1)
= A
(mxn)
z
(nx1)
E(v)
(mx1)
=
_
_
_
_
E(v
1
)
E(v
2
)
.
.
.
E(v
m
)
_
_
_
_
= AE(z)
V ar(v)
(mxm)
= AV ar(z)A

## Introduction & CLRM, Autumn Term 2010 22

The OLS estimators unbiasedness
E(b) = E(b ) = 0
sampling error
b = (X

X)
1
X

y
= (X

X)
1
X

(X +)
= (X

X)
1
X

X + (X

X)
1
X

= + (X

X)
1
X

= (X

X)
1
X

E(b |X) = (X

X)
1
X

## E(|X) = 0 under assumption 1.2

E
X
(E(b|X)) = E
X
() = E(b) by the LTE
Introduction & CLRM, Autumn Term 2010 23
We show that V ar(b|X) =
2
(X

X)
1
V ar(b|X) = V ar(b |X)
= V ar((X

X)
1
X

|X) = V ar(A|X)
= AV ar(|X)A

= A
2
I
n
A

=
2
AI
n
A

=
2
AA

=
2
(X

X)
1
X

X(X

X)
1
=
2
(X

X)
1
Note:
non-random
b sampling error
A = (X

X)
1
X

V ar(|X) =
2
I
n
Introduction & CLRM, Autumn Term 2010 24
Sketch of the proof of the Gauss Markov theorem
V ar(

|X) V ar(b|X)
V ar(

|X) = V ar(

|X) = V ar[(D+A)|X]
= (D+A)V ar(|X)(D

+A

) =
2
(D+A)(D

+A

)
=
2
(DD

+DA

+AA

) =
2
[DD

+ (X

X)
1
]

2
(X

X)
1
= V ar(b|X)
where
C is a function of X

= Cy
D = CA
A (X

X)
1
X

## Details of proof: Hayashi pages 29 - 30

Introduction & CLRM, Autumn Term 2010 25
The OLS estimator is BLUE
- OLS is the best estimator
Holds under the Gauss Markov theorem V ar(

|X) V ar(b|X)
- OLS is linear
Holds under assumption 1.1
- OLS is unbiased
Holds under assumption 1.1 - 1.3
Introduction & CLRM, Autumn Term 2010 26
5. Hypothesis Testing under Normality
Hayashi p. 33-45
Introduction & CLRM, Autumn Term 2010 27
Hypothesis testing
Economic theory provides hypotheses about parameters
If theory is right testable implications
But: Hypotheses cant be tested without distributional assumptions
Distributional assumption: Normality assumption about the conditional
distribution of |X MV N(0,
2
I
n
) [Assumption 1.5]
Introduction & CLRM, Autumn Term 2010 28
Some facts from multivariate statistics
Vector of random variables: x = (x
1
, x
2
, ..., x
n
)

Expectation vector:
E(x) = = (
1
,
2
, ...,
n
)

= (E(x
1
), E(x
2
), ..., E(x
n
))

Variance-covariance matrix:
V ar(x) = =
_
_
_
V ar(x
1
) Cov(x
1
, x
2
) . . . Cov(x
1
, x
n
)
Cov(x
1
, x
2
) V ar(x
2
)
.
.
.
.
.
.
.
.
.
Cov(x
1
, x
n
) . . . V ar(x
n
)
_
_
_
y = c +Ax; c, A non-random vector/matrix
E(y) = (E(y
1
), E(y
2
), ..., E(y
n
))

= c +A
V ar(y) = AA

## x MV N(, ) y = c +Ax MV N(c +A, AA

)
Introduction & CLRM, Autumn Term 2010 29
Application of the facts from multivariate statistics and the assump-
tions 1.1 - 1.5
b
. .
= (X

X)
1
X

sampling error
Assuming |X MV N(0,
2
I
n
)
b |X MV N
_
(X

X)
1
X

E(|X), (X

X)
1
X

2
I
n
X(X

X)
1
_
b |X MV N
_
0,
2
(X

X)
1
_
Note that V ar(b|X) =
2
(X

X)
1
OLS-estimator conditionally normally distributed if |X is multivariate
normal
Introduction & CLRM, Autumn Term 2010 30
Testing hypothesis about individual parameters (t-Test)
Null hypothesis: H
0
:
k
=

k
,

k
a hypothesized value, a real number
Under assumption 1.5 and |X MV N(0,
2
I
n
) alternative hypothesis:
H
A
:
k
=

k
If H
0
is true E(b
k
) =

k
Test statistic: t
k
=
b
k

2
[(X

X)
1
]
kk
N(0, 1)
Note: [(X

X)
1
]
kk
is the k-th row k-th column element of (X

X)
1
Introduction & CLRM, Autumn Term 2010 31
Nuisance parameter
2
can be estimated

2
= E(
2
i
|X) = V ar(
i
|X) = E(
2
i
) = V ar(
i
)
We dont know
i
but we use the estimator e
i
= y
i
x

i
b

2
=
1
n

n
i=1
(e
i

1
n

n
i=1
e
i
)
2
=
1
n

n
i=1
e
2
i
=
1
n
e

e

2
is a biased estimator:
E(
2
|X) =
nK
n

2
Introduction & CLRM, Autumn Term 2010 32
An unbiased estimator of
2
For s
2
=
1
nK

n
i=1
e
2
i
=
1
nK
e

## e we get an unbiased estimator

E(s
2
|X) =
1
nK
E(e

e|X) =
2
E
_
E(s
2
|X)
_
= E(s
2
) =
2
Using this provides an unbiased estimator of V ar(b|X) =
2
(X

X)
1
:

V ar(b|X) = s
2
(X

X)
1
t-statistic under H
0
:
t
k
=
b
k

V ar(b|X)

kk
=
b
k

k
SE(b
k
)
=
b
k

V ar(b
k
|X)

t(n K)
Introduction & CLRM, Autumn Term 2010 33
Decision rule for the t-test
1. H
0
:
k
=

k
, is often

k
= 0
H
A
:
k
=

k
2. Given

k
, OLS-estimate b
k
and s
2
, we compute t
k
=
b
k

k
SE(b
k
)
3. Fix signicance level of two-sided test
4. Fix non-rejection and rejection regions decision
Remark:
_

2
[(X

X)
1
]
kk
: standard deviation b
k
|X
_
s
2
[(X

X)
1
]
kk
: standard error b
k
|X
Introduction & CLRM, Autumn Term 2010 34
Testing joint hypotheses (F-test/Wald test)
Write hypothesis as:
H
0
: R = r
(#r x K) (K x 1) (#r x 1)
R: matrix of real numbers
r: number of restrictions
Replacing the = (
1
,
2
, ...,
k
) by estimator b = (b
1
, b
2
, ..., b
K
)

:
Rb = r
Introduction & CLRM, Autumn Term 2010 35
Denition of the F-test statistic
Properties of Rb:
RE(b|X) = R = r
RV ar(b|X)R

= R
2
(X

X)
1
R

Rb = r MV N(R, R
2
(X

X)
1
R

)
Using some additional important facts from multivariate statistics
z = (z
1
, z
2
, ..., z
m
) MV N(, )
(z )

1
(z )
2
(m)
Result applied: Wald statistic
(Rbr)

[
2
R(X

X)
1
R

]
1
(Rbr)
2
(#r)
Introduction & CLRM, Autumn Term 2010 36
Properties of the F-test statistic
Replace
2
by its unbiased estimate s
2
=
1
nK

n
i=1
e
2
i
=
1
nK
e

e and
dividing by #r:
F-ratio:
F =
(Rb r)

[R(X

X)
1
R

]
1
(Rb r)/#r
(e

e)/(n K)
= (Rb r)

[R

V ar(b|X)R

]
1
(Rb r)/#r F(#r, n K)
Note: F-test is one-sided
Proof: see Hayashi p. 41
Introduction & CLRM, Autumn Term 2010 37
Decision rule of the F-test
1. Specify H
0
in the form R = r and H
A
: R = r.
2. Calculate F-statistic.
3. Look up entry in the table of the F-distribution for #r and n K at
given signicance level.
4. Null is not rejected on the signicance level for F less than
F

(#r, n K)
Introduction & CLRM, Autumn Term 2010 38
Alternative representation of the F-statistic
Minimization of the unrestricted sum of squared residuals:
min

n
i=1
(y
i
x

i
b)
2
SSR
U
Minimization of the restricted sum of squared residuals:
min

n
i=1
(y
i
x

b)
2
SSR
R
F-ratio:
F =
(SSR
R
SSR
U
)/#r
SSR
U
/(nK)
Introduction & CLRM, Autumn Term 2010 39
6. Condence intervals and goodness of t
measures
Hayashi p. 38/20
Introduction & CLRM, Autumn Term 2010 40
Duality of t-test and condence interval
Under H
0
:
k
=
k
t
k
=
b
k

k
SE(b
k
)
t(n K)
Probability for non-rejection:
P
_
t

2
(n K) t
k
t

2
(n K)
_
= 1
t

2
(n K) lower critical value
t

2
(n K) upper critical value
t
k
random variable (value of test statistic)
1 xed number
P
_
b
k
SE(b
k
)t

2
(n K)
k
b
k
+SE(b
k
)t

2
(n K)
_
= 1
Introduction & CLRM, Autumn Term 2010 41
The condence interval
Condence interval for
k
:
P
_
b
k
SE(b
k
)t

2
(n K)
k
b
k
+SE(b
k
)t

2
(n K)
_
= 1
The condence bounds are random variables!
b
k
SE(b
k
)t

2
(n K): lower bound
b
k
+SE(b
k
)t

2
(n K): upper bound
Wrong Interpretation: True parameter
k
lies with probability 1 within
the bounds of the condence interval
Problem: Condence bounds are not xed; they are random!
H
0
is rejected at signicance level if the hypothesized value does not lie
within the condence bounds of the 1 interval.
Introduction & CLRM, Autumn Term 2010 42
Coecient of determination: uncentered R
2
Measure of the variability of the dependent variable:

y
2
i
= y

y
Decomposition of y

y:
y

y = ( y +e)

( y +e)
= y

y + 2 ye +e

e
= y

y +e

e
R
2
uc
1
e

e
y

y
A good model explains much and therefore the residual variation is very
small compared to the explained variation.
Introduction & CLRM, Autumn Term 2010 43
Coecient of determination: centered R
2
and R
2
Use centered R
2
if there is a constant in the model (x
i1
= 1)

n
i=1
(y
i
y)
2
=

n
i=1
( y
i
y)
2
+

n
i=1
e
2
i
R
2
c
1

n
i=1
e
2
i

n
i=1
(y
i
y)
2
1
SSR
SST
Note, that R
2
uc
and R
2
c
lie both in the interval [0, 1] but describe dierent
models. They are not comparable!
R
2
is constructed with a penalty for heavy parametrization:
R
2
= 1
SSR/(nK)
SST/(n1)
= 1
n1
nK
SSR
SST
The R
2
is an accepted model selection criterion
Introduction & CLRM, Autumn Term 2010 44
Alternative goodness of t measures
Akaike criterion (AIC): log
_
SSR
n
_
+
2K
n
Schwarz criterion (SBC): log
_
SSR
n
_
+
log(n)K
n
Note:
Both criteria include a penalty term for heavy parametrization
Select model with smallest AIC/SBC
Introduction & CLRM, Autumn Term 2010 45