Sie sind auf Seite 1von 16

Econometrics 2 Fall 2005

Generalized Method of Moments


(GMM) Estimation
Heino Bohn Nielsen

1 of 32

Outline
(1) Introduction and motivation
(2) Moment Conditions and Identification
(3) A Model Class: Instrumental Variables (IV) Estimation
(4) Method of Moment (MM) Estimation

Examples: Mean, OLS and Linear IV


(5) Generalized Method of Moment (GMM) Estimation

Properties: Consistency and Asymptotic Distribution


(6) Ecient GMM

Examples: Two-Stage Least Squares


(7) Comparison with Maximum Likelihood

Pseudo-ML Estimation
(8) Empirical Example: C-CAPM Model

2 of 32

Introduction
Generalized method of moments (GMM) is a general estimation principle.
Estimators are derived from so-called moment conditions.
Three main motivations:
(1) Many estimators can be seen as special cases of GMM.
Unifying framework for comparison.
(2) Maximum likelihood estimators have the smallest variance in the class of consistent

and asymptotically normal estimators.


But: We need a full description of the DGP and correct specification.
GMM is an alternative based on minimal assumptions.
(3) GMM estimation is often possible where a likelihood analysis is extremely dicult.

We only need a partial specification of the model.


Models for rational expectations.

3 of 32

Moment Conditions and Identification


A moment condition is a statement involving the data and the parameters:
g(0) = E[f (wt, zt, 0)] = 0.

()

where is a K 1 vector of parameters; f () is an R dimensional vector of (nonlinear) functions; wt contains model variables; and zt contains instruments.

If we knew the expectation then we could solve the equations in () to find 0.


If there is a unique solution, so that
E[f (wt, zt, )] = 0 if and only if = 0,
then we say that the system is identified.

Identification is essential for doing econometrics. Two ideas:


(1) Is the model constructed so that 0 is unique (identification).
(2) Are the data informative enough to determine 0 (empirical identification).
4 of 32

Instrumental Variables Estimation


In many applications, the moment condition has the specific form:
f (wt, zt, ) = u(wt, ) |{z}
zt ,
| {z }
(11)

(R1)

where the R instruments in zt are multiplied by the disturbance term, u(wt, ).

You can think of u(wt, ) as the equivalent of an error term.


The moment condition becomes
g(0) = E[u(wt, 0) zt] = 0,
stating that the instruments are uncorrelated with the error term of the model.

This class of estimators is referred to as instrumental variables estimators.


The function u(wt, ) may be linear or non-linear in .

5 of 32

Example: Moment Condition From RE


Consider a monetary policy rule, where the interest rate depends on expected future
inflation:
rt = E[t+1 | It] + t.
Noting that

xt+1 = E[xt+1 | It] + vt,


where vt is the expectation error, we can write the model as

rt = E[ t+1 | It] +

= xt+1 + ( t vt) = xt+1 + ut.

Under rational expectations, the expectation error, vt, should be orthogonal to the
information set, It, and for zt It we have the moment condition
E[ut zt] = E[(rt xt+1) zt] = 0.
This is enough to identify .
6 of 32

Method of Moments (MM) Estimator


For a given sample, wt and zt (t = 1, 2, ..., T ), we cannot calculate the expectation.
We replace with sample averages to obtain the analogous sample moments:
T
1X
gT () =
f (wt, zt, ).
T t=1

MM , as the solution to gT (b
We can derive an estimator, b
MM ) = 0.

To find an estimator, we need at least as many equations as we have parameters.


The order condition for identification is R K .
R = K is called exact identification.
The estimator is denoted the method of moments estimator, b
MM .

R > K is called over-identification.


GMM .
The estimator is denoted the generalized method of moments estimator, b
7 of 32

Example: MM Estimator of the Mean


Assume that yt is random variable drawn from a population with expectation 0.
We have a single moment condition:
g(0) = E[f (yt, 0)] = E[yt 0] = 0,
where f (yt, 0) = yt 0.

For a sample, y1, y2, ..., yT , we state the corresponding sample moment conditions:
T
1X
gT (b
) =
(yt
b) = 0.
T t=1

The MM estimator of the mean 0 is the solution, i.e.

which is the sample average.

bMM

T
1X
=
yt,
T t=1

8 of 32

Example: OLS as a MM Estimator


Consider the linear regression model of yt on xt (K 1):
yt = x0t 0 + t.

()

Assume that () represents the conditional expectation:

E[yt | xt] = x0t 0 so that E[ t | xt] = 0.

That implies the K unconditional moment conditions


g( 0) = E[xt t] = E [xt (yt x0t 0)] = 0,
which we recognize as the minimal assumption for consistency of the OLS estimator.

9 of 32

We define the corresponding sample moment conditions as

T
T
T
1X

X
X
1
1
0
b =
b = 0.
b =
gT ()
xt yt xt
xtyt
xtx0t
T t=1
T t=1
T t=1

And the MM estimator is derived as the unique solution:


!1 T
T
X
X
b

=
xtx0
xtyt,
MM

provided that

PT

0
t=1 xt xt

t=1

t=1

is non-singular.

Method of moments is one way to motivate the OLS estimator.


Highlights the minimal (or identifying) assumptions for OLS.

10 of 32

Example: Under-Identification
Consider again a regression model
yt = x0t 0 +

= x01t 0 + x02t 0 + t.

Assume that the K1 variables in x1t are predetermined, while the K2 = K K1


variables in x2t are endogenous. That implies
E[x1t t] = 0 (K1 1)
E[x2t t] =
6 0 (K2 1).

()

()

We have K parameters in 0 = ( 00, 00)0, but only K1 < K moment conditions


(i.e. K1 equations to determine K unknowns).
The parameters are not identified and cannot be estimated consistently.
11 of 32

Example: Simple IV Estimator


Assume K2 new variables, z2t, that are correlated with x2t but uncorrelated with t:
E[z2t t] = 0.

()

The K2 moment conditions in () can replace (). To simplify notation, we define

x1t
x1t
xt =
and
zt =
.
x2t
z2t
(K1)
(K1)

xt are model variables, z2t are new instruments, and zt are instruments.
We say that x1t are instruments for themselves.
Using () and () we have K moment conditions:

E[x1t t]
g( 0) =
= E[zt t] = E[zt (yt x0t 0)] = 0,
E[z2t t]
which are sucient to identify the K parameters in .

12 of 32

The corresponding sample moment conditions are given by


T

X
1
0b
b
gT () =
zt yt xt = 0.
T t=1

The method of moments estimator is the unique solution:


!1 T
T
X
X
0
b
MM =
ztx
ztyt,
t

provided that

PT

t=1

0
t=1 zt xt

t=1

is non-singular.

Note the following:


(1) We need the instruments to identify the parameters.
(2) The MM estimator coincides with the simple IV estimator.
(3) The procedure only works with K2 instruments (i.e. R = K ).
PT
0
(4) Non-singularity of
t=1 zt xt requires relevant instruments.

13 of 32

Generalized Method of Moments Estimation


The case R > K is called over-identification.
More equations than parameters and no solution to gT () = 0 in general.
Instead we minimize the distance from gT () to zero.
The distance is measured by the quadratic form
QT () = gT ()0WT gT (),
where WT is an R R symmetric and positive definite weight matrix.

The GMM estimator depends on the weight matrix:


b
GMM (WT ) = arg min {gT ()0WT gT ()} .

14 of 32

Distances and Weight Matrices


Consider a simple example with 2 moment conditions

ga
gT () =
,
gb
where the dependence of T and is suppressed.

First consider a simple weight matrix, WT = I2 :

1
0
g
a
QT () = gT ()0WT gT () = ga gb
= ga2 + gb2,
gb
0 1
which is the square of the simple distance from gT () to zero.
Here the coordinates are equally important.

Alternatively, look at a dierent weight matrix:

2
0
g
a
QT () = gT ()0WT gT () = ga gb
= 2 ga2 + gb2,
gb
0 1
which attaches more weight to the first coordinate in the distance.

15 of 32

Consistency: Why Does it Work?


Assume that a law of large numbers (LLN) applies to f (wt, zt, ), i.e.
T

T
X
t=1

f (wt, zt, ) E[f (wt, zt, )] for T .

That requires IID or stationarity and weak dependence.

If the moment conditions are correct, g(0) = 0, then GMM is consistent,


b
GMM (WT ) 0 as T ,

for any WT positive definite.

Intuition: If a LLN applies, then gT () converges to g().


GMM (WT ) minimizes the distance from gT () to zero, it will be a consistent
Since b
estimator of the solution to g(0) = 0.
The weight matrix, WT , has to be positive definite, so that we put a positive and
non-zero weight on all moment conditions.
16 of 32

Asymptotic Distribution
Assume a central limit theorem for f (wt, zt, ), i.e.:
T

1 X
T gT (0) =
f (wt, zt, 0) N(0, S),
T t=1

where S is the asymptotic variance.

Then it holds that for any positive definite weight matrix, W , the asymptotic distribution of the GMM estimator is given by


b
T GMM 0 N(0, V ).
The asymptotic variance is given by

V = (D0W D)
where

D0W SW D (D0W D)

f (wt, zt, )
D=E
0
is the expected value of the R K matrix of first derivatives of the moments.

17 of 32

Ecient GMM Estimation


GMM depends on the weight matrix, WT .
The variance of b
The ecient GMM estimator has the smallest possible (asymptotic) variance.
Intuition: a moment with small variance is informative and should have large weight.
It can be shown that the optimal weight matrix, WTopt, has the property that
plim WTopt = S 1.
With the optimal weight matrix, W = S 1, the asymptotic variance simplifies to

1 0 1 1 0 1 1 0 1 1
V = D0S 1D
D S SS D D S D
= DS D
.

The best moment conditions have small S and large D.


A small S means that the sample variation of the moment (noise) is small.
A large D means that the moment condition is much violated if 6= 0.
The moment is very informative on the true values, 0.
18 of 32

Hypothesis testing can be based on the asymptotic distribution:


a
b
GMM N(0, T 1Vb ).

An estimator of the asymptotic variance is given by


1

Vb = DT0 ST1DT
,
where

T
gT ()
1 X f (wt, zt, )
DT =
=
|{z}
T t=1
0
0

(RK)

is the sample average of the first derivatives.


And ST is an estimator of S = T V [gT ()]. If the observations are independent, a
consistent estimator is
T
1X
ST =
f (wt, zt, )f (wt, zt, )0.
T t=1

Estimation of the weight matrix is typically the most tricky part of GMM.
19 of 32

Test of Overidentifying Moment Conditions


Recall that K moment conditions are sucient to estimate the K parameters in .
If R > K , we can test the validity of the R K overidentifying moment conditions.
By MM estimation we can set K moment conditions equal to zero.
If all R conditions are valid then the R K moments should also be close to zero.
From CLT we have

gT (0) N(0, T 1S).

If we use the optimal weights, WTopt S 1, then

J = T gT (b
GMM )0WToptgT (b
GMM ) = T QT (b
GMM ) 2(R K).

This is the J-test or the Hansen test for overidentifying restrictions.


In linear models it is often referred to as the Sargan test.
J is not a test of the validity of model or the underlying economic theory.
J considers whether the RK moments are in line with the K identifying moments.

20 of 32

Computational Issues
The estimator is defined by minimizing QT (). Minimization can be done by
QT () (gT ()0WT gT ())
=
= 0 .

(K1)

Sometimes analytically but often by numerical optimization.


We need an optimal weight matrix, WTopt, but that depends on the parameters!
Two-step ecient GMM:
(1) Choose an initial weight matrix, e.g. W[1] = IR , and find a consistent but inecient
first-step GMM estimator

b
[1] = arg min gT ()0W[1]gT ().

(2) Find the optimal weight matrix,

opt
[1]. Find the ecient estimator
W[2]
, based on b

opt
b
[2] = arg min gT ()0W[2]
gT ().

The estimator is not unique as it depends on the initial weight matrix W[1].
21 of 32

Iterated GMM estimator:


opt
[2] it is natural to update the weights, W[3]
[3].
From the estimator b
, and update b
We can switch between estimating W opt and b
[] until convergence.
[]

Iterated GMM does not depend on the initial weight matrix.


The two approaches are asymptotically equivalent.

Continuously updated GMM estimator:


A third approach is to recognize from the outset that the weight matrix depends on
the parameters, and minimize

QT () = gT ()0WT ()gT ().


That is never possible to solve analytically.

22 of 32

Example: 2SLS
Consider again a regression model
yt = x0t 0 +

= x01t 0 + x02t 0 + t,

where E[x1t t] = 0 and E[x2t t] 6= 0.


Assume that you have R > K valid instruments in zt so that

g( 0) = E[zt t] = E[zt (yt x0t 0)] = 0.


The corresponding sample moments are given by
T
1X
1
gT () =
zt (yt x0t) = Z 0 (Y X) ,
| {z } T
T
t=1
(R1)

where Y (T 1), X (T K), and Z (T R) are the stacked matrices.

In this case we cannot solve gT () = 0 directly; Z 0X is R K and not invertible.


23 of 32

Instead, we want to derive the GMM estimator by minimizing the criteria function
QT () = gT ()0WT gT ()

= T 1Z 0 (Y X) WT T 1Z 0 (Y X)

= T 2 Y 0ZWT Z 0Y 2 0X 0ZWT Z 0Y + 0X 0ZWT Z 0X .


We take the first derivative, and the GMM estimator is the solution to
QT ()
= 2T 2X 0ZWT Z 0Y + 2T 2X 0ZWT Z 0X = 0.

bGMM (WT ) = (X 0ZWT Z 0X)1 X 0ZWT Z 0Y , depending on WT .


We find

To estimate the optimal weight matrix, WTopt = ST1, we use the estimator
T
T
1 X
1X 2 0
0
ST =
f (wt, zt, )f (wt, zt, ) =
b ztz ,
T t=1
T t=1 t t

which allows for general heteroskedasticity of the disturbance term.

24 of 32

For the asymptotic distributions, we recall that

0 1 1
a
1
b
GMM N 0, T
.
DS D
The derivative is given by

gT ()
DT =
=
0
(RK)

PT
1
0
T
t=1 zt (yt xt )

= T

T
X

ztx0t,

t=1

so the variance of the estimator becomes


h
i

bGMM = T 1 D0 W optDT 1
V
T T

!
!1
!1
T
T
T
X
X 2
X
1
1
0
1
0
1
T
T
T
= T
xtzt
bt ztzt
ztx0t

T
X
t=1

xtzt0

!1

t=1

T
X
t=1

b2t ztzt0

T
X
t=1

t=1

ztx0t

!1

t=1

Note that this is the heteroskedasticity consistent (HC) variance estimator of White.
GMM with allowance for heteroskedastic errors automatically produces heteroskedasticity consistent standard errors!
25 of 32

If we assume that the error terms are IID, the optimal weight matrix simplifies to
T

b2 X 0
ST =
ztzt = T 1
b2Z 0Z,
T t=1

where
b2 is a consistent estimator for 2.

In this case the ecient GMM estimator becomes

bGMM = X 0ZS 1Z 0X 1 X 0ZS 1Z 0Y.

T
T
1

1 0
1
0
1 2 0
0
b ZZ
ZX
X 0Z T 1
b2 Z 0 Z
ZY
= XZ T

1
1 0
1
0
0
= X Z (Z Z) Z X
X 0Z (Z 0Z) Z 0Y,
which is identical to the two stage least squares (2SLS) estimator.

The variance of the estimator is


h
i

1
1
b
V GMM = T 1 DT0 ST1DT
=
b2(X 0Z (Z 0Z) Z 0X)1,
which again coincides with the 2SLS variance.

26 of 32

Pseudo-ML (PML) Estimation


The first order conditions for ML estimation can be seen as a sample counterpart to
a moment condition
T
1X
st () = 0 corresponds to E[st ()] = 0,
T t=1

and ML becomes a special case of GMM.

b
ML is consistent for weaker assumptions than maintained by ML.
The FOC for a normal regression model corresponds to
E[xt(yt x0t)] = 0,

which is weaker than the assumption that the entire distribution is correctly specified.
OLS is consistent even if t is not normal.

A ML estimation that maximizes a likelihood function dierent from the true models
likelihood is referred to as a pseudo-ML or a quasi-ML estimator.
Note that the variance matrix is no longer the inverse information.
27 of 32

(My Unfair) Comparison of ML and GMM


Maximum Likelihood

Generalized Method of Moments

Assumptions: Full specification.


Know Density(0) apart from 0.

Partial specification/weak assumptions.


Moment conditions: E[f (data;0)] = 0.
Strong economic assumptions.

Eciency:

Cramr Rao lower bound.


(Smallest possible variance).

Ecient based on moment condition.


Larger than Cramr Rao.

Typical
approach:

Statistical description of the data.


Misspecification testing.
Restrictions recover economics.

Estimate deep parameters of


economic model.

Robustness:

First order conditions should hold!


Moment conditions should hold!
PML is a GMM interpretation of ML. Weights and variances can
Use larger PML variance.
be made robust.

28 of 32

Example: The C-CAPM Model


Consider the consumption based capital asset pricing (C-CAPM) model of Hansen
and Singleton (1982).
A representative agent maximizes the discounted value of lifetime utility subject to
a budget constraint:
max

X
s=1

E [ s u(ct+s) | It] ,

At+1 = (1 + rt+1) At + yt+1 ct+1,


where At is financial wealth, yt is income, 0 1 is a discount factor, and It is
the information set at time t.

The first order condition is given by the Euler equation:


u0(ct) = E [ u0(ct+1) Rt+1 | It] ,

where u0() is the derivative, and Rt+1 = 1 + rt+1 is the return factor.
29 of 32

Now assume a constant relative risk aversion (CRRA) utility function:


ct1
,
u(ct) =
1

< 1,

so that u0(ct) = c
t . That gives the explicit Euler equation:

R
|
I
t+1
t = 0.
t
t+1

To ensure stationarity, we reformulate:


#
"

ct+1
E
Rt+1 1 | It = 0,
ct
which is a conditional moment condition.

That implies the unconditional moment conditions


"
! #

ct+1
E [f (ct+1, ct, Rt+1; zt; , )] = E

Rt+1 1 zt = 0,
ct
for all variables zt It included in the formation set.

30 of 32

To estimate the parameters, = (, )0, we need at least R = 2 instruments in zt.


We try with R = 3 instruments:

0
ct
zt = 1,
, Rt .
ct1
That produces the moment conditions
"
!
#

ct+1
E

Rt+1 1
= 0
ct
"
!

#
ct+1
ct
E

Rt+1 1
= 0
ct
ct1
"
#
!

ct+1
E

= 0,
Rt+1 1 Rt
ct
for t = 1, 2, ..., T .

The model is formally identified but is poorly determined.


Weak instruments, little variation in the data, or wrong model!
31 of 32

Results for US data, 1959 : 3 1978 : 12.


Method
Lags

T
J
DF
2-Step
HC
1
0.9987
0.8770 237 0.434
1

p val
0.510

(0.0086)

(3.6792)

Iterated HC

0.9982

1.0249 237 1.068

0.301

CU

HC

0.9981

0.9549 237 1.067

0.302

2-Step

HAC

0.9987

0.8876 237 0.429

0.513

Iterated HAC

0.9980

0.8472 237 1.091

0.296

CU

HAC

0.9977

0.7093 237 1.086

0.297

2-Step

HC

0.9975

0.0149 236 1.597

0.660

Iterated HC

0.9968 0.0210 236 3.579

0.311

0.9958 0.5526 236 3.501

0.321

0.9970 0.1872 236 1.672

0.643

0.9965 0.2443 236 3.685

0.298

0.9952 0.9094 236 3.591

0.309

CU
2-Step

HC
HAC

Iterated HAC
CU

HAC

2
2
2
2

(0.0044)
(0.0044)
(0.0092)
(0.0045)
(0.0045)
(0.0066)
(0.0045)
(0.0046)
(0.0068)
(0.0047)
(0.0048)

(1.8614)
(1.8629)
(4.0228)
(1.8757)
(1.8815)
(2.6415)

(1.7925)
(1.8267)
(2.7476)
(1.8571)
(1.9108)

32 of 32

Das könnte Ihnen auch gefallen