GMM Stata Implementation ESS 2017

Stata implementation Specification tests Panel data models with strictly exogenous instruments
GMM: Stata implementation and tests
Giovanni Bruno1
1 Bocconi University
Econometrics - ESS, 2016-2017

1 Stata implementation
Notation
ivregress 2sls
ivregress gmm
2 Specification tests
Non-robust Hausman test
Robust Hausman test
Hansen-Sargan test of overidentifying restrictions
Testing for weak instruments
Limited Information ML (LIML)
3 Panel data models with strictly exogenous instruments

Notation
Notation
z contains all available exogenous variable ==> if some of
the x are exogenous, say x1 , they have to be part of z and we
write z (x1 z1 ) , referring to
x1 as included exogenous variables with k1 = the number of
such variables
z1 as excluded exogenous variables (the strictly speaking
instrumental variables for many authors) with L1 = the number
of such variables
The remaining variables in x, say x2 , are the endogenous
variables in the model with k2 = the number of such variables
==> k = k1 + k2 and L = k1 + L1
==> The necessary order condition L k is satisfied if
and only if the number of endogenous variables is no
greater than the number of excluded exogenous, that
is L1 k2
ivregress 2sls
ivregress 2sls
TSLS is implemented by the command ivregress 2sls

followed by the name of the dependent variable y , the names
of the included exogenous x1 and, within parentheses, the
names of the endogenous variables x2 to the left of the equal
symbol = and the names of the excluded exogenous z1 to the
right of =, as follows
ivregress 2sls depvar indepvars (endog_vars =
instruments ), options
ivregress 2sls
. * IV estimation of a just-identified model with single endog regressor

.
. ivregress 2sls ldrugexp (hi_empunion = ssiratio) totchr age female blhisp linc, first
First-stage regressions
Number of obs = 10089

F( 6, 10082) = 138.32
Prob > F = 0.0000
R-squared = 0.0761
Adj R-squared = 0.0755
Root MSE = 0.4672
hi_empunion Coef. Std. Err. t P>|t| [95% Conf. Interval]
totchr .0127865 .0036225 3.53 0.000 .0056856 .0198874

age -.0086323 .000713 -12.11 0.000 -.01003 -.0072347
female -.07345 .0094932 -7.74 0.000 -.0920586 -.0548414
blhisp -.06268 .0127687 -4.91 0.000 -.0877091 -.0376509
linc .0483937 .0056768 8.52 0.000 .0372661 .0595212
ssiratio -.1916432 .0141289 -13.56 0.000 -.2193387 -.1639477
_cons 1.028981 .0574094 17.92 0.000 .9164466 1.141514
Instrumental variables (2SLS) regression Number of obs = 10089

ivregress 2sls

Wald chi2(6) = 1919.06
Prob > chi2 = 0.0000
R-squared = 0.0640
Root MSE = 1.3177
ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]
hi_empunion -.8975913 .2079185 -4.32 0.000 -1.305104 -.4900786

totchr .4502655 .0104189 43.22 0.000 .4298449 .4706861
age -.0132176 .0028749 -4.60 0.000 -.0188523 -.0075829
female -.020406 .0315408 -0.65 0.518 -.0822249 .0414129
blhisp -.2174244 .0386745 -5.62 0.000 -.2932249 -.1416238
linc .0870018 .0220144 3.95 0.000 .0438543 .1301493
_cons 6.78717 .2554343 26.57 0.000 6.286528 7.287812
Instrumented: hi_empunion
Instruments: totchr age female blhisp linc ssiratio
.
ivregress gmm
ivregress gmm
The GMM in the linear model is implemented by ivregress

gmm with the same syntax as for TSLS, as follows
ivregress gmm depvar indepvars (endog_vars =
instruments ), options
ivregress gmm
The GMM weighting matrix
The weighting matrix in the optimal two-step GMM estimator is

1
/n
A = Z 0 SZ . (1)

A is a consistent estimate of the inverse of Var p1 Z 0 # , the
n
variance-covariance matrix of the sample moments. Choices of S
are the following:
ivregress gmm
If # is homoskedastic and independent, S = I (the resulting

GMM estimator collapses to TSLS). Its implemented through
the ivregress gmm option: wmatrix(unadjusted).
If # is heteroskedastic and independent, S is diagonal:
0 1
e12 0 0
B .. C
B 0 e22 . C
S = B
B .. ..
C
C
@ . . 0 A
0 0 en2
with ei = yi xi0 b
b
TSLS , i = 1, ..., n. Its implemented through
the ivregress gmm option: wmatrix(robust). Its the
default.
ivregress gmm
If errors are clustered, then S is a block diagonal matrix with

generic block equal to the outer product of the residuals
peculiar to the corresponding cluster. Residuals are taken
from a one-step consistent regression (TSLS):
0 1
S 1 0 0
B .. C
B 0 S 2 . C

S=B .B C
@ .. .. C
. 0 A
0 0 S N
with S i = ei ei0 and ei = yi xi0 b

b
TSLS is the vector of
residual observations peculiar to cluster i = 1, ..., N. Its
implemented through the ivregress gmm option:
wmatrix(cluster cluster_var ).
ivregress gmm
With time-series data and # heteroskedastic and serially correlated

the optimal weighting matrix A may be assembled by using the
Newey-West heteroskedasticity- and- autocorrelation-consistent
(HAC) estimator. This is implemented by specifying wmatrix(hac
kernel # ), which requests a weighting matrix using the specified
kernel (see below) with # lags. The bandwidth of a kernel is equal
to the number of lags plus one.
Specifying wmatrix(hac kernel opt) requests an HAC
weighting matrix using the specified kernel, and the lag order is
selected using Newey and Wests (1994) optimal lag-selection
algorithm. Specifying wmatrix(hac kernel ) requests an HAC
weighting matrix using the specified kernel and n-2 lags, where n is
the sample size.
ivregress gmm
There are three kernels available for HAC weighting matrices:

bartlett or nwest requests the Bartlett (Newey-West)
kernel;
parzen or gallant requests the Parzen (Gallant 1987) kernel;
and
quadraticspectral or andrews requests the quadratic
spectral (Andrews 1991) kernel.
ivregress gmm
Iterative GMM
The GMM procedure can be iterated by adding the option igmm to

ivregress gmm. The resulting estimator is asymptotically
equivalent to the two-step estimator. Hall (2005) suggests that it
may have a better finite-sample performance.
ivregress gmm
Robust standard errors
The less efficient, but computationally simpler and still consistent,

TSLS estimator is often used inestimation.
Its robust
variance-covariance matrix Var b b
TSLS is consistently estimated
as

\ 1 1
Var b [Z ] X X 0 P[Z ] X
TSLS = X P[Z ] X X 0 P[Z ] SP
b 0
,
where S is chosen according to the various departures from

homoskedasticity and independence spelled out above. The Stata
implementation of the variance-covariance estimators is through
the following ivregress options: vce(unadjusted),
vce(robust), vce(cluster cluster_var ), vce(hac kernel
... )
ivregress gmm
Results for four GMM estimators
Variable TwoSLS GMM_het GMM_clu TwoSLS_~f
hi_empunion -0.98993 -0.99328 -1.03587 -0.98993

0.20459 0.20467 0.20438 0.19221
totchr 0.45121 0.45095 0.44822 0.45121
0.01031 0.01031 0.01325 0.01051
age -0.01414 -0.01415 -0.01185 -0.01414
0.00290 0.00290 0.00626 0.00278
female -0.02784 -0.02817 -0.02451 -0.02784
0.03217 0.03219 0.02919 0.03117
blhisp -0.22371 -0.22310 -0.20907 -0.22371
0.03958 0.03960 0.05018 0.03870
linc 0.09427 0.09446 0.09573 0.09427
0.02188 0.02190 0.01474 0.02123
_cons 6.87519 6.87782 6.72769 6.87519
0.25789 0.25800 0.50588 0.24528
legend: b/se
.
We test exogeneity of X2 maintaining instruments validity:

E (#|Z ) = 0, which implies E (#|X1 ) = 0
==> H0 : E (#|X1 X2 ) = E (#|X1 )
A conventional Hausman test ([2]) can be implemented, based

on the Hausmans statistics measuring the statistical
difference between IV and OLS estimates. It would not be
robust to heteroskedastic and clustered errors, though.
Robust Hausman test
Robust Hausman test

A robust version of the test is implemented through the
control-function approach, recasting endogeneity as a
misspecification problem in the structural equation (see [5]
and [1])
y = X b + #, (2)
X = (X1 X2 ), b = b10 b20 , Z = (X1 Z1 ) and # = u + np.
u : E (u|X ) = 0 and n is the n k2 -matrix of the errors in the
k2 first-stage equations. NB: n is what makes X2 endogenous.
Replacing n in (2) with the residuals from the first-stage
regressions, n = M[Z ] X2 , (M[Z ] I P[Z ] ) makes the H test
a simple test of joint significance for p in the auxiliary OLS
regression
y = X b + M[Z ] X2 p + u . (3)
The test works since under the alternative of p 6= 0, OLS
estimation of the auxiliary regression yields the TSLS
Robust Hausman test
The H test could be easily robustified for heteroskedasticity

and/or clustered errors by testing the joint significance of p
via test after estimating (3) with regress and a suitable
robust option:
with heteroskedasticity: vce(robust)
with heteroskedasticity and cluster correlation: vce(cluster
clustervar ).
The above is not necessary, though. The various versions of
the H test can be immediately implemented in Stata through
the ivregress postestimation command estat endogenous.
Robust Hausman test
. * Robust Durbin-Wu-Hausman test of endogeneity implemented by estat endogenous

. ivregress 2sls ldrugexp (hi_empunion = ssiratio) $xlist, vce(robust)

Wald chi2(6) = 2000.86
Prob > chi2 = 0.0000
R-squared = 0.0640
Root MSE = 1.3177
Robust
ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]
hi_empunion -.8975913 .2211268 -4.06 0.000 -1.330992 -.4641908

totchr .4502655 .0101969 44.16 0.000 .43028 .470251
age -.0132176 .0029977 -4.41 0.000 -.0190931 -.0073421
female -.020406 .0326114 -0.63 0.531 -.0843232 .0435113
blhisp -.2174244 .0394944 -5.51 0.000 -.294832 -.1400167
linc .0870018 .0226356 3.84 0.000 .0426368 .1313668
_cons 6.78717 .2688453 25.25 0.000 6.260243 7.314097
Instrumented: hi_empunion
Instruments: totchr age female blhisp linc ssiratio
.
. estat endogenous
Tests of endogeneity
Ho: variables are exogenous
Robust score chi2(1) = 24.935 (p = 0.0000)

Robust regression F(1,10081) = 26.4333 (p = 0.0000)
.
Robust Hausman test
. estat endogenous,forcenonrobust
Tests of endogeneity
Ho: variables are exogenous
Durbin (score) chi2(1) = 25.2819 (p = 0.0000)

Wu-Hausman F(1,10081) = 25.3253 (p = 0.0000)
Robust score chi2(1) = 24.935 (p = 0.0000)
Robust regression F(1,10081) = 26.4333 (p = 0.0000)
.
Robust Hausman test
. * Robust Durbin-Wu-Hausman test of endogeneity implemented manually

. global xlist totchr age female blhisp linc
. quietly regress hi_empunion ssiratio $xlist
. quietly predict v1hat, resid
. quietly regress ldrugexp hi_empunion v1hat $xlist, vce(robust)
. test v1hat
( 1) v1hat = 0
F( 1, 10081) = 26.43
Prob > F = 0.0000
.
Hansen-Sargan Test
If the population moment conditions

are true, then the minimized
GMM criterion function Q b b
TSLS should not be significantly
different from zero. This provides a test for the validity of the
L k over-identifying moment conditions based on the following
statistic (Hansen-Sargan test)

2
S nQ b b
TSLS c (L k) .
. * Test of overidentifying restrictions following ivregress gmm

. quietly ivregress gmm ldrugexp (hi_empunion = ssiratio multlc) ///
> $xlist, wmatrix(robust)
.
. estat overid
Test of overidentifying restriction:
Hansen's J chi2(1) = 1.04754 (p = 0.3061)
.
Staiger and Stocks rule of thumb: partial F tests in the first

stage regression > 10. Not rigorous, rejects too often weak
intruments, no obvious implementation when there are more
than one endogenous variables.
Stock and Yogos (2005) two tests overcome all of the above
difficulties. Both based on the on the minimum eigenvalue of
a matrix analog of the partial F test, a statistics introduced by
Cragg and Donald (1993) to test nonidentification.
Importantly, the large-sample properties for both tests have
been derived under the assumption of homoskedastic and
independent errors: caution must be taken.
Both procedures are implemented by the ivregress
postestimation command estat firststage.
. * Weak instrument tests - just-identified model

. quietly ivregress 2sls ldrugexp (hi_empunion = ssiratio) $xlist, vce(robust)
.
. estat firststage, forcenonrobust all /// implements the Stock and
> ///Yogo (2005) weak instrument tests
>
First-stage regression summary statistics
Adjusted Partial Robust

Variable R-sq. R-sq. R-sq. F(1,10082) Prob > F
hi_empunion 0.0761 0.0755 0.0179 65.7602 0.0000

Minimum eigenvalue statistic = 183.98
Critical Values # of endogenous regressors: 1

Ho: Instruments are weak # of excluded instruments: 1
5% 10% 20% 30%

2SLS relative bias (not available)
10% 15% 20% 25%

2SLS Size of nominal 5% Wald test 16.38 8.96 6.66 5.53
LIML Size of nominal 5% Wald test 16.38 8.96 6.66 5.53
Olea and Pfluger (2013) derive a new test for weak

instruments that extends that by Stock and Yogo to
heteroskedasticity and cluster correlation. This is implemented
in Stata by the user-written command weakivtest after
ivregress.
. weakivtest /// weakivtest (user-written)

> /// implements the weak
> /// instrument test of Montiel Olea and
> /// Pflueger (2013).
> /// extends Stock and Yogo (2005)
> /// to accommodate heteroskedasticity
> /// and cluster correlation
> /// It is a postestimation command for
> /// ivregress.
>
(obs=10089)
Montiel-Pflueger robust weak instrument test
Effective F statistic: 65.760

Confidence level alpha: 5%
Critical Values TSLS LIML
% of Worst Case Bias

tau=5% 37.418 37.418
tau=10% 23.109 23.109
tau=20% 15.062 15.062
tau=30% 12.039 12.039
.
Limited Information ML (LIML)
LIML
LIML is a ML estimator maintaining that erorrs in structural and

first-stage equations are jointly normal. It is not full-information
ML because it is based on reduced form first-stage equations,
rather than fully specified structural equations for the included
endogenous variables. It has often better finite sample properties,
but is less robust than TSLS. It is implemented by ivregress
liml.
Panel data models with strictly exogenous instruments
The conventional panel-data transformations, group-mean

deviations and partial deviations, can be applied to yield consistent
panel-data IV-GMM estimators only if there is a matrix Z of
strictly exogenous variables.
The FE-TSLS estimator is simply computed by applying TSLS

to variables y, Z and X transformed in group-mean deviations.
The RE-TSLS estimator is simply computed by applying TSLS
to variables y, Z and X transformed in partial deviations.
For both estimators to be consistent it is required that
E (#|Z ) = 0.
FE-TSLS and RE-TSLS are implemented in Stata by xtivreg
with options, respectively, fe and re (or default). For the rest
the syntax is as that of ivregress 2sls.
A. C. Cameron and P. K. Trivedi.

Microeconometrics using Stata - Revised Edition.
Stata Press, College Station, TX, 2010.
J. Hausman.
Specification tests in econometrics.
Econometrica, 46:12511271, 1978.
J. L. M. Olea and C. Pfluger.
A robust test for weak instruments.
Journal of Business & Economic Statistics, pages 358369,
2013.
J. H. Stock and M. Yogo.
Testing forweak instruments in linear iv regression.
In D.W. Andrews and J. H. Stock, editors, Identification and
Inference for Econometric Models: Essays in Honor of Thomas
Rothenberg, pages 80108. Cambridge: Cambridge University
Press, 2005.
Jeffrey M. Wooldridge.
Econometric Analysis of Cross Section and Panel Data.
The MIT Press, Cambridge, MA, 2nd edition, 2010.

GMM Stata Implementation ESS 2017

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

GMM Stata Implementation ESS 2017

Hochgeladen von

Copyright:

Verfügbare Formate

Stata implementation Specification tests Panel data models with strictly exogenous instruments

GMM: Stata implementation and tests

Econometrics - ESS, 2016-2017

3 Panel data models with strictly exogenous instruments

TSLS is implemented by the command ivregress 2sls

. * IV estimation of a just-identified model with single endog regressor

Number of obs = 10089

hi_empunion Coef. Std. Err. t P>|t| [95% Conf. Interval]

totchr .0127865 .0036225 3.53 0.000 .0056856 .0198874

Instrumental variables (2SLS) regression Number of obs = 10089

Instrumental variables (2SLS) regression Number of obs = 10089

ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]

hi_empunion -.8975913 .2079185 -4.32 0.000 -1.305104 -.4900786

The GMM in the linear model is implemented by ivregress

The GMM weighting matrix

The weighting matrix in the optimal two-step GMM estimator is

If # is homoskedastic and independent, S = I (the resulting

If errors are clustered, then S is a block diagonal matrix with

with S i = ei ei0 and ei = yi xi0 b

With time-series data and # heteroskedastic and serially correlated

There are three kernels available for HAC weighting matrices:

The GMM procedure can be iterated by adding the option igmm to

Robust standard errors

The less efficient, but computationally simpler and still consistent,

where S is chosen according to the various departures from

Results for four GMM estimators

Variable TwoSLS GMM_het GMM_clu TwoSLS_~f

hi_empunion -0.98993 -0.99328 -1.03587 -0.98993

Non-robust Hausman test

Non-robust Hausman test

We test exogeneity of X2 maintaining instruments validity:

A conventional Hausman test ([2]) can be implemented, based

Robust Hausman test

Robust Hausman test

Robust Hausman test

The H test could be easily robustified for heteroskedasticity

Robust Hausman test

. * Robust Durbin-Wu-Hausman test of endogeneity implemented by estat endogenous

Instrumental variables (2SLS) regression Number of obs = 10089

hi_empunion -.8975913 .2211268 -4.06 0.000 -1.330992 -.4641908

Robust score chi2(1) = 24.935 (p = 0.0000)

Robust Hausman test

Durbin (score) chi2(1) = 25.2819 (p = 0.0000)

Robust Hausman test

. * Robust Durbin-Wu-Hausman test of endogeneity implemented manually

. quietly regress hi_empunion ssiratio $xlist

. quietly predict v1hat, resid

. quietly regress ldrugexp hi_empunion v1hat $xlist, vce(robust)

Hansen-Sargan test of overidentifying restrictions

If the population moment conditions

Hansen-Sargan test of overidentifying restrictions

. * Test of overidentifying restrictions following ivregress gmm

Test of overidentifying restriction:

Hansen's J chi2(1) = 1.04754 (p = 0.3061)

Testing for weak instruments

Testing for weak instruments

Staiger and Stocks rule of thumb: partial F tests in the first

Testing for weak instruments

. * Weak instrument tests - just-identified model

First-stage regression summary statistics

Adjusted Partial Robust

hi_empunion 0.0761 0.0755 0.0179 65.7602 0.0000

Testing for weak instruments

Minimum eigenvalue statistic = 183.98