Sie sind auf Seite 1von 33

Stata implementation Specification tests Panel data models with strictly exogenous instruments

GMM: Stata implementation and tests

Giovanni Bruno1
1 Bocconi University

Econometrics - ESS, 2016-2017


Stata implementation Specification tests Panel data models with strictly exogenous instruments

1 Stata implementation
Notation
ivregress 2sls
ivregress gmm

2 Specification tests
Non-robust Hausman test
Robust Hausman test
Hansen-Sargan test of overidentifying restrictions
Testing for weak instruments
Limited Information ML (LIML)

3 Panel data models with strictly exogenous instruments


Stata implementation Specification tests Panel data models with strictly exogenous instruments

Notation

Notation
z contains all available exogenous variable ==> if some of
the x are exogenous, say x1 , they have to be part of z and we
write z (x1 z1 ) , referring to
x1 as included exogenous variables with k1 = the number of
such variables
z1 as excluded exogenous variables (the strictly speaking
instrumental variables for many authors) with L1 = the number
of such variables
The remaining variables in x, say x2 , are the endogenous
variables in the model with k2 = the number of such variables
==> k = k1 + k2 and L = k1 + L1
==> The necessary order condition L k is satisfied if
and only if the number of endogenous variables is no
greater than the number of excluded exogenous, that
is L1 k2
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress 2sls

ivregress 2sls

TSLS is implemented by the command ivregress 2sls


followed by the name of the dependent variable y , the names
of the included exogenous x1 and, within parentheses, the
names of the endogenous variables x2 to the left of the equal
symbol = and the names of the excluded exogenous z1 to the
right of =, as follows
ivregress 2sls depvar indepvars (endog_vars =
instruments ), options
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress 2sls

. * IV estimation of a just-identified model with single endog regressor


.
. ivregress 2sls ldrugexp (hi_empunion = ssiratio) totchr age female blhisp linc, first

First-stage regressions

Number of obs = 10089


F( 6, 10082) = 138.32
Prob > F = 0.0000
R-squared = 0.0761
Adj R-squared = 0.0755
Root MSE = 0.4672

hi_empunion Coef. Std. Err. t P>|t| [95% Conf. Interval]

totchr .0127865 .0036225 3.53 0.000 .0056856 .0198874


age -.0086323 .000713 -12.11 0.000 -.01003 -.0072347
female -.07345 .0094932 -7.74 0.000 -.0920586 -.0548414
blhisp -.06268 .0127687 -4.91 0.000 -.0877091 -.0376509
linc .0483937 .0056768 8.52 0.000 .0372661 .0595212
ssiratio -.1916432 .0141289 -13.56 0.000 -.2193387 -.1639477
_cons 1.028981 .0574094 17.92 0.000 .9164466 1.141514

Instrumental variables (2SLS) regression Number of obs = 10089


Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress 2sls

Instrumental variables (2SLS) regression Number of obs = 10089


Wald chi2(6) = 1919.06
Prob > chi2 = 0.0000
R-squared = 0.0640
Root MSE = 1.3177

ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]

hi_empunion -.8975913 .2079185 -4.32 0.000 -1.305104 -.4900786


totchr .4502655 .0104189 43.22 0.000 .4298449 .4706861
age -.0132176 .0028749 -4.60 0.000 -.0188523 -.0075829
female -.020406 .0315408 -0.65 0.518 -.0822249 .0414129
blhisp -.2174244 .0386745 -5.62 0.000 -.2932249 -.1416238
linc .0870018 .0220144 3.95 0.000 .0438543 .1301493
_cons 6.78717 .2554343 26.57 0.000 6.286528 7.287812

Instrumented: hi_empunion
Instruments: totchr age female blhisp linc ssiratio

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

ivregress gmm

The GMM in the linear model is implemented by ivregress


gmm with the same syntax as for TSLS, as follows
ivregress gmm depvar indepvars (endog_vars =
instruments ), options
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

The GMM weighting matrix

The weighting matrix in the optimal two-step GMM estimator is


1
/n
A = Z 0 SZ . (1)

A is a consistent estimate of the inverse of Var p1 Z 0 # , the
n
variance-covariance matrix of the sample moments. Choices of S
are the following:
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

If # is homoskedastic and independent, S = I (the resulting


GMM estimator collapses to TSLS). Its implemented through
the ivregress gmm option: wmatrix(unadjusted).
If # is heteroskedastic and independent, S is diagonal:
0 1
e12 0 0
B .. C
B 0 e22 . C
S = B
B .. ..
C
C
@ . . 0 A
0 0 en2

with ei = yi xi0 b
b
TSLS , i = 1, ..., n. Its implemented through
the ivregress gmm option: wmatrix(robust). Its the
default.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

If errors are clustered, then S is a block diagonal matrix with


generic block equal to the outer product of the residuals
peculiar to the corresponding cluster. Residuals are taken
from a one-step consistent regression (TSLS):
0 1
S 1 0 0
B .. C
B 0 S 2 . C

S=B .B C
@ .. .. C
. 0 A
0 0 S N

with S i = ei ei0 and ei = yi xi0 b


b
TSLS is the vector of
residual observations peculiar to cluster i = 1, ..., N. Its
implemented through the ivregress gmm option:
wmatrix(cluster cluster_var ).
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

With time-series data and # heteroskedastic and serially correlated


the optimal weighting matrix A may be assembled by using the
Newey-West heteroskedasticity- and- autocorrelation-consistent
(HAC) estimator. This is implemented by specifying wmatrix(hac
kernel # ), which requests a weighting matrix using the specified
kernel (see below) with # lags. The bandwidth of a kernel is equal
to the number of lags plus one.
Specifying wmatrix(hac kernel opt) requests an HAC
weighting matrix using the specified kernel, and the lag order is
selected using Newey and Wests (1994) optimal lag-selection
algorithm. Specifying wmatrix(hac kernel ) requests an HAC
weighting matrix using the specified kernel and n-2 lags, where n is
the sample size.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

There are three kernels available for HAC weighting matrices:


bartlett or nwest requests the Bartlett (Newey-West)
kernel;
parzen or gallant requests the Parzen (Gallant 1987) kernel;
and
quadraticspectral or andrews requests the quadratic
spectral (Andrews 1991) kernel.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

Iterative GMM

The GMM procedure can be iterated by adding the option igmm to


ivregress gmm. The resulting estimator is asymptotically
equivalent to the two-step estimator. Hall (2005) suggests that it
may have a better finite-sample performance.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

Robust standard errors

The less efficient, but computationally simpler and still consistent,


TSLS estimator is often used inestimation.
Its robust
variance-covariance matrix Var b b
TSLS is consistently estimated
as

\ 1 1
Var b [Z ] X X 0 P[Z ] X
TSLS = X P[Z ] X X 0 P[Z ] SP
b 0
,

where S is chosen according to the various departures from


homoskedasticity and independence spelled out above. The Stata
implementation of the variance-covariance estimators is through
the following ivregress options: vce(unadjusted),
vce(robust), vce(cluster cluster_var ), vce(hac kernel
... )
Stata implementation Specification tests Panel data models with strictly exogenous instruments

ivregress gmm

Results for four GMM estimators

Variable TwoSLS GMM_het GMM_clu TwoSLS_~f

hi_empunion -0.98993 -0.99328 -1.03587 -0.98993


0.20459 0.20467 0.20438 0.19221
totchr 0.45121 0.45095 0.44822 0.45121
0.01031 0.01031 0.01325 0.01051
age -0.01414 -0.01415 -0.01185 -0.01414
0.00290 0.00290 0.00626 0.00278
female -0.02784 -0.02817 -0.02451 -0.02784
0.03217 0.03219 0.02919 0.03117
blhisp -0.22371 -0.22310 -0.20907 -0.22371
0.03958 0.03960 0.05018 0.03870
linc 0.09427 0.09446 0.09573 0.09427
0.02188 0.02190 0.01474 0.02123
_cons 6.87519 6.87782 6.72769 6.87519
0.25789 0.25800 0.50588 0.24528

legend: b/se

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Non-robust Hausman test

Non-robust Hausman test

We test exogeneity of X2 maintaining instruments validity:


E (#|Z ) = 0, which implies E (#|X1 ) = 0
==> H0 : E (#|X1 X2 ) = E (#|X1 )

A conventional Hausman test ([2]) can be implemented, based


on the Hausmans statistics measuring the statistical
difference between IV and OLS estimates. It would not be
robust to heteroskedastic and clustered errors, though.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Robust Hausman test

Robust Hausman test


A robust version of the test is implemented through the
control-function approach, recasting endogeneity as a
misspecification problem in the structural equation (see [5]
and [1])
y = X b + #, (2)
X = (X1 X2 ), b = b10 b20 , Z = (X1 Z1 ) and # = u + np.
u : E (u|X ) = 0 and n is the n k2 -matrix of the errors in the
k2 first-stage equations. NB: n is what makes X2 endogenous.
Replacing n in (2) with the residuals from the first-stage
regressions, n = M[Z ] X2 , (M[Z ] I P[Z ] ) makes the H test
a simple test of joint significance for p in the auxiliary OLS
regression
y = X b + M[Z ] X2 p + u . (3)
The test works since under the alternative of p 6= 0, OLS
estimation of the auxiliary regression yields the TSLS
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Robust Hausman test

The H test could be easily robustified for heteroskedasticity


and/or clustered errors by testing the joint significance of p
via test after estimating (3) with regress and a suitable
robust option:
with heteroskedasticity: vce(robust)
with heteroskedasticity and cluster correlation: vce(cluster
clustervar ).
The above is not necessary, though. The various versions of
the H test can be immediately implemented in Stata through
the ivregress postestimation command estat endogenous.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Robust Hausman test

. * Robust Durbin-Wu-Hausman test of endogeneity implemented by estat endogenous


. ivregress 2sls ldrugexp (hi_empunion = ssiratio) $xlist, vce(robust)

Instrumental variables (2SLS) regression Number of obs = 10089


Wald chi2(6) = 2000.86
Prob > chi2 = 0.0000
R-squared = 0.0640
Root MSE = 1.3177

Robust
ldrugexp Coef. Std. Err. z P>|z| [95% Conf. Interval]

hi_empunion -.8975913 .2211268 -4.06 0.000 -1.330992 -.4641908


totchr .4502655 .0101969 44.16 0.000 .43028 .470251
age -.0132176 .0029977 -4.41 0.000 -.0190931 -.0073421
female -.020406 .0326114 -0.63 0.531 -.0843232 .0435113
blhisp -.2174244 .0394944 -5.51 0.000 -.294832 -.1400167
linc .0870018 .0226356 3.84 0.000 .0426368 .1313668
_cons 6.78717 .2688453 25.25 0.000 6.260243 7.314097

Instrumented: hi_empunion
Instruments: totchr age female blhisp linc ssiratio

.
. estat endogenous

Tests of endogeneity
Ho: variables are exogenous

Robust score chi2(1) = 24.935 (p = 0.0000)


Robust regression F(1,10081) = 26.4333 (p = 0.0000)

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Robust Hausman test

. estat endogenous,forcenonrobust

Tests of endogeneity
Ho: variables are exogenous

Durbin (score) chi2(1) = 25.2819 (p = 0.0000)


Wu-Hausman F(1,10081) = 25.3253 (p = 0.0000)
Robust score chi2(1) = 24.935 (p = 0.0000)
Robust regression F(1,10081) = 26.4333 (p = 0.0000)

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Robust Hausman test

. * Robust Durbin-Wu-Hausman test of endogeneity implemented manually


. global xlist totchr age female blhisp linc

. quietly regress hi_empunion ssiratio $xlist

. quietly predict v1hat, resid

. quietly regress ldrugexp hi_empunion v1hat $xlist, vce(robust)

. test v1hat

( 1) v1hat = 0

F( 1, 10081) = 26.43
Prob > F = 0.0000

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Hansen-Sargan test of overidentifying restrictions

Hansen-Sargan Test

If the population moment conditions


are true, then the minimized
GMM criterion function Q b b
TSLS should not be significantly
different from zero. This provides a test for the validity of the
L k over-identifying moment conditions based on the following
statistic (Hansen-Sargan test)

2
S nQ b b
TSLS c (L k) .
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Hansen-Sargan test of overidentifying restrictions

. * Test of overidentifying restrictions following ivregress gmm


. quietly ivregress gmm ldrugexp (hi_empunion = ssiratio multlc) ///
> $xlist, wmatrix(robust)

.
. estat overid

Test of overidentifying restriction:

Hansen's J chi2(1) = 1.04754 (p = 0.3061)

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Testing for weak instruments

Testing for weak instruments

Staiger and Stocks rule of thumb: partial F tests in the first


stage regression > 10. Not rigorous, rejects too often weak
intruments, no obvious implementation when there are more
than one endogenous variables.
Stock and Yogos (2005) two tests overcome all of the above
difficulties. Both based on the on the minimum eigenvalue of
a matrix analog of the partial F test, a statistics introduced by
Cragg and Donald (1993) to test nonidentification.
Importantly, the large-sample properties for both tests have
been derived under the assumption of homoskedastic and
independent errors: caution must be taken.
Both procedures are implemented by the ivregress
postestimation command estat firststage.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Testing for weak instruments

. * Weak instrument tests - just-identified model


. quietly ivregress 2sls ldrugexp (hi_empunion = ssiratio) $xlist, vce(robust)

.
. estat firststage, forcenonrobust all /// implements the Stock and
> ///Yogo (2005) weak instrument tests
>

First-stage regression summary statistics

Adjusted Partial Robust


Variable R-sq. R-sq. R-sq. F(1,10082) Prob > F

hi_empunion 0.0761 0.0755 0.0179 65.7602 0.0000


Stata implementation Specification tests Panel data models with strictly exogenous instruments

Testing for weak instruments

Minimum eigenvalue statistic = 183.98

Critical Values # of endogenous regressors: 1


Ho: Instruments are weak # of excluded instruments: 1

5% 10% 20% 30%


2SLS relative bias (not available)

10% 15% 20% 25%


2SLS Size of nominal 5% Wald test 16.38 8.96 6.66 5.53
LIML Size of nominal 5% Wald test 16.38 8.96 6.66 5.53
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Testing for weak instruments

Olea and Pfluger (2013) derive a new test for weak


instruments that extends that by Stock and Yogo to
heteroskedasticity and cluster correlation. This is implemented
in Stata by the user-written command weakivtest after
ivregress.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Testing for weak instruments

. weakivtest /// weakivtest (user-written)


> /// implements the weak
> /// instrument test of Montiel Olea and
> /// Pflueger (2013).
> /// extends Stock and Yogo (2005)
> /// to accommodate heteroskedasticity
> /// and cluster correlation
> /// It is a postestimation command for
> /// ivregress.
>
(obs=10089)

Montiel-Pflueger robust weak instrument test

Effective F statistic: 65.760


Confidence level alpha: 5%

Critical Values TSLS LIML

% of Worst Case Bias


tau=5% 37.418 37.418
tau=10% 23.109 23.109
tau=20% 15.062 15.062
tau=30% 12.039 12.039

.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Limited Information ML (LIML)

LIML

LIML is a ML estimator maintaining that erorrs in structural and


first-stage equations are jointly normal. It is not full-information
ML because it is based on reduced form first-stage equations,
rather than fully specified structural equations for the included
endogenous variables. It has often better finite sample properties,
but is less robust than TSLS. It is implemented by ivregress
liml.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Panel data models with strictly exogenous instruments

The conventional panel-data transformations, group-mean


deviations and partial deviations, can be applied to yield consistent
panel-data IV-GMM estimators only if there is a matrix Z of
strictly exogenous variables.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

The FE-TSLS estimator is simply computed by applying TSLS


to variables y, Z and X transformed in group-mean deviations.
The RE-TSLS estimator is simply computed by applying TSLS
to variables y, Z and X transformed in partial deviations.
For both estimators to be consistent it is required that
E (#|Z ) = 0.
FE-TSLS and RE-TSLS are implemented in Stata by xtivreg
with options, respectively, fe and re (or default). For the rest
the syntax is as that of ivregress 2sls.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

A. C. Cameron and P. K. Trivedi.


Microeconometrics using Stata - Revised Edition.
Stata Press, College Station, TX, 2010.
J. Hausman.
Specification tests in econometrics.
Econometrica, 46:12511271, 1978.
J. L. M. Olea and C. Pfluger.
A robust test for weak instruments.
Journal of Business & Economic Statistics, pages 358369,
2013.
J. H. Stock and M. Yogo.
Testing forweak instruments in linear iv regression.
In D.W. Andrews and J. H. Stock, editors, Identification and
Inference for Econometric Models: Essays in Honor of Thomas
Rothenberg, pages 80108. Cambridge: Cambridge University
Press, 2005.
Stata implementation Specification tests Panel data models with strictly exogenous instruments

Jeffrey M. Wooldridge.
Econometric Analysis of Cross Section and Panel Data.
The MIT Press, Cambridge, MA, 2nd edition, 2010.

Das könnte Ihnen auch gefallen