Sie sind auf Seite 1von 23

SEMINAR PAPER

!
!
!
!
Difference!GMM!vs.!System!GMM!
Estimating)Dynamic)Panel)Data)Models:)the)case)of)Acemoglu)et)al.)2008)

Seminar:

The Econometrics of Economic Consultancy


Interpreting Regression Estimates for Policy Makers

Supervisor:

Dr. Benedikt Heid

Date:

4 October 2015

Author:

Tobias Grohmann
1149805 (Matr.nr.)
P&E MA(6)

Difference!GMM!vs.!System!GMM!
Estimating)Dynamic)Panel)Data)Models:)the)case)of)Acemoglu)et)al.)2008)

1"Introduction"
In their Income and Democracy Acemoglu, Johnson, Robinson and Yared (2008)
(henceforth AJRY 2008) use the Arellano Bond Difference GMM estimator (Arellano
and Bond 1991) for dynamic panel data to show that there are no positive effects of
income per capita on democracy. They argue that their results invalidate the so-called
modernization theory according to which higher income per capita causes a country to
be democratic. However, the Acemoglu et.al. results have been met with criticism in
the literature. For example, Heid, Langer and Larch (2012) show that using the socalled system GMM estimator (Arellano and Bover 1995; Blundell and Bond 1998)
instead of the difference GMM estimator yields statistically positive correlation
between income and democracy (Heid, Langer, and Larch 2012, 166) and thus reject
the AJRY 2008 conclusion. The authors argue that the Arellano and Bond Difference
GMM estimator is ill suited for the dynamic panel estimated by AJRY 2008. This is
because the data shows high persistence between income and democracy and, in
comparison with the system GMM estimator, the difference GMM estimator doesn't
perform well with highly persistent variables.
In this paper I inspect closer the theoretical and methodological differences between
the difference GMM and system GMM estimators. The overall aim of this paper is to
sensitize the reader for the conditions under which to use difference GMM and system
GMM estimators from a practitioners point of view. In line with Heid, Langer and
Larch, I conclude that if the underlying data generating process can most accurately
be modeled as an autoregressive (AR) process with high persistence between the AR
explanatory variable and the dependent variable, then practitioners should choose
system GMM over difference GMM. To this purpose I present arguments that make
the identification assumptions of the system GMM estimator more plausible for the
AJRY 2008 data than those of the difference GMM estimator. Moreover, I reproduce
the results from AJRY 2008 and present system GMM estimates for their baseline
panel.

The contents of this paper a structured as follows. In section 2 I briefly present the
data, the AJRY 2008 model and initial estimation results for OLS and fixed effects FE
OLS regressions. I discuss briefly why both approaches are unsuccessful for the
dynamic panel at hand. In sections 3 and 4 I derive the difference GMM and system
GMM estimators and I present estimation results for both estimators. In particular I
argue in section 4 why system GMM estimation should be preferred over difference
GMM estimation for the AJRY 2008 panel. Section 6 concludes the paper.

2"AJRY"2008:"data,"model"and"estimates"
Dynamic"Panel"Data"
The difference GMM and system GMM estimators are used for the econometric
analysis of dynamic economic relationships in panel data. Econometric analysis of
panel data means that researchers observe many different individuals over time. This
means that the underlying data contains a total of !# individual observations. A
typical characteristic of such dynamic panel data is large N, small T, i.e. that we
have many observed individuals, but few observations over time. The observed
individuals might be consumers, firms or countries. In their baseline panel AJRY
2008 observe 945 countries over time, from 1960 to 2000 in five-year intervals. The
focus of AJRY 2008 is on the relationship between the countries levels of democracy
and the respective levels of income per capita; more specifically, they investigate the
dependence of democracy on income. To measure this relationship AJRY 2008 use
two different indicators for democracy. First, the Freedom House measure of
democracy and, second, the Polity measure of democracy. They use these two
measures in order to test for robustness over various indicators. Income is measured
in GPD per capita in PPP terms.
Now, panel data allows investigating dynamic economic relationships, i.e. economic
relationships in which variables adjust over time. In the AJRY 2008 estimation, the
levels of democracy adjust over time. This means that the current level of democracy
depends not only on the list of specified regressors, including the income level of the
previous period, but also on the value of the dependent variable, democracy, of the
previous period. In modeling practice, dynamic relationships are characterized by the
presence of a lagged dependent variable among the regressors. For the AJRY 2008

this means that lagged levels of democracy are included in the list of regressors and
appear on the right hand side of the model specification. Because of the dependence
of the dependent variable on earlier realizations of itself, the data generating process
is called an autoregressive process. AJRY 2008 model characterizes the
autoregressive relationship as a AR(1) process, i.e. an autoregressive process of order
one, where democracy at time $ depends only on the democracy level in period $ 1.
Yet, as I show later, including the lagged dependent variable introduces endogeneity
with respect to this variable, and this means that simple linear OLS regression will
yield inconsistent estimates. The difference GMM and system GMM estimators are
strategies to resolve this endogeneity problem. I explain later how. For now note that,
while the estimate of interest is the value of the parameter of the exogenous variable
income per capita, the difficulty for estimating the autoregressive processes in AJRY
2008 is to control accurately for the parameter of lagged democracy, i.e. the lagged
dependent variable.

The"AJRY"2008"model""
Given this background, AJRY 2008 model the current level of democracy in
dependence of the level of income of the previous period and the level of democracy
of the previous period. Let me be more specific. In the AJRY 2008 model any
observation has two indices: one that indicates the individual ' ; and another that
indicates the time period $ in which the observation is made. For example, (),+
denotes the level of democracy in country ' at time $.
In the AJRY 2008 model (),+ is the dependent variable and the data generating
process for current levels of democracy is modeled by an autoregressive process of
order 1 a so-called AR(1) process. This means that, in the model, current levels of
the dependent variable depend potentially among other factors on the level of that
variable in the previous period (and only of the previous period not on even earlier
periods). That is (),+ depends on (),+,- . The AJRY 2008 model is a model of such an
AR(1) process.
(),+ = /(),+,- + 12),+,- + 3 4 ),+,- 5 + 6 7+ + 8) + 9),+

(1)

where (),+ is the level democracy in country ' at time $. (),+,- is the lagged level of
democracy in country ' at time $ 1, where / measures the causal effect of lagged

levels of democracy on current levels of democracy. 2),+,- is the lagged log level of
income in country ' at time $, where 1 is the parameter of interest, which measures
the causal effect of lagged income on current levels of democracy. 3 are other
exogenous regressors, 7+ are time effects that capture common shocks to the
democracy levels of all countries; 8) are unobserved, time-invariant country effects
and 9),+ is an error term that captures all other omitted factors, with ;(9),+ ) = 0 for all
' and $. (c.f. Acemoglu et al. 2008, 8146)
As anticipated earlier, although the aim of AJRY 2008 is to estimate61, the difficulty
is to estimate / correctly, since (),+ is endogenous. Hence, for the methodological
purposes of this paper, i.e. the derivation of the diffGMM and sysGMM estimator, I
simplify equation (1) to
(),+ = /(),+,- + ?),+

(2)6

?),+ = 8) + 9),+

(3)6

with

That is, for the remainder of this paper I will exclusively work with the autoregressive
process of (),+ . Yet, since the model is linear in parameters, the other regressors can
then easily be included after the derivation of the two GMM estimators. Of course,
the estimation results reported here are due to the full AJRY 2008 model in (1).
It is now easy to see why nave OLS estimation of this model is inconsistent and
biased. Since (),+ is dependent on the time invariant individual effects 8) , also (),+,will be dependent on 8) . This means that (),+,- is endogenous and we will face socalled dynamic panel bias [Nickell] when estimating the model with OLS. The OLS
coefficient estimate of / is upward-biased, because the the lagged dependent variable
(),+,- is positively correlated with the error term ?),+ = 8) + 9),+ . Due to space
restriction, I will not provide proof of the inconsistency of the OLS estimate here.
However, what this short discussion shows is that the introduction of a lagged
dependent variable among the regressors makes OLS estimation inconsistent and
biased. And therefore nave OLS estimation of the dynamic panel does not yield
accurate coefficient estimates.
The results for simple pooled OLS regression are reported in column (1) of Table 1 in
the appendix. The OLS estimate for 2),+,- , the log GDP per capita in $ 1 , is

statistically highly significant with 0.072 and standard error = 0.010. AJRY 2008
argue that, while this estimate illustrates the well-documented positive relationship
between income and democracy (Acemoglu et al. 2008, 817), it is still quantitatively
small. This point estimate implies that a 10 percent increase in GDP per capita is
associated with an increase in the Freedom House score of less than 0.007, which is
very small (for comparison, the gap between the United States and Columbia today is
0.5) (ibid.). Also the implied cumulative effect of 2),+,- on income reported in row
(3) of Table 1 is quantitatively small with 0.0024. 1

Fixed"effects"in"AJRY"
AJRY 2008 use different strategies to avoid the inconsistency of pooled OLS
estimation of AR(1) processes. The first strategy I want to discuss is fixed effects
(FE) OLS estimation. Broadly, this strategy aims at eliminating the unobservable,
time-invariant country fixed effects 8) in order to eliminate the inconsistency caused
by the dependence of (),+,- on 8) . This is achieved by a so-called a Within
transformation. It is called Within transformation because the resulting model, which
lacks the control for specific country effects, does not look at cross section relation
between income and democracy, but captures the effects of income on democracy
within any country, that is it investigates which effect income levels in one country
have on democracy levels in that country.
For a Within transformation, equation (2) is differenced by its average over time, that
is:
(),+ = 6/(),+,- + 8) + 9),+
((). = 6/(). + 8) + 9). )
(),+ (). = 6/ (),+,- (). + 8) 8) + 9),+ 9).

(4)6

Here it can easily be seen that the time-invariant fixed effects 8) are eliminated by
differencing, i.e. by subtracting their mean, which is 8) itself. FE OLS estimation then
proceeds by estimating / by standard OLS estimation. The results for the AJRY 2008
model, of course including all other regressors, are reported in in column (2) of Table
1. The 2),+,- coefficient is 0.010, which is not statistically significant. It is even lower
than the OLS estimate and, given the FE OLS estimate, a 10 percent increase in GDP
log GHI6JKL6MNJ'$N+,-

1 (KOPMLNM2+,-

per capita is associated with a 0.001 score increase of the Freedom House measure of
democracy.
However, while the within transformation eliminates some of the inconsistency of the
OLS it does not eliminate all of the inconsistency. To see this recall that on the right
hand side we have now ((),+,- (). ) which is correlated with (9),+ 9). ). Where 9)
per construction contains 9),+,- which is correlated with (),+,- . Additionally, 9),+ is
correlated with (). because the latter average contains (),+ . This inconsistency does
not vanish as ! becomes large, but just as # becomes large. However, this is typically
not the case in panel data in which we have large ! and small #. Note that, in contrast
to simple OLS estimation, the FE OLS estimation is downward-biased. In fact, the
OLS and FE OLS estimates can be considered as the upper and lower bounds for the
autoregressive coefficient (),+,- .(e.g. Bond 2002, 144) For the AJRY 2008 panel
these bounds are given by 0.706 from the OLS regression and 0.379 from FE OLS.
We should therefore expect all reasonable estimates of (),+,- to lie between these
bounds.
To conclude, the FE OLS estimation employed by AJRY 2008 as a strategy to
circumvent the inconsistency of OLS estimation of panel data models fails to provide
consistent estimates as well. This is because, since large ! and small # is one of the
characteristics of panel data and since the data generating process is autoregressive,
OLS estimation as well as FE OLS estimation is inconsistent. In the following let me
turn to the second, alternative strategy that AJRY 2008 pursue to avoid inconsistency
of the OLS estimation.

3"The"DiffGMM"estimator"
AJRY 2008 anticipate this further endogeneity problem that arises with the
introduction of the lagged dependent variable and estimate the coefficients of income
by the difference GMM estimator. The difference GMM estimator was first
introduced by Arellano and Bond (1991). Also the difference GMM estimator also
eliminates the country fixed effects 8) from the dgp equation by differencing.
However, instead of the Within differentiation used earlier, the difference GMM
estimator uses so-called first differencing. In addition to eliminating the timeinvariant country fixed effects, first differencing constructs instruments for the

potentially endogenous regressors in the estimation, i.e. the lagged dependent variable
(),+,- . These instruments are then used for the estimation of / . And if these
instruments are strong instruments, that is if they are uncorrelated with the error term
and and highly correlated with the original regressors (here (),+,- ), we avoid the
inconsistency of the estimation caused by the endogeneity of (),+,- . Note, that the
difference GMM estimator as well as the system GMM estimator, which I discuss
later, are general purpose estimators that use instruments, which are given from within
the equations. This means that these instrument estimators can be used even when the
researcher has no specific instrumentation strategy at hand. Let me explain the
derivation of the difference GMM estimator in detail. I begin by the construction of
the instruments

Construction"of"Instruments"for"diffGMM"
Consider again equation (2), where all other regressors except (),+,- are dropped from
the original model. First differencing this equation means differencing the model by
the value of the variables in the previous period. Thus,
(),+ = 6/(),+,- + 8) + 9),+
(),+ (),+,- = 6/ (),+,- (),+,Q + 8) 8) + 9),+ 9),+,- 6
(),+ (),+,- = 6/ (),+,- (),+,Q + 9),+ 9),+,(),+ = 6/ (),+,- + 9),+

(5)6

Hence, first differencing equation (2) eliminates 8) from the equation. Note that by
this first-difference transformation the error term itself becomes a time series, namely
a moving average of order one MA(1) with unit root. These properties of the error
model will become important later.
Now, to construct instruments for the estimation of /, consider period $ = 3 which is
the period in which we first observe the above relationship in equation.
(),T (),Q = 6/ (),Q (),- + 9),T 9),Q 6
In this first-difference equation for period $ = 3, (),- is a valid and strong instrument
for (),Q (),- . This is because it is highly correlated with (),Q (),- but
uncorrelated with 9),T 9),Q . Thus, the democracy level in $ = 1 is a good
instrument for the difference between democracy levels in $ = 2 and $ = 3.
9

Further, consider period $ = 4:


(),U (),T = 6/ (),T (),Q + 9),U 9),T 6
We can see here that not only (),Q is a valid instrument for (),T (),Q , but also that
(),- is such an instrument. Both are highly correlated with (),T (),Q , but
uncorrelated with (),U (),T . If one continues in this fashion and increases $ until
# , then, for each extra forward period, one obtains an additional extra valid
instrument, so that in period $ = # we have # 2 instruments ((),- , (),Q , , (),W,Q 6)
for the difference (),W,- (),W,Q . The total number of conditions in the set is given
by OX = YZ(# 1)(# 2). This set of instruments can then be used to define the
# 2 OX instrument matrix [\]
(),-

0
(),-

[\] =

6(),Q

(),-

(),W,Q 6

As I shall explain in the following, the instrument matrix [\] can now be used to
formulate moment conditions for estimating the difference equations () in a GMM
framework. 2 Before, however, I explain these moment conditions and their role in the
difference GMM estimator, let me make some remarks on the GMM framework and
the feasibility and efficiency of the GMM estimator.

GMM"framework,"moment"conditions"and"efficiency"
As already anticipated, the difference GMM estimator is a linear IV estimator. These
are usually derived within the generalized method of moment framework, where the
instruments are used as moment conditions. Note, however, that the GMM IV
estimator for the difference equations model in (5) will be just-identified for # = 3
and overidentified for any number of periods beyond # = 3 . To see this, the
regressors in the above difference equations system are the differenced lagged
dependent variables (),+,- =

(),Q (),- , , (),W,- (),W,Q

. Their total

number is given by ` = # 2, whereas the number of instruments L is given by


2

Note that the instrumentation strategy that I describe here is only valid for balanced
panels, in which the data is complete, i.e. in which for every '-observation all $observations are available. The strategy is not valid for unbalanced panels. See
Roodman 2009 p.104 and p.107 for ways around this problem.
10

L = OX = YZ(# 1)(# 2). Now for # = 3, ` = L = 1, while for # = 4, already


` = 2 and L = 3. In the latter case the number of instruments is greater than the
number of regressors L > `. This means we cannot simply use the standard linear IV
estimator for just-identified models but have to derive the IV for over-identified
models in the GMM framework. This means we have to derive the optimal GMM
estimator for the over-identified model given the instruments/moment conditions that
I anticipated above.
Generally speaking, the GMM estimator choses the value of the parameter / such that
it minimizes the following quadratic criterion function:
bc / =

c
)e- (() , /)

fc

c
)e- (() , /)

(6)6

Where () are the (unconditional) sample moment conditions and fc is a weighting


matrix. Let me briefly discuss these arguments of bc in turn.
First, the sample moment conditions () are functions of the parameter / that we
want to estimate and the observed data: in case of the simplified AJRY 2008 model
that I discuss here, the observed data is a (!# 2) matrix of the differenced
democracy levels (),+ of every country from period $ = 3 onwards. Since the
difference GMM estimator is an IV estimator, the conditional population moment
conditions are given by the orthogonality of the error term with respect to the
instruments. For every ', $ = 3, , # and i 2 we have
; 9),+ (),+,k = ; (),+,k 9),+ = 0

(7)6

; mn) [X) = ; [4X) mn) = 0

(8)6

And thus,

Hence we assume that, in the population, the instruments in p are uncorrelated with
the differenced errors 9),+ . Where mn) is the

1 # 2

vector 9),+ =

6 9),T 9),Q , , 9),W 9),W,- . Its entries are given by 9),+ = 6(),+ /((),+,- )
and, thus, as anticipated above the moment conditions are functions of the differenced
lagged dependent variable (),+,- .
Second, the weighting matrix fc , if chosen optimally, guarantees efficient and
feasible estimation of / . Only if this condition is satisfied the resulting GMM
estimator is optimal. With respect to the weighting matrix fc , if the model was just-

11

identified, then one could simply chose the identity matrix qc for fc . However,
since the model is over-identified the choice of fc should be more careful. An
optimal choice of fc is also necessary because the errors in the first differenced
model cannot be expected to be homoskedastic. Assuming homoskedastic, spherical
differenced errors would incur a lack of accuracy, and therefore inefficiency, on the
resulting estimates (e.g. Roodman 2009, 109f.). To see why the differenced errors are
not spherical consider that the error differences 9),+ are not independent. More
specifically, 9),+ = 9),+ 9),+,- may be correlated with 9),+,- = 9),+,- 9),+,Q
because they share 9),+,- (c.f. Roodman 2009, 110). In fact, the variance-covariance
matrix of the differenced error term is
; mn) mn4) = rsQ t

(9)6

with

t=

2 1 0
1 2 1
0 1 2

2 1 0
1 2 1
0 1 2

Where G has dimension # 2 # 2 . The entries in this matrix are determined


by the first-order moving average MA(1) structure of the differenced error term3 (e.g.
Baltagi 2008, 149).
To obtain an efficient estimator of the differenced model with heteroskedastic errors,
a similar estimation method is used as it is used for estimating simple linear models
with heteroskedastic errors by OLS. The method for the latter case is generalized least
squares (GLS) estimation. It falls within the group of so-called weighted least squares
(WLS) estimators, where GLS is the most efficient WLS for linear models. Let me
explain. Recall that the OLS estimator minimizes the sum of squared errors, that is, it
minimizes the quadratic criterion function b 5 =

c
Q
)e- 9)

. If the errors 9) are

The differenced errors 9),+ may be modelled as a MA(1) process with unit root:
9),+ = 9),+ x9),+,- , with x = 1 (unit root) and 9),+ ~zzH 0, rsQ . The entries in G
Q

are given by ; 9),+ 9),+,= 2rsQ and ; 9),+ 9),+,- 9),+,- 9),+,Q =
rsQ . Since we consider only one lag, all the other entries are 0. For orthogonal
differenced errors the entries on the diagonal are 1, otherwise 0. (Roodman 2009,
110)
12

heteroskedastic, i.e. they do not have constant variance, the quadratic form of the
criterion function weighs some observations more than others. This results in biased
OLS estimates. To avoid this bias, the model with the heteroskedastic errors is
weighted by the inverse of the variance-covariance matrix of the errors. This
countervails the unduly weighting of some of the observations by the quadratic form
of b 5 . Then the transformed OLS criterion function is given by
b 5 =

c
4
)e- 9)

|,- 9) 6

(10)6

|,- = fc 6
In order to obtain efficient estimates for 5, we use a two step procedure. First, since |
is generally unknown, it has to be estimated. To this purpose, we assume some
arbitrary variance-covariance matrix } , whose inverse we then use for fc . For
example } ,- = qc = fc . With this suboptimal choice fc = } ,- we then estimate
a preliminary first-step estimate 5- and calculate the estimated variance-covariance
matrix } of the errors with it. This completes the first step. In the second step, } ,- is
then used to estimate 5~ , the optimal and efficient GLS estimate for parameter 5. In
this sense, fc = } ,- = [9],- , where the optimal weighting matrix is the inverse of
the variance-covariance matrix of the idiosyncratic disturbances.

Derivation"of"difference"GMM"estimator"
The same two-step procedure is used to find the optimal IV GMM estimator for the
differenced AJRY 2008 model. To begin consider again the criterion function bc (/)
with the unconditional sample moment conditions ) =

c
)e-

[4X) mn),+ and some

preliminary choice for fc of the form


fc =

c
)e-

[4X) n) n4) [X)

(11)6

This yields
-

bc / =

c
)e-

[4 n),+

fc

c
)e-

[4 n),+

(12)

Differentiating bc / with respect to / and solving the following first-order


conditions

= 2

mn4),+ [ fc

[4 m\),+ / m\),+,-

=0

for / yields the consistent first step linear IV GMM estimator

13

/X- =

m\),+,- [fc [ m\),+,-

,-

m\),+,- [fc [ m\),+

(13)6

Note that this is basically the linear IV GMM estimator. This completes the first step
of the procedure explained before. For the second step choose
fc =

c
)e-

[4\] n n4 [\]

(14)6

where n] are residuals from the previous initial first-step difference GMM estimator.
This yields the following optimal two-step difference GMM estimator
/XQ =

m\),+,- [fc [ m\),+,-

,-

m\),+,- [fc [ m\),+

(15)6

Note that this optimal difference GMM estimator makes no additional assumptions
about the distribution of 9),+ , i.e. it does not assume homoscedastic or even identically
distributed errors to arrive at an optimal estimator for /.
AJRY 2008 use this difference GMM estimator to obtain the estimates presented in
columns (3)-(6) of Table 1. Columns (3) and (4) report the estimates of the first step
GMM estimator /X- with different choices of the variance-covariance matrix of the
differenced disturbances, i.e. the weighting matrix fc . (3) uses fc =
c

c
)e-

[4X) qc [X)

,-

, which is implemented by the h(1) option of the xtabond2

command in Stata. (4) uses the more appropriate choice fc =

c
)e-

[4X) t[X)

,-

implemented by h(2) in xtabond2. Note that only for the latter choice the /Xestimates for the lagged dependent variable, which are all highly significant, lie
between the bounds set by the OLS and FE OLS estimations reported above. Thus, I
shall only discuss results in columns (4) and (6).
In fact, the column (4) estimates are those reported by AJRY 2008. Given this choice,
the estimated 1X- coefficient for log GHI6JKL6MNJ'$N+,- is -0.129 with standard error
0.076, which is only weakly significant. Note, however, that this estimate is now
negative and seems to refute the central tenet of the modernization theory that higher
income per capita causes a country to be democratic. Indeed, with this estimate, a 10
percent increase in income would be associated with a 0.012 decrease in democracy.
The cumulative effect of income on democracy is reported with -0.253. The results in
column (6) for the two-step difference GMM estimator with fc as in equation (14)
confirm this finding, although the negative effect of income on democracy is now
considerably smaller by one magnitude with -0.012 (standard error = 0.048).

14

However, there are concerns with these results and the conclusion that AJRY 2008
draw from it. To see this, consider that the optimality of GMM estimators is not only
limited by the choice of fc but also by the choice of instruments by which we seek
to avoid the endogeneity introduced by the lagged dependent variable (),+,- . As it
turns out, the instruments used by the difference GMM estimator may be weak
instruments for the AJRY 2008 dynamic panel. And exactly here Heid, Langer and
Larch (2012) apply their critique of the AJRY2008 estimation, which leads to using
the system GMM estimator instead of the difference GMM estimator. In the following
section let me discuss why the difference GMM instruments are indeed weak
instruments for the AJRY 2008 panel. I will then proceed to derive the system GMM
estimator and its specific instruments.

4"The"System"GMM"estimator"
Weak"difference"GMM"instruments"
As anticipated, Heid, Langer and Larch (2012) observe that income as well as
democracy in the AJRY 2008 panel are highly persistent. This means that the values
of these variables do not deviate systematically from their path over time and thus
the model approaches a random walk. This persistence in the variables4 incurs bias
and imprecision on the difference GMM estimator. Bias and imprecision of the
Difference GMM estimator occurs because the double-lagged level instruments for
the difference equations in () are in fact weak instruments. As Blundell and Bond
(2000, 325) write, The instruments used in the standard first-differenced GMM
estimator become less informative in two important cases. First, as the value of the
autoregressive parameter / increases towards unity; and second, as the variance of the
[country]-specific effects [ 8) ] increases relative to the variance of the transitory
shocks [9),+ ]. To see this, consider the case of # = 3 of the the simplified AJRY
2008 model
(),+ = 6/(),+,- + 8) + 9),+

(16)6

I focus here on the autoregressive dependent variable. However, see Blundell and
bond (2000) for a discussion of persistence of both the autoregressive dependent
variable and other exogenous regressors
15

As anticipated earlier, the difference GMM estimator of this model is just-identified


for # = 3 and we have only one orthogonality condition ; 2),- 9)T = 0. The first
stage of 2SLS regression for this case is running a regression of (),Q on (),- . To
obtain the relevant equation evaluate (16) at $ = 2 and subtract (),- from both sides.
This gives
(),Q = / 1 (),- + 8) + 9),Q

(17)6

Blundel and Bond (1998) show that the least squares estimator of / 1 in (17) is
biased upwards, towards zero with
J'O / 1 = / 1
where M = 1 /

Z Z

(18)6

1 + / . They find that the F-statistic for the first stage

instrumental variable regression converges to -Q , a noncentral chi-squared parameter


with one degree of freedom
=

Z Z

(19)6

This noncentrality parameter approaches zero as / 0 as well as for decreasing


values of rsQ and increasing values of rQ . In this case the IV estimator performs
poorly and Blundell and Bond (1998) attribute the bias and poor precision of the firstdifference GMM estimator to the problem of weak instruments, where this problem is
best characterized by the concentration parameter . (c.f. Baltagi 2008, 161; but also
Blundell, Bond, and Windmeijer 2000).
Hence, if the dependent variable (here (),+ ) is close to a random walk, then lagged
levels of the dependent variables are only weakly correlated with subsequent first
differences of that series, and thus they are only weak instruments. In other words, for
the case at hand, past levels convey little information about future changes, so
untransformed lags are weak instruments for transformed variables.(Roodman 2009,
114)
To account for the random walk-likeness of the dependent variable, Blundell and
Bond (1998) introduce a stationarity restriction on the dependent variable in the data
generating process: that ;
towards its mean

-,

(),-

-,

8) = 0 , which means that (),+ converges

for each individual from period $ = 2 onwards. (c.f. Baltagi

16

2008, 161) This stationarity restriction, or persistence assumption, accounts for the
effect that individuals with higher initial deviations will have slower subsequent
changes as they converge to the long-run mean (Roodman 2009, 115). Thus, the
long-run mean does not correlate with the country fixed-effects 8) , which leads to the
assumption ; (),+,- 8) = 0 that past changes of democracy are orthogonal to the
country fixed-effects for all ' and $. This is to say that ; (),+,- 8) is time-invariant.
This assumption is then used to construct new instruments for the estimation of the
AJRY 2008 model.

A"new"set"of"instruments"
To see this, in addition to ; (),+,- 8) = 0, consider the further assumption that the
country fixed-effects 8) are uncorrelated with 9),+ such that ; 9),+ 8) = 0 . With
these two assumptions we then obtain additional # 2 non-redundant linear level
moment conditions
; (),+,k 8) + 9),+

= ; (),+,k ?),+ = 0

(20)6

for $ = 3,4, , # and 1 i $ 2. The system GMM estimator then proposes to


use the differences (),+,- as a new set of instruments for the levels (),+,- of the
lagged dependent variable based on these assumptions. It, thus, exploits a new set of
instruments from within the system that was not available for the difference GMM
estimator.
However, as Roodman (2009) emphasizes these assumptions are not trivial. For, if we
instrument (),+,- with (),+,- , which [] contains the fixed effect 8) [and] yet we
assume that the levels equation error, ?),+ , contains 8) too, [this then] makes the
proposition that the instrument is orthogonal to the error, that [; (),+,- ?),+ = 0],
counterintuitive. (Roodman 2009, 114). This proposition can in fact hold only if the
stationarity restriction on the initial conditions of the dependent variable holds, which
I have introduced earlier. That is, if the country fixed-effect and the parameter / of
the lagged dependent variable offset each other in expectation for the whole panel.
Hence, if the data generating process is indeed persistent with respect to (),+ , then
(),+,- can be instrumented with (),+,- . The elements of (),+,- are a valid set of
instruments, because, as anticipated earlier, it is orthogonal to the error term ?),+ =
8) + 9),+ , if the stationarity assumption holds. They are also a strong set of
17

instruments because, even if / 1 and/or ?NL(8) ) ?NL(9),+ ) , the level of (),+


is strongly correlated with (),+,- . In short, [for] random walklike variables, past
changes may indeed be more predictive of current levels than past levels are of
current changes. (Roodman 2009, 114).
Given this discussion and similar to the difference GMM case, the level moment
conditions from (20) can be stacked into the (# 2)! matrix
(),Q

0
(),Q

[) =

(),T

(),Q

(),W,-

The"system"GMM"estimator"
The system GMM estimator utilizes this additional set of level moment conditions as
well as the difference moment conditions to estimate dynamic panel data. To this
purpose it combines the difference equations and the level equations to one greater
system of equations. This involves building a stacked data set with twice the
observations (Roodman 2009, 115), where the difference equations are followed by
the level equations. Since both assume the same linear functional form and
specification of the data generating process, the whole system can be estimated by
one single operation that applies to both systems of equations.
Following Roodman (2009) the new greater dataset is obtained by left-multiplying the
original set by the following transformation matrix
=

where stands for the differencing operation. Thus, the resulting combined data set
is

) =

)
)
,
) =
)
)

For the simplified AJRY 2008 model (),+ = 6/(),+,- + 8) + 9),+ both
),+ and ),+,-

have dimension 2!2 # 2

, that is they observe the democracy over all

countries for $ > 2 in the differenced form as well as in level form. In this model 9),+

is given by n)

n)

and the corresponding moment conditions are expressed as


; [4k) n
) =0

(21)6
18

where
[X)
[X)
[k) =
0

0
=
[)

0
\),Q
\),T

\),W,

with [X) defined as for the difference GMM estimator and [) as defined for the level

equation. Note however, that [) is the non-redundant subset of [) . (c.f. Blundell,


Bond, and Windmeijer 2000, sec. 7.1) Due to space restrictions I cannot discuss the

choice of [) over [) , however see Roodman (2009, 116).


The derivation of the system GMM estimator /k proceeds in the two step procedure
analogous to the derivation of the difference GMM estimator. However, note that
there are different proposals as how to define the weighting matrix fc for the firststep estimator /k- . As Roodman argues, it cannot follow the same arguments as for
the difference GMM estimator: because the fixed effects are present in the levels
errors (Roodman 2009, 117) and thus defining a unique fc is infeasible. The
choice of fc should be such that it minimizes arbitrariness while setting fc to
what it be in the simplest case. For example, consider NL 9)+ , for some ', $, which
gives the diagonal entries in NL
),+
NL 9)+ = NL 8) + ?),+ = NL 8) + 2P? 8) , ?),+ + NL ?),+
= NL 8) + 0 + 1
In the simplest case we choose NL 8) = 0 as an a priori estimate for the variance of
the time-invariant country fixed effects. This means that we treat the error term 9)+ as
only dependent on the individual idiosyncratic errors ?),+ , i.e. 9)+ = ?),+ . Hence, the
preliminary choice of the variance-covariance matrix of the error in the combined
data set is given by the following block matrix

Q
NL
),+ = NL ),+ = rse 6

(22)6

where G is defined as above for the preliminary choice of the variance-covariance


matrix in the first-step difference GMM estimator. z is given by the variancecovariance matrix of the errors in the level equations with ?),+ ~zzH 0, r Q , where,
obviously, the entries on the diagonal are given by ;

?),+

= r Q , otherwise 0,

19

since the errors are assumed to be uncorrelated across individuals and serially

uncorrelated. So far, I follow suit Roodmans specification of NL 9),+


. Yet, he does
4
not expand on and . The entries in are given by ; ?),+ ?),+
= ; ?),+
4
?),+,- ?),+
= r Q 0 = r Q , on the diagonal and 0 elsewhere, since, again,

idiosyncratic errors are serially uncorrelated and uncorrelated across individuals. The
4
entries in IQ follow analogously, except with ; ?),+
?),+ = r Q on the diagonal and 0

elsewhere. Thus
fc = [4k) [k)

,-

Q
; = rse
6

t
-

(23)6

Note that in their original contribution Blundell and Bond (1998) set the entries in
and to 0.
Given this characterization of fc the system GMM estimator is then derived in the
same two-step procedure as the difference GMM estimator. The estimation results of
the AJRY 2008 panel with System GMM are given in column (7) (10) of Table 1.
Column (7) and (8) give the estimates for the first-step estimation with the Blundell
and Bond (1998) and the Roodman (2009) choice of fc respectively. Analogously,
column (9) and (10) give the results for the optimal two-step system GMM estimator.
Note that all four estimates of the (KOPMLNM2+,- coefficient lie between the bounds
given by the OLS and FE OLS estimates. Moreover, the coefficient for
P6GHI6JKL6MNJ'$N+,- is in all cases positive and significant. For example, the twostep system GMM point estimate in column (10) is 0.099 (standard error=0.013),
which implies that a 10 percent increase in income is associated with a 0.009 unit
increase with regard to the Freedom House democracy score. The cumulative effect of
is 0.253. That is in the long run, a 10 percent increase in income is associated with a
0.025 increase in democracy.
Generally, these results disconfirm the AJRY 2008 findings that there is no causal
effect of income and democracy and, thus, rather support than disconfirm the
modernization theory. The results are in line with HLL although their reported
numbers differ as they choose, for example, to treat P6GHI6JKL6MNJ'$N+,- as
endogenous and include it into the GMM style set of instruments.

20

5"Conclusion"
To conclude, the difference and system GMM estimators are powerful tools to
estimate dynamic panel data models with autoregressive processes for so-called small
T, large N panels. Both estimators use instruments which are available from within
the system of equations when no external instruments are available to the researcher.
However, as the previous discussion concerning the Acemoglu et al. 2008 paper has
shown, the difference GMM estimator performs poorly when the dependent variable
(here lagged democracy levels) is persistent.
The problem with persistent variables in small T, large N panels is wide spread and
by now well known. For example, Blundell and Bond (2000), similarly to Heid,
Langer and Larch, reevaluate a difference GMM estimation of capital and labor
coefficients in a Cobb-Douglas production function. In contrast to the difference
GMM estimation they find that, not only the system GMM estimator provides more
precise estimates, but also shows that the capital coefficient is not as low to reject the
hypothesis of constant returns to scale.
In a similar manner Bobba and Coviello (2007) reject the hypothesis that, contrary to
popular opinion, education has no positive effect on democracy (e.g. Acemoglu et al.
2008). While Acemoglu et al. use again the difference GMM estimator to support
their thesis, Bobba and Coviello show that by using the system GMM estimator
overturns the result on education such that the coefficient for lagged education is now
positive and significant.
All these cases show that, a careful examination of the original series and
consideration of the system GMM estimator can usefully overcome many of the
disappointing features of the standard [difference] GMM estimator for dynamic panel
models (Blundell et al. 2000 in Baltagi 2008: 161) This result is important especially
for policy makers, since simply depending on which estimator is used, it might imply
different policy choices. Thus, not only researchers but also policy makers and their
consultants should pay a great deal of attention to which data and methods are used in
the econometric literature even, or probably especially, in case of renowned
contributors to the literature.

21

7"References"
Acemoglu, Daron, Simon Johnson, James A. Robinson, and Pierre Yared. 2008.
Income and Democracy. American Economic Review 98 (3): 80842.
Arellano, Manuel, and Stephen Bond. 1991. Some Tests of Specification for Panel
Data: Monte Carlo Evidence and an Application to Employment Equations.
Review of Economic Studies 58 (2): 277.
Arellano, Manuel, and Olimpia Bover. 1995. Another Look at the Instrumental
Variable Estimation of Error-Components Models. Journal of Econometrics 68:
2951.
Baltagi, Badi H. 2008. Econometric Analysis of Panel Data. 4th ed. Wiley.
Blundell, Richard, and Stephen Bond. 1998. Initial Conditions and Moment
Restrictions in Dynamic Panel Data Models. Journal of Econometrics 87 (1):
11543.
. 2000. GMM Estimation with Persisten Panel Data: An Application to
Production Functions. Econometric Reviews 19 (3): 32140.
Blundell, Richard, Stephen Bond, and Frank Windmeijer. 2000. Estimation in
Dynamic Panel Data Models: Improving on the Performance of the Standard
GMM Estimator. WP 00/12.
Bobba, Matteo, and Decio Coviello. 2007. Weak Instruments and Weak
Identification in Estimating the Effects of Education on Democracy. Economics
Letters 96: 3016.
Bond, Stephen R. 2002. Dynamic Panel Data Models: A Guide to Micro Data
Methods and Practice. Portuguese Economic Journal 1 (2): 14162.
Heid, Benedikt, Julian Langer, and Mario Larch. 2012. Income and Democracy:
Evidence from System GMM Estimates. Economics Letters 116 (2). Elsevier
B.V.: 16669.
Roodman, David. 2009. How to xtabond2: An Introduction to Difference and
System GMM in Stata. Stata Journal 9 (1): 86136.

22

Appendix(
Table 1
Income and Democracy Acemoglu et al. (2008), 5 Year Panel
Base sample, 1960-2000
Pooled OLS

FE OLS

(1)

(2)

DiffGMM-1
H(1)
(3)

H(2)
(4)

diffGMM-2
H(1)
(5)

H(2)
(6)

sysGMM-1

sysGMM-2

H(2)
(7)

H(3)
(8)

H(2)
(9)

H(3)
(10)

0.529***
(0.076)
0.109***
(0.030)
0.231
[0.00]
[0.000]
[0.318]
945
150

0.570***
(0.061)
0.094***
(0.023)
0.218
[0.00]
[0.000]
[0.299]
945
150

0.608***
(0.027)
0.099***
(0.013)
0.253
[0.00]
[0.000]
[0.304]
945
150

0.608***
(0.027)
0.099***
(0.013)
0.253
[0.00]
[0.000]
[0.304]
945
150

Dependent variable is !"#$%&'%() (Freedom House Index)


!"#$%&'%()*+
,$-./01.2"&.%'234')*+
Cumulative effect of
income
AR(1)
AR(2)
Observations
Countries

0.706***
(0.035)
0.072***
(0.010)
0.246
[0.00]

0.379***
(0.051)
0.010
(0.035)
0.017
[0.76]

945
150

945
150

0.236
(0.102)
-0.128
(0.118)
-0.168
[0.24]
[0.000]
[0.902]
838
127

0.489***
(0.085)
-0.129*
(0.076)
-0.253
[0.09]
[0.000]
[0.448]
838
127

0.327***
(0.049)
0.010
(0.056)
0.014
[0.86]
[0.000]
0.736
838
127

0.528***
(0.047)
-0.012
(0.048)
-0.026
[0.80]
[0.000]
[0.410]
838
127

Notes: The base sample is taken from Acemoglu et al. 2008. Unbalanced panel from 1960 to 2000 with data at five-year intervals, where the start date refers to the dependent variable
(i.e., t=1960, so t-1=1955). The dependent variable is the Augmented Freedom House Political Rights index. ***, **, and * report statistical significance at 1,5 and 10 percent levels
respectively. Standard errors in parenthesis and p-values in brackets. Pooled OLS and fixed effects FE OLS regressions use robust standard errors clustered by country. All GMM
estimates use robust standard errors and treat the lagged democracy measure as predetermined. diffGMM-1, sysGMM-1 and diffGMM-2, sysGMM-2 are the first-step and two-step
estimations respectively. For the diffGMM estimations H(1) denotes H=I as the initial choice of the variance-covariance matrix of the differenced error terms. H(2) sets H as defined
in Blundell and Bond 1998. For the sysGMM estimations H(2) sets H as defined in Blundell and Bond 1998. And H(3) sets H as defined in Roodman 2009, the xtabond2 default
setting. Implied cumulative effect of income represents the coefficient estimate of log GPD per capitat-1/(1-democracyt-1), and the p-value from a nonlinear test of its significance in
brackets. AR(1) and AR(2) report the p-values for first and second order autocorrelated disturbances in the first differences equations.

23

Das könnte Ihnen auch gefallen