Sie sind auf Seite 1von 14

Journal of Econometrics 142 (2008) 539552

Fixed effects instrumental variables estimation in correlated


random coefcient panel data models
Irina Murtazashvili
a
, Jeffrey M. Wooldridge
b,
a
Department of Economics, University of Pittsburgh, Pittsburgh, PA 15260, USA
b
Department of Economics, Michigan State University, East Lansing, MI 48824-1038, USA
Available online 6 September 2007
Abstract
We provide a set of conditions sufcient for consistency of a general class of xed effects instrumental variables (FE-IV)
estimators in the context of a correlated random coefcient panel data model, where one ignores the presence of individual-
specic slopes. We discuss cases where the assumptions are met and violated. Monte Carlo simulations verify that the
FE-IV estimator of the population averaged effect performs notably better than other standard estimators, provided a full
set of period dummies is included. We also propose a simple test of selection bias in unbalanced panels when we suspect the
slopes may vary by individual.
r 2007 Elsevier B.V. All rights reserved.
JEL classication: C23; C33
Keywords: Correlated random coefcient model; Population averaged effect; Average treatment effect; Fixed effects; Instrumental
variables
1. Introduction
In both cross-section and panel data settings, there is substantial interest in estimating population averaged
effects (PAEs), including average treatment effects (ATEs), in the correlated random coefcient (CRC) model.
Models with both exogenous explanatory variables and endogenous regressors have been investigated in
recent years. Angrist (1991) discusses the conditions for consistency of ATE estimates in models with binary
endogenous variables and no exogenous covariates. A set of sufcient assumptions required for consistent
ATE estimates with continuous endogenous regressors in a CRC model can be found in Wooldridge (2003).
Both papers study estimation with random sampling from a cross-section.
The possibility that treatment effects might depend on individual-specic heterogeneity motivated Imbens
and Angrist (1994). to introduce the local ATE (LATE) as an evaluation parameter, which provides a useful
interpretation of the instrumental variables estimator when the effect of a binary treatment varies across units.
That emphasis on LATE led to a reinterpretation of IV estimates in many empirical applications, and spurred
ARTICLE IN PRESS
www.elsevier.com/locate/jeconom
0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2007.09.001

Corresponding author. Tel.: +1 517 353 5972; fax: +1 517 432 1068.
E-mail address: wooldri1@msu.edu (J.M. Wooldridge).
a great deal of research on interpreting IV estimators in a variety of contexts. Heckman and Vytlacil (2005) provide
a recent unication, including a discussion of whether we should be interested in parameters such as LATE.
The understanding that IV generally consistently estimates LATE in simple settings is useful, but often we
are interested in estimating the expected effect for a randomly drawn unit from the underlying population.
Plus, strict interpretation of LATE as the ATE among units induced into treatment by the switching of an
instrumental variablesuch as program eligibilityis limited to special cases. Here we study estimation of
population average effects, or ATEs, in a general panel data model with heterogeneous slopes. By estimating
population average effects we can easily estimate the aggregate effects of various policies, such as increasing
the amount of job training among the population of manufacturing workers.
Wooldridge (2005a) studied general xed effects estimators with strictly exogenous regressors in the CRC
model with panel data, and derived conditions under which generalized xed effects estimatorsgeneralized
in the sense that they sweep away unit-specic trendsare consistent for the PAE. In this paper, we study the
model in Wooldridge (2005a) but, in addition to allowing correlation between the instruments and the
unobserved heterogeneity, we allow some explanatory variables to be correlated with the idiosyncratic error.
The main result is a set of sufcient conditions under which xed effects instrumental variables (FE-IV)
estimators consistently estimate the PAE, even when the individual-specic slopes are ignored. The results
include the commonly used FE two stage least squares estimator (FE-2SLS) as a special case, but also more
general FE-IV estimators that sweep away individual-specic time trends. The conditions are most likely to
apply when the endogenous explanatory variables are continuous, as in Wooldridge (2003) for the cross-
sectional case.
The remainder of the paper is organized as follows. In Section 2 we introduce the model and briey review
existing results. Section 3 contains the main consistency result, and Section 4 covers examples where the
conditions willand will nothold. Section 5 contains a Monte Carlo study that shows how the FE-IV
estimator, with a full set of time period dummies, outperforms its obvious competitors. The simulation results
support the results in Sections 3 and 4.
In Section 6, we expand on earlier work by allowing the random trend part of the structural equation to be
misspecied. Interestingly, it is still possible to estimate the averaged slopes under reasonable assumptions.
Section 7 considers unbalanced panels, characterizes the nature of any sample selection problem, and proposes
simple variable addition tests that can be used when the slopes are thought to be individual-specic. Section 8
contains a brief conclusion.
2. Model specication and previous results
The model of interest is a CRC model studied in Wooldridge (2005a). For a random draw i from the
population, the model is
y
it
= w
t
a
i
x
it
b
i
u
it
; t = 1; . . . ; T, (2.1)
where y
it
is a dependent variable, w
t
is a 1 J vector of aggregate time variables, which we treat as
nonrandom, a
i
is a J 1 vector of individual-specic slopes on the aggregate variables, x
it
is a 1 K vector of
endogenous covariates that change across time, b
i
is a K 1 vector of individual-specic slopes, and u
it
is an
idiosyncratic error. As discussed in Wooldridge (2005a), we require JoT. So, if we have two time periods, we
can only allow a scalar individual-specic intercept, a
i
. If T = 3, we can allow individual-specic linear trends,
too. Higher order trend terms are allowed as T increases.
Eq. (2.1) is a CRC model when the individual specic slopes, b
i
(as well as the elements in a
i
), are allowed to
be correlated with x
it
. For example, a simple CRC wage equation might look like
log(wage
it
) = a
i1
a
i2
t b
i1
training
it
b
i2
union
it
b
i3
married
it
u
it
, (2.2)
where, in addition to the standard level effect a
i1
, each individual is allowed to have his or her own unobserved
growth in wages, a
i2
. In addition, the time-varying explanatory variables have individual-specic returns. The
variable training might be hours spent in job training, and the CRC model allows the return to training to be
individual-specic and correlated with the amount of trainingas a standard model of human capital
accumulation would suggest.
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 540
Wooldridge (2005a) studied the consistency of xed effects estimators of (2.1) that sweep out the a
i
but act
as if b
i
= b for all i. To describe Wooldridges main result, and the extension here, write b
i
= b d
i
, and
substitute into (2.1):
y
it
= w
t
a
i
x
it
b (x
it
d
i
u
it
) w
t
a
i
x
it
b v
it
, (2.3)
where v
it
x
it
d
i
u
it
: We eliminate a
i
by regressing, for each i, y
it
on w
t
; t = 1; . . . ; T and x
it
on w
t
;
t = 1; . . . ; T, and keeping the residuals, y
it
and x
it
, respectively. This gives the equations
y
it
= x
it
b
i
u
it
= x
it
b ( x
it
d
i
u
it
) = x
it
b v
it
; t = 1; . . . ; T. (2.4)
The xed effects estimator studied by Wooldridge (2005a) is just the pooled OLS estimator from (2.4). We
control the amount of individual-specic detrending by choosing w
t
appropriately.
An assumption used by Wooldridge (2005a) is the standard strict exogeneity assumption conditional on
(a
i
; b
i
):
E(u
it
[x
i1
; . . . ; x
iT
; a
i
; b
i
) = 0; t = 1; . . . ; T. (2.5)
Using a simple iterated expectations argument, Wooldridge shows that, under the additional assumption
E(b
i
[ x
it
) = E(b
i
); t = 1; . . . ; T, (2.6)
the xed effects estimator is consistent for the PAE, b.
Consistency of the usual FE estimator relies heavily on assumption (2.5), which rules out traditional
simultaneity, time-varying measurement error, correlation between time-varying omitted factors (in u
it
) and
the elements of x
it
, and models with lagged dependent variables or other kinds of regressors where changes in
u
it
may feed back into changes in x
i;th
for hX1. In the case where b
i
= b, methods that rst eliminate a
i
and
then apply instrumental variablesusually, 2SLShave become a standard tool for the applied economist.
Here, we study such estimators but allow for individual-specic slopes, b
i
.
Let z
it
be a 1 L vector of instrumental variables, with LXK. Let z
it
be the detrended instruments from
the individual-specic regressions of z
it
on w
t
; t = 1; . . . ; T. Then we can estimate (2.4) using instruments z
it
for
unit i in time period t. Whether we just use pooled 2SLSthe estimator we focus on hereor a more
sophisticated generalized method of moments (GMM) estimator, the moment conditions we use are
E( z
/
it
v
it
) = 0; t = 1; . . . ; T. (2.7)
In the next section, we study consistency of the FE-2SLS estimator under conditions that relax those in
Wooldridge (2005a).
3. Conditions for consistent FE-IV estimation
In order to ensure that (2.7) holds, we place conditions separately on the relationship between the
instruments and idiosyncratic errors and the instruments and the unobserved effects. Plus, of course, there is
always a standard rank condition.
Assumption 3.1. With the denitions in Section 2,
E(u
it
[z
i1
; z
i2
; . . . ; z
iT
) = 0; t = 1; . . . ; T. (3.1)
Assumption 3.1 is stronger than we needas will be clear, E( z
/
it
u
it
) = 0; t = 1; . . . ; T would sufcebut (3.1)
is a natural strict exogeneity assumption on the instruments. Assumption 3.1 is common in simultaneous
equations models with panel data, as well as models with other kinds of endogeneity that induces correlation
between x
it
and u
it
, such as omitted variables and measurement error. Assumption 3.1 rules out lagged
dependent variables among the instrumentsas well as other nonstrictly exogenous instrumentsand so its
application to dynamic models is limited unless sufcient strictly exogenous instruments are available. When
z
it
= x
it
, so that the covariates are strictly exogenous, Wooldridge (2005a) included a
i
and b
i
in the
conditioning set, as in (2.5). When the unit-specic trend function is correctly specied, this stronger form of
the assumption is essentially harmless. But in Section 6 we will investigate the behavior of the FE-IV estimator
when the unit-specic trends have been misspecied.
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 541
The second component of the error term in (2.4) is x
it
d
i
, and we need assumptions such that z
it
is
uncorrelated with x
it
d
i
. This requires some care because x
it
contains endogenous elements. (That is, we allow
components of x
it
to be endogenous even after removing unit-specic intercepts and trends.) The rst
assumption mimics the key assumption from Wooldridge (2005a), except that we replace the covariates with
the instruments:
Assumption 3.2. b
i
is mean independent of all the unit-specic detrended z
it
, that is,
E(b
i
[ z
it
) = E(b
i
) = b; t = 1; . . . ; T. (3.2)
Because the z
it
are net either of a time average or, more generally, level and trend effects, Assumption 3.2
maintains mean independence of the heterogeneous slopes and deviations of the instruments from long-run
levels or trends. Of course, in the case where the instruments are assumed, in each time period, to be
independent of all heterogeneity, Assumption 3.2 automatically holds. Assumption 3.2 is practically much
weaker than full independence because it allows b
i
to be arbitrarily correlated with systematic components of
z
it
; we cover some examples in Section 4. (Wooldridge (2005a) contains a discussion for the case of strictly
exogenous x
it
.)
Generally, the richer is w
t
, the more likely (3.2) is to hold. For example, the usual FE-IV estimator takes out
time averages from the instruments, and this might not be enough to ensure (3.2) if the instruments are
trending differently across units i. On the other hand, adding more aggregate factors to w
t
reduces the
variation in z
it
, generally leading to less efcient IV estimators. Not surprisingly, in deciding what to include in
w
t
we confront the usual tradeoff between efciency and consistency.
Unfortunately, Assumptions 3.13.2 are not enough to conclude that the IV estimator is consistent. Instead,
we employ a constant conditional covariance assumption.
Assumption 3.3. For j = 1; . . . ; K;
Cov( x
itj
; b
ij
[ z
it
) = Cov( x
itj
; b
ij
); t = 1; . . . ; T. (3.3)
Importantly, (3.3) allows the detrended covariates and the random coefcient to be correlated, and the
covariance may change over time; in fact, there is no restriction on the temporal pattern of Cov( x
itj
; b
ij
). But
the covariance conditional on the detrended IVs is assumed not to depend on z
it
. (In any case, the covariances
Cov( x
itj
; b
ij
) do not depend on i because of random sampling in the cross-sectional dimension. As we are
conditioning only on z
it
, the restriction is that the covariance condition on z
it
does not depend on z
it
; we have
no need to place restrictions on other conditional covariances.)
Assumption 3.3 extends to the panel data case a condition used by Wooldridge (2003) for the pure cross-
sectional case. An important difference is that Assumption 3.3 applies to the detrended covariates and
instruments. Importantly, we allow the unconditional covariances to change arbitrarily over time. Of course, if
b
ij
= b
j
for all i, then (3.3) is trivially true because both sides are zero.
Assumptions 3.33.3 imply that the key orthogonality conditions (2.7) hold, and these conditions can be
used in a GMM framework. For simplicity, we focus here on the xed effects two stage least squares
estimator, FE-2SLS (interpreted in the general sense of eliminating a
i
from (2.1)). To ensure consistency of
FE-2SLS estimator we add a standard rank condition.
Assumption 3.4. (i) rank(
P
T
t=1
E( z
/
it
x
it
)) = K.
(ii) rank(
P
T
t=1
E( z
/
it
z
it
)) = L.
Practically speaking, the rst part of Assumption 3.4 is most important; it means that, after netting out
individual-specic trends, there is still sufcient correlation between the instruments and regressors. Part (ii)
requires sufcient variation in the detrended instruments. It would be violated if, say, we specify w
t
= (1; t)
and z
it
contains an element that is constant across t for all i (such as gender) or changes by the same value in
each time period (such as a persons age when the length of the sampling period is constant).
Proposition 3.1. Under Assumptions 3.13.4assumption4, the FE-IV estimator is consistent for b, provided a
full set of time period dummies is included in (2.4).
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 542
Proof. Under Assumption 3.2, E(d
ij
[ z
it
) = 0, j = 1; . . . ; K for all t, and so
E( x
itj
d
ij
[ z
it
) = Cov( x
itj
; d
ij
[ z
it
) = Cov( x
itj
; b
ij
[ z
it
).
But by Assumption 3.3, the conditional covariances equal the corresponding unconditional covariances, say
g
tj
, and so E( x
itj
d
ij
[ z
it
) = g
tj
, j = 1; . . . ; J, t = 1; . . . ; T. Since x
it
d
i
= x
it1
d
i1
x
it2
d
i2
x
itK
d
iK
, we have
shown that E( x
it
d
i
[ z
it
) = g
t1
g
tK
y
t
. Therefore, we can write x
it
d
i
= y
t
r
it
where E(r
it
[ z
it
) = 0,
t = 1; . . . ; T. Now we plug this expression for x
it
d
i
into Eq. (2.4):
y
it
= y
t
x
it
b (r
it
u
it
); t = 1; . . . ; T. (3.4)
As we have just shown, Assumptions 3.3.2 and 3.3 imply that E(r
it
[ z
it
) = 0. Assumption 3.1 implies that
E( u
it
[ z
it
) = 0. Thus, the composite error in (3.4) satises E(r
it
u
it
[ z
it
) = 0, t = 1; . . . ; T, and so any IV method
that uses instruments z
it
at time t consistently estimates b. In particular, under the rank condition in
Assumption 3.4, and standard nite moment conditions, the FE-2SLS estimator is consistent and

N
_
-
asymptotically normal. This completes the proof. &
Proposition 3.1 contains an important empirical lesson: unless there are very good reasons to the contrary,
one should include a full set of time effects in a xed effects IV analysis. Even if the model does not originally
contain separate time period interceptsitself a questionable premisethe estimating equation generally
should if one wants to allow correlated random slope coefcients.
Because the error term in (3.4), r
it
u
it
, is generally heteroskedastic and serially correlatedat a minimum
due to the presence of x
it
d
i
inference should be carried out using a fully robust variance matrix for
^
b.
Typically this is straightforward for pooled 2SLS where all instruments have been detrended prior to
estimation.
4. Examples
To see how Proposition 3.1 applies, suppose x
it
is linearly related to z
it
with heterogeneous linear trends for
each element of x
it
:
x
it
= g
i
C t h
i
W z
it
P q
it
; t = 1; . . . ; T. (4.1)
Initially, take w
t
= (1; t), so the regressors and instruments are linearly detrended before applying pooled
2SLS. Assume the instruments also have heterogeneous linear trends, which are removed by individual-specic
detrending. Then Assumption 3.2 simply requires that the idiosyncratic movements in z
it
are uncorrelated with
b
i
, a weak requirement on instrumental variables. For Assumption 3.3 , write x
it
= z
it
P q
it
, t = 1; . . . ; T, so
that Cov( x
it
; b
i
[ z
it
) = Cov[( z
it
P q
it
); b
i
[ z
it
] = Cov( q
it
; b
i
[ z
it
), t = 1; . . . ; T under Assumption 3.2. Thus,
provided
Cov( q
it
; b
i
[ z
it
) = Cov( q
it
; b
i
); t = 1; . . . ; T, (4.2)
we can use z
it
as IVs for x
it
to obtain a consistent estimate of the PAE, b, in Eq. (3.4). One might even assume
that (q
i1
; . . . ; q
iT
; b
i
) is independent of (z
i1
; . . . ; z
iT
), which is sufcient for (4.2) (as well as for Assumption 3.2).
It is possible that the FE-IV estimator is consistent even if we only demean the regressors and instruments,
provided the instruments satisfy a stronger exogeneity assumption. In other words, even though x
it
contains
individual-specic linear trends, we ignore that in our estimation procedure. To see why we can still get
consistency, demean x
it
to get
x
it
x
i
= [t (T 1)=2] h
i
W (z
it
z
i
)P (q
it
q
i
); t = 1; . . . ; T. (4.3)
Now, if [(q
it
q
i
); b
i
] is independent of (z
it
z
i
) for each t, and (3.2) holds for z
it
= (z
it
z
i
) and (4.2) also
holds. Therefore,
Cov(x
it
x
i
; b
i
[z
it
z
i
) = [t (T 1)=2]W
/
Cov(h
i
; b
i
) = Cov(x
it
x
i
; b
i
)
for each t, which means that Assumption 3.3 holds: while the conditional covariances are not generally zero,
or even constant over time, they do not depend on z
it
z
i
. So, the FE-IV estimator will be consistent provided
we include a full set of year dummies in estimation.
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 543
What happens if we have a binary endogenous variable, x
it
? Assumption 3.3 is unlikely to hold. To see why,
take the case w
t
1; t = 1; . . . ; T, which corresponds to the usual unobserved effects model with CRC. Then,
x
it
= x
it
x
i
; t = 1; . . . ; T, and we need E( x
it
d
i
[ z
it
) not to depend on z
it
. Now, by iterated expectations,
E( x
it
d
i
[ z
it
) = E[E( x
it
d
i
[d
i
; z
i
)[ z
it
] = E[d
i
E( x
it
[d
i
; z
i
)[ z
it
]. (4.4)
Standard models for binary responses, with z
it
strictly exogenous conditional on d
i
, would have P(x
it
= 1[d
i
; z
i
)
depending on d
i
and z
it
, in a nonlinear way. For concreteness, suppose P(x
it
= 1[d
i
; z
i
) follows a probit model,
P(x
it
= 1[d
i
; z
i
) = P(x
it
= 1[d
i
; z
it
) = F(a
0
a
1
d
i
z
it
a
2
). (4.5)
Then
E( x
it
[d
i
; z
i
) = F(a
0
a
1
d
i
z
it
a
2
) T
1
X
T
r=1
F(a
0
a
1
d
i
z
ir
a
2
) g
t
(d
i
; z
i
) (4.6)
and so, by (4.4),
E( x
it
d
i
[ z
it
) = E[d
i
g
t
(d
i
; z
i
)[ z
it
]. (4.7)
Even if d
i
is independent of z
it
a sensible strengthening of Assumption 3.3.2(4.7) generally depends on z
it
.
Thus, assuming E( x
it
d
i
[ z
it
) does not depend on z
it
is rather strong for a binary endogenous explanatory variable
x
it
. (Heckman (1997) contains a detailed discussion of the behavioral implications of this assumption in
different empirical studies.) In a cross-sectional context, Wooldridge (1997) proposes a modied set of
assumptions that are sufcient for consistent estimation of the ATE, b, with a binary endogenous variable, but,
applied to the current setup, P(x
it
= 1[d
i
; z
it
) would have to follow a linear probability model.
In a cross-sectional setting, Card (2001) shows that the analogue of Assumption 3.3 can also be violated in
the case of continuous explanatory variables due to heteroskedasticity in the variance matrix of (x
i
; b
i
) given z
i
.
(With a pure cross-section, there are no time subscripts and, of course, no unit-specic demeaning or
detrending.) In an earnings equation where x
i
includes schooling, Card rejects Cov(x
i
; b
i
[z
i
) = Cov(x
i
; b
i
) using
IQ score as a proxy for unobserved ability (an element of b
i
) and a binary indicator for college proximity as an
instrument for education. In our panel data setup, Assumption 3.1 allows Cov(x
it
; b
i
[z
it
) to depend on z
it
, as it
generally would if x
it
and z
it
contain persistent heterogeneity correlated with b
i
. Using a generalized xed
effects approach, we need only assume Cov( x
it
; b
i
[ z
it
) does not depend on z
it
, and this is much more plausible
when we think the unit-specic detrending successfully eliminates the time-constant heterogeneity in x
it
and z
it
.
More recently, in a cross-sectional setting, Wooldridge (2005b) proposes conditions that allow Cov(x
i
; b
i
[z
i
)
to depend on z
i
, but these do not apply directly to the panel data case with time-constant heterogeneity that
can be correlated with the covariates and instruments.
5. Finite sample behavior of the FE-IV estimator
In this section we provide evidence on the nite sample properties of FE-IV estimator of the PAE in a CRC
panel data model. Because one of the most commonly used applications of CRC panel data models is the
usual unobserved effects model with a random coefcient, we rst assume w
t
1, t = 1; . . . ; T in (2.1), as in
the second part of the rst example from Section 4. Also, for scalar processes x
it
and z
it
, we assume a linear
relationship between x
it
and z
it
, with a linear trend for x
it
. We use Monte Carlo simulations to draw the data
and check the properties of the estimator. The number of replications is 500, and the results of the experiment
are presented for cross-sectional sample sizes of 100, 400, and 800 for two time horizons, T = 5 and T = 10.
The population average values are b = 2 and a = 3. For t = 1; . . . ; T, the endogenous explanatory variable is
generated as
x
it
l
xz
z
it
l
xu
u
it
l
xa
a
i
xb
i
xtd
i

1 l
2
xz
l
2
xu
l
2
xa
x
2
(1 t)
2
q
e
it
, (5.1)
where u
it
, e
it
Normal(0; 1), a
i
Normal(a; 1), b
i
= b d
i
, d
i
Normal(0; s
2
b
), and l
xz
; l
xu
; l
xa
, and x are
constants. Further, the instrument is generated as z
it
= l
za
a
i

1 l
2
za
q
m
it
where a
i
is dened above
m
it
Normal(t; 1), and l
za
is the population correlation coefcient between z
it
and a
i
, t = 1; . . . ; T.
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 544
In our reported simulations we use s
2
b
= 1. When l
za
= 0, the coefcients l
xz
, l
xu
, and l
xa
from (5.1) are the
population correlation coefcients between x
it
and z
it
, x
it
and u
it
, and x
it
and a
i
, t = 1; . . . ; T, respectively. The
population correlation between x
it
and b
i
when l
za
= 0 is x(1 t), t = 1; . . . ; T. We use the coefcient on the
error term in (5.1) to ensure that x
it
has unit variance when l
za
= 0. When l
za
a0, Var(x
it
) = 1 2l
xz
l
za
l
xa
,
which is only slightly greater than one for our choices of the l parameters. The relevant covariances are
Cov(x
it
; u
it
) = l
xu
, Cov(x
it
; a
i
) = l
xz
l
za
l
xa
, and Cov(x
it
; z
it
) = l
xz
l
xa
l
za
. For the endogenous explanatory
variable dened in (5.1), Assumption 3.3 is met: Cov( x
it
; b
i
[ z
it
) = Cov( x
it
; b
i
) = x(1 t); t = 1; . . . ; T.
The dependent variable y
it
is generated as
y
it
= a
i
x
it
b
i
u
it
; t = 1; . . . ; T, (5.2)
where a
i
, b
i
, u
it
, and x
it
are dened above. Among other estimators, we obtain the FE-IV estimator in (5.2)
acting as if b
i
= b. Based on the rst example from Section 4, we know this FE-IV estimator is consistent for
x
it
generated as in (5.1) provided we include a full set of time dummies, even though we only demean the
regressor and the instrument while ignoring the individual-specic linear trend in the regressor.
Table 1 presents simulation results for the CRC model for l
xu
= 0:40, l
xa
= 0:20, l
xz
= 0:20, and l
za
= 0:25.
The implied correlation between x
it
and z
it
is about 0:245, which seems to be a reasonable value for panel data.
For comparison, we used a data set provided with Wooldridge (2002) on domestic route air fares for 1,149
routes in the United States for 19972000. (The data set is called AIRFARE.) The correlation between the
log of air fare (an endogenous explanatory variable in a passenger demand equation) and the instrumental
variable candidate, the concentration ratio on the route, is about 20:22, which has a magnitude in the range
of 0:245.
Panel A of Table 1 reports the simulation outcomes for T = 5, where x = 0:12, while Panel B covers the case
T = 10, where x = 0:06. When x = 0:12, the correlation between x
i1
and b
i
is slightly less than 0:24; when
x = 0:06, the correlation is just below 0:12. Columns 16 contain the mean, standard deviation (SD), root
mean squared error (RMSE), lower quartile (LQ), median, and upper quartile (UQ) of the PAE estimates
from 500 replications. Rows of the table report statistics for usual pooled ordinary least squares (POLS)
estimates on the original data, the usual xed effects estimates (FE-OLS), which is just pooled OLS on the
time-demeaned data, pooled instrumental variables (IV) estimates using the original data, the FE-IV estimates
without period dummy variables (FE-IV without dummies), and FE-IV estimates when a full set of period
dummy variables is included (FE-IV with dummies).
From the table we see that the POLS estimates are roughly 1.5 times larger than the true value of b in the
100, 400, and 800 observation samples. One source of bias of the POLS estimates is the correlation between
the unobserved heterogeneity a
i
and the regressor x
it
. A second source of bias in the POLS estimates is the
endogeneity of the regressor x
it
, with correlation coefcient r
xu
very close to 0:4. A third source of bias (and
inconsistency) is the correlation between x
it
and b
i
.
The within transformation eliminates a
i
, and so the correlation between x
it
and a
i
is not a source of bias for
the usual FE-OLS estimator. But FE-OLS still produces a biased estimator of b for the last two reasons
mentioned above. The bias in the FE-OLS estimator is much lower than for POLS, but the bias is still on the
order of 30%.
The pooled IV estimatorthat is, without removing time averages and without time period dummies
actually has a larger bias than the FE-OLS estimator, a nding that is not too surprising because the
instruments are correlated with a
i
. Using the FE transformation combined with IV eliminates the dependence
between z
it
and a
i
because z
it
= l
za
a
i

1 l
2
za
q
m
it
. Therefore, the FE-IV estimator (without time dummies)
has a smaller bias and considerably smaller RMSE than the pooled IV estimator. More importantly, the
FE-IV estimator with period dummies has the lowest RMSE among all estimators for all the sample sizes and
both time horizons. Plus, the RMSE of the FE-IV estimator with time dummies falls quickly as the sample
size, N, grows. Without period dummies, the FE-IV estimates of b are biased by at least 20%, and the bias
does not disappear as No. As T increases, the RMSE of the FE-IV estimator without dummies estimates
decreases but it is still higher than the one for the FE-IV estimates when the period dummy variables are
included. Thus, even though the structural model (5.2) does not contain a time trend, inclusion of a full set of
period dummies ensures the consistency of the FE-IV estimation.
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 545
Not surprisingly, the FE-OLS estimator has a smaller SD than the FE-IV estimator (both without time
dummies). Typically, methods that treat regressors as exogenous have substantially less sampling variation
than their IV counterparts because the correlation between the instrument and regressor is typically well below
one, as in the current simulation.
The difference between the FE-IV estimates with and without time dummies illustrates the trade-off
between bias and variance. The FE-IV estimates without time period dummy variables are always less variable
than the FE-IV with time dummies. This is hardly surprising, as including more explanatory variablesthe
time dummies in this casethat are correlated with the instrument induces multicollinearity into the IV
estimates. The instrument, z
it
, is constructed to be correlated with time dummies, and so the FE-IV estimator
with time dummies is less precise than that without. But, of course, the estimator without time dummies
suffers from substantial bias even though the structural model does not contain separate period intercepts. The
ARTICLE IN PRESS
Table 1
Estimator Time dummies? (1) (2) (3) (4) (5) (6)
Mean SD RMSE LQ Median UQ
A: Usual unobserved effects CRC model for b = 2 and T = 5
N = 100
POLS No 3.363 0.189 1.377 3.238 3.356 3.486
FE-OLS No 2.616 0.138 0.642 2.527 2.621 2.711
IV No 2.752 0.225 0.781 2.612 2.761 2.901
FE-IV No 2.423 0.214 0.484 2.288 2.429 2.558
FE-IV Yes 1.945 0.407 0.407 1.711 1.980 2.208
N = 400
POLS No 3.369 0.091 1.372 3.299 3.366 3.434
FE-OLS No 2.623 0.067 0.635 2.575 2.626 2.667
IV No 2.745 0.110 0.760 2.666 2.740 2.818
FE-IV No 2.428 0.096 0.455 2.362 2.423 2.498
FE-IV Yes 1.988 0.177 0.213 1.887 1.997 2.101
N = 800
POLS No 3.373 0.063 1.375 3.330 3.366 3.412
FE-OLS No 2.625 0.046 0.637 2.596 2.624 2.655
IV No 2.753 0.076 0.764 2.700 2.750 2.801
FE-IV No 2.436 0.068 0.458 2.389 2.437 2.480
FE-IV Yes 2.004 0.131 0.182 1.919 2.009 2.091
B: Usual unobserved effects CRC model for b = 2 and T = 10
N = 100
POLS No 3.204 0.157 1.223 3.097 3.195 3.314
FE-OLS No 2.534 0.106 0.562 2.469 2.531 2.603
IV No 2.397 0.123 0.440 2.324 2.395 2.475
FE-IV No 2.277 0.115 0.331 2.208 2.276 2.351
FE-IV Yes 2.013 0.283 0.313 1.841 2.020 2.210
N = 400
POLS No 3.196 0.077 1.202 3.146 3.193 3.247
FE-OLS No 2.528 0.056 0.545 2.490 2.527 2.565
IV No 2.392 0.061 0.417 2.450 2.393 2.431
FE-IV No 2.270 0.060 0.305 2.231 2.274 2.308
FE-IV Yes 1.995 0.138 0.186 1.901 2.002 2.092
N = 800
POLS No 3.194 0.054 1.200 3.155 3.194 3.224
FE-OLS No 2.525 0.040 0.541 2.498 2.523 2.551
IV No 2.388 0.042 0.410 2.357 2.387 2.416
FE-IV No 2.268 0.041 0.299 2.241 2.267 2.294
FE-IV Yes 1.992 0.100 0.160 1.926 1.993 2.062
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 546
RMSE for the FE-IV estimator that includes a full set of dummies is much lower than the estimator that
does not.
We also conducted simulations with more variability in the random coefcient, namely, s
2
b
= 4, so that the
SD of b
i
is double that in Table 1. The results of these simulations are not included here but are available on
request. With more variability in b
i
, the bias induced by failing to include time dummies in the FE-IV
estimation is more pronounced (even though, remember, the structural model does not include time effects).
For example, with T = 5, and N = 800, the RMSE of the FE-IV estimator without dummies is about 1:36,
compared with about 0:22 for the estimator that does include the dummies.
For the next set of simulations, we take w
t
(1; t); t = 1; . . . ; T, in (2.1), so that each cross-sectional unit has
its own linear trend. In particular, we generate y
it
as
y
it
= a
i0
a
i1
t x
it
b
i
u
it
; t = 1; . . . ; T, (5.3)
where a
i0
and a
i1
are independent Normal(a; 1) random variables and b
i
, and u
it
are dened above. The
endogenous explanatory variable x
it
is generated as
x
it
l
xz
z
it
l
xu
u
it
l
xa
(a
i0
a
i1
) xb
i
xtd
i

1 l
2
xz
l
2
xu
2l
2
xa
x
2
(1 t)
2
q
e
it
, (5.4)
and the instrument is generated as z
it
= l
za
a
i0

1 l
2
za
q
m
it
. Again, the coefcient on e
it
is chosen so that
Var(x
it
) = 1 if l
za
= 0. We use the same values for the l parameters as in Table 1, and we take s
b
= 1.
(Simulation ndings for the case s
b
= 2 are available on request). Because the structural model (5.3) contains a
time trend, the default is to include a full set of time period dummies in the various estimation methods. For
comparison, we include the FE-IV estimator without time period dummies.
The rows of Table 2 report statistics for POLS with time dummies, xed effects with time dummies, pooled
instrumental variables with time dummies, FE-IV estimates with time dummies, and FE-IV estimates without
time dummies. As in Table 1, the simulation ndings are unambiguous: xed effects IV with a full set of time
dummies is superior, by far, to the other estimation methods, for all combinations of N and T. Perhaps not
surprisingly, when y
it
is itself trending, the consequences of omitting aggregate time effects is much more
detrimental than in the previous case.
The simulation ndings are perhaps not too surprising: the only estimator that is essentially unbiased for the
PAE removes the unobserved effect (or, more generally, the individual-specic trends), includes a full set of
aggregate time effects, and instruments for the endogenous explanatory variable. Nevertheless, it is useful to
see that the theoretical ndings in Section 3 have practically important implications: the FE-IV estimator with
time dummies is robust to correlation between the random coefcients and the explanatory variable, at least
for assumptions that can be met by continuous endogenous explanatory variables.
6. Estimation with misspecied random trends
The consistency result in Proposition 3.1 assumes that the random trend in model (2.1) is actually w
t
a
i
, so
that unit-specic detrending in Eq. (2.1) eliminates the unobserved heterogeneity, a
i
. But what if we have the
individual-specic trends incorrect in the structural model?
It turns out that we can extend Proposition 3.1 to allow for misspecication in the random trends. We now
take the structural model to be
y
it
= g
i
(t) x
it
b
i
u
it
; t = 1; . . . ; T, (6.1)
where g
i
(t) is the unknown trend function for unit i. In estimation, we act as if g
i
(t) = w
t
a
i
, and we still apply
the FE-IV estimator that ignores variation in the slopes, b
i
. In other words, the estimator is exactly the same
one studied in Section 3 but under possible misspecication of the unit-specic trend function.
When we detrend (6.1), we no longer necessarily eliminate the random trend component. Instead,
y
it
= g
i
(t) x
it
b
i
u
it
= g
i
(t) x
it
b ( x
it
d
i
u
it
); t = 1; . . . ; T, (6.2)
where the double-dot notation indicates residuals from regressing on w
t
, t = 1; . . . ; T. Because we apply
instrumental variables estimation to (6.2), we adopt the same conditions in Proposition 3.1, but we now add
an assumption to handle the extra, unknown trend term, g
i
(t).
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 547
Assumption 6.1. The unit-specic trends are uncorrelated with the detrended instruments, that is,
Cov[ z
it
; g
i
(t)] = 0; t = 1; . . . ; T, (6.3)
where { z
it
: t = 1; . . . ; T] are the residuals from the regression z
it
on w
t
, t = 1; . . . ; T.
Proposition 6.1. In addition to Assumptions 3.13.4, assume Assumption 6.1 holds. Then the FE-IV estimator
is consistent for b, again provided a full set of time period dummies is included.
Proof. The proof is a simple modication of the proof of Proposition 3.1. With a full set of time period
dummies, we can assume that E[g
i
(t)] = 0 (because the time period dummies effectively demean g
i
(t) for each
t). The error term now contains g
i
(t), and so we need, in addition to the steps in Proposition 3.1,
E[
P
T
t=1
z
/
it
g
i
(t)] = 0. But
P
T
t=1
z
/
it
g
i
(t) =
P
T
t=1
z
/
it
g
i
(t), and so (6.3), along with the earlier assumptions,
ARTICLE IN PRESS
Table 2
Estimator Time dummies? (1) (2) (3) (4) (5) (6)
Mean SD RMSE LQ Median UQ
A: Random trend CRC model for b = 2 and T = 5
N = 100
POLS Yes 4.293 0.300 2.303 4.096 4.284 4.475
FE-OLS Yes 2.673 0.182 0.697 2.555 2.671 2.782
IV Yes 2.929 0.850 1.247 2.444 2.941 3.496
FE-IV Yes 2.000 0.626 0.642 1.635 2.057 2.383
FE-IV No 13.414 1.411 11.422 12.464 13.221 14.225
N = 400
POLS Yes 4.308 0.144 2.312 4.201 4.307 4.411
FE-OLS Yes 2.663 0.085 0.679 2.607 2.666 2.721
IV Yes 3.004 0.411 1.073 2.704 3.023 3.292
FE-IV Yes 2.013 0.269 0.301 1.835 2.019 2.204
FE-IV No 13.406 0.665 11.406 12.915 13.340 13.878
N = 800
POLS Yes 4.296 0.097 2.294 4.225 4.295 4.363
FE-OLS Yes 2.660 0.060 0.671 2.617 2.658 2.700
IV Yes 2.996 0.278 1.038 2.809 2.993 3.171
FE-IV Yes 1.996 0.187 0.223 1.874 2.005 2.130
FE-IV No 13.351 0.478 11.328 13.049 13.318 13.654
B: Random trend CRC model for b = 2 and T = 10
N = 100
POLS Yes 4.789 0.407 2.820 4.522 4.814 5.051
FE-OLS Yes 2.651 0.178 0.687 2.539 2.656 2.761
IV Yes 2.916 1.042 1.401 2.357 2.976 2.615
FE-IV Yes 1.968 0.619 0.641 1.603 2.001 2.384
FE-IV No 15.933 0.771 13.919 15.367 15.902 16.479
N = 400
POLS Yes 4.808 0.190 2.815 4.678 4.808 4.943
FE-OLS Yes 2.662 0.089 0.678 2.600 2.662 2.718
IV Yes 3.000 0.504 1.137 2.659 2.993 3.361
FE-IV Yes 1.981 0.311 0.338 1.767 1.978 2.203
FE-IV No 15.900 0.406 13.875 15.633 15.890 16.177
N = 800
POLS Yes 4.788 0.144 2.784 4.682 4.779 4.885
FE-OLS Yes 2.663 0.062 0.674 2.618 2.660 2.703
IV Yes 3.000 0.360 1.061 2.759 3.026 3.243
FE-IV Yes 1.997 0.201 0.234 1.855 1.998 2.132
FE-IV No 15.904 0.289 13.888 15.693 15.895 16.098
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 548
impliesE(
P
T
t=1
z
/
it
v
it
) = 0, where v
it
= g
i
(t) x
it
d
i
u
it
. This orthogonality condition implies consistency of
FE-2SLS under the rank condition Assumption 3.4. &
Proposition 6.1 is a rather straightforward extension of Proposition 3.1, but its implications are practically
important. If our choice of w
t
effectively removes the unit-specic heterogeneity in the instruments, then the
fact that the individual-specic trends in the structural model might be misspecied does not affect our ability
to consistently estimate b. In most applications our interest is in b, and Proposition 6.1 implies some
additional robustness of the FE-IV estimator. Our ability to determine the trends in z
it
, without confounding
factors, is likely to be easier than specifying the trend in (6.1), where we must deduce the random trends
affecting y
it
that are not due to trends in x
it
.
As a general example of where Assumption 6.1 is reasonable, assume that
z
it
= w
t
H
i
q
it
; t = 1; . . . ; T, (6.4)
where {q
it
; t = 1; . . . ; T] is a general time series process and H
i
is a J L matrix of unobserved heterogeneity.
Then z
it
= q
it
so, provided the idiosyncratic movements {q
it
; t = 1; . . . ; T] are uncorrelated with g
i
(t)a
reasonable assumptionAssumption 6.1 holds. Note that H
i
is allowed to be arbitrarily correlated with the
unknown trend functions g
i
(t). This example makes the point that if w
t
adequately captures the trends in the
exogenous variables then we need not have g
i
(t) correctly modeled.
We can also apply Proposition 6.1 to a class of models with only one source of heterogeneity but where the
effect of the heterogeneity on y
it
changes over time in an unrestricted manner. Specically, the model is
y
it
= Z
t
c
i
x
it
b
i
u
it
; t = 1; . . . ; T, (6.5)
so that g
i
(t) = Z
t
c
i
for unknown constants Z
t
(sometimes called the factor loads). This model applies to wage
equations when the return to unobserved ability, c
i
, varies over time, or to a production function when
unobserved managerial skill has time-varying effects on rm output. Ahn et al. (2001) (ALS) study (6.5) with
b
i
= b and with strictly exogenous regressors (z
it
= x
it
). With strictly exogenous covariates, we can draw
conclusions for estimating b when b
i
varies across i and the trend function used in estimation differs from Z
t
c
i
.
Condition (6.3) becomes
Cov( x
it
; c
i
) = 0; t = 1; 2; . . . ; T, (6.6)
which is a pretty weak assumption concerning the relationship between the detrended regressors and the time-
constant heterogeneity in the equation of interest, (6.5). In other words, consistency of the (generalized) FE
estimator for b = E(b
i
) is ensured when the explanatory variables are strictly exogenous (with respect to
{u
it
: t = 1; . . . ; T]), E(b
i
[ x
it
) = E(b
i
) for all t, (6.6) holds, and we include a full set of time period dummies.
Importantly, these assumptions do not restrict the second moment matrix of u
i
, either conditionally or
unconditionally. Proposition 6.1 allows us to conclude the FE-IV estimator is consistent in some applications
where some elements of x
it
are correlated with u
it
.
In some cases, interest lies in estimating the Z
t
in model (6.5) (where a normalization, such as Z
1
= 1, is
needed for identication). For the case of strictly exogenous covariates, ALS study estimators of b and the Z
t
when b
i
= b. They propose conditional least squares, which is consistent only when u
i
has a scalar variance
matrix, as well as GMM procedures that allow the second moments of u
i
to be unspecied. Our analysis here
shows that an initial consistent estimator of b is available under the extra assumption (6.6). (The usual FE
estimator is neither more nor less robust than ALSs conditional least squares estimator: Assumption (6.6) is
very different from second moment assumptions on u
i
). If one then imposes E(u
i
[x
i
; c
i
) = 0 and
Var(u
i
[x
i
; c
i
) = s
2
u
I
T
, the residuals ^ v
it
= y
it
x
it
^
b can be used to estimate s
2
c
; s
2
u
, and the Z
t
by nding the
variance matrix of v
i
, where v
it
= Z
t
c
i
u
it
, as a function of (s
2
c
; s
2
u
; Z
2
; . . . ; Z
T
). Alternatively, we can drop the
second moment assumption and use two-step GMM estimators to simplify estimation of the Z
t
after
preliminary estimation of b. As our goal here is not in estimating theunit-specic trend function, we do not
pursue these possibilities in detail.
Sometimes, one wants to estimate the heterogeneity in the trend functions g
i
(t). While we have nothing
specic to say about this problemit would require a large T as well as large N frameworkProposition
6.1 might be helpful in future work. In particular, again take the case where b
i
= b for all i. Then we can rst
obtain a consistent estimator of b using FE-IV (or just FE when the x
it
are strictly exogenous), and again
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 549
obtain the residuals, y
it
x
it
^
b. We can use time series of residuals to estimate g
i
(t) for each unit i. Any
discussion of consistency requires T o. In conducting inference on the g
i
(t) we might be able to ignore the
estimation error in
^
b if we allow N to grow fast enough relative to T.
7. Unbalanced panels and a test for selection bias
Unbalanced panel data sets are common, and it is useful to know when applying FE-IV to an unbalanced
panel nevertheless results in consistent estimation of b = E(b
i
). We extend the framework in Semykina and
Wooldridge(2005) to allow for general random trends as well as random slope coefcients in the context of
model (2.1). In particular, unlike in the previous section, we assume that we have the individual-specic trends
correctly specied. (Allowing for a misspecied trend means we would have to assume selection does not
depend on unobserved heterogeneity). Unlike in Semykina and Wooldridge(2005), we do not consider testing
or correcting for selection that depends on idiosyncratic factors.
For each random draw i from the cross-section, let s
i
= (s
i1
; . . . ; s
iT
) be the vector of selection indicators
such that s
it
= 1 if we use time period t for unit i. Now, the individual-specic detrending can be done only
using the time periods with s
it
= 1. Let T
i
=
P
T
t=1
s
it
be the number of observed time periods for unit i. Then,
assuming that T
i
4J, we obtain y
it
as
y
it
= y
it

X
T
r=1
s
ir
w
/
r
w
r
!
1
X
T
r=1
s
ir
w
/
r
y
ir
!
. (7.1)
Of course, we observe y
it
only when s
it
= 1. With similar denitions for x
it
and z
it
, we can write the FE-2SLS
estimator on the unbalanced panel as in Semykina and Wooldridge(2005):
^
b = N
1
X
N
i=1
X
T
t=1
s
it
x
/
it
z
it
!
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
z
it
!
1
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
x
it
!
2
4
3
5
1
N
1
X
N
i=1
X
T
t=1
s
it
x
/
it
z
it
!
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
z
it
!
1
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
y
it
!
= b N
1
X
N
i=1
X
T
t=1
s
it
x
/
it
z
it
!
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
z
it
!
1
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
x
it
!
2
4
3
5
1
N
1
X
N
i=1
X
T
t=1
s
it
x
/
it
z
it
!
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
z
it
!
1
N
1
X
N
i=1
X
T
t=1
s
it
z
/
it
v
it
!
, (7:2)
where v
it
= x
it
d
i
u
it
. From Semykina and Wooldridge(2005), it sufces that E(s
it
z
/
it
v
it
) = 0 for all t, along
with a rank condition on the selected sample. The rst assumption we impose is strict exogeneity of selection
conditional on (Z
i
; a
i
; b
i
).
Assumption 7.1. With the previous denitions,
E(u
it
[z
i
; s
i
) = 0; t = 1; . . . ; T. (7.3)
Assumption 7.1 formalizes the notion that selection is not correlated with the idiosyncratic part of the error
term. Eq. (7.3) allows unrestricted correlation between s
i
and (a
i
; b
i
), as well as with the instruments, z
i
.
However, the next assumption rules out systematic correlation between the random slopes and selection:
Assumption 7.2. b
i
is mean independent of all the detrended z
it
and the selection indicators, that is,
E(b
i
[ z
it
; s
it
) = E(b
i
) = b; t = 1; . . . ; T. (7.4)
Similarly, we add the selection indicator to Assumption 3.3 as well as to the rank condition,
Assumption 3.4:
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 550
Assumption 7.3. For j = 1; . . . ; K,
Cov( x
itj
; b
ij
[ z
it
; s
it
) = Cov( x
itj
; b
ij
); t = 1; . . . ; T. (7.5)
Assumption 7.4. (i) rank(
P
T
t=1
E(s
it
z
/
it
x
it
)) = K.
(ii) rank(
P
T
t=1
E(s
it
z
/
it
z
it
)) = L.
Proposition 7.1. Under Assumption 7.17.4, the FE-IV estimator applied to the selected sample is consistent
for b, provided a full set of time period dummies is included.
The proof of Proposition 7.1 is a straightforward modication of Proposition 3.1, once we add selection
indicators, as in Eq. (7.2). With individual-specic slopes, the conditions for consistent estimation by xed
effects methods under sample selection are not as straightforward as the case b
i
= b. With common slopes we
can get away with Assumptions 6.1.1 and 7.4 while allowing selection to depend in an unrestricted way on a
i
.
Consequently, it is useful to have a simple method of testing for selection bias in unbalanced panels when we
think the coefcients might vary by individual.
We focus on an alternative to Assumption 7.2. Because z
it
depends on (z
i1
; . . . ; z
iT
) and (s
i1
; . . . ; s
iT
),
sufcient for (7.4) is
E(b
i
[z
i1
; . . . ; z
iT
; s
i1
; . . . ; s
iT
) = E(b
i
) = b; t = 1; . . . ; T, (7.6)
and, as a practical matter, there may not be much difference between (7.6) and (7.4). Eq. (7.6) suggests an
alternative to the null of no selection:
E(b
i
[z
i1
; . . . ; z
iT
; s
i1
; . . . ; s
iT
) = E(b
i
[s
i1
; . . . ; s
iT
) = E(b
i
[T
i
), (7.7)
where the rst equality implies that we are looking specically for correlation between b
i
and selection and
the second equality means that b
i
(which does not vary over time) is correlated only with the total number
of time periods available for unit ian assumption very similar to the Mundlak-type assumption
E(b
i
[z
i1
; . . . ; z
iT
) = E(b
i
[ z
i
).
The alternative (7.7) is convenient for obtaining a specication test because E(b
i
[T
i
) takes on at most T
different values. With dim(w
t
) = J, we can only use the cross-section observations with T
i
XJ 1. Therefore,
let d
i;J1
= 1[T
i
= J 1]; . . . ; d
i;T1
= 1[T
i
= T 1] be dummy variables for the possible values taken on by
T
i
, with T
i
= T taken as the base group. Then we augment the original equation by including interactions
between these dummy variables and the explanatory variables:
y
it
w
t
a
i
x
it
b d
i;J1
x
it
d
1
d
i;J2
x
it
d
2
d
i;T1
x
it
d
TJ1
v
it
. (7.8)
We remove a
i
by regressing y
it
and each element of x
it
on w
t
using the selected sample (for each i). The
detrended equation is
y
it
x
it
b d
i;J1
x
it
d
1
d
i;J2
x
it
d
2
d
i;T1
x
it
d
TJ1
v
it
(7.9)
and we estimate this equation by FE-2SLS using instruments ( z
it
; d
i;J1
z
it
; d
i;J2
z
it
; . . . ; d
i;T1
z
it
). The null
hypothesis of no selection bias is H
0
: d
1
= d
2
= = d
TJ1
= 0. A fully robust Wald statistic is appropriate.
Naturally, one may be selective about which elements of x
it
are thought to have time-varying coefcients, and
only those (detrended) elements would be used in constructing the interactions in (7.9).
This simple test is attractive because it tests the most pertinent issue: Do the estimated slope coefcients
differ signicantly over the subpanels that use different numbers of time periods? If they do, then not only is
there evidence that the slopes vary across individual, but that selection is correlated with those slopes and
therefore causes inconsistency in the FE-2SLS estimator on the unbalanced panel.
8. Conclusion
This paper suggests a set of conditions sufcient for applying the standard IV approach to the estimation of
PAEs in a CRC panel data model with continuous endogenous explanatory variables. Assumptions 3.13.4
ensure consistent FE-IV estimation of the population averaged slopes, b, even ignoring individual-specic
slopes. Monte Carlo simulations suggest the proposed FE-IV estimator of the PAEs performs better than
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 551
other estimators in nite samples for the case of continuous endogenous explanatory variables, provided a full
set of time period dummies is included in the model.
A natural direction for future work is to relax homoskedasticity of E( x
it
d
i
[ z
it
); Card (2001) showed how the
analogous assumption can fail in a cross-sectional environment. Recently, Murtazashvili (2006) shows how
this assumption can be relaxed using a control function approach by putting restrictions on the reduced forms
of the endogeneous elements of x
it
restrictions that can be met for continuous variablesand by modeling
the conditional covariances.
Acknowledgment
Three anonymous referees provided very helpful comments, as did participants at the Michigan State
University econometrics workshop.
References
Angrist, J.D., 1991. Instrumental variables estimation of average treatment effects in econometrics and epidemiology. National Bureau of
Economics Research Technical Working Paper Number 115.
Ahn, S.C., Lee, Y.H., Schmidt, P., 2001. GMM estimation of linear panel data models with time-varying individual effects. Journal of
Econometrics 101, 219255.
Card, D., 2001. Estimating the return to schooling: progress on some persistent econometric problems. Econometrica 52, 11991218.
Heckman, J.J., 1997. Instrumental variables: a study of implicit behavioral assumptions used in making program evaluations. Journal of
Human Resources 32, 441462.
Heckman, J.J., Vytlacil, E., 2005. Structural equations, treatment effects, and econometric policy evaluation. Econometrica 73, 669738.
Imbens, G., Angrist, J.D., 1994. Identication and estimation of local average treatment effects. Econometrica 62, 467476.
Murtazashvili, I., 2006. A control function approach to estimation of correlated random coefcient panel data models. Mimeo, Michigan
State University Department of Economics.
Semykina, A., Wooldridge, J.M., 2005. Estimating panel data models in the presence of endogeneity and selection: theory and application.
Mimeo, Michigan State University Department of Economics.
Wooldridge, J.M., 1997. On two stage least squares estimation of the average treatment effect in a random coefcient model. Economics
Letters 56, 129133.
Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA.
Wooldridge, J.M., 2003. Further results on instrumental variables estimation of the average treatment effect in the correlated random
coefcient model. Econometric Theory 79, 185191.
Wooldridge, J.M., 2005a. Fixed effects and related estimators in correlated random coefcient and treatment effect panel data models.
Review of Economics and Statistics 87, 385390.
Wooldridge, J.M., 2005b. Unobserved heterogeneity and estimation of average partial effects. In: Andrews, D.W.K., Stock, J.H. (Eds.),
Identication and Inference for Econometric Models: A Festschrift in Honor of Thomas J. Rothenberg. Cambridge University Press,
Cambridge, pp. 2755.
ARTICLE IN PRESS
I. Murtazashvili, J.M. Wooldridge / Journal of Econometrics 142 (2008) 539552 552

Das könnte Ihnen auch gefallen