(Advances in Econometrics) Badi H. Baltagi-Nonstationary Panels, Panel Cointegration, and Dynamic Panels-JAI Press (NY) (2000)

LIST OF CONTRIBUTORS
Badi H. Baltagi Texas A&M University, Department of Economics, College Station, TX 77843-4228, USA. E-mail: badi@econ.tamu.edu Department of Economics and International Business, Sam Houston State University, Huntsville, TX 77341, USA Institute for Fiscal Studies and University College London, UK. E-mail: r.blundell@ucl.ac.uk Institute for Fiscal Studies and Nufeld College, Oxford, UK. E-mail: steve.bond@nufeld.ox.ac.uk Humboldt University Berlin, Institute of Statistics and Econometrics, Spandauer Strasse 1, D-10178 Berlin, Germany. Fax: + 49.30.2093.5712; E-mail: breitung@wiwi.hu-berlin.de National Cheng-Kung University, Institute of International Business, Tainan, Taiwan. Fax: 886-6-2766459; E-mail: mchiang@mail.ncku.edu.tw University Maastricht, Department of Quantitative Economics, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Fax: + 31-43-388 48 74 Department of Economics, Emory University, Atlanta, GA 30322-2240, USA. Fax: 404-727-4639; E-mail: nislam@emory.edu
vii
M. Douglas Berg
Richard Blundell
Stephen Bond
Jrg Breitung
Min-Hsien Chiang
Alain Hecq
Nazrul Islam
viii
Chihwa Kao
Syracuse University, Center for Policy Research, Syracuse, NY 13244-1020, USA. Fax: 315-443-1081; E-mail: cdkao@maxwell.syr.edu University of Helsinki, Department of Economics, P.O. Box 54 (Unioninkatu 37), FIN-00014 University of Helsinki, Finland. Fax: + 358-9-1917980; E-mail: heikki.kauppi@helsinki. Department of Economics, Texas A&M University, College Station, TX 77843 and Department of Economics, University of Guelph, Guelph, Ontario, N1G 2W1 Canada. E-mail: qi@econ.tamu.edu Department of Economics, University of Houston, Houston, TX 77204-5882, USA. Fax: (713) 743-3798; E-mail: cjmurray@uh.edu University Maastricht, Department of Quantitative Economics, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Fax: + 31-43-388 48 74 Department of Economics, University of Houston, Houston, TX 77204-5882, USA. Fax: (713) 743-3798; E-mail: dpapell@uh.edu Indiana University, Department of Economics, Bloomington, IN 47405, USA. E-mail: ppedroni@indiana.edu Department of Economics, University of California, Riverside, CA 92521, USA
Heikki Kauppi
Qi Li
Chris Murray
Franz C. Palm
David H. Papell
Peter Pedroni
Aman Ullah
ix
Jean-Pierre Urbain
University Maastricht, Department of Quantitative Economics, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Fax: + 31-43-388 48 74; E-mail: j.urbain@ke.unimaas.nl Institute for Fiscal Studies, 7 Ridgmount Street, London WC1E 7AE, UK. Fax: + 44.(0)20.7323.4780; E-mail: f.windmeijer@ifs.org.uk Department of Finance and Managerial Economics, State University of New York at Buffalo, Buffalo, NY 14260, USA Department of Economics, State University of New York at Buffalo, Buffalo, NY 14260, USA. Fax: 716-645-2127; E-mail: yyin@buffalo.edu
Frank Windmeijer
Showen Wu
Yong Yin
INTRODUCTION
Badi H. Baltagi, Thomas B. Fomby and R. Carter Hill
Twenty two years ago, the rst special issue on panel data econometrics was published by the Annales de lINSEE. This consisted of two volumes containing a list of whos who in economics and econometrics of panel data that was edited by Mazodier (1978). Since then, several books on panel data have been written including the econometric society monograph by Hsiao (1986), a two volume collection of classic papers on the subject by Maddala (1993), a Handbook, which in its second edition contained 33 chapters edited by Matyas & Sevestre (1996) and a textbook by Baltagi (1995a). Several special issues of journals with a panel data theme have also appeared since 1978, those include Raj & Baltagi (1992), Matyas (1992), Carraro, et al. (1993), Baltagi (1995b), Sevestre (1999) and Banerjee (1999). There have been nine international conferences on panel data since the rst conference at INSEE, the last one was held at the University of Geneva in June, 2000. Panel data econometrics continues to have an important impact on todays empirical economics studies. A Journal of Economic Literature search returned 2780 citations using the words panel data between 1980 and 2000. This volume is dedicated to two recent intensive areas of research in the econometrics of panel data: nonstationary panels and dynamic panels, see the survey chapter by Baltagi & Kao in this volume. The volume includes eleven refereed chapters on this subject written by twenty authors. The editors are grateful to the authors and referees for their cooperation. The chapter by Baltagi & Kao surveys the nonstationary panels, cointegration in panels and dynamic panels literature. In particular, panel unit root tests are considered rst and several important chapters are reviewed including a summary of the nite sample properties of these unit roots tests obtained from
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 15. Copyright 2000 by Elsevier Science Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0688-2
BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL
extensive simulations. Also, spurious regressions in panel data are considered followed by panel cointegration tests with a summary of the nite sample properties of these cointegration tests using Monte Carlo experiments. Next, estimation and inference in panel cointegration models is considered and the chapter concludes with a review of recent developments in dynamic panel data models that have occurred over the last ve years. The chapter by Blundell, Bond & Windmeijer reviews recent developments in the estimation of dynamic panel data models using generalized method of moments (GMM). In particular, this chapter focuses on the system GMM estimator derived by Blundell & Bond (1998) which relies on relatively mild restrictions on the initial condition process. This system GMM estimator encompasses the GMM estimator based on the non-linear moment conditions available in the dynamic error components model. Monte Carlo experiments and asymptotic variance calculations show that this extended GMM estimator can offer considerable efciency gains in situations where the rst differenced GMM estimator performs poorly. The chapter by Pedroni develops methods for estimating and testing hypotheses for cointegrating vectors in dynamic panels. In particular, this chapter proposes methods based on fully modied OLS principles which account for considerable heterogeneity across individual members of the panel. The asymptotic properties of various estimators are compared based on pooling along the within and between dimensions of the panel. Monte Carlo simulations show that the group mean estimator is well behaved even in relatively small samples under a variety of scenarios. The chapter by Hecq, Palm & Urbain extends the concept of serial correlation common features analysis to nonstationary panel data models. This analysis is motivated both by the need to study and test for common structures and comovements in panel data with autocorrelation present and by an increase in efciency due to pooling. The authors propose sequential testing procedures and test their performance using a small scale Monte Carlo. Concentrating upon the xed effects model, they dene homogeneous panel common feature models and give a series of steps to implement these tests. These tests are used to investigate the liquidity constraints model for 22 OECD and G7 countries. The presence of a panel common feature vector is rejected at the 5% nominal level. The chapter by Breitung studies the local power of panel unit root test statistics against a sequence of local power alternatives. In particular, this chapter nds that the Levin & Lin (1993) (LL) and Im, Pesaran & Shin (1997) (IPS) tests suffer from severe loss of power if individual specic trends are
Introduction
included. Breitung suggests a test statistic that does not employ a bias adjustment whose power is substantially higher than that of LL or the IPS tests using Monte Carlo experiments. This chapter also nds that the power of the LL and IPS tests is sensitive to the specication of the deterministic terms. The chapter by Kao & Chiang studies the limiting distributions of ordinary least squares (OLS), fully modied OLS (FMOLS) and dynamic OLS (DOLS) estimators in a panel cointegrated regression model. This chapter shows that the OLS, FMOLS and DOLS estimators are all asymptotically normally distributed. However, the asymptotic distribution of the OLS estimator has a non-zero mean. Extensive Monte Carlo experiments are performed which show that the OLS estimator has a non-negligible bias in nite samples, the FMOLS estimator does not improve on the OLS estimator in general, and the DOLS estimator outperforms both OLS and FMOLS. The chapter by Murray & Papell proposes a panel unit roots test in the presence of structural change. In particular, this chapter proposes a unit root test for non-trending data in the presence of a one-time change in the mean for a heterogeneous panel. The date of the break is endogenously determined. The resultant test allows for both serial and contemporaneous correlation, both of which are often found to be important in the panel unit roots context. Murray & Papell conduct two power experiments for panels of non-trending, stationary series with a one-time change in means and nd that conventional panel unit root tests generally have very low power. Then they conduct the same experiment using methods that test for unit roots in the presence of structural change and nd that the power of the test is much improved. The chapter by Kauppi develops a new limit theory for panel data that may be cross sectionally heterogeneous in a fairly general way. This limit theory builds upon the concepts of joint convergence in probability and in distribution for double indexed processes by Phillips & Moon (1999a). This limit theory is applied to a panel regression model with regressors that are generated by an autoregressive process with a root local to unity. The main results are the following: (i) the usual pooled panel OLS estimator is invalid for inference, (ii) a bias corrected pooled OLS proves to be NT consistent with an asymptotic normal distribution centered on the true parameter value irrespective of whether the regressors have near or exact unit roots. This positive result holds only in the special case where the model does not exhibit any deterministic effects, such as individual intercepts. (iii) The fully modied panel estimator of Phillips & Moon (1999a) is also subject to severe bias effects if the regressors are nearly rather than exactly cointegrated. These theoretical results are conrmed using Monte Carlo results.
BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL
The chapter by Yin & Wu proposes stationarity tests for a heterogeneous panel data model. The authors consider the case of serially correlated errors in the level and trend stationary models. The proposed panel tests utilize the Kwaitkowski, Phillips, Schmidt & Shin (1992) test and the Leybourne & McCabe (1994) test from the time series literature. Two different ways of pooling information from the independent tests are used. In particular, the group mean and the Fisher type tests are used to develop the panel stationarity tests. Monte Carlo experiments are performed that reveal good small sample performance in terms of size and power. The chapter by Berg, Li & Ullah considers the problem of estimating a semiparametric partially linear dynamic panel data model with disturbances that follow a one-way error component structure. Two new semiparametric instrumental variable (IV) estimators are proposed for the coefcient of the parametric component. These are shown to be more efcient than the ones suggested by Li & Stengos (1996) and Li & Ullah (1998) because they make full use of the error component structure. This is conrmed using Monte Carlo experiments. The chapter by Islam conducts a Monte Carlo study to investigate the small sample properties of dynamic panel data estimators. Although there are extensive Monte Carlo studies on this subject, this study customizes the design to the estimation of the growth convergence equation using the SummersHeston data. Islam concludes that the OLS estimation of the growth-convergence equation is likely to give misleading results. At the same time, indiscriminate use of panel estimators is risky and one should make judicious choice of panel estimators.
REFERENCES
Only references that are not cited later in the volume are given here. Baltagi, B. H. (1995b). Editors Introduction: Panel Data. Journal of Econometrics, 68, 14. Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of Economics and Statistics, 61, 607629. Carraro, C., Peracchi, F., & Weber, G. (Eds.) (1993). The Econometrics of Panels and Pseudo Panels. Journal of Econometrics, 59, 1211. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Maddala, G. S. (Ed.) (1993). The Econometrics of Panel Data. Vols. 1 and 2. Cheltenham: Edward Elgar. Matyas, L. (Ed.) (1996). Modelling Panel Data. Structural Change and Economic Dynamics, 3, 291384.
Introduction
Matyas, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: Handbook of Theory and Applications. Dordrecht: Kluwer Academic Publishers. Mazodier P. (Ed.) (1978). The Econometrics of Panel Data. Annales de IINSEE, 30/31. Raj, B., & Baltagi, B. (1992). Editors Introduction and Overview: Panel Data Analysis. Empirical Economics, 17, 18. Sevestre, P. (1999). 19771997: Changes and Continuities in Panel Data. Annales DEconomie et de Statistique, 5556, 1525.
NONSTATIONARY PANELS, COINTEGRATION IN PANELS AND DYNAMIC PANELS: A SURVEY

Badi H. Baltagi and Chihwa Kao
ABSTRACT
This chapter provides an overview of topics in nonstationary panels: panel unit root tests, panel cointegration tests, and estimation of panel cointegration models. In addition it surveys recent developments in dynamic panel data models.
I. INTRODUCTION
Two important areas in the econometrics of panel data that have received much attention recently are dynamic panel data1 and nonstationary panel time series models.2 This special issue focuses on these two topics. With the growing use of cross-country data over time to study purchasing power parity, growth convergence and international R&D spillovers, the focus of panel data econometrics has shifted towards studying the asymptotics of macro panels with large N (number of countries) and large T (length of the time series) rather than the usual asymptotics of micro panels with large N and small T. In fact, the limiting distributions of double indexed integrated processes had to be developed, see Phillips & Moon (1999a). The fact that T is allowed to increase to innity in macro panel data, generated two strands of ideas. The rst rejected the homogeneity of the regression parameters implicit in the use of a pooled
BADI H. BALTAGI & CHIHWA KAO
regression model in favor of heterogeneous regressions, i.e. one for each country, see Pesaran & Smith (1995), Im, Pesaran & Shin (1997), Lee, Pesaran & Smith (1997), Pesaran, Shin & Smith (1999) and Pesaran & Zhao (1999) to mention a few. This literature critically relies on T being large to estimate each countrys regression separately. Another strand of literature applied time series procedures to panels, worrying about non-stationarity, spurious regressions and cointegration. Adding the cross-section dimension to the time-series dimension offers an advantage in the testing for nonstationarity and cointegration. The hope of the econometrics of nonstationary panel data is to combine the best of both worlds: the method of dealing with nonstationary data from the time series and the increased data and power from the cross-section. The addition of the cross-section dimension, under certain assumptions, can act as repeated draws from the same distribution. Thus as the time and cross-section dimension increase panel test statistics and estimators can be derived which converge in distribution to normally distributed random variables. However, the use of such panel data methods are not without their critics, see Maddala, Wu & Liu (2000) who argue that panel data unit root tests do not rescue purchasing power parity (PPP). In fact, the results on PPP with panels are mixed depending on the group of countries studied, the period of study and the type of unit root test used. More damaging is the argument by Maddala et al. that for PPP, panel data tests are the wrong answer to the low power of unit root tests in single time series. After all, the null hypothesis of a single unit root is different from the null hypothesis of a panel unit root for the PPP hypothesis. Using the same line of criticism, Maddala (1999) argued that panel unit root tests did not help settle the question of growth convergence among countries. However, it was useful in spurring much needed research into dynamic panel data models. Also, Quah (1996) who argued that the basic issues of whether poor countries catch up with the rich can never be answered by the use of traditional panels. Instead, Quah suggested formulating and estimating models of income dynamics. One can nd numerous applications of time series methods applied to panels in recent years. Examples from the purchasing power parity literature include Bernard & Jones (1996), Jorion & Sweeney (1996), MacDonald (1996), Oh (1996), Wu (1996), Coakley & Fuertes (1997), Culver & Papell (1997), Papell (1997), OConnell (1998), Choi (1999a), Andersson & Lyhagen (1999), and Canzoneri, Cumby & Diba (1999) to mention a few. On health care expenditures, see McCoskey & Selden (1998), and Gerdtham & Lthgren (1998). On growth and convergence, see Islam (1995), Evans & Karas (1996), Sala-i-Martin (1996), Lee, Pesaran & Smith (1997), and McCoskey & Kao
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
(1999a). On international R&D spillovers, see Funk (1998) and Kao, Chiang & Chen (1999). On exchange rate models, see Groen & Kleibergen (1999), and Groen (1999). On savings and investment models, see Coakely, Kulasi & Smith (1996) and Moon & Phillips (1998). The rst part of this chapter surveys some of the developments in nonstationary panel models that have occurred since the middle of 1990s. Two other recent surveys on this subject include Phillips & Moon (1999b) on multiindexed processes and Banerjee (1999) on panel unit roots and cointegration tests. We will pay attention to the following three topics: (1) panel unit root tests, (2) panel cointegration tests, and (3) estimation and inference in the panel cointegration models. The discussion of each topic will be illustrated by examples taken from the aforementioned list of references. Section 2 reviews panel unit root tests, while Section 3 discusses the panel spurious models. Section 4 considers the panel cointegration tests, while Section 5 discusses panel cointegration models. Section 6 reviews some recent developments in dynamic panels and Section 7 gives our conclusion. A word on notation. We write the integral 01W(s)ds, as W when there is no ambiguity over limits. We dene 1/2 to be any matrix such that p = (1/2)(1/2), use to denote weak convergence, to denote convergence in probability, I(0) and I(1) to signify a time series that is integrated of order zero and one, respectively, and WZ(r) = W(r) [ WZ][ ZZ]Z(r) to denote an L2 projection residual of W(r) on Z(r).
II. PANEL UNIT ROOTS TESTS

Testing for unit roots in time series studies is now common practice among applied researchers and has become an integral part of econometric courses. However, testing for unit roots in panels is recent, see Levin & Lin (1992), Im, Pesaran & Shin (1997), Harris & Tzavalis (1999), Maddala & Wu (1999), Choi (1999a), and Hadri (1999). Exceptions are Bharagava et al. (1982), Boumahdi & Thomas (1991), Breitung & Meyer (1994), and Quah (1994). Bharagava et al. proposed a test for random walk residuals in a dynamic model with xed effects. They suggested a modied Durbin-Watson (DW) statistic based on xed effects residuals and two other test statistics based on differenced OLS residuals. In typical micro panels with N , they recommended their modied DW statistic. Boumahdi & Thomas (1991) proposed a generalization of the Dickey-Fuller (DF) test for unit roots in panel data to assess the efciency of the French capital market using 140 French stock prices over the
10
period January 1973 to February 1986. Breitung & Meyer (1994) applied various modied DF test statistics to test for unit roots in a panel of contracted wages negotiated at the rm and industry level for Western Germany over the period 19721987. Quah (1994) suggested a test for unit root in a panel data model without xed effects where both N and T go to innity at the same rate such that N/T is constant. Levin & Lin (1992) generalized this model to allow for xed effects, individual deterministic trends and heterogeneous serially correlated errors. They assumed that both N and T tend to innity. However, T increases at a faster rate than N with N/T 0. Even though this literature grew from time series and panel data, the way in which N, the number of crosssection units, and T, the length of the time series, tend to innity is crucial for determining asymptotic properties of estimators and tests proposed for nonstationary panels, see Phillips & Moon (1999a). Several approaches are possible including: (i) sequential limits where one index, say N, is xed and T is allowed to increase to innity, giving an intermediate limit. Then by letting N tend to innity subsequently, a sequential limit theory is obtained. Phillips & Moon (1999b) argued that these sequential limits are easy to derive and are helpful in extracting quick asymptotics. However, Phillips and Moon provided a simple example that illustrates how sequential limits can sometimes give misleading asymptotic results. (ii) A second approach, used by Quah (1994) and Levin & Lin (1992) is to allow the two indexes, N and T to pass to innity along a specic diagonal path in the two dimensional array. This path can be determined by a monotonically increasing functional relation of the type T = T(N) which applies as the index N . Phillips & Moon (1999b) showed that the limit theory obtained by this approach is dependent on the specic functional relation T = T(N) and the assumed expansion path may not provide an appropriate approximation for a given (T, N) situation. (iii) A third approach is a joint limit theory allowing both N and T to pass to innity simultaneously without placing specic diagonal path restrictions on the divergence. Some control over the relative rate of expansion may have to be exercised in order to get denitive results. Phillips & Moon argued that, in general, joint limit theory is more robust than either sequential limit or diagonal path limit. However, it is usually more difcult to derive and requires stronger conditions such as the existence of higher moments that will allow for uniformity in the convergence arguments. The muti-index asymptotic theory in Phillips & Moon (1999a, b) is applied to joint limits in which T, N and (T/N) , i.e. to situations where the time series sample is large relative to the cross-section sample. However, the general approach given there is also applicable to situations in which (T/ N) 0 although different limit results will generally obtain in that case.
11
A. Levin & Lin (1992) Tests Consider the model yit = iyit 1 + z iti + uit, i = 1, . . . , N; t = 1, . . . , T, (1) where zit is the deterministic component and uit is a stationary process. zit could be zero, one, the xed effects, i, or xed effect as well as a time trend, t. The Levin & Lin (LL) tests assume that uit are iid(0, 2 u) and i = for all i. LL are interested in testing the null hypothesis H0 : = 1 against the alternative hypothesis Ha : < 1. Let be the OLS estimator of in (1) and dene zt = (z1t, . . . , zNt), h(t, s) = z t (2)
u it = uit and
ztz t
zs,
t=1
h(t, s)uis,
s=1
y it = yit Then we have
T s=1
h(t, s)yis.
(3)
NT( 1) =
N 1 N

N T
i=1 N
y i, t 1u it y 2 i, t 1
t=1 T
i=1
1 T2
t=1
and the corresponding t-statistic, under the null hypothesis is given by

( 1)
t =

N T i=1 t=1
y 2 i, t 1
se
12
where
1 s = NT
2 e
N T i=1 t=1
u 2 it.
Assume that there exists a scaling matrix DT and piecewise continuous function Z(r) such that
1 z[Tr] Z(r) DT
uniformly for r[0, 1]. For a xed N, we have

1 N
N T i=1
1 T
y i, t 1u it
1 N
t=1
N i=1 N i=1
WiZ dWiZ
and
1 N

N T i=1
1 T2
2 i, t 1
t=1
1 N
W2 iZ
W2 iZ,
as T . Next we assume that WiZ dWiZ and W 2 iZ, are independent across i and have nite second moments. Then it follows that
1 N
1 N
N i=1
WiZ dWiZ E
W2 iZ E
i=1
WiZ dWiZ
N 0, Var
WiZ dWiZ
as N by a law of large numbers and the Lindeberg-Levy central limit theorem. The following moments are taken from Levin & Lin (1992): zit E[ WiZ dWiZ] Var[ WiZ dWiZ] 1 0 0 2 1 1 0 3 1 1 i 2 12 1 1 (i, t) 2 60 E[ W 2 iZ] 1 2 1 2 1 6 1 15 Var[ W2 iZ] 1 3 ?
1 45 11 6300
(4)
13
Using (4), Levin & Lin (1992) obtain the following limiting distributions of NT( 1) and t: zit 0 1 i NT( 1) N(0, 2) NT( 1) N(0, 2) t t N(0, 1) t N(0, 1) (5)
NT( 1) + 3N 0,
(i, t) N(T( 1) + 7.5) N 0,

51 5 2895 112
1.25t + 1.875N N(0, 1)
448 (t + 3.75N) N(0, 1) 277
Sequential limit theory, i.e. T followed by N , is used to derive the limiting distributions in (5). In case uit is stationary, the asymptotic distributions of and t need to be modied due to the presence of serial correlation. Harris & Tzavalis (1999) also derived unit root tests for (1) with zit = {0}, {i}, or {(i, t)} when the time dimension of the panel, T, is xed. This is the typical case for micro panel studies. The main results are: zit 0
N( 1) N 0,
N 1+
(i, t) N 1+
2 T(T 1)
3 3(17T 2 20T + 17) N 0, T+1 5(T 1)(T + 1)3
15 15(193T 2 728T + 1147) N 0, 2(T + 2) 112(T + 2)3(T 2)
Harris & Tzavalis (1999) also showed that the assumption that T tends to innity at a faster rate than N as in LL rather than T xed as in the case in micro panels, yields tests which are substantially undersized and have low power especially when T is small. Recently, Frankel & Rose (1996), Oh (1996), and Lothian (1996) tested the PPP hypothesis using panel data. All of these articles use LL tests and some of them report evidence supporting the PPP hypothesis. OConnell (1998),
14
however, showed that the LL tests suffered from signicant size distortion in the presence of correlation among contemporaneous cross-sectional error terms. OConnell highlighted the importance of controlling for cross-sectional dependence when testing for a unit root in panels of real exchange rates. He showed that, controlling for cross-sectional dependence, no evidence against the null of a random walk can be found in panels of up to 64 real exchange rates. Virtually all the existing nonstationary panel literature assume crosssectional independence. It is true that the assumption of independence across i is rather strong, but it is needed in order to satisfy the requirement of the Lindeberg-Levy central limit theorem. Moreover, as pointed out by Quah (1994), modeling cross-sectional dependence is involved because individual observations in a cross-section have no natural ordering. Driscoll & Kraay (1998) presented a simple extension of common nonparametric covariance matrix estimation techniques which yields standard errors that are robust to very general forms of spatial and temporal dependence as the time dimension becomes large. In a recent paper, Conley (1999) presented a spatial model of dependence among agents using a metric of economic distance that provides cross-sectional data with a structure similar to time-series data. Conley proposed a generalized method of moments (GMM) using such dependent data and a class of nonparametric covariance matrix estimators that allow for a general form of dependence characterized by economic distance. B. Im, Pesaran & Shin (1997) Tests The LL test is restrictive in the sense that it requires to be homogeneous across i. As Maddala (1999) pointed out, the null may be ne for testing convergence in growth among countries, but the alternative restricts every country to converge at the same rate. Im, Pesaran & Shin (1997) (IPS) allow for a heterogeneous coefcient of yit 1 and proposed an alternative testing procedure based on averaging individual unit root test statistics. IPS suggested an average of the augmented DF (ADF) tests when uit is serially correlated with different serial correlation properties across cross-sectional units, i.e. uit = i p j = 1 ijuit j + it. Substituting this uit in (1) we get: yit = iyit 1 + The null hypothesis is H0 : i = 1
pi j=1
ij yit j + z iti + it.
(6)
15
for all i and the alternative hypothesis is Ha : i < 1 for at least one i. The IPS t-bar statistic is dened as the average of the individual ADF statistics as
1 t= N
N i=1
ti,
(7)
where ti is the individual t-statistic of testing H0 : i = 1 in (6). It is known that for a xed N
ti

1 0 1
WiZ dWiZ
= tiT
1/2
(8)
W2 iZ
as T . IPS assume that tiT are iid and have nite mean and variance. Then
1 N N
N i=1
tiT E[tiT | i = 1]
Var[tiT | i = 1] N( t E[tiT | i = ]) Var[tiT | i = 1]
N(0, 1)
(9)
as N by the Lindeberg-Levy central limit theorem. Hence tIPS = N(0, 1) (10)
as T followed by N sequentially. The values of E[tiT | i = 1] and Var[tiT | i = 1] have been computed by IPS via simulations for different values of T and p is. In this volume, Breitung (2000) studies the local power of LL and IPS tests statistics against a sequence of local alternatives. Breitung nds that the LL and IPS tests suffer from a dramatic loss of power if individual specic trends are included. This is due to the bias correction that also removes the mean under the sequence of local alternatives. The simulation results indicate that the power of LL and IPS tests is very sensitive to the specication of the deterministic terms. McCoskey & Selden (1998) applied the IPS test for testing unit root for per capita national health care expenditures (HE) and gross domestic product (GDP) for a panel of OECD countries. McCoskey & Selden rejected the null
16
hypothesis that these two series contain unit roots. Gerdtham & Lthgren (1998) claimed that the stationarity found by McCoskey & Selden are driven by the omission of time trends in their ADF regression in (6). Using the IPS test with a time trend, Gerdtham & Lthgren found that both HE and GDP are nonstationary. They concluded that HE and GDP are cointegrated around linear trends following the results of McCoskey & Kao (1999b). C. Combining P-Values Tests Let GiTi be a unit root test statistic for the i-th group in (1) and assume that as Ti , GiTi Gi. Let pi be the p-value of a unit root test for cross-section i, i.e. pi = F(GiTi), where F() is the distribution function of the random variable Gi. Maddala & Wu (1999) and Choi (1999a) proposed a Fisher type test P=2
N i=1
ln pi
(11)
which combines the p-values from unit root tests for each cross-section i to test for unit root in panel data. P is distributed as 2 with 2N degrees of freedom as Ti for all N. Maddala et al. (1999) argued that the IPS and Fisher tests relax the restrictive assumption of the LL test that i is the same under the alternative. Both the IPS and Fisher tests combine information based on individual unit root tests. However, the Fisher test has the advantage over the IPS test in that it does not require a balanced panel. Also, the Fisher test can use different lag lengths in the individual ADF regressions and can be applied to any other unit root tests. The disadvantage is that the p-values have to be derived by Monte Carlo simulations. Choi (1999a) echoes similar advantages of the Fisher test: (1) the cross-sectional dimension, N, can be either nite or innite, (2) each group can have different types of nonstochastic and stochastic components, (3) the time series dimension, T, can be different for each i, and (4) the alternative hypothesis would allow some groups to have unit roots while others may not. When N is large, Choi (1999a) proposed a Z test,
1 N
Z=
N i=1
( 2 ln pi 2) 2
(12)
since E[ 2 ln pi] = 2 and Var[ 2 ln pi] = 4. Assume that the pis are iid and use the Lindeberg-Levy central limit theorem to get Z N(0, 1)
17
as Ti followed by N .3 Choi (1999a) applied the Z test in (12) and the IPS test in (7) to panel data of real exchange rates and provided evidence in favor of the PPP hypothesis. Choi claimed that this is due to the improved nite sample power of the Fisher test. Maddala & Wu (1999) and Maddala et al. (1999) nd that the Fisher test is superior to the IPS test, but they argue that these panel unit root tests still do not rescue the PPP hypothesis. When allowance is made for the deciency in the panel data unit root tests and panel estimation methods, support for PPP turns out to be weak. D. Residual Based LM Test Hadri (1999) proposed a residual based Lagrange Multiplier (LM) test for the null that the time series for each i are stationary around a deterministic trend against the alternative of a unit root in panel data. Consider the following model yit = z it + rit + it where zit is the deterministic component, rit is a random walk rit = rit 1 + uit uit ~ iid(0, ) and it is a stationary process. (13) can be written as
2 u
(13)
yit = z it + eit where eit =
(14)
t j=1
uij + it.
2 Let e it be the residuals from the regression in (14) and e be the estimate of the error variance. Also, let Sit be the partial sum process of the residuals, Sit = Then the LM statistic is
1 N
t j=1
e ij.
LM =

N T i=1
1 T2
S2 it
t=1
2 e
18
It can be shown that

p
LM E

W2 iZ N(0, 1)
as T followed by N provided E[ W 2 iZ] < . Also,

N(LM E[ W 2 iZ]) Var[ W 2 iZ]
as T followed by N . E. Finite Sample Properties of Unit Root Tests Extensive simulations have been conducted to explore the nite sample performance of panel unit root tests, e.g. Karlsson & Lthgren (1999), Im et. al. (1997), Maddala & Wu (1999), and Choi (1999a). Choi (1999a) studied the small sample properties of IPS t-bar test in (7) and Fishers test in (11). Chois major ndings are the following: (1) The empirical size of the IPS and the Fisher test are reasonably close to their nominal size 0.05 when N is small. But the Fisher test shows mild size distortions at N = 100, which is expected from the asymptotic theory. Overall, the IPS t-bar test has the most stable size. (2) In terms of the size-adjusted power, the Fisher test seems to be superior to the IPS t-bar test. (3) When a linear time trend is included in the model, the power of all tests decrease considerably.
III. SPURIOUS REGRESSION IN PANEL DATA

Entorf (1997) studied spurious xed effects regressions when the true model involves independent random walks with and without drifts. Entorf found that for T and N nite, the nonsense regression phenomenon holds for spurious xed effects models and inference based on t-values can be highly misleading. Kao (1999) and Phillips & Moon (1999a) derived the asymptotic distributions of the least squares dummy variable estimator and various conventional statistics from the spurious regression in panel data. Consider a spurious regression model for all i using panel data: yit = x it + z it + eit, where (15)
19
xit = xit 1 + it, and eit is I(1). The OLS estimator of is

N T

x it x it
1 N T i=1 t=1 i=1 t=1
x ity it ,
(16)
where y it is dened in (3) and x it = xit
T s=1 p
h(t, s)xis.
It is known that if a time-series regression for a given i is performed in model (15), the OLS estimator of is spurious. It is easy to see that
1 N

N T i=1
1 T2
x it x it E
t=1
WiZW iZ
and
1 N

N T i=1
1 T2
x ity it E WiZW iZ u
t=1
as by a sequential limit theory, where E[ WiZWiZ] 1 0 2 1 1 2 1 i Ik 2 1 (i, t) Ik 15 zit Then we have

p 1 u.
(17)
, is consistent for its true value, (17) shows that the OLS estimator of , 1 u. This is due to the fact that the noise, eit, is as strong as the signal, xit,
20
since both eit and xit are I(1). In the panel regression (15) with a large number of cross-sections, the strong noise of eit is attenuated by pooling the data and a consistent estimate of can be extracted. The asymptotics of the OLS estimator are very different from those of the spurious regression in pure time series. This has an important consequence for residual-based cointegration tests in panel data, because the null distribution of residual-based cointegration tests depends on the asymptotics of the OLS estimator. This point is explained further in the next section.
IV. PANEL COINTEGRATION TESTS

A. Kao Tests Kao (1999) presented two types of cointegration tests in panel data, the DF and ADF types tests. The DF type tests from Kao can be calculated from the estimated residuals in (15) as: e it = e it 1 + vit, (18) where e it = y it x it. In order to test the null hypothesis of no cointegration, the null can be written as H0 : = 1. The OLS estimate of and the t-statistic are given as:
and
( 1)
t =
1 where s = NT
2 e

N T i=1 t=2 N T
e ite it 1 e 2 it
i=1
t=2
e 2 it 1
i=1
t=2
tests by assuming zit = {i}: DF =

NT( 1) + 3N 10.2
N T i=1 t=2
se
( eit e it 1)2. Kao proposed the following four DF type
21
DFt = 1.25t + 1.875N,

NT( 1) +
DF* =
and
t +
3N 2 2 0 , 36 4 3+ 4 5 0 6N 2 0
DF* t=
1 and 1 2 where 2 = u u 0 = u u . While DF and DFt are based on the strong exogeneity of the regressors and errors, DF* and DF* t are for the cointegration with endogenous relationship between regressors and errors. For the ADF test, we can run the following regression: it 1 + e it = e
, 2 3 2 0 + 2 2 10 2 0
p j=1
je it j + itp.
(19)
With the null hypothesis of no cointegration, the ADF test statistics can be constructed as:
tADF + 6N 20
ADF =
2 3 2 0 + 2 2 10 2 0
where tADF is the t-statistic of in (19). The asymptotic distributions of DF, DFt, DF* , DF* t, and ADF converge to a standard normal distribution N(0, 1) by the sequential limit theory. B. Residual Based LM Test McCoskey & Kao (1998) derived a residual-based test for the null of cointegration rather than the null of no cointegration in panels. This test is an extension of the LM test and the locally best invariant (LBI) test for an MA unit root in the time series literature, see Harris & Inder (1994) and Shin (1994). Under the null, the asymptotics no longer depend on the asymptotic properties
22
of the estimating spurious regression, rather the asymptotics of the estimation of a cointegrated relationship are needed. For models which allow the cointegrating vector to change across the cross-sectional observations, the asymptotics depend merely on the time series results as each cross-section is estimated independently. For models with common slopes, the estimation is done jointly and therefore the asymptotic theory is based on the joint estimation of a cointegrated relationship in panel data. For the residual based test of the null of cointegration, it is necessary to use an efcient estimation technique of cointegrated variables. In the time series literature a variety of methods have been shown to be efcient asymptotically. These include the fully modied (FM) estimator of Phillips & Hansen (1990) and the dynamic least squares (DOLS) estimator as proposed by Saikkonen (1991) and Stock & Watson (1993). For panel data, Kao & Chiang (2000) showed that both the FM and DOLS methods can produce estimators which are asymptotically normally distributed with zero means. The model presented allows for varying slopes and intercepts: yit = i + x iti + eit, xit = xit 1 + it eit = it + uit, and it = it 1 + uit, where uit are i.i.d(0, 2 u). The null of hypothesis of cointegration is equivalent to = 0. The test statistic proposed by McCoskey & Kao (1998) is dened as follows:
1 N
(20) (21) (22)
LM =

N T i=1
1 T2
S2 it
t=1
2 e
(23)
where Sit, is partial sum process of the residuals, Sit =
t j=1
e ij
and 2 e is dened in McCoskey and Kao. The asymptotic result for the test is: N(LM ) N(0, 2 ). (24)
23
The moments, and 2 , can be found through Monte Carlo simulation. The limiting distribution of LM is then free of nuisance parameters and robust to heteroskedasticity. Urban economists have long sought to explain the relationship between urbanization levels and output. McCoskey & Kao (1999a) revisited this question and test the long run stability of a production function including urbanization using nonstationary panel data techniques. McCoskey and Kao applied the IPS test and LM in (23) and showed that a long run relationship between urbanization, output per worker and capital per worker cannot be rejected for the sample of thirty developing countries or the sample of twentytwo developed countries over the period 19651989. They do nd, however, that the sign and magnitude of the impact of urbanization varies considerably across the countries. These results offer new insights and potential for dynamic urban models rather than the simple cross-section approach. C. Pedroni Tests Pedroni (1997a) also proposed several tests for the null hypothesis of no cointegration in a panel data model that allows for considerable heterogeneity. His tests can be classied into two categories. The rst set is similar to the tests discussed above, and involve averaging test statistics for cointegration in the time series across cross-sections. The second set group the statistics such that instead of averaging across statistics, the averaging is done in pieces so that the limiting distributions are based on limits of piecewise numerator and denominator terms. The rst set of statistics as discussed includes a form of the average of the Phillips & Ouliaris (1990) statistic:
= Z

T N t=1 T i=1 2 e it 1 t=1
i) ( eit 1e it
, (25)
i = 1 ( where e it is estimated from (15), and 2 2 2 2 i s i ), for which i and s i are 2 individual long-run and contemporaneous variances respectively of the residual e it. For his second set of statistics, Pedroni denes four panel test statistics. Let i be a consistent estimate of i, the long-run variance-covariance matrix. i such that in the i to be the lower triangular Cholesky composition of Dene L
24
2 u is the long-run conditional variance. In 2 this survey we consider only one of these statistics:
22i = 11i = scalar case L and L 2 u

N
Zt =
NT

T i=1 t=2 N T
2 11i i) L ( eit 1e it
2 NT
2 2 11i L e it 1
i=1
t=2
(26)
where
1 NT = N
N i=1
2 i . 2 L 11i
It should be noted that Pedroni bases his test on the average of the numerator and denominator terms respectively, rather than the average for the statistics as a whole. Using results on convergence of functionals of Brownian motion, Pedroni nds the following result: Zt + 1.73N N(0, 0.93).
NT
Note that this distribution applies to the model including an intercept and not including a time trend. Asymptotic results for other model specications can be found in Pedroni (1997a). The convergence in distribution is based on individual convergence of the numerator and denominator terms. What is the intuition of rejection of the null hypothesis? Using the average of the overall test statistic allows more ease in interpretation: rejection of the null hypothesis means that enough of the individual cross-sections have statistics far away from the means predicted by theory were they to be generated under the null. Pedroni (1999) derived asymptotic distributions and critical values for several residual based tests of the null of no cointegration in panels where there are multiple regressors. The model includes regressions with individual specic xed effects and time trends. Considerable heterogeneity is allowed across individual members of the panel with regards to the associated cointegrating vectors and the dynamics of the underlying error process. Pedroni (1997b) showed that for test of the null of no cointegration, the appropriate weighting matrix of a GLS based estimator must be constructed using the long run conditional covariance matrix between individual members of the panel in order to eliminate nuisance parameters associated with member specic dynamics. Pedroni (1997b) found that the violation of cross-sectional independence does not appear to play a signicant role for the conclusions in
25
favor of weak long run PPP provided that one also includes common time dummies in the regression. Pedroni (2000) also demonstrated how it is possible to construct a test that can be employed to test whether or not members of a panel with heterogeneous short run dynamics converge to a single common steady state. D. Likelihood-Based Cointegration Test Larsson, Lyhagen & Lthgren (1998) presented a likelihood-based (LR) panel test of cointegrating rank in heterogeneous panel models based on the average of the individual rank trace statistics developed by Johansen (1995). The proposed LR-bar statistic is very similar to the IPS t-bar statistic in (7) through (10). In Monte Carlo simulation, Larsson et al. investigated the small sample properties of the standardized LR-bar statistic. They found that the proposed test requires a large time series dimension. Even if the panel has a large crosssectional dimension, the size of the test will be severely distorted. Groen & Kleibergen (1999) proposed a likelihood-based framework for cointegrating analysis in panels of a xed number of vector error correction models. Maximum likelihood estimators of the cointegrating vectors are constructed using iterated generalized method of moments (GMM) estimators. Using these estimators Groen and Kleibergen construct likelihood ratio statistics, LR(B|A), to test for a common cointegration rank across the individual vector error correction models, both with heterogeneous and homogeneous cointegrating vectors. Interestingly, the limiting distribution of LR(B|A) is invariant to the covariance matrix of the error terms which implies that LR(B|A) is robust with respect to the choices of covariance matrix. Let us dene the LRs(r|k) as the summation of the N individual trace statistics LRs(r | k) =
N i=1
LRi(r | k)
(27)
where LRi(r | k) is the i-th Johansens likelihood ratio statistic, so that LRi(r | k) tr
dBk r, iB k r, i
dBk r, iB k r, i
dBk r, iB k r, i
as T . Now for a xed N, it is clear that
26
LRs(r | k) =

N i=1 N
LRi(r | k)
tr
dBk r, iB k r, i
i=1
dBk r, iB k r, i
dBk r, iB k r, i
(28)
as T by a continuous mapping theorem. It follows that LRs(r | k) is asymptotically equivalent to LR(B | A) when N is xed and T is large. This means that nothing is lost by assuming that the covariance matrix has zero nondiagonal covariances as far as the asymptotics are concerned for the proposed test statistics in this chapter. More importantly, the tests based on the crossindependence like (27) will perform just as well (asymptotically) as the tests based on the cross-dependence such as LR(B | A). Groen and Kleibergen veried that the likelihood-based cointegration tests proposed by Larsson et al. in (27) are robust with respect to the cross-dependence in panel data. The (asymptotic) equivalence of LRs(r | k) and LR(B | A) found in Groen and Kleibergen has profound implications to econometricians and applied economists, e.g. there exists tests/estimators based on the cross-independence which are equivalent to tests/estimators based on the cross-dependence in nonstationary panel time series. Dene LR(r | k) to be the average of LRi(r | k):
1 1 LR(r | k) = LRs(r | k) = N N
N i=1
LRi(r | k).
It can be shown that

LR(r | k) E[LR(r | k)] Var[LR(r | k)]
N(0, 1)
as T followed by N by a continuous mapping theorem and a central limit theorem provided E[LR(r | k)] and Var[LR(r | k)] are bounded. Dene LR(B | A) = For a xed N, it is easy to show that
1 LR(B | A). N
(29)
27
LR(B | A) =
1 LR(B | A) N
N
1 N
= where Zki = tr as T . Then
1 N

tr dBk r, iB k r, i
i=1
dBk r, iB k r, i
dBk r, iB k r, i
Zki
i=1
dBk r, iB k r, i
N
N(0, 1) N 1 Var Zki N i=1 as N since Bk r, i and Bk r, j are independent for i j. It implies that
LR(B | A) E[LR(B | A)] Var[LR(B | A)]
1 N

dBk r, iB k r, i
N i=1
dBk r, iBk r, i
1 Zki E N
Zki
i=1
N(0, 1)
as T followed by N . The above discussion indicates that LR(r | k) and LR(B | A) are also equivalent when T and N are large. Groen & Kleibergen (1999) applied LR(B | A) to a data set of exchange rates and appropriate monetary fundamentals. They found strong evidence for the validity of the monetary exchange rate model within a panel of vector correction models for three major European countries, whereas the results based on individual vector error correction models for each of these countries separately are less supportive. E. Finite Sample Properties McCoskey & Kao (1999b) conducted Monte Carlo experiments to compare the size and power of different residual based tests for cointegration in
28
heterogeneous panel data: varying slopes and varying intercepts. Two of the tests are constructed under the null hypothesis of no cointegration. These tests are based on the average ADF test and Pedronis pooled tests in (25) and (26). The third test is based on the null hypothesis of cointegration which is based on the McCoskey & Kao LM test in (23). Wu & Yin (1999) performed a similar comparison for panel tests in which they consider only tests for which the null hypothesis is that of no cointegration. Wu & Yin compared ADF statistics with maximum eigenvalue statistics in pooling information on means and p-values respectively. They found that the average ADF performs better with respect to power and their maximum eigenvalue based p-value performs better with regards to size. The test of the null hypothesis was originally proposed in response to the low power of the tests of the null of no cointegration, especially in the time series case. Further, in cases where economic theory predicted a long run steady state relationship, it seemed that a test of the null of cointegration rather than the null of no cointegration would be appropriate. The results from the Monte Carlo study showed that the McCoskey & Kao LM test outperforms the other two tests. Of the two reasons for the introduction of the test of the null hypothesis of cointegration, low power and attractiveness of the null, the introduction of the cross-section dimension of the panel solves one: all of the tests show decent power when used with panel data. For those applications where the null of cointegration is more logical than the null of no cointegration, McCoskey & Kao (1999b), at a minimum, conclude that using the McCoskey & Kao LM test does not compromise the ability of the researcher in determining the underlying nature of the data. Recently, Hall et al. (1999) proposed a new approach based on principal components analysis to test for the number of common stochastic trends driving the nonstationary series in a panel data set. The test is consistent even if there is a mixture of I(0) and I(1) series in the sample. This makes it unnecessary to pretest the panel for unit root. It also has the advantage of solving the problem of dimensionality encountered in large panel data sets.
V. ESTIMATION AND INFERENCE IN PANEL COINTEGRATION MODELS

This section discusses the issues that arise in estimation and inference of cointegrated panel regression models. The asymptotic properties of the estimators of the regression coefcients and the associated statistical tests are different from those of the time series cointegration regression models. Some
29
of these differences have become apparent in recent works by Kao & Chiang (2000), Phillips & Moon (1999a) and Pedroni (1996). The panel cointegration models are directed at studying questions that surround long run economic relationships typically encountered in macroeconomic and nancial data. Such a long run relationship is often predicted by economic theory and it is then of central interest to estimate the regression coefcients and test whether they satisfy theoretical restrictions. Kao & Chen (1995) showed that the OLS in panel cointegrated models is asymptotically normal but still asymptotically biased. Chen, McCoskey & Kao (1999) investigated the nite sample proprieties of the OLS estimator, the t-statistic, the bias-corrected OLS estimator, and the bias-corrected t-statistic. They found that the bias-corrected OLS estimator does not improve over the OLS estimator in general. The results of Chen et al. suggested that alternatives, such as the fully modied (FM) estimator or dynamic OLS (DOLS) estimator may be more promising in cointegrated panel regressions. Phillips & Moon (1999a) and Pedroni (1996) proposed a FM estimator, which can be seen as a generalization of Phillips & Hansen (1990). In this volume, Kao & Chiang (2000) propose an alternative approach based on a panel dynamic least squares (DOLS) estimator, which builds upon the work of Saikkonen (1991) and Stock & Watson (1993). Next, we provide a brief discussion of the OLS estimation methods in a panel cointegrated model. Consider the following panel regression: yit = x it + z iti + uit, (30) where {yit} are 1 1, is a k 1 vector of the slope parameters, zit is the deterministic component, and {uit} are the stationary disturbance terms. We assume that {xit} are k 1 integrated processes of order one for all i, where xit = xit 1 + it. Under these specications, (30) describes a system of cointegrated regressions, i.e. yit is cointegrated with xit. The OLS estimator of is OLS = It is easy to show that
1 N

N T
x it x it
x ity it .
(31)
i=1
t=1
i=1
t=1

N T i=1
1 T2
t=1
1 x it x it lim N N
p
N i=1
E[2i],
(32)
and
30
1 N
N T i=1
1 T
t=1
1 x itu it lim N N
N i=1
E[1i]
(33)
using sequential limit theory, where zit 0 1 i (i, t) and i = E[1i] 0 0

1 ui + ui 2 1 ui + ui 2
E[2i] 1 2 0 1 i 6 1 i 15
ui ui
ui i
is the long-run covariance matrix of (uit, it), also i =
ui ui
ui is the onei
sided long-run covariance. For example, when zit = {i}, we get OLS ) NNT N 0, 6 NT(
N 1
where = lim
1 N
1 NT = N

i and
N i=1 T i=1

1 lim N N
i=1 1
1 u.i ,
1 T2
(xit x i)(xit x i)
t=1
1 N
1/2 i
1/2 i dW W ui + ui . i i
i=1
Kao & Chiang (2000) in this volume studied the limiting distributions for the FM, and DOLS estimators in a cointegrated regression and showed they are
31
asymptotically normal. Phillips & Moon (1999a) and Pedroni (1996) also obtained similar results for the FM estimator. The reader is referred to the cited papers for further details. Kao and Chiang also investigated the nite sample properties of the OLS, FM, and DOLS estimators. They found that: (i) the OLS estimator has a non-negligible bias in nite samples, (ii) the FM estimator does not improve over the OLS estimator in general, and (iii) the DOLS estimator may be more promising than OLS or FM estimators in estimating the cointegrated panel regressions. Choi (1999b) extended Kao & Chiang (2000) to study asymptotic properties of OLS, Within and GLS estimators for an error component model. The error component model involves both stationary and nonstationary regressors. Chois simulation results indicated that the feasible GLS estimator is more efcient than the Within estimator. Choi (1999c) studied instrumental variable estimation for an error component model with stationary and nearly nonstationary regressors. Phillips & Moon (1999a) studied various regressions between two panel vectors that may or may not have cointegrating relations, and present a fundamental framework for studying sequential and joint limit theories in nonstationary panel data. In particular, Phillips and Moon studied regression limit theory of nonstationary panels when both N and T go to innity. Their limit theory allows for both sequential limits, where T followed by N and joint limits, where T, N simultaneously. Phillips and Moon require that N/T 0, so that these results apply for moderate N and large T macro panel data and not large N and small T micro panel data. The panel models considered allow for four cases: (i) panel spurious regression, where there is no time series cointegration, (ii) heterogeneous panel cointegration, where each individual has its own specic cointegration relation, (iii) homogeneous panel cointegration where individuals have the same cointegration relation, and (iv) near-homogeneous panel cointegration, where individuals have slightly different cointegration relations determined by the value of a localizing parameter. Phillips & Moon (1999a) investigated these four models and developed panel asymptotics for regression coefcients and tests using both sequential and joint limit arguments. In all cases considered the pooled estimator is consistent and has a normal limiting distribution. In fact, for the spurious panel regression, Phillips & Moon (1999a) showed that under quite weak regularity conditions, the pooled least squares estimator of the slope coefcient is N consistent for the long-run average relation parameter and has a limiting normal distribution. Also, Moon & Phillips (1998a) showed that a limiting cross-section regression with time averaged data is also N consistent for and has a limiting normal distribution. This is different from
32
the pure time series spurious regression where the limit of the OLS estimator of is a nondegenerate random variate that is a functional of Brownian motions and is therefore not consistent for . The idea in Phillips & Moon (1999a) is that independent cross-section data in the panel adds information and this leads to a stronger overall signal than the pure time series case. Pesaran & Smith (1995) studied limiting cross-section regressions with time averaged data and established consistency with restrictive assumptions on the heterogeneous panel model. This differs from Phillips & Moon (1999a) in that the former use an average of the cointegrating coefcients which is different from the long run average regression coefcient. This requires the existence of cointegrating time series relations, whereas the long run average regression coefcient is dened irrespective of the existence of individual cointegrating relations and relies only on the long run average variance matrix of the panel. Phillips & Moon (1999a) also showed that for the homogeneous and near homogeneous cointegration cases, a consistent estimator of the long run regression coefcient can be constructed which they call a pooled FM estimator. They showed that this estimator has faster convergence rate than the simple cross-section and time series estimators. See also Phillips & Moon (1999b) for a concise review. In fact, the latter paper also shows how to extend the above ideas to models with individual effects in the data generating process. For the panel spurious regression with individual specic deterministic trends, estimates of the trend coefcients are obtained in the rst step and the detrended data is pooled and used in least squares regression to estimate in the second step. Two different detrending procedures are used based on OLS and GLS regressions. OLS detrending leads to an asymptotically more efcient estimator of the long run average coefcient in pooled regression than GLS detrending. Phillips & Moon (1999b) explain that the residuals after time series GLS detrending have more cross section variation than they do after OLS detrending and this produces great variation in the limit distribution of the pooled regression estimator of the long run average coefcient. Moon & Phillips (1999) investigate the asymptotic properties of the Gaussian MLE of the localizing parameter in local to unity dynamic panel regression models with deterministic and stochastic trends. Moon and Phillips nd that for the homogeneous trend model, the Gaussian MLE of the common localizing parameter is N consistent, while for the heterogeneous trends model, it is inconsistent. The latter inconsistency is due to the presence of an innite number of incidental parameters (as N ) for the individual trends. Unlike the xed effects dynamic panel data model where this inconsistency due to the incidental parameter problem disappears as T , the inconsistency of
33
the localizing parameter in the Moon and Phillips model persists even when both N and T go to innity. Pesaran, Shin & Smith (1999) derived the asymptotics of a pooled mean group (PMG) estimator. The PMG estimation constrains the long run coefcients to be identical, but allow the short run and adjustment coefcients as the error variances to differ across the cross-sectional dimension. Recently, Binder, Hsiao & Pesaran (2000) considered estimation and inference in panel vector autoregressions (PVARS) with xed effects when T is nite and N is large. A maximum likelihood estimator as well as unit root and cointegration tests are proposed based on a transformed likelihood function. This MLE is shown to be consistent and asymptotically normally distributed irrespective of the unit root and cointegrating properties of the PVAR model. The tests proposed are based on standard chi-square and normal distributed statistics. Binder et al. also show that the conventional GMM estimators based on standard orthogonality conditions break down if the underlying time series contain unit roots. Monte Carlo evidence is provided which favors MLE over GMM in small samples. In this volume, Kauppi (2000) develops a new joint limit theory where the panel data may be cross-sectionally heterogeneous in a general way. This limit theory builds upon the concepts of joint convergence in probability and in distribution for double indexed processes by Phillips & Moon (1999a) and develops new versions of the law of large numbers and the central limit theorem that apply in panels where the data may be cross-sectionally heterogeneous in a fairly general way. Kauppi demonstrates how this joint limit theory can be applied to derive asymptotics for a panel regression where the regressors are generated by a local to unit root with heterogeneous localizing coefcients across cross-sections. Kauppi discusses issues that arise in the estimation and inference of panel cointegrated regressions with near integrated regressors. Kauppi shows that a bias corrected pooled OLS for a common cointegrating parameter has an asymptotic normal distribution centered on the true value irrespective of whether the regressor has near or exact unit root. However, if the regression model contains individual effects and/or deterministic trends, then Kauppis bias corrected pooled OLS still produces asymptotic bias. Kauppi also shows that the panel FM estimator is subject to asymptotic bias regardless of how individual effects and/or deterministic trends are contained if the regressors are nearly rather than exacly integrated. This indicates that much care should be taken in interpreting empirical results achieved by the recent panel cointegration methods that assume exact unit roots when near unit roots are equally plausible.
34
Kao et al. (1999) apply the asymptotic theory of panel cointegration developed by Kao & Chiang (2000) to the Coe & Helpman (1995) international R&D spillover regression. Using a sample of 21 OECD countries and Israel, they re-examine the effects of domestic and foreign R&D capital stocks on total factor productivity of these countries. They nd that OLS with bias-correction, the fully modied (FM) and the dynamic OLS (DOLS) estimators produce different predictions about the impact of foreign R&D on total factor productivity (TFP). However, all the estimators support the result that domestic R&D is related to TFP. Kao et al.s empirical results indicate that the estimated coefcients in the Coe and Helpmans regressions are subject to estimation bias. Given the superiority of the DOLS over FM as suggested by Kao & Chiang (2000), Kao et al. leaned towards rejecting the Coe and Helpman hypothesis that international R&D spillovers are trade related. Funk (1998) examined the relationship between trade patterns and international R&D spillovers among the OECD countries using the panel cointegration methods developed by Kao (1999), Kao & Chiang (2000), and Pesaran, Shin & Smith (1999). Using randomly simulated bilateral trade patterns, Funk found that the choice of weights used in constructing foreign R&D stocks is informative of the avenue of spillover transmission when panel cointegration methods are employed. A re-examination of the relationship between import patterns and R&D spillovers found no evidence to link the patterns of R&D spillovers to the patterns of imports. Funk found strong evidence indicating that exporters receive substantial R&D spillovers from their customers.
VI. DYNAMIC PANEL DATA MODELS

This section surveys recent developments in dynamic panel data models. The dynamic panel data regression is characterized by two sources of persistence over time. Autocorrelation due to the presence of a lagged dependent variable among the regressors and individual effects characterizing the heterogeneity among the individuals yit = yi, t 1 + x it + i + uit (34) for i = 1, 2, . . . , N; and t = 1, 2, . . . , T. is a scalar, xit is k 1, i denotes the i-th individuals effect and uit is the remainder disturbance. Basic introductions to this topic are found in Hsiao (1986), Baltagi (1995) and Matyas & Sevestre (1996). Applications using this model are too many to enumerate. These include employment equations, see Arellano & Bond (1991), liquor demand, see Baltagi & Grifn (1995), growth convergence, see Islam (1995) and
35
Nerlove (1999), life cycle labor supply models, see Ziliak (1997), and demand for gasoline, see Baltagi & Grifn (1997) to mention a few. It is well known that for typical micro-panels where there are a large number of rms or individuals (N) observed over a short period of time (T), the xed effects (FE) estimator is biased and inconsistent (since T is xed and N ), see Nickell (1981) and more recently Kiviet (1995, 1999). Monte Carlo results have shown that rst order asymptotic properties do not necessarily yield correct inference in nite samples. Therefore, Kiviet (1995) examined higher order asymptotics which may approximate the actual nite sample properties more closely and lead to better inference. In fact, Kiviet (1995) considered the simple dynamic linear panel data model with serially uncorrelated disturbances and strongly exogenous regressors and derived an approximation for the bias of the FE estimator. When a consistent estimator of this bias is subtracted from the original FE estimator, a corrected FE estimator results. This corrected FE estimator performed well in simulations when compared with eight other consistent instrumental variable or GMM estimators.4 In macro-panels studying for example long run growth, the data covers a large number of countries N over a moderate size T. In this case, T is not very small relative to N. Hence, some researchers may still favor the FE estimator arguing that its bias may not be large. Judson & Owen (1999) performed some Monte Carlo experiments for N = 20 or 100 and T = 5, 10, 20 and 30 and found that the bias in the FE can be sizeable, even when T = 30. The bias of the FE estimator increases with and decreases with T. But even for T = 30, this bias could be as much as 20% of the true value of the coefcient of interest. Judson & Owen (1999) recommend the corrected FE estimator proposed by Kiviet (1995) as the best choice, GMM being second best and for long panels, the computationally simpler Anderson & Hsiao (1982) estimator. This last estimator rst differences the data to get rid of the individual effects and then uses lagged predetermined variables in levels as instruments.5 Arellano & Bond (1991) proposed GMM procedures that are more efcient than the Anderson & Hsiao (1982) estimator. Ahn & Schmidt (1995) derive additional nonlinear moment restrictions not exploited by the Arellano & Bond (1991) GMM estimator.6 Ahn & Schmidt (1995, 1997) also give a complete count of the set of orthogonality conditions corresponding to a variety of assumptions imposed on the disturbances and the initial conditions of the dynamic panel data model. While many of the moment conditions are nonlinear in the parameters, Ahn & Schmidt (1997) propose a linearized GMM estimator that is asymptotically as efcient as the nonlinear GMM estimator. They also provide simple moment tests of the validity of these nonlinear restrictions. In addition, they investigate the circumstances under which the optimal GMM estimator is equivalent to a
36
linear instrumental variable estimator. They nd that these circumstances are quite restrictive and go beyond uncorrelatedness and homoskedasticity of the errors. Ahn & Schmidt (1995) provide some evidence on the efciency gains from the nonlinear moment conditions which provide support for their use in practice. By employing all these conditions, the resulting GMM estimator is asymptotically efcient and has the same asymptotic variance as the MLE under normality. In fact, Hahn (1997) showed that GMM based on an increasing set of instruments as N would achieve the semiparametric efciency bound. Hahn (1997) considers the asymptotic efcient estimation of the dynamic panel data model with sequential moment restrictions in an environment with i.i.d. observations. Hahn (1997) shows that the GMM estimator with an increasing set of instruments as the sample size grows attains the semiparametric efciency bound of the model. Hahn (1997) explains how Fourier series or polynomials may be used as the set of instruments for efcient estimation. In a limited Monte Carlo comparison, Hahn nds that this estimator has similar nite sample properties as the Keane & Runkle (1992) and/or Schmidt et al. (1992) estimators when the latter estimators are efcient. In cases where the latter estimators are not efcient, the Hahn efcient estimator outperforms both estimators in nite samples. Recently, Wansbeek & Bekker (1996) considered a simple dynamic panel data model with no exogenous regressors and disturbances uit and random effects i that are independent and normally distributed. They derived an expression for the optimal instrumental variable estimator, i.e. one with minimal asymptotic variance. A striking result is the difference in efciency between the IV and ML estimators. They nd that for regions of the autoregressive parameter which are likely in practice, ML is superior. The gap between IV (or GMM) and ML can be narrowed down by adding moment restrictions of the type considered by Ahn & Schmidt (1995). Hence, Wansbeek & Bekker (1996) nd support for adding these nonlinear moment restrictions and warn against the loss in efciency as compared with MLE by ignoring them. Blundell & Bond (1998) revisit the importance of exploiting the initial condition in generating efcient estimators of the dynamic panel data model when T is small. They consider a simple autoregressive panel data model with no exogenous regressors yit = yi, t 1 + i + uit (35) with E(i) = 0; E(uit) = 0; and E(iuit) = 0 for i = 1, 2, . . . , N; t = 1, 2, . . . , T. Blundell & Bond (1998) focus on the case where T = 3 and therefore there is
37
only one orthogonality condition given by E(yi1ui3) = 0, so that is justidentied. In this case, the rst stage IV regression is obtained by running yi2 on yi1. Note that this regression can be obtained from (2) evaluated at t = 2 by subtracting yi1 from both sides of this equation, i.e. yi2 = ( 1)yi, 1 + i + ui2 Since we expect E(yi1i) > 0, ( 1) will be biased upwards with 1) = ( 1) plim(
c 2 c + (2 /u)
(36)
(37)
where c = (1 )/(1 + ). The bias term effectively scales the estimated coefcient on the instrumental variable yi1 towards zero. They also nd that the F-statistic of the rst stage IV regression converges to 2 1 with noncentrality parameter =
2 (2 uc) 0 as 1 2 2 + uc
(37)
As 0, the instrumental variable estimator performs poorly. Hence, Blundell and Bond attribute the bias and the poor precision of the rst difference GMM estimator due to the problem of weak instruments described in Nelson & Startz (1990) and Staiger & Stock (1997) and characterize this weak IV by its concentration parameter . Next, Blundell & Bond (1998) show that an additional mild stationarity restriction on the initial conditions process allows the use of an extended system GMM estimator that uses lagged differences of yit as instruments for equations in levels, in addition to lagged levels of yit as instruments for equations in rst differences, see Arellano & Bover (1995). The system GMM estimator is shown to have dramatic efciency gains over the basic rst 2 2 difference GMM as 1 and (2 /u) increases. In fact, for T = 4 and (/ 2 u) = 1, the asymptotic variance ratio of the rst difference GMM estimator to this system GMM estimator is 1.75 for = 0 and increases to 3.26 for = 0.5 and 55.4 for = 0.9. This clearly demonstrates that the levels restrictions suggested by Arellano & Bover (1995) remain informative in cases where rst differenced instruments become weak. Things improve for rst difference GMM as T increases. However, with short T and persistent series, the Blundell and Bond ndings support the use of the extra moment conditions. These results are reviewed and corroborated in Blundell, Bond & Windmeijer (2000) in this volume, using Monte Carlo experiments as well as an empirical example. In fact, simulations that include the weakly exogenous covariates nd large nite sample bias and very low precision for the standard rst differenced
38
estimator. However, the system GMM estimator not only improves the precision but also reduces the nite sample bias. The empirical application revisits the estimates of the capital and labor coefcients in a Cobb-Douglas production function considered by Griliches & Mairesse (1998). Using data on 509 R&D performing US manufacturing companies observed over 8 years (19821989), the standard GMM estimator that uses moment conditions on the rst differenced model nds a low estimate of the capital coefcient and low precision for all coefcients estimated. However, the system GMM estimator gives reasonable and more precise estimates of the capital coefcient and constant returns to scale is not rejected. Blundell et al. conclude that a careful examination of the original series and consideration of the system GMM estimator can usefully overcome many of the disappointing features of the standard GMM estimator for dynamic panel models. Hahn (1999) also examines the role of the initial condition imposed by the Blundell & Bond (1998) estimator. This is done by numerically comparing the semiparametric information bounds for the case that incorporates the stationarity of the initial condition and the case that does not. Hahn (1999) nds that the efciency gain can be substantial. Ziliak (1997) asks the question whether the bias/efciency trade-off for the GMM estimator considered by Tauchen (1986) for the time series case is still binding in panel data where the sample size is normally larger than 500. For time series data, Tauchen (1986) shows that even for T = 50 or 75 there is a bias/ efciency trade-off as the number of moment conditions increase. Therefore, Tauchen recommends the use of sub-optimal instruments in small samples. This result was also corroborated by Andersen & Sorensen (1996) who argue that GMM using too few moment conditions is just as bad as GMM using too many moment conditions. This problem becomes more pronounced with panel data since the number of moment conditions increase dramatically as the number of strictly exogenous variables and the number of time series observations increase. Even though it is desirable from an asymptotic efciency point of view to include as many moment conditions as possible, it may be infeasible or impractical to do so in many cases. For example, for T = 10 and ve strictly exogenous regressors, this generates 500 moment conditions for GMM. Ziliak (1997) performs an extensive set of Monte Carlo experiments for a dynamic panel data model and nds that the same trade-off between bias and efciency exists for GMM as the number of moment conditions increase, and that one is better off with sub-optimal instruments. In fact, Ziliak nds that GMM performs well with suboptimal instruments, but is not recommended for panel data applications when all the moments are exploited for estimation.7 Ziliak estimates a life cycle labor supply model under uncertainty based on 532
39
men observed over 10 years of data (19781987) from the panel study of income dynamics. The sample was restricted to continuously married, continuously working prime age men aged 2251 in 1978. These men were paid an hourly wage or salaried and could not be piece-rate workers or selfemployed. Ziliak nds that the downward bias of GMM is quite severe as the number of moment conditions expands, outweighing the gains in efciency. Ziliak reports estimates of the intertemporal substitution elasticity which is the focal point of interest in the labor supply literature. This measures the intertemporal changes in hours of work due to an anticipated change in the real wage. For GMM, this estimate changes from 0.519 to 0.093 when the number of moment conditions used in GMM are increased from 9 to 212. The standard error of this estimate drops from 0.36 to 0.07. Ziliak attributes this bias to the correlation between the sample moments used in estimation and the estimated weight matrix. Interestingly, Ziliak nds that the forward lter 2SLS estimator proposed by Keane & Runkle (1992) performs best in terms of the bias/ efciency trade-off and is recommended. Forward ltering eliminates all forms of serial correlation while still maintaining orthogonality with the initial instrument set. Schmidt, Ahn & Wyhowski (1992) argued that ltering is irrelevant if one exploits all sample moments during estimation. However, in practice, the number of moment conditions increases with the number of time periods T and the number of regressors K and can become computationally intractable. In fact for T = 15 and K = 10, the number of moment conditions for Schmidt, et al. (1992) is T(T1)K/2 which is 1040 restrictions, highlighting the computational burden of this approach. In addition, Ziliak argues that the overidentifying restrictions are less likely to be satised possibly due to the weak correlation between the instruments and the endogenous regressors.8 In this case, the forward lter 2SLS estimator is desirable yielding less bias than GMM and sizeable gains in efciency. In fact, for the life cycle labor example, the forward lter 2SLS estimate of the intertemporal substitution elasticity was 0.135 for 9 moment conditions compared to 0.296 for 212 moment conditions. The standard error of these estimates dropped from 0.32 to 0.09. The practical problem of not being able to use more moment conditions as well as the statistical problem of the trade-off between small sample bias and efciency prompted Ahn & Schmidt (1999a) to pose the following questions: Under what conditions can we use a smaller set of moment conditions without incurring any loss of asymptotic efciency? In other words, under what conditions are some moment conditions redundant in the sense that utilizing them does not improve efciency? These questions were rst dealt with by Im, Ahn, Schmidt & Wooldridge (1999) who considered panel data models with strictly exogenous explanatory variables. They argued that, for example, with
40
ten strictly exogenous time-varying variables and six time periods, the moment conditions available for the random effects (RE) model is 360 and this reduces to 300 moment conditions for the FE model. GMM utilizing all these moment conditions leads to an efcient estimator. However, these moment conditions exceed what the simple RE and FE estimators use. Im et al. (1999) provide the assumptions under which this efcient GMM estimator reduces to the simpler FE or RE estimator. In other words, Im et al. (1999) show the redundancy of the moment conditions that these simple estimators do not use. Ahn & Schmidt (1999a) provide a more systematic method by which redundant instruments can be found and generalize this result to models with time-varying individual effects. However, both papers deal only with strictly exogenous regressors. In a related paper, Ahn & Schmidt (1999b) consider the cases of strictly and weakly exogenous regressors. They show that the GMM estimator takes the form of an instrumental variables estimator if the assumption of no conditional heteroskedasticity (NCH) holds. Under this assumption, the efciency of standard estimators can often be established showing that the moment conditions not utilized by these estimators are redundant. However, Ahn & Schmidt (1999b) conclude that the NCH assumption necessarily fails if the full set of moment conditions for the dynamic panel data model are used. In this case, there is clearly a need to nd modied versions of GMM, with reduced set of moment conditions that lead to estimates with reasonable nite sample properties. Crepon, Kramarz & Trognon (1997) argue that for the dynamic panel data model, when one considers a set of orthogonal conditions, the parameters can be divided into parameters of interest (like ) and nuisance parameters (like the second order terms in the autoregressive error component model). They show that the elimination of such nuisance parameters using their empirical counterparts does not entail an efciency loss when only the parameters of interest are estimated. In fact, Sevestre and Trognon in chapter 6 of Matyas & Sevestre (1996) argue that if one is only interested in , then one can reduce the number of orthogonality restrictions without loss in efciency as far as is concerned. However, the estimates of the other nuisance parameters are not generally as efcient as those obtained from the full set of orthogonality conditions. The Alonso-Borrego & Arellano (1999) paper is also motivated by the nite sample bias in panel data instrumental variable estimators when the instruments are weak. The dynamic panel model generates many overidentifying restrictions even for moderate values of T. Also, the number of instruments increases with T, but the quality of these instruments is often poor because they tend to be only weakly correlated with rst differenced
41
endogenous variables that appear in the equation. Limited information maximum likelihood (LIML) is strongly preferred to 2SLS if the number of instruments gets large as the sample size tends to innity. Hillier (1990) showed that the alternative normalization rules adopted by LIML and 2SLS are at the root of their different sampling behavior. Hillier (1990) also showed that a symmetrically normalized 2SLS estimator has properties similar to those of LIML. Following Hillier (1990), Alonso-Borrego & Arellano (1999) derive a symmetrically normalized GMM (SNM) and compare it with ordinary GMM and LIML analogues by means of simulations. Monte Carlo and empirical results show that GMM can exhibit large biases when the instruments are poor, while LIML and SNM remain unbiased. However, LIML and SNM always had 2 a larger interquartile range than GMM. For T = 4, N = 100, 2 = 0.2 and = 1, the bias for = 0.5 was 6.9% for GMM, 1.7% for SNM and 1.7% for LIML. This bias increases to 17.8% for GMM, 3.7% for SNM and 4.1% for LIML for = 0.8. Alvarez & Arellano (1997) studied the asymptotic properties of FE, one-step GMM and non-robust LIML for a rst-order autorgressive model when both N and T tend to innity with (N/T) c for 0 c < 2. For T < N, GMM bias is always smaller than FE and LIML bias is smaller than the other two. In xed T framework, GMM and LIML are asymptotically equivalent, but as T increases, LIML has a smaller asymptotic bias than GMM. These results provide some theoretical support for LIML over GMM.9 Wansbeek & Knaap (1999) consider a simple dynamic panel data model with a time trend and heterogeneous coefcients on the lagged dependent variable and the time trend, i.e. yit = iyi, t1 + it + i + uit (39)
This model results from Islams (1995) version of Solows model on growth convergence among countries. Wansbeek & Knaap (1999) show that double differencing gets rid of the individual country effects (i) on the rst round of differencing and the heterogeneous coefcient on the time trend (i) on the second round of differencing. Modied OLS, IV and GMM methods are adapted to this model and LIML is suggested as a viable alternative to GMM to guard against the small sample bias of GMM. Macroeconomic data are subject to measurement error and Wansbeek & Knaap (1999) show how these estimators can be modied to account for measurement error that is white noise. For example, GMM is modied so that it discards the orthogonality conditions that rely on the absence of measurement error. Jimenez-Martin (1998) performs Monte Carlo experiments to study the performance of the Holtz-Eakin (1988) test for the presence of individual
42
heterogeneity effects in dynamic small T unbalanced panel data models. The design of the experiment includes both endogenous and time-invariant regressors in addition to the lagged dependent variable. The test behaves correctly for a moderate autoregressive coefcient. However, when this coefcient approaches unity, the presence of an additional regressor sharply affects the power and the size of the test. The power of this test is higher when the variance of the specic effects increases (they are easier to detect), when the sample size increases, when the data set is balanced (for a given number of cross-section units) and when the regressors are strictly exogenous. A. Heterogeneous Dynamic Panel Data Models The fundamental assumption underlying pooled homogeneous parameter models has been called into question. Robertson & Symons (1992) warned about the bias from pooled estimators when the estimated model is dynamic and homogeneous when in fact the true model is static and heterogeneous. Pesaran & Smith (1995) argued in favor of dynamic heterogeneous models when N and T are large. In this case, pooled homogeneous estimators are inconsistent whereas an average estimator of heterogeneous parameters can lead to consistent estimates as N and T tend to innity. Maddala, Srivastava & Li (1994) argued that shrinkage estimators are superior to either heterogeneous or homogeneous parameter estimates especially for prediction purposes. In fact, Maddala, Trost, Li & Joutz (1997) considered the problem of estimating short run and long run elasticities of residential demand for electricity and natural gas for each of 49 states over the period 19701990.10 They conclude that individual heterogeneous state estimates were hard to interpret and had the wrong signs. Pooled data regressions were not valid because the hypothesis of homogeneity of the coefcients was rejected. They recommend shrinkage estimators if one is interested in obtaining elasticity estimates for each state since these give more reliable results. Baltagi & Grifn (1997) compare short run and long run estimates as well as forecasts for pooled homogeneous, individual heterogeneous and shrinkage estimators of a dynamic demand model for gasoline across 18 OECD countries over the period 19601990. Based on one, ve and ten year forecasts and plausibility of the short run and long run elasticity estimates, the results are in favor of pooling. Similar results were obtained for a dynamic model for cigarette demand across 46 states over the period 19631992, see Baltagi, Grifn & Xiong (2000). In chapter 8 of Matyas & Sevestre (1996), Pesaran, Smith & Im investigated the small sample properties of various estimators of the long run coefcients
43
for a dynamic heterogeneous panel data model using Monte Carlo experiments. Their ndings indicate that the mean group estimator performs reasonably well for large T. However, when T is small, the mean group estimator could be seriously biased, particularly when N is large relative to T. Pesaran & Zhao (1999) examine the effectiveness of alternative bias-correction procedures in reducing the small sample bias of these estimators using Monte Carlo experiments. An interesting nding is that when the coefcient of the lagged dependent variable is greater than or equal to 0.8, none of the bias correction procedures seem to work. Hsiao, Pesaran & Tahmiscioglu (1999) suggest a Bayesian approach for estimating the mean parameters of a dynamic heterogeneous panel data model. The coefcients are assumed to be normally distributed across cross-sectional units and the Bayes estimator is implemented using Markov Chain Monte Carlo methods. Hsiao et al. argue that Bayesian methods can be a viable alternative in the estimation of mean coefcients in dynamic panel data models even when the initial observations are treated as xed constants. They establish the asymptotic equivalence of this Bayes estimator and the mean group estimator proposed by Pesaran & Smith (1995). The asymptotics are carried out for both N and T with N/T 0. Monte Carlo experiments show that this Bayes estimator has better sampling properties than other estimators for both small and moderate size T. Hsiao et al. also caution against the use of the mean group estimator unless T is sufciently large relative to N. The bias in the mean coefcient of the lagged dependent variable appears to be serious when T is small and the true value of this coefcient is larger than 0.6. Hsiao et al. apply their methods to estimate the q-investment model using a panel of 273 US rms over the period 19721993.
VII. CONCLUSION
This survey gives a brief overview of some of the main results in the econometrics of nonstationary panels as well as recent developments in dynamic panels. There has been an immense amount of research in this area recently with the demand for empirical studies exceeding the supply of econometric theory developed for these models. As this survey indicates, several issues have been resolved but a lot remains to be done.
ACKNOWLEDGMENTS
The authors would like to thank R. Carter Hill, M. H. Pesaran and an anonymous referee for their helpful comments and suggestions. Baltagi was
44
funded by the Advanced Research Program, Texas Higher Education Board. Kao was supported by a grant from the Chiang Ching-kou Foundation for International Scholarly Exchange.
NOTES
1. A collection of dynamic panel data routines can be found in: http://www.cem.es/ ~ arellano/#dpd. 2. Chiang & Kao (2000) have recently put together a fairly comprehensive set of subroutines, NPT 1.0, for studying nonstationary panel data. NPT 1.0 can be downloaded from http://web.syr.edu/ ~ cdkao. 3. Testing for cointegration in panel data by combining p-values tests is a straightforward extension of the testing procedures in this section. For cointegration tests, the relevant model is equation (15). We let GiTi be a test for the null of no cointegration and apply the same tests and asymptotic theory in this section. 4. Kiviet (1999) extends this derivation to the case of weakly exogenous variables and examines to what degree this order of approximation is determined by the initial conditions of the dynamic panel model. 5. Arellano (1989) found that using lagged differences of predetermined variables as instruments is not recommended since it has a singularity point and very large variances over a signicant range of the parameter values. 6. See also Arellano & Bover (1995), chapter 8 of Baltagi (1995) and chapters 6 and 7 of Matyas & Sevestre (1996) for more details. 7. For a Hausman & Taylor (1981) type model, Metcalf (1996) shows that using less instruments may lead to a more powerful Hausman specication test. Asymptotically, more instruments lead to more efcient estimators. However, the asymptotic bias of the less efcient estimator will also be greater as the null hypothesis of no correlation is violated. Metcalf argues that if the bias increases at the same rate as the variance (as the null is violated) for the less efcient estimator, then the power of the Hausman test will increase. This is due to the fact that the test statistic is linear in variance but quadratic in bias. 8. See the growing literature on weak instruments by Nelson & Startz (1990), Bekker (1994), Angrist & Kreuger (1995), Bound, Jaeger & Baker (1995) and Staiger & Stock (1997) to mention a few. 9. An alternative one-step method that achieves the same asymptotic efciency as robust GMM or LIML estimators is the maximum empirical likelihood estimation method, see Imbens (1997). This maximizes a multinomial pseudo-likelihood function subject to the orthogonality restrictions. These are invariant to normalization because they are maximum likelihood estimators. 10. Maddala et al. (1997) also provide a unied treatment of classical, Bayes and empirical Bayes procedures for estimating this model.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efcient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 527.
45
Ahn, S. C., & Schmidt, P. (1997). Efcient Estimation of Dynamic Panel Data Models: Alternative Assumptions and Simplied Estimation. Journal of Econometrics, 76, 309321. Ahn, S. C., & Schmidt, P. (1999a). Modied Generalized Instrumental Variables Estimation of Panel Data Models with Strictly Exogenous Instrumental Variables. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 171198). Cambridge: Cambridge University Press. Ahn, S. C., & P. Schmidt. (1999b). Estimation of Linear Panel Data Models Using GMM. In: Generalized Method of Moments Estimation (pp. 211247). Cambridge: Cambridge University Press. Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalized Instrumental Variable Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 3649. Alvarez, J., & Arellano, M. (1997). The Time Series and Cross-section Asymptotics of Dynamic Panel Data Estimators. Working paper, CEMFI, Madrid. Andersen, T. G., & Srensen, R. E. (1996). GMM Estimation of a Stochastic Volatility Model: A Monte Carlo Study. Journal of Business and Economic Statistics, 14, 328352. Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics, 18, 4782. Andersson, J., & Lyhagen, J. (1999). A Long Memory Panel Unit Root Test: PPP Revisited. Working paper, Economics and Finance, No. 303, Stockholm School of Economics, Sweden. Angrist, J. D., & Krueger, A. B. (1995). Split Sample Instrumental Variable Estimates of Return to Schooling. Journal of Business and Economic Statistics, 13, 225235. Arellano, M. (1989). A Note on the Anderson-Hsiao Estimator for Panel Data. Economics Letters, 31, 337341. Arellano, M., & Bond, S. (1991). Some Tests of Specication for Panel Data: Monte Carlo Evidence and An Application to Employment Equations. Review of Economic Studies, 58, 277297. Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variables Estimation of ErrorComponent Models. Journal of Econometrics, 68, 2951. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley. Baltagi, B. H., & Grifn, J. M. (1995). A Dynamic Demand Model for Liquor: The Case for Pooling. Review of Economics and Statistics, 77, 545553. Baltagi, B. H., & Grifn, J. M. (1997). Pooled Estimators v.s. Their Heterogeneous Counterparts in the Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303327. Baltagi, B. H., Grifn, J. M. & Xiong, W. (2000). To Pool or Not to Pool: Homogeneous Versus Heterogeneous Estimators Applied to Cigarette Demand. Review of Economics and Statistics, 82, 117126. Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of Economics and Statistics, 61, 607629. Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variables Estimators. Econometrica, 62, 657682. Bernard, A., & Jones, C. (1996). Productivity Across Industries and Countries: Time Series Theory and Evidence. Review of Economics and Statistics, 78, 135146. Bhargava, A., Franzini, L. & Narendranathan, W. (1982). Serial Correlation and Fixed Effects Models. Review of Economic Studies, 49, 533549. Binder, M., Hsiao, C. & Pesaran, M. H. (2000). Estimation and Inference in Short Panel Vector Autoregressions With Unit Roots and Cointegration. Working paper, Department of Economics, University of Maryland.
46
Blundell, R. W., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115143. Blundell, R. W., Bond, S., & Windmeijer, F. (2000). Estimation in Dynamic Panel Data Models: Impoving on the Performance of the Standard GMM Estimator. Advances in Econometrics, 15, forthcoming. Boumahdi, R., & Thomas, A. (1991). Testing for Unit Roots Using Panel Data. Economics Letters, 37, 7779. Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variables is Weak. Journal of the American Statistical Association, 90, 443450. Breitung, J. (2000). The Local Power of Some Unit Root Tests for Panel Data. Advances in Econometrics, 15, forthcoming. Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different Bargaining Levels Cointegrated? Applied Economics, 26, 353361. Canzoneri, M. B., Cumby, E. E., & Diba, B. (1999). Relative Labor Productivity and the Real Exchange Rate in the Long Run: Evidence for a Panel of OECD Countries. Journal of International Economics, 47, 245266. Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management Sciences, 19, 75114. Chiang, M. H., & Kao, C. (2000). Nonstationary Panel Time Series Using NPT 1.0 A User Guide. Manuscript, Center for Policy Research, Syracuse University. Choi, I. (1999a). Unit Root Tests for Panel Data. Working paper, Department of Economics, Kookmin University, Korea. Choi, I. (1999b). Asymptotic Analysis of a Nonstationary Error Component Model. Working paper, Department of Economics, Kookmin University, Korea. Choi, I. (1999c). Instrumental Variables Estimation of a Nearly Nonstationary Error Component Model. Working paper, Department of Economics, Kookmin University, Korea. Coakley, J., & Fuertes, A. M. (1997). New Panel Unit Root Tests of PPP. Economics Letters, 57, 1722. Coakely, J., Kulasi, F., & Smith, R. (1996). Current Account Solvency and the Feldstein-Horioka Puzzle. Economic Journal, 106, 620627. Coe, D., & Helpman, E. (1995). International R&D Spillovers. European Economic Review, 39, 859887. Conley, T. G. (1999). GMM Estimation with Cross Sectional Dependence. Journal of Econometrics, 92, 145. Crepon, B., Kramarz, F., & Trognon, A. (1997). Parameters of Interest, Nuisance Parameters and Orthogonality Conditions: An Application to Autoregressive Error Components Models. Journal of Econometrics, 82, 135156. Culver, S. E., & Papell, D. H. (1997). Is There a Unit Root in the Ination Rate? Evidence from Sequential Break and Panel Data Model. Journal of Applied Econometrics, 35, 155160. Driscoll, J. C., & Kraay, A. C. (1998). Consistent Covariance Matrix Estimation with Spatially Dependent Panel Data. Review of Economics and Statistics, 80, 549560. Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37, 249265. Entorf, H. (1997). Random Walks with Drifts: Nonsense Regression and Spurious Fixed-Effect Estimation. Journal of Econometrics, 80, 287296.
47
Frankel, J. A., & Rose, A. K. (1996). A Panel Project on Purchasing Power Parity: Mean Reversion Within and Between Countries. Journal of International Economics, 40, 209224. Funk, M. (1998). Trade and International R&D Spillovers Among OECD Countries. Working paper, Department of Economics, St. Louis University, St. Louis. Gerdtham, U. G., & Lthgren, M. (1998). On Stationarity and Cointegration of International Health Expenditure and GDP. Working paper, Economics and Finance, No. 232, Stockholm School of Economics, Sweden. Griliches, Z., & Mairesse, J. (1998). Production Functions: The Search for Identication. In: S. Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series, Cambridge: Cambridge University Press. Groen, J. J. J. (1999). The Monetary Exchange Rate Model as A Long-run Phenomenon. Journal of International Economics, forthcoming. Groen, J. J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of Vector Error Correction Models. Discussion paper 99055/4, Tinbergen Institute, The Netherlands. Hadri, K. (1999). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root in Panel Data with Serially Correlated Errors. Manuscript, Department of Economics and Accounting, University of Liverpool, United Kingdom. Hahn, J. (1997). Efcient Estimation of Panel Data Models With Sequential Moment Restrictions. Journal of Econometrics, 79, 121. Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed Effects? Journal of Econometrics, 93, 309326. Hall, S., Lazarova, S., & Urga, G. (1999). A Principal Components Analysis of Common Stochastic Trends in Heterogeneous Panel Data: Some Monte Carlo Evidence. Oxford Bulletin of Economics and Statistics, 61, 749767. Harris, D., & Inder, B. (1994). A Test of the Null Hypothesis of Cointegration. In: C. P. Hargreaves (Ed.), Nonstationary Time Series Analysis and Cointegration. New York: Oxford University Press. Harris, R. D. F., & Tzavalis, E. (1999). Inference for Unit Roots in Dynamic Panels Where the Time Dimension is Fixed. Journal of Econometrics, 91, 201226. Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects. Econometrica, 49, 13771398. Hillier, G. H. (1990). On the Normalization of Structural Equations: Properties of Direction Estimators. Econometrica, 58, 11811194. Holtz-Eakin, D. (1988). Testing for Individual Effects in Autoregressive Models. Journal of Econometrics, 39, 297307. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Hsiao, C., Pesaran, M. H., & Tahmiscioglu, K. (1999). Bayes Estimation of Short-run Coefcients in Dynamic Panel Data Models. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 268296). Cambridge: Cambridge University Press. Im, K. S., Ahn, S. C., Schmidt, P., & Wooldridge, J. M. (1999). Efcient Estimation of Panel Data Models with Strictly Exogenous Explanatory Variables. Journal of Econometrics, 93, 177201. Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels. Manuscript, Department of Applied Economics, University of Cambridge, United Kingdom.
48
Imbens, G. (1997). One-Step Estimators for Over-identied Generalized Method of Moments Models. Review of Economic Studies, 64, 359383. Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, 110, 11271170. Jimenez-Martin, S. (1998). On the Testing of Heterogeneity Effects in Dynamic Unbalanced Panel Data Models. Economics Letters, 58, 157163. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Jorion, P., & Sweeney, R. (1996). Mean Reversion is Real Exchange Rates: Evidence and Implications for Forecasting. Journal of International Money and Finance, 15, 535550. Judson, R. A., & Owen, A. L. (1999). Estimating Dynamic Panel Data Models: A Guide for Macroeconomists. Economics Letters, 65, 915. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 144. Kao, C., & Chiang, M. H. (2000). On the Estimation and Inference of a Cointegrated Regression in Panel Data. Advances in Econometrics, 15, forthcoming. Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data when the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center for Policy Research, Syracuse University. Kao, C., Chiang, M. H., & Chen, B. (1999). International R&D Spillovers: An Application of Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and Statistics, 61, 691709. Karlsson, S., & Lthgren, M. (1999). On the Power and Interpretation of Panel Unit Root Tests. Working paper, Economics and Finance, No. 299, Stockholm School of Economics, Sweden. Kauppi, H. (2000). Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression with Near Integrated Regressors. Advances in Econometrics, 15, forthcoming. Keane, M. P., & Runkle, D. E. (1992). On the Estimation of Panel-data Models with Serial Correlation When Instruments are Not Strictly Exogenous. Journal of Business and Economics Statistics, 10, 19. Kiviet, J. F. (1995). On Bias, Inconsistency and Efciency of Some Estimators in Dynamic Panel Data Models. Journal of Econometrics, 68, 5378. Kiviet, J. F. (1999). Expectations of Expansions for Estimators in a Dynamic Panel Data Model: Some Results for Weakly Exogenous Regressors. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 199225). Cambridge: Cambridge University Press. Larsson, R., Lyhagen, J., & Lthgren, M. (1998). Likelihood-Based Cointegration Tests In Heterogeneous Panels. Working paper, Economics and Finance, No. 250, Stockholm School of Economics, Sweden. Lee, K., Pesaran, M. H., & Smith, R. (1997). Growth and Convergence in a Multi-Country Empirical Stochastic Solow Model. Journal of Applied Econometrics, 12, 357392. Levin, A., & Lin, C. F. (1992). Unit Root Test in Panel Data: Asymptotic and Finite Sample Properties. Discussion paper No. 9293, University of California at San Diego. Lothian, J. R. (1996). Multi-Country Evidence on the Behavior of Purchasing Power Parity Under the Current Float. Journal of International Money and Finance, 16, 1935. MacDonald, R. (1996). Panel Unit Root Tests and Real Exchange Rates Economics Letters, 50, 711.
49
Maddala, G. S. (1999). On the Use of Panel Data Methods with Cross Country Data. Annales dEconomie et de Statistique, 5556, 429448. Maddala, G. S., Srivastava, V. K., & Li, H. (1994). Shrinkage Estimators for the Estimation of Short-run and Long-run Parameters From Panel Data Models. Working paper, Ohio State University, Ohio. Maddala, G. S., Trost, R. P., Li, H., & Joutz, F. (1997). Estimation of Short-run and Long-run Elasticities of Energy Demand from Panel Data Using Shrinkage Estimators. Journal of Business and Economic Statistics, 15, 90100. Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and A New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631652. Maddala, G. S., Wu, S., & Liu, P. (2000). Do Panel Data Rescue Purchasing Power Parity (PPP) Theory? In: J. Krishnakumar & E. Ronchetti (Eds.), Panel Data Econometrics: Future Directions (pp. 3551). Amsterdam: North-Holland. Mtys, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: A Handbook of Theory and Applications. Dordrecht: Kluwer Academic Publishers. McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 5784. McCoskey, S., & Kao, C. (1999a). Testing the Stability of a Production Function with Urbanization as a Shift Factor: An Application of Non-Stationary Panel Data Techniques. Oxford Bulletin of Economics and Statistics, 61, 671690. McCoskey, S., & Kao, C. (1999b). Comparing Panel Data Cointegration Tests with an Application of the Twin Decits Problems. Working paper, Center for Policy Research, Syracuse University, New York. McCoskey, S., & Selden, T. (1998). Health Care Expenditures and GDP: Panel Data Unit Root Test Results. Journal of Health Economics, 17, 369376. Metcalf, G. E. (1996). Specication Testing in Panel Data with Instrumental Variables. Journal of Econometrics, 71, 291307. Moon, H. R., & Phillips, P. C. B. (1998). A Reinterpretation of the Feldstein-Horioka Regressions from a Nonstationary Panel Viewpoint. Working paper, Department of Economics, Yale University. Moon, H. R., & Phillips, P. C. B. (1999). Maximum Likelihood Estimation in Panels with Incidental Trends. Oxford Bulletin of Economics and Statistics, 61, 711747. Nelson, C., & Startz, R. (1990). The Distribution of the Instrumental Variables Estimator and Its t-ratio When the Instrument Is A Poor One. Journal of Business, 63, S125-S140. Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri, L. F. Lee & M.H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models (pp. 136170). Cambridge: Cambridge University Press. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426. OConnell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of International Economics, 44, 119. Oh, K. Y. (1996). Purchasing Power Parity and Unit Roots Tests Using Panel Data. Journal of International Money and Finance, 15, 405418. Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float. Journal of International Economics, 43, 313332. Pedroni, P. (1996). Fully Modied OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper, Department of Economics, Indiana University.
50
Pedroni, P. (1997a). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests with an Application to the PPP Hypothesis. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power Parity in Panels. Working paper, Department of Economics, Indiana University. Pedroni, P. (1999). Critical Values for Cointegration Tests in Heterogeneous Panels with Multiple Regressors. Oxford Bulletin of Economics and Statistics, 61, 653678. Pedroni, P. (2000). Testing for Convergence to Common Steady States in Nonstationary Heterogeneous Panels. Working paper, Department of Economics, Indiana University. Pesaran, M. H., & Smith, R. (1995). Estimating Long-run Relationships From Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79113. Pesaran, M. H., Shin, Y., & Smith, R. (1999). Pooled Mean Group Estimation of Dynamic Heterogeneous Panels. Journal of the American Statistical Association, 94, 621634. Pesaran, M. H., & Zhao, Z. (1999). Bias Reduction in Estimating Long-run Relationships From Dynamic Heterogeneous Panels. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Persaran (Eds.), Analysis of Panels and Limited Dependent Variable Models (pp. 297322). Cambridge: Cambridge University Press. Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables Regression with I (1) Processes. Review of Economic Studies, 57, 99125. Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 10571111. Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some Recent Developments. Econometric Reviews, forthcoming. Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for Cointegration. Econometrica, 58, 165193. Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 919. Quah, D. (1996). Empirics for Economic Growth and Convergence. European Economic Review, 40, 13531375. Robertson, D., & Symons, J. (1992). Some Strange Properties of Panel Data Estimators. Journal of Applied Econometrics, 7, 175189. Saikkonen, P. (1991). Asymptotically Efcient Estimation of Cointegrating Regressions. Econometric Theory, 58, 121. Sala-i-Martin, X. (1996). The Classical Approach to Convergence Analysis. Economic Journal, 106, 10191036. Schmidt, P., Ahn, S. C. & Wyhowski, D. (1992). Comment. Journal of Business and Economic Statistics, 10, 1014. Shin, Y. (1994). A Residual Based Test of the Null of Cointegration Against the Alternative of No Cointegration. Econometric Theory, 10, 91115. Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression With Weak Instruments. Econometrica, 65, 557586. Stock, J. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems. Econometrica, 61, 783820. Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems. Econometrica, 61, 783820. Tauchen, G. (1986). Statistical Properties of Generalized Method of Moments Estimators of Structural Parameters Obtained From Financial Market Data. Journal of Business and Economic Statistics, 4, 397416.
51
Wansbeek, T. J., & Bekker, P. (1996). On IV, GMM and ML in a Dynamic Panel Data Model. Economics Letters , 51, 145152. Wansbeek, T. J., & Knaap, T. (1999). Estimating a Dynamic Panel Data Model with Heterogenous Trends. Annales dEconomie et de Statistique, 5556, 331349. Wooldridge, J. M. (1997). Multiplicative Panel Data Models Without the Strict Exogeneity Assumption. Econometric Theory, 13, 667678. Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo Study. Working paper, Department of Economics, State University of New York at Buffalo, New York. Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel Data Set. Journal of Money, Credit and Banking, 28, 5463. Ziliak, J. P. (1997). Efcient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-condition Estimators. Journal of Business and Economic Statistics, 15, 419431.
ESTIMATION IN DYNAMIC PANEL DATA MODELS: IMPROVING ON THE PERFORMANCE OF THE STANDARD GMM ESTIMATOR
Richard Blundell, Stephen Bond and Frank Windmeijer
ABSTRACT
This chapter reviews developments to improve on the poor performance of the standard GMM estimator for highly autoregressive panel series. It considers the use of the system GMM estimator that relies on relatively mild restrictions on the initial condition process. This system GMM estimator encompasses the GMM estimator based on the non-linear moment conditions available in the dynamic error components model and has substantial asymptotic efciency gains. Simulations, that include weakly exogenous covariates, nd large nite sample biases and very low precision for the standard rst differenced estimator. The use of the system GMM estimator not only greatly improves the precision but also greatly reduces the nite sample bias. An application to panel production function data for the U.S. is provided and conrms these theoretical and experimental ndings.
1. INTRODUCTION
Much of the recent literature on dynamic panel data estimation has focused on providing optimal linear Generalised Method of Moments (GMM) estimators
53
54
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
under relatively weak auxiliary assumptions about the exogeneity of the covariate processes and the properties of the heterogeneity and error term processes. A standard approach is to rst-difference the equation to remove permanent unobserved heterogeneity, and to use lagged levels of the series as instruments for the predetermined and endogenous variables in rst-differences (see Anderson & Hsiao (1981), Holtz-Eakin, Newey & Rosen (1988) and Arellano & Bond (1991)). However, in dynamic panel data models where the series are highly autoregressive and the number of time series observations is moderately small, this standard GMM estimator has been found to have large nite sample bias and poor precision in simulation studies (see the experimental evidence and theoretical discussions in Ahn & Schmidt (1995) and Alonso-Borrego & Arellano (1999), for example). The poor performance of the standard GMM panel data estimator is also reected in empirical experience with estimation on relatively short panels with highly persistent data. To quote from the extensive review of production function estimation by Griliches & Mairesse (1998) one of the original applications for panel data estimation In empirical practice, the application of panel methods to micro-data produced rather unsatisfactory results: low and often insignicant capital coefcients and unreasonably low estimates of returns to scale. One simple explanation of these ndings in the production function context is that lagged levels of the series provide weak instruments for rst-differenced variables in this case (see Blundell & Bond (2000)). One response to these ndings has been to consider the use of further moment conditions that have improved properties for the estimates of the parameters of interest. For example, Ahn & Schmidt (1995) consider the nonlinear moment conditions implied by the standard error components formulation and show that asymptotic variance ratios can be considerably improved. Blundell & Bond (1998) consider alternative estimators that require further restrictions on the initial conditions process, designed to improve the properties of the standard rst-differenced instrumental variables estimator. This also provides the motivation for the discussion in this chapter. The idea is to consider the performance of a system GMM estimator that relies on relatively mild restrictions on the initial condition process to improve the performance of the GMM estimator in the dynamic panel data context. The material presented draws extensively from the existing literature. For example, Arellano & Bover (1995) and Blundell & Bond (1998) show that mean stationarity in an AR(1) panel data model is sufcient to justify the use of lagged differences of the dependent variable as instruments for equations in levels, in addition to lagged levels as instruments for equations in rstdifferences. This result naturally extends to models with weakly exogenous
GMM Estimation in Dynamic Panel Data Models
55
covariates. The Monte Carlo simulations and asymptotic variance calculations reported in this paper show that this extended GMM estimator can offer considerable efciency gains in the situations where the standard rstdifferenced GMM estimator performs poorly. Given this restriction on the initial conditions, the system GMM estimator is also shown to encompass the GMM estimator based on the non-linear moment conditions available in the dynamic error components model (see Ahn & Schmidt (1995)). The system GMM estimator has substantial asymptotic efciency gains relative to this nonlinear GMM estimator, and these are reected in their nite sample properties. The chapter is organised in the following way. The next section reviews the standard error components structure for a linear dynamic panel data model and lays out the underlying assumptions. Recalling that Within Groups, GLS and OLS on the levels and rst-differenced models all suffer from bias even when the cross-section dimension is large, this section also briey considers the biases that occur for standard panel data estimators in dynamic models. Section 3 then presents the linear GMM estimator for this model that uses lagged information to instrument current differences in a rst-differenced specication. The following section then outlines the problem of weak instruments in this case. Following the discussion in Ahn & Schmidt (1995), Section 5 considers the use of further non-linear moment conditions that are implied by the model outlined in Section 2. Section 6 derives a linear moment restriction for the levels model using initial condition restrictions and this is then incorporated into the full system GMM estimator. Asymptotic variance comparisons among these various GMM estimators are given in Section 8. The detailed discussion in these earlier sections uses an AR(1) model and the extension to a multivariate setting is presented in Section 9. Finally, before moving to the Monte Carlo results and empirical application, over-identication tests are reviewed. The Monte Carlo results presented in Section 11 are the rst in the literature to consider the properties of these GMM estimators in dynamic models with weakly exogenous regressors. As this is perhaps the most common case in empirical applications, these results have important bearing on applied work. The analysis nds both a large bias and very low precision for the standard rst-differenced estimator when the individual series are highly autoregressive. The use of the system GMM estimator not only greatly improves the precision but also greatly reduces the nite sample bias. Exploiting the non-linear moment conditions also provides signicant gains compared to the standard rst-differenced GMM estimator, but these gains are much less dramatic than
56
those provided by the system GMM estimator when the initial conditions restriction is valid. The empirical application returns to the Griliches and Mairesse discussion. The application uses production function data for the U.S. and conrms the Griliches and Mairesse ndings for the capital and labor coefcients in a CobbDouglas model. Using the standard rst-differenced GMM estimator, the estimated coefcient on capital is very low and all coefcient estimates have poor precision. Constant returns to scale is easily rejected. Moreover, an examination of the individual series suggests that they are highly autoregressive thus hinting at a weak instruments problem for standard GMM on this data. These production function results are improved by using the system estimator. The capital coefcient is now more precise and takes a reasonable value and constant returns to scale is not rejected. These Monte Carlo and empirical results indicate that a careful examination of the original series and use of the system GMM estimator can overcome many of the disappointing features of the standard GMM estimator in the context of highly persistent series.
2. DYNAMIC MODELS AND THE BIASES FROM STANDARD PANEL DATA ESTIMATORS
To analyse the properties of estimators of the parameters in linear dynamic panel data models we consider an autoregressive panel data model of the form yit = yit 1 + xit + uit uit = i + vit (2.1) (2.2)
for i = 1, . . . , N and t = 2, . . . , T, where i + vit is the usual error components decomposition of the error term; N is large, T is xed and || < 1.1 This model specication is sufcient to cover most of the standard cases encountered in linear dynamic panel applications. Allowing the inclusion of xit 1 provides the autoregressive panel data model yit = yi, t 1 + 1xit + 2xit 1 + i + vit which has the corresponding common factor restricted (2 = 1) form yit = 1xit + fi + it, with it = i, t 1 + vit and i = (1 )fi. In our Monte Carlo study and application to panel data production function equations presented in Sections 11 and 12 we allow for the inclusion of xit regressors, but for the evaluation of the various estimators we use an AR(1) model with unobserved individual-specic effects
57
yit = yi, t 1 + uit uit = i + vit
(2.3)
for i = 1, . . . , N and t = 2, . . . , T.2 At the outset we will assume that i and vit have the familiar error components structure in which E(i) = 0, E(vit) = 0, E(viti) = 0 for i = 1, . . . , N and t = 2, . . . , T and E(vitvis) = 0 for i = 1, . . . , N and t s. (2.5) In addition there is the standard assumption concerning the initial conditions yi1 (see Ahn & Schmidt (1995), for example) E(yi1vit) = 0 for i = 1, . . . , N and t = 2, . . . , T. (2.6) These standard assumptions (2.4), (2.5) and (2.6) imply moment restrictions that are sufcient to (identify and) estimate for T 3.3 Further restrictions on the initial conditions dene a mean stationary process as yi1 = and E(i1) = E(ii1) = 0 for i = 1, . . . , N, and a covariance stationary process by further specifying
2 E(v2 it) = v
(2.4)
i + i1 1
for i = 1, . . . , N
(2.7)
(2.8)
for i = 1, . . . , N and t = 2, . . . , T
2 v
E(2 i1) =
1 2
for i = 1, . . . , N.
For completeness and to conclude this brief outline of the dynamic error components model, we consider the biases from the standard panel data estimators in this model. We consider here the biases found under covariance stationarity (for more details see Baltagi (1995) and Hsiao (1986)). The asymptotic bias of the simple OLS estimator for in model (2.3), is given by plim( OLS ) = (1 )
2 2 1 /v , with k = , 2 2 /v + k 1+
2 where 2 = E(i ), and therefore the OLS estimator is biased upwards, with < plim( OLS) < 1.
58
The asymptotic bias of the Within Groups estimator for has been documented by Nickell (1981) and is given by
1+ 1 1 T 1 T1 T (1 ) , plim( WG ) = 2 1 1 T 1 1 (1 )(T 1) T (1 ) and so, when > 0, plim( WG) < . When the model is transformed into rst-differences to eliminate the unobserved individual heterogeneity component i, yit = yit 1 + uit, the asymptotic bias of the OLS estimator is given by 1+ , plim( OLSd ) = 2 1 and so plim( OLSd) = < 0. 2
3. A FIRST-DIFFERENCED GMM ESTIMATOR

3.1. The Standard Moment Conditions In the absence of any further restrictions on the process generating the initial conditions, the autoregressive error components model (2.3)(2.6) implies the following md = 0.5(T 1)(T 2) orthogonality conditions which are linear in the parameter E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1, (3.1) where uit = uit ui, t 1. These depend only on the assumed absence of serial correlation in the time varying disturbances vit, together with the restriction (2.6). The moment restrictions in (3.1) can be expressed more compactly as E(Z diui) = 0, where Zdi is the (T 2) md matrix given by yi1 0 0 . . . 0 . . . 0 0 0 yi1 yi2 . . . 0 . . . Zdi = . , . . ... . ... . 0 0 0 . . . yi1 . . . yiT 2 and ui is the (T 2) vector (ui3, ui4, . . . , uiT).
59
The Generalised Method of Moments (GMM) estimator based on these moment conditions minimises the quadratic distance uZdWNZ du for some is the m N(T 2) matrix ( Z , Z , . . . , Z metric WN, where Z d d d1 d2 dN) and u , u , . . . , u ). This gives the GMM estimator for is the N(T 2) vector (u 1 2 N as
1 y d = (y 1ZdWNZ dy 1) 1ZdWNZ dy,
where y i is the (T 2) vector (yi3, yi4, . . . , yiT), y i, 1 is the (T 2) vector (yi2, yi3, . . . , yi, T 1), and y and y 1 are stacked across individuals in the same way as u. Alternative choices for the weights WN give rise to a set of GMM estimators based on the moment conditions in (3.1), all of which are consistent for large N and nite T, but which differ in their asymptotic efciency.4 In general the optimal weights are given by
1 WN = N
N i=1
Z diuiu iZdi
(3.2)
where ui are residuals from an initial consistent estimator. We refer to this as the two-step GMM estimator.5 In the absence of any additional knowledge about the process for the initial conditions, this estimator is asymptotically efcient in the class of estimators based on the linear moment conditions (3.1) (see Hansen (1982) and Chamberlain (1987)). 3.2. Homoskedasticity Ahn & Schmidt (1995) show that additional linear moment conditions are available if the vit disturbances are homoskedastic through time, i.e. if
2 E(v2 it) = i for t = 2, . . . , T.
(3.3) (3.4)
This implies T 3 orthogonality restrictions of the form E(yi, t 2ui, t 1 yi, t 1uit) = 0; for t = 4, . . . , T and allows a further T 3 columns to be added to the instrument matrix Zdi. The additional columns Zhi are yi2 0 Zhi = . 0 yi3 yi3 . 0 0 yi4 . 0 ... 0 ... 0 ... . . . . yiT 2 0 0 . . yiT 1
60
Calculation of the one-step and two-step GMM estimators then proceeds exactly as described above.
4. WEAK INSTRUMENTS
The instruments used in the standard rst-differenced GMM estimator become less informative in two important cases. First, as the value of the autoregressive parameter increases towards unity; and second, as the variance of the individual effects i increases relative to the variance of vit. To examine this further consider the case with T = 3. In this case, the moment conditions corresponding to the standard GMM estimator reduce to a single orthogonality condition. The corresponding method of moments estimator reduces to a simple two stage least squares (2SLS) estimator, with rst stage (instrumental variable) regression yi2 = dyi1 + ri for i = 1, . . . , N. For sufciently high autoregressive parameter or for sufciently high relative variance of the individual effects, the least squares estimate of the reduced form coefcient d can be made arbitrarily close to zero. In this case the instrument yi1 is only weakly correlated with yi2. To see this notice that the model (2.3) implies that yi2 = ( 1)yi1 + i + vi2 for i = 1, . . . , N. (4.1) The least squares estimator of ( 1) in (4.1) is generally biased upwards, towards zero, since we expect E(yi1i) > 0. Assuming covariance stationarity 2 d is given by and letting 2 = var(i) and v = var(vit), the plim of plim d = ( 1)
k 1 ; with k = . 2 1+ +k 2 v
(4.2)
The bias term effectively scales the estimated coefcient on the instrumental 2 variable yi1 toward zero. We nd that plim d 0 as 1 or as (2 /v ) , which are the cases in which the rst stage F-statistic is Op(1). A graph showing 2 both plim d and 1 against is given in Fig. 1, for 2 = v , T = 3. We are interested in inferences using this rst-differenced instrumental variable (IV) estimator when d is local to zero, that is where the instrument yi1 is only weakly correlated with yi2. Following Nelson & Startz (1990a, b) and Staiger & Stock (1997) we characterise this problem of weak instruments using the concentration parameter. First note that the F-statistic for the rst stage instrumental variable regression converges to a noncentral chi-squared with one
61
Fig. 1.
2 plim d and 1, 2 = , T = 3. Source: Blundell & Bond (1998).
degree of freedom. The concentration parameter is then the corresponding noncentrality parameter which we label in this case. The IV estimator performs poorly when approaches zero. Assuming covariance stationarity, has the following simple characterisation in terms of the parameters of the AR model =
2 (2 1 v k) ; with k = . 2 2 + v k 1+
The performance of the standard GMM differenced estimator in this AR(1) specication can therefore be seen to deteriorate as 1, as well as for 2 decreasing values of 2 v and for increasing values of . To illustrate this further 2 Fig. 2 provides a plot of against for the case = 2 v = 1, T = 3. Blundell & Bond (2000) note that the nite sample bias of the rstdifferenced GMM estimator for the AR(1) model with weak instruments is likely to be in the direction of the Within Groups estimator. This is because the (one-step) rst-differenced GMM estimator coincides with a 2SLS estimator based on the orthogonal deviations transformation of Arellano & Bover (1995), and 2SLS estimators are biased in the direction of OLS in the presence of weak instruments (see, for example, Bound, Jaeger & Baker (1995)).6 We explore the nite sample behaviour of the rst-differenced GMM estimator further in Section 11 below.
62
Fig. 2.
2 Concentration Parameter , 2 = = 1, T = 3. Source: Blundell & Bond (1998).
5. NON-LINEAR MOMENT CONDITIONS

5.1. Standard Assumptions The standard assumptions (2.4), (2.5) and (2.6) also imply non-linear moment conditions which are not exploited by the standard linear rst-differenced GMM estimator described in Section 3.1. Ahn & Schmidt (1995) show that there are a further T 3 non-linear moment conditions, which can be written as E(uitui, t 1) = 0; for t = 4, 5, . . . , T (5.1)
and which could be expected to improve efciency. These conditions relate directly to the absence of serial correlation in vit and do not require homoskedasticity. Thus, under the standard assumptions, the complete set of second-order moment conditions available is (3.1) and (5.1). Asymptotic efciency comparisons reported in Ahn & Schmidt (1995) conrm that these non-linear moments are particularly informative in cases where is close to 2 unity and/or where 2 /v is high.
63
5.2. Homoskedasticity Under the homoskedasticity through time restriction (3.3), there is one further non-linear moment condition available, in addition to (3.1), (3.4) and (5.1) (see Ahn & Schmidt (1995)). This can be written as
1 E(uiui3) = 0 where ui = T1
T t=2
uit.
(5.2)
Thus, under the homoskedasticity assumption in addition to the standard assumptions, the complete set of moment conditions available comprises the linear conditions (3.1) and (3.4), and the non-linear conditions (5.1) and (5.2).
6. INITIAL CONDITIONS AND A LEVELS GMM ESTIMATOR

In addition to the standard assumptions set out in Section 2, we now consider the additional assumption E(iyi2) = 0 for i = 1, . . . , N. (6.1) Notice that, given (2.3)(2.6) which species yi2 given yi1, assumption (6.1) is a restriction on the initial conditions process generating yi1.7 If this initial conditions restriction holds in addition to the standard assumptions (2.4), (2.5) and (2.6), the following T 2 linear moment conditions are valid E(uityi, t 1) = 0; for t = 3, 4, . . . , T. (6.2) Moreover, given the standard assumptions, these linear moment conditions imply the T 3 non-linear moment conditions given in (5.1), and render these non-linear conditions redundant for estimation. Thus the complete set of second order moment restrictions implied by (2.3)(2.6) and (6.1) can be implemented as a linear GMM estimator. To consider when the rst-differences yit are uncorrelated with the individual effects, notice that for the AR(1) model (2.3) yit =
t2
yi2 +
t3 s=0
sui, t s
so that yit will be uncorrelated with i if and only if yi2 is uncorrelated with i. This is precisely the assumption (6.1). To guarantee this, we require the initial conditions restriction
64
yi1
i i = 0, 1
which is satised under mean stationarity of the yit process, as dened by (2.3)(2.8). To show that the moment conditions (6.2) remain informative when 2 approaches unity or 2 /v becomes large, we again consider the case of T = 3. Here we can use one equation in levels yi3 = yi2 + i + vi3 for which the instrument available is yi2, and the rst stage regression is yi2 = lyi2 + ri. In this case, assuming covariance stationarity, the plim l is given by8 plim l=
1 2
(6.3)
and therefore this moment condition stays informative for high values of , in contrast to the moment condition available for the rst-differenced model. The 0.5(T + 1)(T 2) linear moment conditions (3.1) and (6.2) comprise the full set of second-order moment conditions under mean stationarity in conjunction with the standard assumptions listed in Section 2, and form the basis for a system GMM estimator which will be discussed in the next section. However, as this system GMM estimator combines the moment conditions for the model in rst-differences with those for the model in levels, we also consider a simpler GMM levels estimator, that is based on the ml = 0.5(T 1)(T 2) moment conditions E(uityi, t s) = 0; for t = 3, . . . , T and 1 s t 2, that relate only to the equations in levels. These can be expressed as E(Z liui) = 0, where Zli is the (T 2) ml matrix given by 0 0 yi2 0 yi2 yi3 Zli = . . . 0 0 0 ... 0 ... 0 ... . . . . yi2 ... 0 ... 0 , ... . . . . yiT 1 (6.4)
and ui is the (T 2) vector (ui3, ui4, . . . , uiT). Calculation of the one-step and
65
two-step GMM estimators then proceeds in a similar way to that described above. In this case though, unless 2 = 0, there is no one-step GMM estimator that is asymptotically equivalent to the two-step estimator, even in the special case of i.i.d. disturbances.9
7. A SYSTEM GMM ESTIMATOR

7.1. The Optimal Combination of Differenced and Levels Estimators Calculation of the GMM estimator using the full set of linear moment conditions (3.1) and (6.2) can be based on a stacked system comprising all T 2 equations in rst-differences and the T 2 equations in levels corresponding to periods 3, . . . , T, for which instruments are observed. The ms = 0.5(T + 1)(T 2) moment conditions are10 E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1 E(uityi, t 1) = 0; for t = 3, . . . , T. These can be expressed as E(Z sipi) = 0, where pi = (7.1) (7.2)

ui ui
Zsi =
Zdi 0 0 Zp li
Zdi 0 0 0 0 yi2 = 0 0 yi3 . . . 0 0 0
... 0 ... 0 ... 0 ; ... 0 . . . yi, T 1
with Zdi as dened in section 3, and Zp li is the non-redundant subset of Zli. The calculation of the two-step GMM estimator is then analogous to that described above. Again in this case, unless 2 = 0, there is no one-step GMM estimator that is asymptotically equivalent to the two-step estimator, even in the special case of i.i.d. disturbances.11 The system GMM estimator is clearly a combination of the GMM differenced estimator and a GMM levels estimator that uses only (7.2). This combination is linear for the system 2SLS estimator which is given by
66

1 1 1 s = (q Z q Z 1Zs( Z sZs) sq 1) 1Zs(Z sZs) sq,
where qi =
y i . yi
Because 1 1 p p p 1 p Z Z Zl y 1 q 1Zs(Z sZs) sq 1 = y 1Zd(Z dZd) dy 1 + y 1Zl (Zl Zl ) the system 2SLS estimator is equivalent to the linear combination d + (1 ) p s = l, p l are the 2SLS rst-differenced and levels estimators where d and respectively, with the levels estimator utilising only the T 2 moment conditions (7.2), and 1 y Z 1Zd(Z dZd) dy 1 = p p 1 p 1 p y Z Zl y 1 1Zd(Z dZd) dy 1 + y 1Zl (Zl Zl ) Z Z d d d d = , p p d+ l dZ dZd lZl Zl l are the OLS estimates of the rst stage regression coefcients where d and underlying these 2SLS estimators. From (4.2) and (6.3) it follows that 0 if 2 1 and/or (2 /v ) , so all the weight for the system estimator will in these cases be given to the informative levels moment conditions (7.2). 7.2. Homoskedasticity In the case where the initial conditions satisfy restriction (6.1) and the vit satisfy restriction (3.3), Ahn & Schmidt (1995, equation (12b)) show that the T 2 homoskedasticity restrictions (3.4) and (5.2) can be replaced by a set of T 2 moment conditions E(yituit yi, t 1ui, t 1) = 0; for t = 3, . . . , T, which are all linear in the parameter . The non-linear conditions (5.2) are again redundant for estimation given (6.1), and the complete set of second order moment restrictions implied by (2.3)(2.6), (3.3) and (6.1) can be implemented as a linear GMM estimator.
8. ASYMPTOTIC VARIANCE COMPARISONS

To quantify the gains in asymptotic efciency that result from exploiting the linear moment conditions (6.2), Table 1 reports the ratio of the asymptotic variance of the standard rst-differenced GMM estimator described in Section 3.1 to the asymptotic variance of the system GMM estimator described in
67
Table 1.
Asymptotic Variance Ratios

2 2 /v = 0.25
2 2 /v = 1.00
T=3 0.0 0.3 0.5 0.8 0.9 0.0 0.3 0.5 0.8 0.9
SYS 1.33 2.15 4.00 28.00 121.33 1.75 2.31 3.26 13.97 55.40
NON-LINEAR n/a
SYS 1.33 1.89 2.91 13.10 47.91 1.40 1.77 2.42 8.88 30.90
NON-LINEAR n/a
T=4
1.67 1.91 2.10 2.42 2.54
1.29 1.33 1.35 1.41 1.45
Source: Blundell & Bond (1998)
Section 7.1. These asymptotic variance ratios are calculated assuming both covariance stationarity and homoskedasticity. They are presented for T = 3 and 2 T = 4, for two xed values of 2 /v , and for a range of values of the autoregressive parameter . For comparison, we also reproduce from Ahn & Schmidt (1995) the corresponding asymptotic variance ratios comparing rstdifferenced GMM to the non-linear GMM estimator which uses the quadratic moment conditions (5.1), but not the extra linear moment conditions (6.2). In the T = 3 case there are no quadratic moment restrictions available. These calculations suggest that exploiting conditions (6.2) can result in dramatic efciency gains when T = 3, particularly at high values of and high values of 2 2 /v . These are indeed the cases where we nd the instruments used to obtain the rst-differenced estimator to be weak. In the T = 4 case we still nd dramatic efciency gains at high values of . Comparison to the results for the non-linear GMM estimator also shows that the gains from exploiting conditions (6.2) can be much larger than the gains from simply exploiting the non-linear restrictions (5.1). In the Monte Carlo simulations presented in Section 11 we investigate whether similar improvements are found in nite samples.
9. MULTIVARIATE DYNAMIC PANEL DATA MODELS

In this section the dynamic panel data model with additional regressors is considered.12 In particular, we focus on the model
68
yit = yit 1 + xit + uit uit = i + vit
(9.1)
where xit is a scalar. The error components i and vit again satisfy the conditions (2.4)(2.6). The xit process is correlated with the individual effects i and we consider three possible correlation structures between the xit process and the vit error process that determine the instruments that can be used to estimate and . First, the xit process is strictly exogenous: E(xisvit) = 0; for s = 1, . . . , T; t = 2, . . . , T. Secondly, the xit process is weakly exogenous, or predetermined E(xisvit) = 0; for s = 1, . . . , t; t = 2, . . . , T E(xisvit) 0; for s = t + 1, . . . , T; t = 2, . . . , T and thirdly, the xit process is endogenously determined E(xisvit) = 0; for s = 1, . . . , t 1; t = 2, . . . , T E(xisvit) 0; for s = t, . . . , T; t = 2, . . . , T. We are especially interested in the case when the xit process is endogenously determined, which includes simultaneous processes, but also measurement error. For the GMM rst-differenced estimator, the 0.5(T 1)(T 2) moment conditions (3.1) E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1 remain valid. When the xit process is strictly exogenous, the following additional T(T 2) moment conditions are valid E(xisuit) = 0; for t = 3, . . . , T and 1 s T. (9.5) When xit is predetermined there are only the 0.5(T + 1)(T 2) additional moment conditions E(xi, t suit) = 0; for t = 3, . . . , T and 1 s t 1, (9.6) whereas when xit is endogenously determined only the following 0.5(T 1)(T 2) additional moment conditions are valid E(xi, t suit) = 0; for t = 3, . . . , T and 2 s t 1. (9.7) For the non-linear GMM estimator, moment conditions (5.1) remain valid, and no further moment conditions result from the presence of xit variables. (9.4) (9.3) (9.2)
69
For the system GMM estimator, we rst consider under what conditions both yit and xit are uncorrelated with i. In order to illustrate this, we specify the following process for the regressor xit = xi, t 1 + i + eit. Thus 0 allows the level of xit to be correlated with i, and the covariance properties between vit and eis determine whether xit is strictly exogenous, predetermined or endogenously determined. First notice that xit =
t2
xi2 +
t3 s=0
sei, t s,
so that xit will be correlated with i if and only if xi2 is correlated with i. To guarantee E[xi2i] = 0 we require the initial conditions restriction E
xi1
i i = 0 1
(9.8)
which is satised under mean stationarity of the xit process. Given this restriction, writing yit as yit =
t2
yi2 +
t3 s=0
s(xi, t s + ui, t s)
(9.9)
shows that yit will be correlated with i if and only if yi2 is correlated with i. To guarantee E[yi2i] = 0 we then require the similar initial conditions restriction
yi1
i + i 1 1
=0
(9.10)
which would again be satised under stationarity. Thus, there are additional moment restrictions available for the equations in levels when the yit and xit processes are both mean stationary. Whilst jointly stationary means is sufcient to ensure that both yit and xit are uncorrelated with i, this condition is stronger than is necessary. For example, if the conditional model (9.1) has generated the yit series for sufciently long time prior to our sample period for any inuence of the true initial conditions to be negligible, then an expression analogous to (9.9) shows that yit will be uncorrelated with i provided that xit is uncorrelated with i,
70
even if the mean of xit (and hence yit) is time-varying. Moreover we can note that it is perfectly possible for xit to be uncorrelated with i in cases where yit is correlated with i (for example, when (9.8) holds or = 0 but (9.10) is not satised). However, given (9.9), it seems very unlikely that yit will be uncorrelated with i in contexts where xit is correlated with i. When both yit and xit are uncorrelated with i, the extra moment conditions for the GMM system estimator are, as before, (7.2), E(uityi, t 1) = 0; for t = 3, . . . , T and E(uitxit) = 0; for t = 2, . . . , T in the case where xit is strictly exogenous or predetermined; or E(uitxit 1) = 0; for t = 3, . . . , T, (9.12) when xit is endogenously determined. Therefore, when for example xit is endogenous, the GMM system estimator is based on the moment conditions (7.1), (9.7), (7.2) and (9.12). (9.11)
10. TESTS OF OVERIDENTIFYING RESTRICTIONS

The standard test for testing the validity of the moment conditions used in the GMM estimation procedure is the Sargan test of overidentifying restrictions (see Sargan (1958) and the development for GMM in Hansen (1982)). For the GMM estimator in the rst-differenced model this test statistic is given by Sard =
1 uZdWNZ du N
where WN is the optimal weight matrix as in (3.2) and u are the two-step residuals in the differenced model. In general, under the null that the moment conditions are valid, Sard is asymptotically chi-squared distributed with md k degrees of freedom, where md is the number of moment conditions and k is the number of estimated parameters. For the system estimator, the same test is readily dened. Call this test Sars. A test for the validity of the level moment conditions that are utilised by the system estimator is then obtained as the difference between Sars and Sard: Dif-Sar = Sars Sard (10.1) and Dif-Sar is asymptotically chi-squared distributed with ms md degrees of freedom under the null that the level moment conditions are valid.
71
11. MONTE CARLO RESULTS

This section illustrates the performance of the various estimators, as discussed above, for a dynamic multivariate panel data model. In particular, the effect of weak instruments and the potential gains from exploiting initial conditions restrictions are investigated. The model specication is yit = yit 1 + xit + i + vit xit = xit 1 + i + vit + eit with
2 2 i ~ N(0, 2 ); vit ~ N(0, v ); eit ~ N(0, e )
(11.1) (11.2)
and the initial observations are drawn from the covariance stationary distribution. Although these errors are homoskedastic, we do not consider any of the additional moment conditions that require homoskedasticity in the simulated estimators. We choose the error process parameters in such a way that the xit process is highly persistent for high values of . Further, xit is positively correlated with i and the value of is negative to mimic the effects of measurement error. The values of the parameters that are kept xed in the various Monte Carlo simulations presented below are = 1, = 0.25, = 0.1,
2 2 2 = 1, v = 1, e = 0.16.
The parameters that are varied in the simulations are the autoregressive coefcients and . We consider four designs with and both taking the values of 0.5 and 0.95. The case when = 0.5 and = 0.95 resembles the production function data that will be analysed in the next section. The sample size is N = 500, and the simulation results for the various estimators are presented in Tables 2 and 3 for T = 4 and in Tables 4 and 5 for T = 8. Means, standard deviations and root mean squared errors (RMSE) from 10,000 simulations are tabulated for the OLS levels estimator (OLS), Within Groups estimator (WG), the GMM rst-differenced estimator (DIF), the nonlinear GMM estimator (AS),13 the levels GMM estimator (LEV), and the
72
Table 2.
WG Mean St D rmse 0.030 0.538 0.469 0.915 0.350 0.195 0.840 0.790 1.006 0.516 0.496 0.090 0.091 0.501 0.075 0.075 0.502 Mean St D rmse Mean St D rmse Mean St D rmse 0.059 0.059 0.512 1.029 0.980 1.004 DIF AS LEV
Monte-Carlo results, T = 4, = 0.5, = 1, N = 500

SYS Mean St D rmse 0.500 0.055 0.055 0.512 1.015 0.979 1.000
OLS
Mean
St D rmse 0.036
0.762
0.017 0.263 0.010 0.318 0.300 0.194
= 0.5 0.031 0.491 0.080 0.687 0.131 0.135 0.420 0.428
0.820
0.775
0.011 0.320 0.053 0.231
0.095 0.096 0.351 0.351
0.070 0.070 0.336 0.337
0.060 0.061 0.257 0.257
= 0.95 0.032 0.651 0.075 0.809 0.487 0.773 0.994 1.554
0.990
0.583
0.001 0.040 0.053 0.420
0.242 0.266 0.524 0.565
0.029 0.042 0.289 0.289
0.033 0.044 0.232 0.232
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
Table 3.
WG Mean St D rmse 0.032 0.729 0.466 0.517 0.907 0.233 0.863 0.936 1.021 0.500 0.518 1.078 0.957 1.020 0.472 0.825 0.954 0.868 0.221 0.235 0.961 0.144 0.145 Mean St D rmse Mean St D rmse Mean St D rmse DIF AS LEV Mean
Monte-Carlo results, T = 4, = 0.95, = 1, N = 500

SYS St D rmse 0.953 0.096 0.096 0.514 1.075 0.956 1.020
OLS
Mean
St D rmse 0.221
0.997
0.002 0.047 0.089 0.551 0.661 0.465
= 0.5 0.031 0.412 0.090 0.458 0.103 0.109 1.438 1.522 0.065 0.065 0.461 0.461
0.650
0.830
0.014 0.151 0.034 0.174
0.053 0.056 0.160 0.178
0.044 0.046 0.153 0.170
= 0.95 0.026 0.290 0.089 0.543 0.104 0.112 1.769 1.928
0.962
0.904
0.001 0.012 0.026 0.100
0.072 0.074 0.853 0.864
0.008 0.010 0.091 0.093
0.010 0.011 0.090 0.092
73
74
Table 4.
WG Mean St D rmse 0.018 0.236 0.480 0.930 0.548 0.226 0.969 0.972 0.944 0.497 0.494 0.034 0.035 0.495 0.025 0.026 0.503 Mean St D rmse Mean St D rmse Mean St D rmse 0.029 0.029 0.523 1.041 0.982 0.979 DIF AS LEV
Monte Carlo results, T = 8, = 0.5, = 1, N = 500

SYS Mean St D rmse 0.501 0.024 0.024 0.511 0.997 0.979 0.983
OLS
Mean
St D rmse 0.265
0.762
0.012 0.262 0.311 0.490 0.662 0.388
= 0.5 0.017 0.190 0.045 0.512 0.040 0.045 0.136 0.153 0.029 0.029 0.134 0.145
0.820
0.775
0.007 0.320 0.034 0.228
0.034 0.041 0.157 0.162
0.027 0.029 0.124 0.124
= 0.95 0.016 0.289 0.044 0.613 0.177 0.440 0.356 0.852
0.990
0.581
0.001 0.040 0.035 0.421
0.030 0.036 0.134 0.137
0.007 0.032 0.108 0.110
0.011 0.031 0.101 0.103
Table 5.
WG Mean St D rmse 0.017 0.359 0.480 0.800 0.927 0.615 1.016 0.956 1.099 0.508 0.523 1.084 0.957 1.017 0.676 0.222 0.350 0.903 0.061 0.077 0.973 0.022 0.032 Mean St D rmse Mean St D rmse Mean St D rmse DIF AS LEV Mean
Monte Carlo results, T = 8, = 0.95, = 1, N = 500

SYS St D rmse 0.958 0.031 0.032 0.518 1.075 0.957 1.019
OLS
Mean
St D rmse 0.591
0.997
0.001 0.047 0.396 0.796 0.882 0.745
= 0.5 0.015 0.106 0.040 0.208 0.033 0.039 0.290 0.352 0.024 0.025 0.125 0.159
0.650
0.830
0.009 0.150 0.022 0.171
0.022 0.032 0.058 0.101
0.021 0.028 0.059 0.095
= 0.95 0.009 0.068 0.040 0.258 0.025 0.034 0.400 0.555
0.962
0.902
0.001 0.012 0.017 0.100
0.007 0.009 0.118 0.119
0.002 0.007 0.028 0.033
0.003 0.007 0.031 0.036
75
76
system GMM estimator (SYS). Thus for the case of estimating the AR(1) model for xit, DIF uses the moment conditions (3.1); AS uses the moment conditions (3.1) and (5.1); LEV uses the moment conditions (6.4); and SYS uses the moment conditions (3.1) and (6.2). The reported results are for the two-step GMM estimators. Tables 2 and 4 present results for = 0.5. The row labelled presents the results for the estimates of in model (11.2), where the various GMM estimators only utilise lagged information on x as instruments, and potential information from the lagged values of y is not used. Our results for the DIF and SYS estimators can therefore be compared to those reported in, for example, Blundell & Bond (1998) and AlonsoBorrego & Arellano (1999). As expected, the OLS estimates are biased upward and the WG estimates are biased downwards. In this experiment where xit is not highly persistent and the instruments available for the equations in rst-differences are not weak, all four GMM estimators are virtually unbiased. The AS, LEV and SYS estimators all provide an improvement in precision compared to the standard DIF estimator. As we would expect from the asymptotic variance ratios in Table 1, there is a greater gain in precision from using SYS rather than AS at T = 4, although in Table 4 we can observe that this difference becomes very small at T = 8. The next two rows in Tables 2 and 4 present the estimation results for and in model (11.1) when = 0.5 and = 0.5. The OLS estimates for are biased upwards, whereas those for are biased downwards. The WG estimates for and are both biased downwards. Again, as expected, since both the y and x series have a low degree of persistence, the four GMM estimators perform quite well in this experiment. The SYS estimator has the smallest RMSE for both parameters, but the gains are not dramatic at T = 8. The nal two rows in Tables 2 and 4 are for the model with = 0.95 and = 0.5. As this makes the y process highly persistent, the DIF estimator suffers from a serious weak instrument bias, as well as being very imprecise. We can notice that the DIF estimates of and are both biased downwards, in the direction of the Within Groups estimates. The AS estimator is better behaved, as a result of exploiting the non-linear moment conditions (5.1). However the LEV and SYS estimators which exploit the initial conditions restrictions provide more dramatic gains in precision, particularly for the estimation of and particularly in the case with T = 4. With T = 8, the LEV and SYS estimates of are biased upwards, in the direction of the OLS estimate, but still dominate on the RMSE criterion. Tables 3 and 5 present the results for the cases where the xit process is highly persistent, with = 0.95. The estimates for show the familiar pattern: OLS is upward biased, WG is downward biased, and DIF is downward biased towards
77
WG as a result of weak instruments. The AS estimator provides a substantial improvement in both bias and precision. However the LEV and SYS estimators provide more dramatic gains, particularly when T = 4. When = 0.5, the DIF estimator estimates quite well, but the DIF estimate of is very imprecise, biased downwards and on average very similar to the WG estimate of . The AS, LEV and SYS estimates of are all close to the true value. The AS estimates of are much less biased than DIF but still imprecise, particularly at T = 4. The LEV and SYS estimates of show a little nite-sample bias, but again dominate in terms of RMSE. This experiment is intended to capture salient features of the production function data we consider in Section 12, notably a highly persistent explanatory variable that is measured with error, and a signicant autoregressive parameter that is not close to one. The simulation results conrm that the system GMM estimator has reasonable properties in this context. When both and are equal to 0.95 the estimators display a similar pattern. One surprise is that the LEV and SYS estimators actually estimate both parameters better than in the experiments with = 0.5, and the gain from using either of these estimators compared to AS is rather more striking in this case. Also the DIF estimator now estimates quite well (though not ); this may be because by increasing whilst keeping the variance of i and vit xed, we have greatly increased the variance of the yit series. To investigate the size properties of the Sargan test of overidentifying restrictions, we present in Figures 312 p-value plots (see Davidson & MacKinnon, 1996) for the Sargan test statistics for the DIF and SYS GMM estimators. We also present the p-value plots for the Dif-Sar statistic as dened in (10.1), testing the validity of the additional levels moment conditions exploited by the SYS estimator. The x-axis of the p-value plots represents the nominal size using the asymptotic critical values of the corresponding chi-squared distributions; the yaxis represents the actual size of the test statistics in the experiments. Figures 36 are the p-value plots for the Sargan tests for the GMM estimators in the univariate model for xit, (11.2). When = 0.5, the distributions of the test statistics are all very close to the asymptotic distribution, with a slight over-rejection when T = 8. When the series are persistent, = 0.95, the tests over-reject, especially for larger T, with the Dif-Sar test having the largest size distortion when T = 4. Figures 714 present the p-value plots for the Sargan test statistics for the multivariate dynamic panel data model (11.1). These appear to be well behaved in the case with = 0.5 and = 0.5. In general, the Dif-Sar test is oversized when either y or x or both are persistent. An interesting case is when = 0.5,
78
= 0.95 and T = 8. The Sars and Dif-Sar tests are considerably oversized in this case, whereas the Sard test has the correct size.
Fig. 3.
p-value plot, = 0.5, T = 4.
Fig. 4.
p-value plot, = 0.95, T = 4.
79
Fig. 5.
p-value plot, = 0.5, T = 8.
Fig. 6.
p-value plot, = 0.95, T = 8.
80
Fig. 7.
= 0.5, = 0.5, T = 4.
Fig. 8.
= 0.5, = 0.95, T = 4.
81
Fig. 9.
= 0.5, = 0.5, T = 8.
Fig. 10.
= 0.5, = 0.95, T = 8.
82
Fig. 11.
= 0.95, = 0.5, T = 4.
Fig. 12.
= 0.95, = 0.95, T = 4.
83
Fig. 13.
= 0.95, = 0.5, T = 8.
Fig. 14.
= 0.95, = 0.95, T = 8.
84
12. AN APPLICATION: THE COBBDOUGLAS PRODUCTION FUNCTION

As Griliches and Mairesse (1998) have argued, the estimation of production functions has highlighted the poor performance of standard GMM estimators for short panels. Here we use the problem of estimating production function parameters to evaluate the practical signicance of the alternative estimators reviewed in this chapter. In particular attention is focused on the estimation of the CobbDouglas production function yit =nnit + kkit + t + (i + vit + mit) vit =vi, t 1 + eit eit, mit ~MA(0), || < 1 (12.1)
where yit is log sales of rm i in year t, nit is log employment, kit is log capital stock and t is a year-specic intercept reecting, for example, a common technology shock. Of the error components, i is an unobserved time-invariant rm-specic effect, vit is a possibly autoregressive (productivity) shock and mit reects serially uncorrelated (measurement) errors. Constant returns to scale would imply n + k = 1, but this is not necessarily imposed. Interest is in the consistent estimation of the parameters (n, k, ) when the number of rms (N) is large and the number of years (T) is xed. We maintain that both employment (nit) and capital (kit) are potentially correlated with the rm-specic effects (i), and with both productivity shocks (eit) and measurement errors (mit). The model has a dynamic (common factor) representation yit = nnit nni, t 1 + kkit kki, t 1 + yi, t 1 + (t t 1) + (i(1 ) + eit + mit mi, t 1) or
* yit = 1nit + 2ni, t 1 + 3kit + 4ki, t 1 + 5yi, t 1 + * t + ( i + wit)
(12.2)
(12.3)
subject to two non-linear (common factor) restrictions 2 = 15 and 4 = 35. Given consistent estimates of the unrestricted parameter vector = (1, 2, 3, 4, 5) and var(), these restrictions can be (tested and) imposed using minimum distance to obtain the restricted parameter vector (n, k, ). Notice that wit = eit ~ MA(0) if there are no measurement errors (var(mit) = 0), and wit ~ MA(1) otherwise.
85
12.1. Data and Results The data used is a balanced panel of 509 R&D-performing U.S. manufacturing companies observed for 8 years, 198289. These data were kindly made available to us by Bronwyn Hall, and are similar to those used in Mairesse & Hall (1996), although the sample of 509 rms used here is larger than the nal sample of 442 rms used in Mairesse & Hall (1996). Capital stock and employment are measured at the end of the rms accounting year, and sales is used as a proxy for output. Further details of the data construction can be found in Mairesse & Hall (1996). Table 6 reports results for the basic production function, not imposing constant returns to scale, for a range of estimators. We report results for both the unrestricted model (12.3) and the restricted model (12.1), where the common factor restrictions are tested and imposed using minimum distance.14 We report results here for the one-step GMM estimators, for which inference based on the asymptotic variance matrix has been found to be more reliable than for the (asymptotically) more efcient two-step estimator. Simulations suggest that the loss in precision that results from not using the optimal weight matrix is unlikely to be large (cf. Blundell & Bond, 1998). As expected in the presence of rm-specic effects, OLS levels appears to give an upwards-biased estimate of the coefcient on the lagged dependent variable, whilst Within Groups appears to give a downwards-biased estimate of this coefcient. Note that even using OLS, we reject the hypothesis that = 1, and even using Within Groups we reject the hypothesis that = 0. Although the pattern of signs on current and lagged regressors in the unrestricted models are consistent with the AR(1) error-component specication, the common factor restrictions are rejected for both these estimators. They also reject constant returns to scale.15 The validity of lagged levels dated t 2 as instruments in the rstdifferenced equations is clearly rejected by the Sargan test of overidentifying restrictions. This is consistent with the presence of measurement errors. Instruments dated t 3 (and earlier) are accepted, and the test of common factor restrictions is easily passed in these rst-differenced GMM results. However the estimated coefcient on the lagged dependent variable is barely higher than the Within Groups estimate. Indeed the differenced GMM parameter estimates are all very close to the Within Groups results. The estimate of k is low and statistically weak, and the constant returns to scale restriction is rejected. The validity of lagged levels dated t 3 (and earlier) as instruments in the rst-differenced equations, combined with lagged rst-differences dated t 2
86
Table 6.
OLS Levels nt nt1 kt kt1 yt1 0.479 (0.029) 0.423 (0.031) 0.235 (0.035) 0.212 (0.035) 0.922 (0.011) 2.60 2.06 0.538 (0.025) 0.266 (0.032) 0.964 (0.006) 0.000 0.000
Production Function Estimates

DIF t2 0.513 (0.089) 0.073 (0.093) 0.132 (0.118) 0.207 (0.095) 0.326 (0.052) 6.21 1.36 0.001 0.583 (0.085) 0.062 (0.079) 0.377 (0.049) 0.014 0.000 DIF t3 0.499 (0.101) 0.147 (0.113) 0.194 (0.154) 0.105 (0.110) 0.426 (0.079) 4.84 0.69 0.073 0.515 (0.099) 0.225 (0.126) 0.448 (0.073) 0.711 0.006 SYS t2 0.629 (0.106) 0.092 (0.108) 0.361 (0.129) 0.326 (0.104) 0.462 (0.051) 8.14 0.59 0.000 0.001 0.773 (0.093) 0.231 (0.075) 0.509 (0.048) 0.012 0.922 SYS t3 0.472 (0.112) 0.278 (0.120) 0.398 (0.152) 0.209 (0.119) 0.602 (0.098) 6.53 0.35 0.032 0.102 0.479 (0.098) 0.492 (0.074) 0.565 (0.078) 0.772 0.641
Within Groups 0.488 (0.030) 0.023 (0.034) 0.177 (0.034) 0.131 (0.025) 0.404 (0.029) 8.89 1.09 0.488 (0.030) 0.199 (0.033) 0.512 (0.022) 0.000 0.000
m1 m2 Sar Dif-Sar n k
Comfac CRS
Asymptotic standard errors in parentheses. Year dummies included in all models. m1 and m2 are tests for rst- and second-order serial correlation, asymptotically N(0, 1). We test the levels residuals for OLS levels, and the rst-differenced residuals in all other columns. Comfac is a minimum distance test of the non-linear common factor restrictions imposed in the restricted models. P-values are reported (also for Sar and Dif-Sar). CRS is a Wald test of the constant resturns to scale hypothesis n + k = 1 in the restricted models. P-values are reported. Source: Blundell & Bond (2000). For the one-step GMM estimators, t s indicates that levels of the three series (y, n, k) dated t s and all observed longer lags are used as instruments for the rst-differenced equations. SYS estimators use lagged differences of the three series dated t s + 1 as instruments for the levels equations.
as instruments in the levels equations, appears to be marginal in the system GMM estimator. However we have seen that these tests do have some tendency to overreject in samples of this size. Moreover the Dif-Sar statistic that
87
specically tests the additional moment conditions used in the levels equations accepts their validity at the 10% level. The system GMM parameter estimates appear to be reasonable. The estimated coefcient on the lagged dependent variable is higher than the Within Groups estimate, but well below the OLS levels estimate. The common factor restrictions are easily accepted, and the estimate of k is both higher and better determined than the differenced GMM estimate. The constant returns to scale restriction is easily accepted in the system GMM results.16 Blundell & Bond (2000) explore this data in more detail and conclude that the system GMM estimates in the nal column of Table 6 are their preferred results. In particular they nd that the individual series used here are highly persistent, and that the instruments available for the rst-differenced equations are only weakly correlated with the explanatory variables in rst-differences. This is consistent with the similarity between the rst-differenced GMM and Within Groups results. Blundell & Bond (2000) also nd that when constant returns to scale is imposed on the production function it is not rejected in the preferred system GMM results then the results obtained using the rstdifferenced GMM estimator become more similar to the system GMM estimates.
13. SUMMARY AND CONCLUSIONS

The aim of this chapter has been to review developments in the recent literature which have tried to improve on the poor performance of the standard rstdifferenced GMM estimator for highly autoregressive panel series by using additional moment conditions. In particular, we discuss the use of the system GMM estimator that relies on relatively mild restrictions on the initial conditions process. This system GMM estimator encompasses the GMM estimator based on the non-linear moment conditions available in the dynamic error components model and has substantial asymptotic efciency gains relative to this non-linear GMM estimator. The chapter systematically sets out the assumptions required and moment conditions used by each estimator and provides a Monte Carlo simulation comparison as well as an application to production function estimation. The simulation results are the rst in the literature to consider the properties of these GMM estimators in dynamic models with endogenous regressors. Our analysis suggests that similar issues arise in this case to those that have been found in previous Monte Carlo studies for the AR(1) model. In particular, we nd both a large bias and very low precision for the standard rst-differenced estimator when the individual series are highly persistent. By exploiting
88
instruments available for the equations in levels, the system GMM estimator can both greatly improve the precision and greatly reduce the nite sample bias when these additional moment conditions are valid. Intermediate results are found for the non-linear GMM estimator considered, which suggests that this estimator could also be useful in applications with persistent series where the validity of the initial conditions restrictions required for the system GMM estimator are rejected. The empirical application uses company accounts data for the US to estimate a simple Cobb-Douglas production function. For the standard GMM estimator that uses moment conditions only for the rst-differenced equations, we conrm the problems noted by Griliches and Mairesse: the estimated coefcient on capital is very low, all coefcient estimates are imprecise, and constant returns to scale is easily rejected. We notice that the rst-differenced GMM results are similar to the Within Groups results, which suggests there may be a problem of weak instruments. This suggestion is consistent with the persistence of the underlying sales, employment and capital stock series. The additional moment conditions used by the system GMM estimator are not rejected in this context, and lead to a marked improvement in the empirical results. Taken together, these Monte Carlo and empirical results suggest that careful consideration of the underlying series and comparisons between different panel data estimators can be useful in detecting situations where the standard rstdifferenced GMM estimator is likely to be subject to serious weak instruments biases. Where appropriate, the use of the system GMM estimator offers a simple and powerful alternative, that can overcome many of the disappointing features of the standard rst-differenced GMM estimator in the context of highly persistent series.
ACKNOWLEDGMENTS
This research is part of the programme of research at the ESRC Centre for the Micro-Economic Analysis of Fiscal Policy at IFS. Financial support from the ESRC is gratefully acknowledged.
NOTES
1. All of the estimators discussed and their properties extend in an obvious fashion to higher order autoregressive models. 2. Extensions to dynamic models with additional regressors are considered in Section 9.
89
3. With T = 3, the absence of serial correlation in vit (2.5) and predetermined initial conditions (2.6) are required to identify (in the absence of any strictly exogenous instruments). With T > 3, can be identied in the presence of suitably low order moving average autocorrelation in vit. 4. These estimators are all based on the normalisation (2.3). Alonso-Borrego & Arellano (1999) consider a symmetrically normalised instrumental variable estimator based on the normalisation invariance of the standard LIML estimator. 5. As a choice of WN to yield the initial consistent estimator, Arellano & Bond (1991) suggest
1 WN = N
N i=1
Z diHdZdi
where Hd is the (T 2) (T 2) matrix given by 2 1 0 ... 0 1 2 1 ... 0 0 1 2 ... 0 ... ... ... ... ... 0 0 0 . 2
Hd =
which can be calculated in one step. The use of this Hd matrix accounts for the rstorder moving average structure in uit induced by the rst-differencing transformation. Note that when the vit are i.i.d., the one-step and two-step estimators are asymptotically equivalent in this model. We follow this suggestion in the Monte Carlo simulations in Section 11. 6. As shown by Arellano & Bover (1995), OLS on the model transformed to orthogonal deviations coincides with the Within Groups estimator. 7. In this section we focus only on moment conditions that are valid under heteroskedasticity. The case with homoskedasticity and assumption (6.1) is considered in Section 7.2. 8. This corrects the expression for plim l as given in Blundell and Bond (1998, p. 125). 9. As a choice of WN to yield the initial consistent estimator, we use
1 WN = N
Z liZli
i=1
in the Monte Carlo simulations reported below. 10. The use of moment conditions E(uityi, t s) = 0 for s > 1 can be shown to be redundant, given (7.1) and (7.2). For balanced panels, the T 2 equations in levels may be replaced by a single levels equation for period T, with (7.2) replaced by the equivalent moment conditions E(uiTyi, T s) = 0 for s = 1, . . . , T 2. However this approach does not extend easily to the case of unbalanced panels. 11. For an analysis of the potential loss in efciency due to specic choices of the initial weight matrix for these system estimators, see Windmeijer (2000). As a choice of WN to yield the initial consistent estimator, we use
90
WN =
1 N
N i=1
Z siHsZsi
in our Monte Carlo simulations, where Hs is the matrix
Znli =
Hd 0
0 , IT 2
IT 2 is the (T 2) identity matrix and Hd is dened in Section 3. 12. Here we only consider moment conditions that do not require any homoskedasticity assumptions. 13. Dene si = [ui3 ui2, . . . , uiT uiT 1, ui4(ui3 ui2), . . . , uiT(uiT 1 uiT 2)] and
Zdi 0
0 , then the non-linear moment conditions can be written as IT 3
E[Z nlisi] = 0. As an initial weight matrix we use WN =

1 N
N
Z nliZnli
, see Meghir &
i=1
Windmeijer (1999). 14. The unrestricted results are computed using DPD98 for GAUSS (see Arellano & Bond, 1998). 15. The table reports p-values from minimum distance tests of the common factor restrictions and Wald tests of the constant returns to scale restrictions. 16. One puzzle is that we nd little evidence of second-order serial correlation in the rst-differenced residuals (i.e. an MA(1) component in the error term in levels), although the use of instruments dated t 2 is strongly rejected. It may be that the eit productivity shocks are also MA(1), in a way that happens to offset the appearance of serial correlation that would otherwise result from measurement errors.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efcient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 528. Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalised Instrumental-Variable Estimation using Panel Data. Journal of Business and Economic Statistics, 17, 3649. Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components. Journal of the American Statistical Association, 76, 598606. Arellano, M., & Bond, S. R. (1991). Some Tests of Specication for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. Review of Economic Studies, 58, 277297. Arellano, M., & Bond, S. R. (1998). Dynamic Panel Data Estimation using DPD98 for GAUSS. http://www.ifs.org.uk/staff/steve_b.shtml. Arellano, M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 2952. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley.
91
Bhagarva, A., & Sargan, J. D. (1983). Estimating Dynamic Random Effects Models from Panel Data Covering Short Time Periods. Econometrica, 51, 16351659. Blundell, R. W., & Bond, S. R. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115143. Blundell, R. W., & Bond, S. (2000). GMM Estimation with Persistent Panel Data: An Application to Production Functions. Econometric Reviews, 19(3), 321340. Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak. Journal of the American Statistical Association, 90, 443450. Chamberlain, G. (1987). Asymptotic Efciency in Estimation with Conditional Moment Restrictions. Journal of Econometrics, 34, 305334. Davidson, R., & MacKinnon, J. G. (1996). Graphical Methods for Investigating the Size and Power of Hypothesis Tests. Manchester School, 66, 126. Griliches, Z., & Mairesse, J. (1998). Production Functions: the Search for Identication. In: S. Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series, Cambridge: Cambridge University Press. Hansen, L. P. (1982). Large Sample Properties of Generalised Method of Moment Estimators. Econometrica, 50, 10291054. Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating Vector Autoregressions with Panel Data. Econometrica, 56, 13711396. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Mairesse, J., & Hall, B. H. (1996). Estimating the Productivity of Research and Development in French and US Manufacturing Firms: An Exploration of Simultaneity Issues with GMM Methods. In: K. Wagner & B. Van Ark (Eds), International Productivity Differences and, Their Explanations (pp. 285315). Elsevier Science. Meghir, C., & Windmeijer, F. (1999). Moment Conditions for Dynamic Panel Data Models with Multiplicative Individual Effects in the Conditional Variance. Annales dconomie et de Statistique, 55/56, 317330. Nelson, C. R., & Startz, R. (1990a). Some Further Results on the Exact Small Sample Properties of the Instrumental Variable Estimator. Econometrica, 58, 967976. Nelson, C. R., & Startz, R. (1990b). The Distribution of the Instrumental Variable Estimator and Its t-ratio When the Instrument is A Poor One. Journal of Business and Economic Statistics, 63, 51255140. Nickell, S. J. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426. Sargan, J. D. (1958). The Estimation of Economic Relationships Using Instrumental Variables. Econometrica, 26, 329338. Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments. Econometrica, 65, 557586. Windmeijer, F. (2000). Efciency Comparisons for a System GMM Estimator in Dynamic Panel Data Models. In: R. D. H. Heijmans, D. S. G. Pollock & A. Satorra (Eds), Innovations in Multivariate Statistical Analysis. A Festschrift for Heinz Neudecker (pp. 175184). Kluwer Academic Publishers.
FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

Peter Pedroni
ABSTRACT
This chapter uses fully modied OLS principles to develop new methods for estimating and testing hypotheses for cointegrating vectors in dynamic panels in a manner that is consistent with the degree of cross sectional heterogeneity that has been permitted in recent panel unit root and panel cointegration studies. The asymptotic properties of various estimators are compared based on pooling along the within and between dimensions of the panel. By using Monte Carlo simulations to study the small sample properties, the group mean estimator is shown to behave well even in relatively small samples under a variety of scenarios.
I. INTRODUCTION
In this chapter we develop methods for estimating and testing hypotheses for cointegrating vectors in dynamic time series panels. In particular we propose methods based on fully modied OLS principles which are able to accommodate considerable heterogeneity across individual members of the panel. Indeed, one important advantage to working with a cointegrated panel approach of this type is that it allows researchers to selectively pool the long run information contained in the panel while permitting the short run dynamics
93
94
PETER PEDRONI
and xed effects to be heterogeneous among different members of the panel. An important convenience of the fully modied approach that we propose here is that in addition to producing asymptotically unbiased estimators, it also produces nuisance parameter free standard normal distributions. In this way, inferences can be made regarding common long run relationships which are asymptotically invariant to the considerable degree of short run heterogeneity that is prevalent in the dynamics typically associated with panels that are composed of aggregate national data. A. Nonstationary Panels and Heterogeneity Methods for nonstationary time series panels, including unit root and cointegration tests, have been gaining increased acceptance in a number of areas of empirical research. Early examples include Canzoneri, Cumby & Diba (1996), Chinn & Johnson (1996), Chinn (1997), Evans & Karras (1996), Neusser & Kugler (1998), Obstfeld & Taylor (1996), Oh (1996), Papell (1997), Pedroni (1996b), Taylor (1996) and Wu (1996), with many more since. These studies have for the most part been limited to applications which simply ask whether or not particular series appear to contain unit roots or are cointegrated. In many applications, however, it is also of interest to ask whether or not common cointegrating vectors take on particular values. In this case, it would be helpful to have a technique that allows one to test such hypothesis about the cointegrating vectors in a manner that is consistent with the very general degree of cross sectional heterogeneity that is permitted in such panel unit root and panel cointegration tests. In general, the extension of conventional nonstationary methods such as unit root and cointegration tests to panels with both cross section and time series dimensions holds considerable promise for empirical research considering the abundance of data which is available in this form. In particular, such methods provide an opportunity for researchers to exploit some of the attractive theoretical properties of nonstationary regressions while addressing in a natural and direct manner the small sample problems that have in the past often hindered the practical success of these methods. For example, it is well known that superconsistent rates of convergence associated with many of these methods can provide empirical researchers with an opportunity to circumvent more traditional exogeneity requirements in time series regressions. Yet the low power of many of the associated statistics has often impeded the ability to take full advantage of these properties in small samples. By allowing data to be pooled in the cross sectional dimension, nonstationary panel methods have the potential to improve upon these small sample limitations. Conversely, the use
Fully Modied OLS for Heterogeneous Cointegrated Panels
95
of nonstationary time series asymptotics provides an opportunity to make panel methods more amenable to pooling aggregate level data by allowing researchers to selectively pool the long run information contained in the panel, while allowing the short run dynamics to be heterogeneous among different members of the panel. Initial methodological work on nonstationary panels focused on testing for unit roots in univariate panels. Quah (1994) derived standard normal asymptotic distributions for testing unit roots in homogeneous panels as both the time series and cross sectional dimensions grow large. Levin & Lin (1993) derived distributions under more general conditions that allow for heterogeneous xed effects and time trends. More recently, Im, Pesaran & Shin (1995) study the small sample properties of unit root tests in panels with heterogeneous dynamics and propose alternative tests based on group mean statistics. In practice however, empirical work often involves relationships within multivariate systems. Toward this end, Pedroni (1993, 1995) studies the properties of spurious regressions and residual based tests for the null of no cointegration in dynamic heterogeneous panels. This chapter continues this line of research by proposing a convenient method for estimating and testing hypotheses about common cointegrating vectors in a manner that is consistent with the degree of heterogeneity permitted in these panel unit root and panel cointegration studies. In particular, we address here two key sources of cross member heterogeneity that are particularly important in dealing with dynamic cointegrated panels. One such source of heterogeneity manifests itself in the familiar xed effects form. These reect differences in mean levels among the variables of different individual members of the panel and we model these by including individual specic intercepts. The second key source of heterogeneity in such panels comes from differences in the way that individuals respond to short run deviations from equilibrium cointegrating vectors that develop in response to stochastic disturbances. In keeping with earlier panel unit root and panel cointegration papers, we model this form of heterogeneity by allowing the associated serial correlation properties of the error processes to vary across individual members of the panel. B. Related Literature Since the original version of this paper, Pedroni (1996a),1 many more papers have contributed to our understanding of hypothesis testing in cointegrating panels. For example, Kao & Chiang (1997) extended their original paper on the least squares dummy variable model in cointegrated panels, Kao & Chen
96
PETER PEDRONI
(1995), to include a comparison of the small sample properties of a dynamic OLS estimator with other estimators including a FMOLS estimator similar to Pedroni (1996a). Specically, Kao & Chiang (1997) demonstrated that a panel dynamic OLS estimator has the same asymptotic distribution as the type of panel FMOLS estimator derived in Pedroni (1996a) and showed that the small sample size distortions for such an estimator were often smaller than certain forms of the panel FMOLS estimator. The asymptotic theory in these earlier papers were generally based on sequential limit arguments (allowing the sample sizes T and N to grow large sequentially), whereas Phillips & Moon (1999) subsequently provided a rigorous and more general study of the limit theory in nonstationary panel regressions under joint convergence (allowing T and N to grow large concurrently). Phillips & Moon (1999) also provided a set of regularity conditions under which convergence in sequential limits implies convergence in joint limits, and considered these properties in the context of a FMOLS estimator, although they do not specically address the small sample properties of feasible versions of the estimators. More recently, Mark & Sul (1999) also study a similar form of the panel dynamic OLS estimator rst proposed by Kao & Chiang (1997). They compare the small sample properties of a weighted versus unweighted version of the estimator and nd that the unweighted version generally exhibits smaller size distortion than the weighted version. In this chapter we report new small sample results for the group mean panel FMOLS estimator that was originally proposed in Pedroni (1996a). An advantage of the group mean estimator over the other pooled panel FMOLS estimators proposed in the Pedroni (1996a) is that the t-statistic for this estimator allows for a more exible alternative hypothesis. This is because the group mean estimator is based on the so called between dimension of the panel, while the pooled estimators are based on the within dimension of the panel. Accordingly, the group mean panel FMOLS provides a consistent test of a common value for the cointegrating vector under the null hypothesis against values of the cointegrating vector that need not be common under the alternative hypothesis, while the pooled within dimension estimators do not. Furthermore, as Pesaran & Smith (1995) argue in the context of OLS regressions, when the true slope coefcients are heterogeneous, group mean estimators provide consistent point estimates of the sample mean of the heterogeneous cointegrating vectors, while pooled within dimension estimators do not. Rather, as Phillips & Moon (1999) demonstrate, when the true cointegrating vectors are heterogeneous, pooled within dimension estimators provide consistent point estimates of the average regression coefcient, not the
97
sample mean of the cointegrating vectors. Both of these features of the group mean estimator are often important in practical applications. Finally, the implementation of the feasible form of the between dimension group mean estimator also has advantages over the other estimators in the presence of heterogeneity of the residual dynamics around the cointegrating vector. As was demonstrated in Pedroni (1996a), in the presence of such heterogeneity, the pooled panel FMOLS estimator requires a correction term that depends on the true cointegrating vector. For a specic null value for a cointegrating vector, the t-statistic is well dened, but of course this is of little use per se when one would like to estimate the cointegrating vector. One solution is to obtain a preliminary estimate of the cointegrating vector using OLS. However, although the OLS estimator is superconsistent, it still contains a second order bias in the presence of endogeneity, which is not eliminated asymptotically. Accordingly, this bias leads to size distortion, which is not necessarily eliminated even when the sample size grows large in the panel dimension. Consequently, this type of approach based on a rst stage OLS estimate was not recommended in Pedroni (1996a), and it is not surprising that Monte Carlo simulations have shown large size distortions for such estimators. Even when the null hypothesis was imposed without using an OLS estimator, the size distortions for this type of estimator were large as reported in Pedroni (1996a). Similarly, Kao & Chiang (1997) also found large size distortions for such estimators when OLS estimates were used in the rst stage for the correction term. By contrast, the feasible version of the between dimension group mean based estimator does not suffer from these difculties, even in the presence of heterogeneous dynamics. As we will see, the size distortions for this estimator are minimal, even in panels of relatively modest dimensions. The remainder of the chapter is structured as follows. In Section 2, we introduce the econometric models of interest for heterogeneous cointegrated panels. We then present a number of theoretical results for estimators designed to be asymptotically unbiased and to provide nuisance parameter free asymptotic distributions which are standard normal when applied to heterogeneous cointegrated panels and can be used to test hypotheses regarding common cointegrating vectors in such panels. In Section 3 we study the small sample properties of these estimators and propose feasible FMOLS statistics that perform relatively well in realistic panels with heterogeneous dynamics. In Section 4 we enumerate the algorithm used to construct these statistics and briey describe a few examples of their uses. Finally, in Section 5 we offer conclusions and discuss a number of related issues in the ongoing research on estimation and inference in cointegrated panels.
98
PETER PEDRONI
II. ASYMPTOTIC RESULTS FOR FULLY MODIFIED OLS IN HETEROGENEOUS COINTEGRATED PANELS
In this section we study asymptotic properties of cointegrating regressions in dynamic panels with common cointegrating vectors and suggest how a fully modied OLS estimator can be constructed to deal with complications introduced by the presence of parameter heterogeneity in the dynamics and xed effects across individual members. We begin, however, by discussing the basic form of a cointegrating regression in such panels and the problems associated with unmodied OLS estimators. A. Cointegrating Regressions in Heterogeneous Panels Consider the following cointegrated system for a panel of i = 1, . . . , N members, yit = i + xit + it xit = xit1 + it (1)
where the vector error process it = (it, it) is stationary with asymptotic covariance matrix i. Thus, the variables xi, yi are said to cointegrate for each member of the panel, with cointegrating vector if yit is integrated of order one. The term i allows the cointegrating relationship to include member specic xed effects. In keeping with the cointegration literature, we do not require exogeneity of the regressors. As usual, xi can in general be an m dimensional vector of regressors, which are not cointegrated with each other. In this case, we partition it = (it, it) so that the rst element is a scalar series and the second element is an m dimensional vector of the differences in the regressors it = xit xit1 = xit, so that when we construct i =
11i 21i
21i 22i
(2)
then 11i is the scalar long run variance of the residual it, and 22i is the m m long run covariance among the it, and 21i is an m 1 vector that gives the long run covariance between the residual it and each of the it. However, for simplicity and convenience of notation, we will refer to xi as univariate in the remainder of this chapter. Each of the results of this study generalize in an obvious and straightforward manner to the vector case, unless otherwise indicated.2
99
In order to explore the asymptotic properties of estimators as both the cross sectional dimension, N, and the time series dimension, T, grow large, we will make assumptions similar in spirit to Pedroni (1995) regarding the degree of dependency across both these dimensions. In particular, for the time series dimension, we will assume that the conditions of the multivariate functional central limit theorems used in Phillips & Durlauf (1986) and Park & Phillips (1988), hold for each member of the panel as the time series dimension grows large. Thus, we have Assumption 1.1 (invariance principle): The process it satises a multivariate functional central limit theorem such that the convergence as T for the it Bi(r, i) holds for any given member, i, of the panel, T t = 1 where Bi(r, i) is Brownian motion dened over the real interval r[0,1], with asymptotic covariance i. This assumption indicates that the multivariate functional central limit theorem, or invariance principle, holds over time for any given member of the panel. This places very little restriction on the temporal dependency and heterogeneity of the error process, and encompasses for example a broad class of stationary ARMA processes. It also allows the serial correlation structure to be different for individual members of the panel. Specically, the asymptotic covariance matrix, i varies across individual members, and is given by i T limT E[T 1(T t = 1it)(t = 1it)], which can also be decomposed as i = o o i + i + i, where i is the contemporaneous covariance and i is a weighted sum of autocovariances. The off-diagonal terms of these individual 21i matrices capture the endogenous feedback effect between yit and xit, which is also permitted to vary across individual members of the panel. For several of the estimators that we propose, it will be convenient to work with a triangularization of this asymptotic covariance matrix. Specically, we will refer to this lower triangular matrix of i as Li, whose elements are related as follows 1/2 1/2 1/2 (3) L11i = (11i 2 21i/22i) , L12i = 0, L21i = 21i /22i, L22i = 22i Estimation of the asymptotic covariance matrix can be based on any one of a number of consistent kernel estimators such as the Newey & West (1987) estimator. Next, for the cross sectional dimension, we will employ the standard panel data assumption of independence. Hence we have: Assumption 1.2 (cross sectional independence): The individual processes are assumed to be independent cross sectionally, so that E[it, jt] = 0 for all i j. partial sum
1
[Tr]
100
PETER PEDRONI
More generally, the asymptotic covariance matrix for a panel of dimension N T is block diagonal with the ith diagonal block given by the asymptotic covariance for member i. This type of assumption is typical of our panel data approach, and we will be using this condition in the formal derivation of the asymptotic distribution of our panel cointegration statistics. For panels that exhibit common disturbances that are shared across individual members, it will be convenient to capture this form of cross sectional dependency by the use of a common time dummy, which is a fairly standard panel data technique. For panels with even richer cross sectional dependencies, one might think of estimating a full non-diagonal N N matrix of ij elements, and then premultiplying the errors by this matrix in order to achieve cross sectional independence. This would require the time series dimension to grow much more quickly than the cross sectional dimension, and in most cases one hopes that a common time dummy will sufce. While the derivation of most of the asymptotic results of this chapter are relegated to the mathematical appendix, it is worth discussing briey here how we intend to make use of assumptions 1.1 and 1.2 in providing asymptotic distributions for the panel statistics that we consider in the next two subsections. In particular, we will employ here simple and somewhat informal sequential limit arguments by rst evaluating the limits as the T dimension grows large for each member of the panel in accordance with assumption 1.1 and then evaluating the sums of these statistics as the N dimension grows large under the independence assumption of 1.2.3 In this manner, as N grows large we obtain standard distributions as we average the random functionals for each member that are obtained in the initial step as a consequence of letting T grow large. Consequently, we view the restriction that rst T and then N as a relatively strong restriction that ensures these conditions, and it is possible that in many circumstances a weaker set of restrictions that allow N and T to grow large concurrently, but with restrictions on the relative rates of growth might deliver similar results. In general, for heterogeneous error processes, such restrictions on the rate of growth of N relative to T can be expected to depend in part on the rate of convergence of the particular kernel estimators used to eliminate the nuisance parameters, and we can expect that our iterative T and then N requirements proxy for the fact that in practice our asymptotic approximations will be more accurate in panels with relatively large T dimensions as compared to the N dimension. Alternatively, under a more pragmatic interpretation, one can simply think of letting T for xed N reect the fact that typically for the panels in which we are interested, it is the
101
time series dimension which can be expected to grow in actuality rather than the cross sectional dimension, which is in practice xed. Thus, T is in a sense the true asymptotic feature in which we are interested, and this leads to statistics which are characterized as sums of i.i.d. Brownian motion functionals. For practical purposes, however, we would like to be able to characterize these statistics for the general case in which N is large, and in this case we take N as a convenient benchmark for which to characterize the distribution, provided that we understand T to be the dominant asymptotic feature of the data. B. Asymptotic Properties of Panel OLS Next, we consider the properties of a number of statistics that might be used for a cointegrated panel as described by (1) under assumptions 1.1 and 1.2 regarding the time series and cross dimensional dependencies in the data. The rst statistic that we examine is a standard panel OLS estimator of the cointegrating relationship. It is well known that the conventional single equation OLS estimator for the cointegrating vector is asymptotically biased and that its standardized distribution is dependent on nuisance parameters associated with the serial correlation structure of the data, and there is no reason to believe that this would be otherwise for the panel OLS estimator. The following proposition conrms this suspicion.4 Proposition 1.1 (Asymptotic Bias of the Panel OLS Estimator). Consider a standard panel OLS estimator for the coefcient of panel (1), under assumptions 1.1 and 1.2, given as NT =

N T i=1 t=1
(xit x i)2

1 N T i=1 t=1
(xit xi)(yit y i)
where x i and y i refer to the individual specic means. Then, (a) The estimator is asymptotically biased and its asymptotic distribution will be dependent on nuisance parameters associated with the dynamics of the underlying processes. (b) Only for the special case in which the regressors are strictly exogenous and the dynamics are homogeneous across members of the panel can valid NT or its inferences be made from the standardized distribution of associated t-statistic. As the proof of proposition 1.1 given in the appendix makes clear, the source of the problem stems from the endogeneity of the regressors under the usual
102
PETER PEDRONI
assumptions regarding cointegrated systems. While an exogeneity assumption is common in many treatments of cross sectional panels, for dynamic cointegrated panels such strict exogeneity is by most standards not acceptable. It is stronger than the standard exogeneity assumption for static panels, as it implies the absence of any dynamic feedback from the regressors at all frequencies. Clearly, the problem of asymptotic bias and data dependency from the endogenous feedback effect can no less be expected to diminish in the context of such panels, and Kao & Chen (1995) document this bias for a panel of cointegrated time series for the special case in which the dynamics are homogeneous. For the conventional time series case, a number of methods have been devised to deal with the consequences of such endogenous feedback effects, and in what follows we develop an approach for cointegrated panels based on fully modied OLS principles similar in spirit to those used by Phillips & Hanson (1990). C. Pooled Fully Modied OLS Estimators for Heterogeneous Panels Phillips & Hansen (1990) proposed a semi-parametric correction to the OLS estimator which eliminates the second order bias induced by the endogeneity of the regressors. The same principle can also be applied to the panel OLS estimator that we have explored in the previous subsection. The key difference in constructing our estimator for the panel data case will be to account for the heterogeneity that is present in the xed effects as well as in the short run dynamics. These features lead us to modify the form of the standard single equation fully modied OLS estimator. We will also nd that the presence of xed effects has the potential to alter the asymptotic distributions in a nontrivial manner. The following proposition establishes an important preliminary result which facilitates intuition for the role of heterogeneity and the consequences of dealing with both temporal and cross sectional dimensions for fully modied OLS estimators. Proposition 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS Estimator). Consider a panel FMOLS estimator for the coefcient of panel (1) given by * NT = where

N T
2 22i
(xit x i)
i=1
t=1

1 N T
L L
1 1 11i 22i
(xit x i)* i it T
i=1
t=1
103
21i 21i L L 21i + o + o xit, i ( 21i 22i) 22i 22i L22i L i as dened in (2) above. Then, i is a lower triangular decomposition of and L * under assumptions 1.1 and 1.2, the estimator NT converges to the true value at rate TN, and is distributed as
* it = it
* TN( NT ) N(0, v) where v = as T and N .
2 iff x i = y i = 0 6 else
As the proposition indicates, when proper modications are made to the estimator, the corresponding asymptotic distribution will be free of the nuisance parameters associated with any member specic serial correlation patterns in the data. Notice also that this fully modied panel OLS estimator is asymptotically unbiased for both the standard case without intercepts as well as the xed effects model with heterogeneous intercepts. The only difference is in the size of the variance, which is equal to 2 in the standard case, and 6 in the case with heterogeneous intercepts, both for xit univariate. More generally, when xit is an m-dimensional vector, the specic values for v will also be a function of the dimension m. The associated t-statistics, however, will not depend on the specic values for v, as we shall see. The fact that this estimator is distributed normally, rather than in terms of unit root asymptotics as in Phillips & Hansen (1990), derives from the fact that these unit root distributions are being averaged over the cross sectional dimension. Specically, this averaging process produces normal distributions whose variance depends only on the moments of the underlying Brownian motion functionals that describe the properties of the integrated variables. This is achieved by constructing the estimator in a way that isolates the idiosyncratic components of the underlying Wiener processes to produce sums of standard and independently distributed Brownian motion whose moments can be computed algebraically, as the proof of the proposition makes clear. The 22i, which correspond to the long run standard errors of 11i and L estimators L conditional process it, and the marginal process xit respectively, act to purge the contribution of these idiosyncratic elements to the endogenous feedback and serial correlation adjusted statistic
T t=1
(xit x i)y* i. it T
The fact that the variance is larger for the xed effects model in which heterogeneous intercepts are included stems from the fact that in the presence
104
PETER PEDRONI
of unit roots, the variation from the cross terms of the sample averages x i and y i grows large over time at the same rate T, so that their effect is not eliminated 5 * asymptotically from the distribution of TN( NT ). However, since the contribution to the variance is computable analytically as in the proof of proposition 1.2, this in itself poses no difculties for inference. Nevertheless, upon consideration of these expressions, it also becomes apparent that there should exist a metric which can directly adjust for this effect in the distribution and consequently render the distribution standard normal. In fact, as the following proposition indicates, it is possible to construct a t-statistic from this fully modied panel OLS estimator whose distribution will be invariant to this effect. Corollary 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS tstatistic). Consider the following t-statistic for the FMOLS panel estimator of as dened in proposition 1.2 above. Then under the same assumptions as in proposition 1.2, the statistic is standard normal, * = ( t * NT ) NT

N T
2 22i
(xit x i)2
i=1
t=1
1/2
N(0, 1)
as T and N for both the standard model without intercepts as well as the xed effects model with heterogeneous estimated intercepts. Again, as the derivation in the appendix makes apparent, because the numerator * of the fully modied estimator NT is a sum of mixture normals with zero mean whose variance depends only on the properties of the Brownian motion functionals associated with the quadratic
T t=1
(xit x i)2, the t-statistic con-
structed using this expression will be asymptotically standard normal. This is * regardless of the value of v associated with the distribution of TN( NT ) and so will also not depend on the dimensionality of xit in the general vector case. Note, however, that in contrast to the conventional single equation case studied by Phillips & Hansen (1990), in order to ensure that the distribution of this t-statistic is free of nuisance parameters when applied to heterogeneous panels, the usual asymptotic variance estimator of the denominator is replaced 2 with the estimator L 22i. By construction, this corresponds to an estimator of the asymptotic variance of the differences for the regressors and can be estimated accordingly. This is in contrast to the t-statistic for the conventional single equation fully modied OLS, which uses an estimator for the conditional
105
asymptotic variance from the residuals of the cointegrating regression. This distinction may appear puzzling at rst, but it stems from the fact that in heterogeneous panels the contribution from the conditional variance of the residuals is idiosyncratic to the cross sectional member, and must be adjusted * for directly in the construction of the numerator of the NT estimator itself before averaging over cross sections. Thus, the conditional variance has already * been implicitly accounted for in the construction of NT, and all that is required is that the variance from the marginal process xit be purged from the quadratic
specied in terms of a transformation, * it, of the true residuals. In Section 3 we will consider various strategies for specifying these statistics in terms of observables and consider the small sample properties of the resulting feasible statistics. D. A Group Mean Fully Modied OLS t-Statistic Before preceding to the small sample properties, we rst consider one additional asymptotic result that will be of use. Recently Im, Pesaran & Shin (1995) have proposed using a group mean statistic to test for unit roots in panel data. They note that under certain circumstances, panel unit root tests may suffer from the fact that the pooled variance estimators need not necessarily be asymptotically independent of the pooled numerator and denominator terms of the xed effects estimator. Notice, however, that the fully modied panel OLS statistics in proposition 1.2 and corollary 1.2 here have been constructed without the use of a pooled variance estimator. Rather, the statistics of the numerator and denominator have been purged of any inuence from the nuisance parameters prior to summing over N. Furthermore, since asymptotically the distribution for the numerator is centered around zero, the covariance between the summed terms of the numerator and denominator also do not play * a role in the asymptotic distribution of TN( as they would * NT ) or t it otherwise. Nevertheless, it is also interesting to consider the possibility of a fully modied OLS group mean statistic in the present context. In particular, the group mean t-statistic is useful because it allows one to entertain a somewhat broader class of hypotheses under the alternative. Specically, we can think of the distinction as follows. The t-statistic for the true panel estimator as described in corollary 1.2 can be used to test the null hypothesis Ho : i = o for all i versus the alternative hypothesis Ha : i = a o for all i where o is the
T t=1
(xit x i)2. Finally, note that proposition 1.2 and its corollary 1.2 have been
106
PETER PEDRONI
hypothesized common value for under the null, and a is some alternative value for which is also common to all members of the panel. By contrast, the group mean fully modied t-statistic can be used to test the null hypothesis Ho : i = o for all i versus the alternative hypothesis Ha : i o for all i, so that the values for are not necessarily constrained to be homogeneous across different members under the alternative hypothesis. The following proposition gives the precise form of the panel fully modied OLS t-statistic that we propose and gives its asymptotic distributions. Proposition 1.3 (Asymptotic Distribution of the Panel FMOLS Group Mean t-Statistic). Consider the following group mean FMOLS t-statistic for of the cointegrated panel (1). Then under assumptions 1.1 and 1.2, the statistic is standard normal, and t = * NT where y* i) it = (yit y
21i 21i L L 21i + o + o xit, i ( 21i 22i) 22i 22i 22i L L 1 N

N T
1 11i
(xit x i)
i=1
t=1

1/2 T t=1
(xit x i)y* i N(0, 1) it T
i as dened in (2) above, as i is a lower triangular decomposition of and L T and N for both the standard model without intercepts as well as the xed effects model with heterogeneous intercepts. Note that the asymptotic distribution of this group mean statistic is also invariant to whether or not the standard model without intercepts or the xed effects model with heterogeneous intercepts has been estimated. Just as with the previous t-statistic of corollary 1.2, the asymptotic distribution of this panel group mean t-statistic will also be independent of the dimensionality of xit for the more general vector case. Thus, we have presented two different types of tstatistics, a pooled panel OLS based fully modied t-statistic based on the within dimension of the panel, and a group mean fully modied OLS tstatistic based on the between dimension of the panel, both of which are asymptotically unbiased, free of nuisance parameters, and invariant to whether or not idiosyncratic xed effects have been estimated. Furthermore, we have characterized the asymptotic distribution of the fully modied panel OLS estimator itself, which is also asymptotically unbiased and free of nuisance parameters, although in this case one should be aware that while the distribution will be a centered normal, the variance will depend on whether heterogeneous intercepts have been estimated and on the dimensionality of the
107
vector of regressors. In the remainder of this chapter we investigate the small sample properties of feasible statistics associated with these asymptotic results and discuss examples of their application.
III. SMALL SAMPLE PROPERTIES OF FEASIBLE PANEL FULLY MODIFIED OLS STATISTICS
In this section we investigate the small sample properties of the pooled and group mean panel FMOLS estimators that were developed in the previous section. We discuss two alternative feasible estimators associated with the panel FMOLS estimators of proposition 1.2 and its t-statistic, which were dened only in terms of the true residuals. While these estimators perform reasonably well in idealized situations, more generally, size distortions for these estimators have the potential to be fairly large in small samples, as was reported in Pedroni (1996a). By contrast, we nd that the group mean test statistics do very well and exhibit relatively little size distortion even in relatively small panels even in the presence of substantial cross sectional heterogeniety of the error process associated with the dynamics around the cointegrating vector. Consequently, after discussing some of the basic properties of the feasible versions of the pooled estimators and the associated difculties for small samples, we focus here on reporting the small sample properties of the group mean test statistics, which are found to do extremely well provided that the time series dimension is not smaller than the cross sectional dimension. A. General Properties of the Feasible Estimators First, before reporting the results for the between dimension group mean test statistic, we discuss the general properties of various feasible forms of the within dimension pooled panel fully modied OLS statistics and consider the consequences of these properties in small samples. One obvious candidate for a feasible estimator based on proposition 1.2 would be to simply construct the statistic in terms of estimated residuals, which can be obtained from the initial N single equation OLS regressions associated with the cointegrating regression for (1). Since the single equation OLS estimator is superconsistent, one might hope that this produces a reasonably well behaved statistic for the panel FMOLS estimator. The potential problem with this reasoning stems from the fact that although the OLS regression is superconsistent it is also asymptotically biased in general. While this is a second order effect for the conventional
108
PETER PEDRONI
single series estimator, for panels, as N grows large, the effect has the potential to become rst order. Another possibility might appear to be to construct the feasible panel FMOLS estimator for proposition 1.2 in terms of the original data series 21i L i) x along the lines of how it is often done for the y* it = (yit y 22i it L conventional single series case. However, this turns out to be correct only in very specialized cases. More generally, for heterogeneous panels, this will introduce an asymptotic bias which depends on the true value of the cointegrating relationship and the relative volatility of the series involved in the regression. The following makes this relationship precise. Proposition 2.1 (Regarding Feasible Pooled Panel FMOLS) Under the conditions of proposition 1.2 and corollary 1.2, consider the panel FMOLS estimator for the coefcient of panel (1) given by * = NT where
21i 11i L 22i L L xit + (xit x i) 22i 22i L L i and * and L i are dened as before. Then the statistics TN ( * NT ) and t NT constructed from this estimator are numerically equivalent to the ones dened in proposition 1.2 and corollary 1.2. This proposition shows why it is difcult to construct a reliable point estimator based on the naive FMOLS estimator simply by using a transformation of y* it analogous to the single equation case. Indeed, as the proposition makes explicit, such an estimator would in general depend on the true value of the parameter that it is intended to estimate, except in very specialized cases, which we discuss below. On the other hand, this does not necessarily prohibit the usefulness of an estimator based on proposition 2.1 for the purposes of testing a particular hypothesis about a cointegrating relationship in heterogeneous panels. By using the hypothesized null value for in the expression for y* it, proposition 2.1 can at least in principle be employed to construct a feasible FMOLS statistics to test the null hypothesis that i = for all i. However, as was reported in Pedroni (1996a), even in this case the small sample performance of the statistic is often subject to relatively large size distortion. Proposition 2.1 also provides us with an opportunity to examine the consequences of ignoring heterogeneity associated with the serial correlation

N T
2 L 22i
(xit x i)
i=1
t=1

1 N T
1 1 L 11iL22i
(xit x i)y* i it T
i=1
t=1
y* i) it = (yit y
109
dynamics for the error process for this type of estimator. In particular, we notice that the modication involved in this estimator relative to the convential time series fully modied OLS estimator differs in two respects. First, it 22i that premultiply the numerator and 11i and L includes the estimators L denominator terms to control for the idiosyncratic serial correlation properties of individual cross sectional members prior to summing over N. Secondly, and more importantly, it includes in the transformation of the dependent variable y* it 11i L 22i L an additional term (xit x i). This term is eliminated only in two 22i L special cases: (1) The elements L11i and L22i are identical for all members of the panel, and do not need to be indexed by i. This corresponds to the case in which the serial correlation structure of the data is homogeneous for all members of the panel. (2) The elements L11i and L22i are perhaps heterogeneous across members of the panel, but for each panel L11i = L22i. This corresponds to the case in which asymptotic variances of the dependent and independent variables are the same. Conversely, the effect of this term increases as (1) the dynamics become more heterogeneous for the panel, and (2) as the relative volatility becomes more different between the variables xit and yit for any individual members of the panel. For most panels of interest, these are likely to be important practical considerations. On the other hand, if the data are known to be relatively homogeneous or simple in its serial correlation structure, the imprecise estimation of these elements will decrease the attractiveness of this type of estimator relative to one that implicitly imposes these known restrictions. B. Monte Carlo Simulation Results We now study small sample properties in a series of Monte Carlo simulations. Given the difculties associated with the feasible versions of the within dimension pooled panel fully modied OLS estimators discussed in the previous subsection based on proposition 2.1, it is not surprising that these tend to exhibit relatively large size distortions in certain scenarios, as reported in the Pedroni (1996a). Kao & Chiang (1997) subsequently also conrmed the poor small sample properties of the within dimension pooled panel fully modied estimator based on a version in which a rst stage OLS estimate was used for the adjustment term. Indeed, such results should not be surprising given that the rst stage OLS estimator introduces a second order bias in the presence of endogeneity, which is not eliminated asymptotically. Consequently, this bias leads to size distortion for the panel which is not necessarily eliminated even when the sample size grows large. By contrast, the feasible version of the
110
PETER PEDRONI
between dimension group mean estimator does not require such an adjustment term even in the presence of heterogeneous serial correlation dynamics, and does not suffer from the same size distortion.6 Consequently, we focus here on reporting the small sample Monte Carlo results for the between dimension group mean estimator and refer readers to Pedroni (1996a) for simulation results for the feasible versions of the within dimension pooled estimators. To facilitate comparison with the conventional time series literature, we use as a starting point a few Monte Carlo simulations analogous to the ones studied in Phillips & Loretan (1991) and Phillips & Hansen (1990) based on their original work on FMOLS estimators for conventional time series. Following these studies, we model the errors for the data generating process in terms of a vector MA(1) process and consider the consequences of varying certain key parameters. In particular, for the purposes of the Monte Carlo simulations, we model our data generating process for the cointegrated panel (1) under assumptions 1.1 and 1.2 as yit = i + xit + it xit = xit 1 + it i = 1, . . . , N, t = 1, . . . , T, for which we model the vector error process it = (it, it) in terms of a vector moving average process given by it = it iit 1; it ~ i.i.d. N(0, i) (3) where i is a 2 2 coefcient matrix and i is a 2 2 contemporaneous covariance matrix. In order to accommodate the potentially heterogeneous nature of these dynamics among different members of the panel, we have indexed these parameters by the subscript i. We will then allow these parameters to be drawn from uniform distributions according to the particular experiment. Likewise, for each of the experiments we draw the xed effects i from a uniform distribution, such that i ~ U(2.0, 4.0). We consider rst as a benchmark case an experiment which captures much of the richness of the error process studied in Phillips & Loretan (1991) and yet also permits considerable heterogeneity among individual members of the panel. In their study, Phillips & Loretan (1991), following Phillips & Hansen (1990), x the following parameters 11i = 0.3, 12i = 0.4, 22i = 0.6, 11i = 22i = 1.0, = 2.0 and then permit 21i and 21i to vary. The coefcient 21i is particularly interesting since a non-zero value for this parameter reects an absence of even weak exogeneity for the regressors in the cointegrating regression associated with (1), and is captured by the term L21i in the panel FMOLS statistics. For our heterogeneous panel, we therefore set 11i = 22i = 1.0, = 2.0 and draw the remaining parameters from uniform
111
distributions which are centered around the parameter values set by Phillips & Loretan (1991), but deviate by up to 0.4 in either direction for the elements of i and by up to 0.85 in either direction for 21i. Thus, in our rst experiment, the parameters are drawn as follows: 11i ~ U(0.1, 0.7), 12i ~ (0.0, 0.8), 21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0) and 21i ~ U(0.85, 0.85). This specication achieves considerable heterogeneity across individual members and also allows the key parameters 21i and 21i to span the set of values considered in Phillips and Loretans study. In this rst experiment we restrict the values of 21i to span only the positive set of values considered in Phillips and Loretan for this parameter. In several cases Phillips and Loretan found negative values for 21i to be particularly problematic in terms of size distortion for many of the conventional test statistics applied to pure time series, and in our subsequent experiments we also consider the consequences of drawing negative values for this coefcient. In each case, the asymptotic covariances were estimated individually for each member i of the cross section using the Newey-West (1987) estimator. In setting the lag length for the band width, we employ the data dependent scheme recommended in Newey & West (1994), which is to set the lag truncation to the nearest integer given by K = 4

T 100
2/9
, where T is the
number of sample observations over time. Since we consider small sample results for panels ranging in dimension from T = 10 to T = 100 by increments of 10, this implies that the lag truncation ranges from 2 to 4. For the cross sectional dimension, we consider small sample results for N = 10, N = 20 and N = 30 for each of these values of T. Results for the rst experiment, with 21i ~ U(0.0, 0.8) are reported in Table I of Appendix B. The rst column of results reports the bias of the point estimator and the second column reports the associated standard error of the sampling distribution. Clearly, the biases are small at 0.058 even in extreme cases when both the N and T dimensions are as small as N = 10, T = 10 and become minuscule as the T dimension grows larger. At N = 10, T = 30 the bias is already down to 0.009, and at T = 100 it goes to 0.001. This should be anticipated, since the estimators are superconsistent and converge at rate TN, so that even for relatively small dimensions the estimators are extremely precise. Furthermore, the Monte Carlo simulations conrm that the bias is reduced more quickly with respect to growth in the T dimension than with respect to growth in the N dimension. For example, the biases are much smaller for T = 30, N = 10 than for T = 10, N = 30 for all of the experiments. The standard errors in column two conrm that the sampling variance around these
112
PETER PEDRONI
biases are also very small. Similar results continue to hold in subsequent experiments with negative moving average coefcients, regardless of the data generating process for the serial correlation processes. Consequently, the rst thing to note is that these estimators are extremely accurate even in panels with very heterogeneous serial correlation dynamics, xed effects and endogenous regressors. Of course these ndings on bias should not come as a surprise given the superconsistency results presented in the previous section. Instead, a more central concern for the purposes of inference are the small sample properties of the associated t-statistic and the possibility for size distortion. For this, we consider the performance of the small sample sizes of the test under the null hypothesis for various nominal sizes based on the asymptotic distribution. Specically, the last two columns report the Monte Carlo small sample results for the nominal 5% and 10% p-values respectively for a two sided test of the null hypothesis = 2.0. As a general rule, we nd that the size distortions in these small samples are remarkably small provided that the time series dimension, T, is not smaller than the cross sectional dimension, N. The reason for this condition stems primarily as a consequence of the estimation of the xed effects. The number of xed effects, i, grows with the N dimension of the panel. On the other hand, each of these N xed effects are estimated consistently as T grows large, so that i i goes to zero only as T grows large. Accordingly, we require T to grow faster than N in order to eliminate this effect asymptotically for the panel. As a practical consequence, small sample size distortion tends to be high when N is large relative to T, and decreases as T becomes large relative to N, which can be anticipated in any xed effects model. As we can see from the results in Table I, in cases when N exceeds T, the size distortions are large, with actual sizes exceeding 30 and 40% when T = 10 and N grows from 10 to 20 and 30. This represents an unattractive scenario, since in this case, the tests are likely to report rejections of the null hypothesis when in fact it is not warranted. However, these represent extreme cases, as the techniques are designed to deal with the opposite case, where the T dimension is reasonably large relative to the N dimension. In these cases, even when the T dimension is only slightly larger than the N dimension, and even in cases where it is comparable, we nd that the size distortion is remarkably small. For example, in the results reported in Table I we nd that with N = 20, T = 40 the size of the nominal 5% and 10% tests becomes 4.5% and 9.3% respectively. Similarly, for N = 10, T = 30 the sizes for the Monte Carlo sample become 6.1% and 11% respectively, and for N = 30, T = 60, they become 4.7% and 9.6%. As the T dimension grows even larger for a xed N dimension, the tests tend to become slightly undersized, with the actual size
113
becoming slightly smaller than the nominal size. In this case the small sample tests actually become slightly more conservative than one would anticipate based on the asymptotic critical values. Next, we consider the case in which the values for 21i span negative numbers, and for the experiment reported in Table II of Appendix B we draw this coefcient from 21i ~ U(0.8, 0.0). Large negative values for moving average coefcients are well known to create size distortion for such estimators, and we anticipate this to be a case in which we have higher small sample distortion. It is interesting to note that in this case the biases for the point estimate become slightly positive, although as mentioned before, they continue to be very small. The small sample size distortions follow the same pattern in that they tend to be largest when T is small relative to N and decrease as T grows larger. In this case, as anticipated, they tend to be higher than for the case in which 21i spans only positive values. However, the values still fall within a fairly reasonable range considering that we are dealing with all negative values for 21i. For example, with N = 10, T = 100 we have values of 6.3% and 12% for the 5% and 10% nominal sizes respectively. For N = 20, T = 100 they become 9% and 15.6% respectively. These are still remarkably small compared to the size distortions reported in Phillips & Loretan (1991) for the conventional time series case. Finally, we ran a third experiment in which we allowed the values for 21i to span both positive and negative values so that we draw the values from 21i ~ U(0.4, 0.4). We consider this to be a fairly realistic case, and this corresponds closely to the range of moving average coefcients that were estimated in the purchasing power parity study contained in Pedroni (1996a). We nd the group mean estimator and test statistic to perform very well in this situation. The Monte Carlo simulation results for this case are reported in Table III of Appendix B. Whereas the biases for the case with large positive values of 21i in Table I were negative, and for the case with large negative values in Table II were positive, here we nd the biases to be positive and often even smaller in absolute value than either of the rst two cases. Most importantly, we nd the size distortions for the t-statistic to be much smaller here than in the case where we have exclusively negative values for 21i. For example, with N = 30, and T as small as T = 60, we nd the nominal 5% and 10% sizes to be 5.4% and 10.5%. Again, generally the small sample sizes for the test are quite close to the asymptotic nominal sizes provided that the T dimension is not smaller than the N dimension. Consequently, it appears to be the case that even when some members of the panel exhibit negative moving average coefcients, as long as other members exhibit positive values, the distortions tend to be averaged out so that the small sample sizes for the group mean statistic stay
114
PETER PEDRONI
very close to the asymptotic sizes. Thus, we conclude that in general when the T dimension is not smaller than the N dimension, the asymptotic normality result appears to provide a very good benchmark for the sampling distribution under the null hypothesis, even in relatively small samples with heterogeneous serial correlation dynamics. Finally, although power is generally not a concern for such panel tests, since the power is generally quite high, it is worth mentioning the small sample power properties of the group mean estimator. Specically, we experimented by checking the small sample power of the test against the alternative hypothesis by generating the 10,000 draws for the DGP associated with case 3 above with = 1.9. For the test of the null hypothesis that = 2.0 against the alternative hypothesis that = 1.9, we found that the power for the 10% p-value test reached 100% for N = 10 when T was 40 or more (or 98.2% when T = 30) and reached 100% for N = 20 when T was 30 or more, and for N = 30 the power reached 100% already when T was 20 or more. Consequently, considering the high power and the relatively small size distortion, we nd the small sample properties of the estimator and associated t-statistic to be extremely well behaved in the cases for which it was designed.
IV. ESTIMATION ALGORITHM AND SOME EXAMPLES OF APPLICATIONS7

In this section we describe the algorithm for computing the panel FMOLS estimators and their associated test statistics and then discuss a few examples of their use. In summary, we can compute any one the desired statistics by performing the following steps: 1. Estimate the panel regression and collect the residuals. Specically one should estimate the desired panel cointegration regression, making sure to include any desired intercepts, or common time dummies in the regression, and then collect the residuals i,t for each of the members of the panel. If the slopes are homogeneous, the common time dummy effects can be eliminated more simply by rst demeaning the data over the time dimension t, xit x t for each prior to estimating the regression. Thus, construct yit y t = N1 N variable, where y t = N1 N i = 1 yit, x i = 1 xit prior to estimating the regression, and prior to the following steps. 2. Estimate the long run covariances and autocovariances of the errors. Use the estimated residuals from part (1) plus the differences of each of the regressors to construct a vector error series it = (it, it). Note that the second element is a vector of dimension m, where m corresponds to the number of regressors. Now use any long run covariance matrix estimator,
115
such as the Newey-West (1987) estimator to estimate the elements of the long run covariance i and the autocovariances i. This can be done by applying the estimator to the entire m + 1 vector it = (it, it) to produce an (m + 1) (m + 1) long run covariance matrix and autocovariances matrix. The elements of i and i then correspond to partitions of the (m + 1) (m + 1) long run covariance matrix and autocovariance matrix respectively. Specically, the far upper right scalar element of the (m + 1) (m + 1) long run covariance matrix corresponds to 11i. The lower m m partition corresponds to 22i, which is an m m matrix representing the long run covariance among the regressors, and the remaining m elements in the column below the far upper right scalar element correspond to 21i. Since the covariance matrix is symmetric, 12i = 21i. The same mapping corresponds the partitions of the (m + 1) (m + 1) autocovariance matrix and the elements of i, except that unlike i, the autocovariance matrix i is not symmetric, so 12i 21i, and these elements must be extracted from the corresponding column and row partitions separately. Once i has been constructed, apply a Cholesky style triangularization to obtain the elements of the matrix Li. Finally, we will use an estimate of the standard contemporaneous covariance matrix, o i , for the elements of it = (it, it), similarly partitioned. 3. Construct the estimator. Now we have all of the pieces required to construct the estimators. Each estimator uses a serial correlation correction term, i, which can be constructed from the pieces obtained in part (2) above, as 21i L 21i + o + o ( i 21i 22i) 22i 22i L i) Next, using the elements of Li, the expression for y* it = (yit y
21i L x can be 22i it L constructed from the original data. Then the nal step is to construct the cross product terms between y* it and (xit xi). This is sufcient now to compute either the point estimators or the associated t-statistics for any of the statistics. It is worth noting two points here. The difference between the panel within dimension estimators and the group mean between dimension estimators is in the way in which the cross product terms are computed. For the within dimension statistics, the cross product terms are computed by summing over the T and N dimensions separately for the numerator and the denominator. For the group mean between dimension statistics, the cross product terms are computed by summing over the T dimension for the numerator and denominator separately, and then summing over the N dimension for the entire ratio. Consequently, the rst point to note is that the algorithm as applied to the
116
PETER PEDRONI
group mean estimator describes the same steps that one would take if one were estimating N different conventional FMOLS estimators and then taking the average of these. The same is true for the group mean t-statistic. Thus, if one already has a routine to estimate the conventional time series FMOLS estimator, then the group mean panel FMOLS estimator is extremely simple and convenient to estimate. The second point to note is that for the panel FMOLS within dimension estimator we have used the estimates of i, i, o i and i to compute the weighted panel variances. But it is equally feasible to compute the unweighted panel variances by rst averaging the values i, i, o i before applying the transformations. Whether or not the two different treatments has much consequence for the estimate is likely to depend on how heterogeneous the values of i are across individual members. Next, we briey describe a few examples of the use of these panel FMOLS estimators. One obvious application is to the exchange rate literature, and in particular the purchasing power parity literature. Long run absolute or strong purchasing power parity predicts that nominal exchange rates and aggregate price ratios among countries should be cointegrated with a unit cointegrating vector, so that the real exchange rate is stationary. However, panel unit root tests based on Levin & Lin (1993) have generally found mixed results. See for example Oh (1996) and Papell (1997) and Wu (1996) among others. On the other hand, panel cointegration tests based on Pedroni (1995, 1997a) have generally rejected the null of no cointegration. See for example Canzoneri, Cumby & Diba (1996), Chinn (1997) and Taylor (1996) among others for these. By contrast, long run relative or weak purchasing power parity simply predicts that the nominal exchange rate and aggregate price ratios will be cointegrated, though not necessarily with a unit cointegrating vector. The panel FMOLS estimators presented in this paper are an obvious way to distinguish between these two hypothesis, and Pedroni (1996a, 1999) uses these panel FMOLS estimators to show that only the relative, weak form of purchasing power parity holds for a panel of post Bretton Woods period oating exchange rates. The latter paper contrasts results for both a parametric group mean DOLS estimator and nonparametric group mean FMOLS estimator for the weak purchasing power parity test. In a similar spirit, Alexius & Nilson (2000), Canzoneri, Cumby & Diba (1996), Chinn (1997) apply these panel FMOLS tests from Pedroni (1996a) to test the Samuelson-Balassa hypothesis that long run movements of real exchange rates are driven by differences in long run relative productivities among countries. Other examples of the use of these panel FMOLS tests have been to the growth literature. Neusser & Kugler (1998) use the tests from Pedroni (1996a) to investigate the connection between nancial development and growth. Kao,
117
Chiang & Chen (1999) use a panel FMOLS estimator and compare it to a panel DOLS estimator to investigate the connection between research and development expenditure and growth. Keller & Pedroni (1999) use the group mean panel estimator presented in this chapter to study the mechanism by which imported R&D impacts growth at the industry level and demonstrate the attractiveness of the more exible form of the group mean estimator. Canning & Pedroni (1999) use the same group mean panel FMOLS test as a rst step estimator to construct a test for the direction of long run causality between public infrastructure and long run growth. Finally Pedroni & Wen (2000) make use of the group mean panel FMOLS estimator as a rst step estimator in an overlapping generations model to identify the position of the U.S., Japanese and European economies relative to the golden rule, and the extent to which social security transfer programs can move economies closer to this position. This is just a brief summary of the application of these estimators to two literatures, the exchange rate and growth literatures. Needless to say, many potential applications exist beyond these two literatures.
V. DISCUSSION OF FURTHER RESEARCH AND CONCLUDING REMARKS

We have explored in this chapter methods for testing and making inferences about cointegrating vectors in heterogeneous panels based on fully modied OLS principles. When properly constructed to take account of potential heterogeneity in the idiosyncratic dynamics and xed effects associated with such panels, the asymptotic distributions for these estimators can be made to be centered around the true value and will be free of nuisance parameters. Furthermore, based on Monte Carlos simulations we have shown that in particular the t-statistic constructed from the between dimension group mean estimator performs very well in that in exhibits relatively little small sample size distortion. To date, the techniques developed in this study have been employed successfully in a number of applications, and it will be interesting to see if the panel FMOLS methods developed in this paper fare equally well in other scenarios. The area of research and application of nonstationary panel methods is rapidly expanding, and we take this opportunity to remark on a few further issues of current and future research as they relate to the subject of this chapter. As we have already discussed, the between dimension group mean estimator has an advantage over the within dimension pooled estimators presented in this chapter in that it permits a more exible alternative hypothesis that allows for heterogeneity of the cointegrating vector. In many cases it is not known a priori
118
PETER PEDRONI
whether heterogeneity of the cointegrating vector can be ruled out, and it would be particularly nice to test the null hypothesis that the cointegrating vectors are heterogeneous in such panels with heterogeneous dynamics. In this context, Pedroni (1998) provides a technique that allows one to test such a null hypothesis against the alternative hypothesis that they are homogeneous and demonstrates how the technique can be used to test whether convergence in the Solow growth model occurs to distinct versus common steady states for the Summers and Heston data set. Another important issue that is often raised for these types of panels pertains to the assumption of cross sectional independence as per assumption 1.2 in this chapter. The standard approach is to use common time dummies, which in many cases is sufcient to deal with cross sectional dependence. However, in some cases, common time dummies may not be sufcient, particularly when the cross sectional dependence is not limited to contemporaneous effects and is dynamic in nature. Pedroni (1997b) proposes an asymptotic covariance weighted GLS approach to deal with such dynamic cross sectional dependence for the case in which the time series dimension is considerably larger than the cross sectional dimension, and applies the panel fully modied form of the test to the purchasing power parity hypothesis using monthly OECD exchange rate data. It is interesting to note, however, that for this particular application, taking account of such cross sectional dependencies does not appear to impact the conclusions and it is possible that in many cases cross sectional dependence does not play as large a role as one might anticipate once common time dummies have been included, although this remains an open question. Another important issue is parameteric versus non-parametric estimation of nuisance parameters. Clearly, any of the estimators presented here can be implemented by taking care of the nuisance parameter effects either nonparameterically using kernel estimators, or parametrically, as for example using dynamic OLS corrections. Generally speaking, non-parametric estimation tends to be more robust, since one does not need to assume a specic parametric form. On the other hand, since non-parametric estimation relies on fewer assumptions, it generally requires more data than parametric estimation. Consequently, for conventional time series tests, when data is limited it is often worth making specic parameteric assumptions. For panels, on the other hand, the greater abundance of data suggests an opportunity to take advantage of the greater robustness of nonparametric methods, though ultimately the choice may simply be a matter of taste. The Monte Carlo simulation results provided here demonstrate that even in the presence of considerable heterogeneity, nonparametric correction methods do very well for the group mean estimator and the corresponding t-statistic.
119
NOTES
1. The results in section 2 and appendix A rst appeared in Pedroni (1996a). The Indiana University working paper series is available at http://www.indiana.edu/ iuecon/ workpaps/ 2. In fact the computer program which accompanies this paper also allows one to implement these tests for any arbitrary number of regressors. It is available upon request from the author at ppedroni@indiana.edu 3. See Phillips & Moon (1999) for a recent formal study of the regularity conditions required for the use of sequential limit theory in panel data and a set of conditions under which sequential limits imply joint limits, including the case in which the long run variances differ among members of the panel. 4. These results are for the OLS estimator when the variables are cointegrated. A related stream of the literature studies the properties of the panel OLS estimator when the variables are not cointegrated and the regression is spurious. See for example Entorf (1997), Kao (1999), Phillips & Moon (1999) and Pedroni (1993, 1997a) on spurious regression in nonstationary panels. 5. A separate issue pertains to differences between the sample averages and the true population means. Since we are treating the asymptotics sequentially, this difference goes to zero as T grows large prior to averaging over N, and thus does not impact the limiting distribution. Otherwise, more generally we would require that the ratio N/T goes to zero as N and T grow large in order to ensure that these differences do not impact the limiting distribution. We return to this point in the discussion of the small sample properties in section 3.2. 6. Of course this is not to say that all within dimension estimators will necessarily suffer from this particular form of size distortion, and it is likely that some forms of the pooled FMOLS estimator will be better behaved than others. Nevertheless, given the other attractive features of the between dimension group mean estimator, we focus here on reporting the very attractive small sample properties of this estimator. 7. I am grateful to an anonymous referee for suggesting this section.
ACKNOWLEDGMENTS
I thank especially Bob Cumby, Bruce Hansen, Roger Moon, Peter Phillips, Norman Swanson and Pravin Trivedi and two anonymous referees for helpful comments and suggestions on various earlier versions, and Maria Arbatskaya for research assistance. The paper has also benetted from presentations at the June 1996 North American Econometric Society Summer Meetings, the April 1996 Midwest International Economics Meetings, and workshop seminars at Rice University-University of Houston, Southern Methodist University, The Federal Reserve Bank of Kansas City, U. C. Santa Cruz and Washington University. The current version of the paper was completed while I was a visitor at the Department of Economics at Cornell University, and I thank the members of the Department for their generous hospitality. A computer program
120
PETER PEDRONI
which implements these tests is available upon request from the author at ppedroni@indiana.edu
REFERENCES
Alexius, A., & Nilson, J. (2000). Real Exchange Rates and Fundamentals: Evidence from 15 OECD Countries. Open Economies Review, forthcoming. Canning, D., & Pedroni, P. (1999). Infrastructure and Long Run Economic Growth. CAE Working paper, No. 9909, Cornell University. Canzoneri M., Cumby, R., & Diba, B. (1996). Relative Labor Productivity and the Real Exchange Rate in the Long Run: Evidence for a Panel of OECD Countries. NBER Working paper No. 5676. Chinn, M. (1997). Sectoral Productivity, Government Spending and Real Exchange Rates: Empirical Evidence for OECD Countries. NBER Working paper No. 6017. Chinn, M., & Johnson, L. (1996). Real Exchange Rate Levels, Productivity and Demand Shocks: Evidence from a Panel of 14 Countries. NBER Working paper No. 5709. Entorf, H. (1997). Random Walks and Drifts: Nonsense Regression and Spurious Fixed-Effect Estimation. Journal of Econometrics, 80, 28796. Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37, 249265. Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels. Working paper, Department of Economics, University of Cambridge. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 144. Kao, C., & Chen, B. (1995). On the Estimation and Inference of a Cointegrated Regression in Panel Data When the Cross-section and Time-series Dimensions Are Comparable in Magnitude. Working paper, Department of Economics, Syracuse University. Kao, C., & Chiang, M. (1997). On the Estimation and Inference of a Cointegrated Regression In Panel Data. Working paper, Department of Economics, Syracuse University. Kao, C., Chiang, M., & Chen, B. (1999). International R&D Spillovers: An Application of Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and Statistics, 61(4), 691709. Keller, W., & Pedroni, P. (1999). Does Trade Affect Growth? Estimating R&D Driven Models of Trade and Growth at the Industry Level. Working paper, Department of Economics, Indiana University and University of Texas. Levin, A., & Lin, F. (1993). Unit Root Tests in Panel Data; Asymptotic and Finite-sample Properties. Working paper, Department of Economic, U. C. San Diego. Mark, N., & Sul, D. (1999). A Computationally Simple Cointegration Vector Estimator for Panel Data. Working paper, Department of Economics, Ohio State University. Neusser, K., & Kugler, M. (1998). Manufacturing Growth and Financial Development: Evidence from OECD Countries. Review of Economics and Statistics, 80, 638646. Newey, W., & West, K. (1987). A Simple, Positive Semi-Denite, Heteroskedasticity and Autocorrelation Consistent Coariance Matrix. Econometrica, 55, 703708. Newey, W., & West, K. (1994). Autocovariance Lag Selection in Covariance Matrix Estimation. Review of Economic Studies, 61, 631653.
121
Obstfeld M., & Taylor, A. (1996). International Capital-Market Integration over the Long Run: The Great Depression as a Watershed. Working paper, Department of Economics, U. C. Berkeley. Oh, K. (1996). Purchasing Power Parity and Unit Root Tests Using Panel Data. Journal of International Money and Finance, 15, 405418. Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float. Journal of International Economics, 43, 31332. Pedroni, P. (1993). Panel Cointegration. Chapter 2 in Panel Cointegration, Endogenous Growth And Business Cycles in Open Economies, Columbia University Dissertation, Ann Arbor, MI: UMI Publishers. Pedroni, P. (1995). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time Series Tests, With an Application to the PPP Hypothesis. Working paper, Department of Economics, No. 95013, Indiana University. Pedroni, P. (1996a). Fully Modied OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper No. 96020, Department of Economics, Indiana University. Pedroni, P. (1996b). Human Capital, Endogenous Growth, & Cointegration for Multi-Country Panels. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997a). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time Series Tests, With an Application to the PPP Hypothesis; New Results. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997b). On the Role of Cross Sectional Dependency in Dynamic Panel Unit Root and Panel Cointegration Exchange Rate Studies. Working paper, Department of Economics, Indiana University. Pedroni, P. (1998). Testing for Convergence to Common Steady States in Nonstationary Heterogeneous Panels. Working paper, Department of Economics, Indiana University. Pedroni, P. (1999). Purchasing Power Parity Tests in Cointegrated Panels. Working paper, Department of Economics, Indiana University. Pedroni, P., & Wen, Y. (2000). Government and Dynamic Efciency. Working paper, Department of Economics, Cornell University and Indiana University. Pesaran, H., & Smith, R. (1995). Estimating Long Run Relationships from Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79114. Phillips, P., & Durlauf, S. (1986). Multiple Time Series Regressions with Integrated Processes. Review of Economic Studies, 53, 473495. Phillips, P., & Hansen, B. (1990). Statistical Inference in Instrumental Variables Regression with I(1) Processes. Review of Economic Studies, 57, 99125. Phillips, P., & Loretan, M. (1991). Estimating Long-run Economic Equilibria. Review of Economic Studies, 58, 407436. Phillips, P., & Moon, H. (1999). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 10571112. Quah, D. (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 919. Taylor, A. (1996). International Capital Mobility in History: Purchasing Power Parity in the LongRun. NBER Working paper No. 5742. Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel-Data Test. Journal of Money Credit and Banking, 28, 5463.
122
PETER PEDRONI
MATHEMATICAL APPENDIX A
Proposition 1.1: We establish notation here which will be used throughout the remainder of the appendix. Let Zit = Zit1 + it where it = (it, it). Then by virtue of assumption 1.1 and the functional central limit theorem, T
1
T 2
it Z it
T
i) dB(r, i) + i + o B(r, i
1
(A1)
t=1
r=0
itZ Z it
i)B(r, i) dr B(r,
(A2)
t=1
r=0
it = Z it Z i refers to the demeaned discrete time process and for all i, where Z B(r, i) is demeaned vector Brownian motion with asymptotic covariance i. 1/2 i) = L is the This vector can be decomposed as B(r, i Wi(r) where Li = i = W1(r) lower triangular decomposition of i and W(r) W2(r)
W2(r) dr
W1(r) dr,
is a vector of demeaned standard Brownian motion,
with W1i independent of W2i. Under the null hypothesis, the statistic can be written in these terms as N T 1 1 it Z T it N i = 1 21 t=1 TN(NT ) = (A3) N T 1 2 itZ T Z it N i=1 22 t=1 Based on (A1), as T , the bracketed term of the numerator converges to
i) i) dB(r, B(r,
r=0

+ 21i + o 21i
21
(A4)
the rst term of which can be decomposed as
i) dB(r, i) B(r,
r=0
= L11iL22i
21
+ L21iL22i
W2i dW1i W 1i(1)
W2i dW2i W 2i(1)

W2i W2i
(A5)
123
In order for the distribution of the estimator to be unbiased, it will be necessary that the expected value of the expression in (A4) be zero. But although the expected value of the rst bracketed term in (A5) is zero, the expected value of the second bracketed term is given as
E L21iL22i
W2i dW2i W2i(1)

W2i
1 = L21iL22i 2
(A6)
Thus, given that the asymptotic covariance matrix, i, must have positive diagonals, the expected value of the expression (A4) will be zero only if L21i = 21i = o 21i = 0, which corresponds to strict exogeneity of regressors for all members of the panel. Finally, even if such strict exogeneity does hold, the variance of the numerator will still be inuenced by the parameters L11i, L22i which reect the idiosyncratic serial correlation patterns in the individual cross sectional members. Unless these are homogeneous across members of the panel, they will lead to non-trivial data dependencies in the asymptotic distribution. Proposition 1.2: Continuing with the same notation as above, the fully modied statistic can be written under the null hypothesis as
1
* TN( NT ) =

N T
1 1 L 11iL22i (0,1) 1 N
N
it Z it
1,
i=1
t=1
21i L i 22i L
(A7)
2 L 22i
T 2
itZ Z it
i=1
t=1
22
Thus, based on (A1), as T , the bracketed term of the numerator converges to
i) dB(r, i) B(r,
r=0
21
21i L 22i L
i) dB(r, i) B(r,
r=0
22
+ 21i + o 21i
21i L ( + o 22i) 22i 22i L
(A8)
i such that which can be decomposed into the elements of W
124
PETER PEDRONI
1
i) dB(r, i) B(r,
r=0
= L11iL22i
21
+ L21iL22i
i) dB(r, i) B(r,
r=0

= L2 22i
22 T
W2i dW1i W1i(1)
W2i dW2i W2i(1)
W2idW2i W2i(1)

W2i W2i W2i
(A9)
(A10)
where the index r has been omitted for notational simplicity. Thus, if a i i and consequently L i Li consistent estimator of i is employed, so that and i , then
1 1 1 L 11iL22i (0,1)(T

t=1
21i L it i Z it) 1, 22i L

1
W2i(r) dW1i(r) W1i(1)
W2i(r) dr
(A11)
where the mean and variance of this expression are given by E
W2i dW1i
1 1 1 1 = 2 + = 2 3 3 6

1 0
W2idW1i W1i(1)
2W1i(1)

W2idr
W2idr = 0
(A12)
W2idW1i + W1i(1)2
W2idr
(A13)
respectively. Now that this expression has been rendered void of any i), then by virtue of idiosyncratic components associated with the original B(r, assumption 1.2 and a standard central limit theorem argument,
1 N
N i=1
W2i(r) dr N( 0, 1/6) (A14)
125
as N . Next, consider the bracketed term of the denominator of (A3), which based on (A1), as T , converges to
i)B(r, i) B(r,
r=0

= L2 22i
22 0
W2i(r)2 dr
0 0
W2i(r) dr
2 2
(A15)
Thus,
2 2 L 22i (T
itZ Z it)
W2i(r)2dr
t=1
22
W2i(r) dr
(A16)
which has nite variance, and a mean given by
0 N
W2i(r)2dr
W2i(r) dr
1 1 1 = = 2 3 6
(A18)
Again, since this expression has been rendered void of any idiosyncratic i), then by virtue of assumption components associated with the original B(r, 1.2 and a standard law of large numbers argument,
1 N

i=1 0
W2i(r)2 dr
W2i(r) dr
1 6
(A18)
as N . Thus, by iterated weak convergence and an application of the * continuous mapping theorem, TN( NT ) N(0, 6) for this case where iheterogeneous intercepts have been estimated. Next, recognizing that T 1/2y

W1i =
W1i(r) dr
and
T 1/2x i
W2i(r) dr
as
T ,
and
setting
W2i = 0 for the case where y i = x i = 0 gives as a special case of (A13)
and (A17) the results for the distribution in the case with no estimated
126
PETER PEDRONI
intercepts. In this case the mean given by (A12) remains zero, but the variance 1 1 in (A13) become 2 and the mean in (A17) also becomes 2. Thus, * TN( NT ) N(0, 2) for this case. Corollary 1.2: In terms of earlier notation, the statistic can be rewritten as:
1 N
t = * NT

N T
1 1 L 11iL22i (0,1)
N
it Z it
T
1,
i=1
t=1
21i L i 22i L
(A19)
1 N
2 L 22i
T 2
itZ Z it
i=1
t=1
22
where the numerator converges to the same expression as in proposition 1.2, and the root term of the denominator converges to the same value as in proposition 1.2. Since the distribution of the numerator is centered around zero, will simply be the distribution of the the asymptotic distribution of t * NT numerator divided by the square root of this value from the denominator. Since E
W2i dW1i
=E
2W1i(1)
W2i
W2i dW1i + W1i(1)2
W2i
W2 2i
W2i
(A20)
by (A13) and (A17) regardless of whether or not
then t N(0, 1) irrespective of whether x i, y i are * NT estimated or not. Proposition 1.3: Write the statistic as:
1 t = * NT N

W1i, 1,
W2i are set to zero,

N T 1 2 L 11i (0, 1) T
it Z it
i=1
t=1
21i L i 22i L
(T
itZ Z it)22
1/2
(A21)
t=1
Then the rst bracketed term converges to
127
L11iL22i
W2i(r) dr
~ N 0, L11iL22i
W2i(r)2 dr
W2i(r) dr
(A22)
by virtue of the independence of W21i(r) and dW1i(r). Since the second bracketed term converges to L22i
W2i(r)2 dr
W2i(r) dr
1/2
(A23)
i Li, (A21) becomes a standardized sum of i.i.d. then, taken together, for L standard normals regardless of whether or not
and thus t N(0, 1) by a standard central limit theorem argument * NT i are estimated or not. irrespective of whether x i, y Proposition 2.1: Insert the expression for y* it into the numerator and use i = (xit x i) + it to give yit y N T 21i L 1 1 L11iL22i (xit x i)(it x ) T i 22i it L i=1 t=1 * = NT N T

W1i,
W2i are set to zero,
2 1 1 Since L 22i = L11iL22i 1 +

2 L 22i (xit x i)2
T i=1 t=1 N
1 1 L 11iL22i 1 +
N
i=1
22i 11i L L 22i L

T
(xit x i)2
t=1
(A24)
2 L 22i
(xit x i)2
i=1
t=1
11i L 22i L , the last term in (A24) reduces to , thereby L22i
giving the desired result.
128
PETER PEDRONI
APPENDIX B
Table I. Small Sample Performance of Group Mean Panel FMOLS with Heterogeneous Dynamics Case 1: 21i ~ (0.0, 0.8)
N 10 T 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 bias 0.058 0.018 0.009 0.006 0.004 0.003 0.002 0.002 0.002 0.001 0.034 0.012 0.006 0.004 0.003 0.003 0.002 0.002 0.002 0.001 0.049 0.017 0.009 0.006 0.004 0.003 0.003 0.002 0.002 0.002 std error 0.115 0.047 0.029 0.020 0.016 0.012 0.010 0.009 0.008 0.007 0.079 0.033 0.020 0.014 0.011 0.009 0.007 0.006 0.006 0.005 0.061 0.025 0.015 0.011 0.008 0.007 0.006 0.005 0.004 0.004 5% size 0.282 0.084 0.061 0.035 0.027 0.020 0.016 0.014 0.014 0.014 0.291 0.100 0.076 0.045 0.039 0.028 0.026 0.021 0.020 0.018 0.386 0.156 0.107 0.072 0.059 0.047 0.039 0.035 0.032 0.030 10% size 0.362 0.145 0.110 0.076 0.062 0.049 0.044 0.040 0.038 0.037 0.378 0.166 0.132 0.093 0.081 0.066 0.059 0.055 0.050 0.052 0.470 0.234 0.177 0.133 0.118 0.096 0.086 0.073 0.077 0.076
20
30
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with = 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11i ~ U(0.1, 0.7), 12i ~ U(0.0, 0.8), 21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0).
129
Table II. Small Sample Performance of Group Mean Panel FMOLS with Heterogeneous Dynamics Case 2: 21i ~ U(0.8, 0.0)
20
30
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with = 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11i ~ U(0.1, 0.7), 12i ~ U(0.8, 0.0), 21i ~ U(0.8, 0.0), 22i ~ U(0.2, 1.0).
130
PETER PEDRONI
Table III. Small Sample Performance of Group Mean Panel FMOLS with Heterogeneous Dynamics Case 3: 21i ~ U(0.4, 0.4)
20
30
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with = 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11 ~ U(0.1, 0.7), 12i ~ U(0.4, 0.4), 21i ~ U(0.4, 0.4), 22i ~ U(0.2, 1.0).
TESTING FOR COMMON CYCLICAL FEATURES IN NONSTATIONARY PANEL DATA MODELS

Alain Hecq, Franz C. Palm and Jean-Pierre Urbain
ABSTRACT
In this chapter we extend the concept of serial correlation common features to panel data models. This analysis is motivated both by the need to develop a methodology to systematically study and test for common structures and comovements in panel data with autocorrelation present and by an increase in efciency coming from pooling procedures. We propose sequential testing procedures and study their properties in a small scale Monte Carlo analysis. Finally, we apply the framework to the well known permanent income hypothesis for 22 OECD countries, 19501992.
I. INTRODUCTION
In economics it is often of interest to test whether a set of time series moves together, that is whether the series are driven by some common factors. The vast literature on cointegration has focussed on long-run comovements for nonstationary time series. More recently, some authors have analyzed the existence of short-run comovements between stationary time series or between rst differenced cointegrated-I(1) series (see Tiao & Tsay, 1989; Engle & Kozicki, 1993; Gouriroux & Peaucelle, 1993; Vahid & Engle, 1993; Vahid &
131
132
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Engle, 1997; Ahn, 1997). Among these approaches, the concept of serial correlation common features (SCCF hereafter) introduced by Engle & Kozicki (1993) appeared to be useful. It means that stationary time series move together as there exist linear combinations of these variables that yield white noise processes. These common feature vectors are measures for analyzing short-run relationships between economic variables suggested by economic theory such as relative purchasing power parity (Gouriroux & Peaucelle, 1993), permanent income hypothesis (Campbell & Mankiw, 1990, Jobert, 1995), cross-country real interest rate differentials (Kugler & Neusser, 1993), real business cycle models (Issler & Vahid, 1996), convergence of economies (Beine & Hecq, 1997, 1998), Okuns Law (Candelon & Hecq, 2000). Serial correlation common features imply the existence of a reduced number of common dynamic factors explaining short-run comovements in economic variables. A companion form of the common features models is the common factor representation which has been used in macroeconomics for some decades (see e.g. Engle & Watson, 1981; Geweke, 1977; Lumsdaine & Prasad, 1997; Singleton, 1980). Beyond economic considerations, through the reducedrank restrictions, the existence of common features is likely to lead to a reduction of the number of parameters to be estimated. In general, imposing common cyclical feature restrictions when they are appropriate will induce an increase in estimation efciency (Ltkepohl, 1991) and accuracy of forecasts (Vahid & Issler, 1999). Also as for unit roots and cointegration tests, the power of common cyclical feature procedures may be low for small samples (Beine & Hecq, 1999). The power of tests might be increased by relying on panel data instead of using only time series data. Consequently, in this paper we propose to extend these models by testing for serial correlation common features in a panel data framework. In order to avoid confusion, it is worth noticing that standard panel data models with common parameter structures obviously already imply a common feature structure, namely the one which allows to pool the behavior of N individuals. Notice that the assumption of poolability often made in panels may be often far too strong. An investigator may want to test which poolability restrictions are supported by the data and which restrictions have to be rejected for the panel data. We propose to generalize the SCCF approach and apply it to search for common cyclical features in panel data. In particular, we investigate whether there exist linear combinations of the variables for individual or entity i which are white noise for all i, in other words, which weights in the linear combinations are identical across all entities. Developing a methodology to
Testing for Common Cyclical Features
133
analyze and test common cyclical features in panel data is of theoretical and practical importance since common cyclical feature restrictions are less restrictive than the assumption of identical parameters across individuals usually made in panel data modeling. Some purists might not speak about panel for this type of analysis. Indeed, in situations we are interested with, N will be relatively small compared to its value in usual panel data and T is assumed large (with T asymptotics). Many macroeconomic studies deal with 15 to 50 annual observations for 20 to 100 countries, regions, industry levels or big rms. In those cases, the border between pure panel analysis (N ) and pure time series analysis (T ) is fuzzy. Far from impoverishing the panel data analysis, taking into account medium or large size time series raises new interesting issues such as testing for unit roots or cointegration in panel data (see inter alia Levin & Lin, 1993; Pesaran & Smith, 1995; Evans & Karas, 1996a; Kao, 1999; Pedroni, 1997a; Phillips & Moon, 1999b, and Phillips & Moon, 1999a, for the asymptotic theory, and the recent issue of the Oxford Bulletin of Economics and Statistics, 1999). The chapter is organized as follows. Section II provides an example of common features between consumption and income implied by economic theory and likely to be common to data for different countries. In Section III we review the concept of serial correlation common features. Section IV extends it to panel data. As we study differences and similarities in macroeconomic series for different countries, we concentrate our analysis on the xed effect model (see Hsiao, 1986). Section V describes estimation procedures. In Section VI simulation results are reported. In Section VII we present an empirical analysis of the liquidity constraint consumption model for 22 OECD countries and the G7. Section VIII concludes.
II. AN EXAMPLE OF COMMON FEATURES

To further motivate this chapter, consider the permanent income hypothesis (PIH hereafter) and the heterogeneous consumer model proposed by Campbell & Mankiw (1990, 1991). These authors consider two groups of agents who receive a disposable income y1t and y2t in xed proportions of the total income respectively, such that y1t = yt, y2t = (1 )yt and yt = y1t + y2t. Agents in the rst group are subject to liquidity constraints. Therefore, they consume their current income while agents in the second group consume their permanent income. We get the following system:
134
c1t = y1t = yt
P c2t = yP 2t = (1 )yt T y1t = yP 1t + y1t T y2t = yP 2t + y2t, T where cit is the consumption of agent i and yP it and yit are the permanent and transitory component of income of the agent i which are assumed to be I(1) and T P P T I(0), respectively. Aggregating over agents we get ct = yP 1t + y1t + y2t = yt + y1t , and thus: T ct = yP t + yt T yt = yP t + yt ,
(1)
(2)
which shows that aggregate consumption and income share a common trend yP t. Note that because a fraction of income accrues to individuals who consume their current income rather than their permanent income, this model has been labelled model by Campbell & Mankiw (1990, 1991). It is also easily seen that if = 0 we get the permanent income model. In order to stress the common cycle component let us take the rst difference of aggregate consumption ct = c1t + c2t. By substituting the shares of income in the total income we obtain ct = yt + (1 )yP t which in rst differences can be written as: ct = yt + (1 )yP t. (3) Consequently, assuming that the permanent income is a martingale, the consumption function can be tested by the regression ct = yt + (1 )t. However, t is a difference martingale which is not orthogonal by construction to yt. Therefore this equation cannot be consistently estimated by OLS but instrumental variables (IV) estimators are appropriate. With a few exceptions as Vahid & Engle (1993) and Jobert (1995), most empirical studies do not take the cointegrating vector into account as a valid instrument when testing equation (3) using IV estimates, and therefore may be subject to an omitted variable problem. Vahid & Engle (1993) made the connection with the common feature hypothesis that t is a white noise1 with [1 ] the associate normalized common features vector. Empirical studies have shown that is usually signicantly different from zero with coefcients in the range 0.3 to 0.5 for most countries. Therefore in order to test for the existence of one short-run relationship common to a set of countries and to improve the power of common feature tests, a pooled common features test in panel seems appropriate. The use of the cross-section dimension in the estimation could also give rise to substantial efciency gains.
135
III. COMMON FEATURES IN TIME SERIES

In the context of time series analysis, serial correlation common features means that there exist linear combinations of (stationary) economic time series which are white noise processes. Consider a cointegrated VAR of order p = 2, with reduced rank autoregressive coefcient matrix, written in its VECM form, for consumption and income, for t = 1 . . . T:
ct 1 = + [21 yt 2 1
22]
c t 1 + [1 y t 1 1
2]

ct 1 1t + yt 1 2t
(4)
where 1 and 2 are constant drift terms, [1t, 2t] is a bivariate white noise process with non-singular covariance matrix . (2/1) is the long-run income elasticity if one chooses consumption as normalized variable. A distinction could be made at this stage between a strong and a weak form reduced rank structure, as put forward by Hecq, Palm & Urbain (1997, 2000). The Strong Form Reduced Rank Structure (SF) is the original formulation proposed by Engle & Kozicki (1993) in which long-run and short-run matrices share the same left null space. It corresponds to = in system (4). In this case, there = [1 ] such that premultiplying exists a common feature vector expression (4) by yields a white noise. In the less restrictive model, labelled Weak Form Reduced Rank Structure (WF), , and a linear combination of rst differences in deviation from the long-run equilibrium is a white noise:

ct [1 yt 1
ct 1 yt 1
1t . = 2t
(5)
Formal denitions of the strong and the weak form are given in Hecq, Palm & Urbain (1997, 2000) and consequences in terms of common cycles as well as inference issues are analyzed there as well. Notice that Hecq et al. (1997) also consider the mixed form combining both the strong and the weak form. Common features relationships give information on short-run comovements. These relationships may come from economic theory (relative purchasing power parity, PIH) or from stylized facts (convergence, Real Business Cycle (RBC) models) and give the dynamic common factor within the system, i.e. 21ct 1 + 22yt 1 in the WF case for instance. The orthogonal complement of = 0s 2), gives the factor loading of the common , labelled ( the = [ 1] in system (4). Note that these dynamics in the equations, that is common dynamic factors should not be confused with common cycles.
136
Common cycles are dened in a specic trend-cycle decomposition as the stationary part of the time series left after removing permanent components. Vahid & Engle (1993) show that the existence of s common feature vectors (of the SCCF or SF type) leads to n s common cycles in the multivariate Beveridge-Nelson decomposition. Vahid & Engle (1997) extend this denition to nonsynchronous cycles. Hecq, Palm & Urbain (2000) propose a BeveridgeNelson decomposition for the WF that allows for a reduced number of common cycles. Note that the latter weak form reduced rank structure will in the sequel not be explicitly considered as we want to focus on the extension of the standard serial correlation common feature analysis to panel data. We use the terms common dynamic features, common cyclical features and common dynamic factors as synonyms to denote reduced rank structures in the shortrun dynamics of the rst-differenced VAR or the VECM. In this simple bivariate model (4), the serial correlation common feature hypothesis may also be written in terms of moment conditions such as:
E[(ct yt).Wt] = 0,
(6)
where E[.] is the expectation operator and Wt = {1, ct 1 . . . ct k, yt 1, . . . , yt l, zt 1} is a set of instruments consisting of a constant term, the lags of both variables and the deviation from the long-run relationship zt 1 ct 1 (2/ 1)yt 1. Adopting a two-step approach,2 there are two obvious ways to test for SCCF. The rst way is to carry out a canonical correlation analysis between consumption and income on the one hand and the set of instruments on the other hand. The non-signicant squared canonical correlations reveal the existence of linear combinations which yield white noise processes. Alternatively, one can use generalized method of moments type estimators exploiting the moment condition (6). A test of overidentifying restrictions implied by (6) is a test of serial correlation common features. The use of canonical correlation estimation has the advantage that results do not rely on the choice of the normalization of the moment conditions. Moreover, it is more convenient when we test for the number of common feature vectors. In this paper we treat the problem in a GMM framework for several reasons. Firstly, we have at most one common feature vector in a bivariate system. Secondly, this framework may be more easily extended to panel data models. Finally, normalization imposed on IV by selecting one variable as having a coefcient equal to one leads to an increase of the power of the test compared with those based on canonical correlations.3
137
IV. EXTENSION TO PANEL DATA MODELS

Frequently, analyses comparing for instance the PIH with model, concentrate on one country, very often the USA. In order to motivate the generality of the theory, some authors extend their empirical investigation to several countries (Campbell & Mankiw, 1991; Evans & Karas, 1996b). However it is difcult to claim that results for different countries are uncorrelated. Since it is not possible to construct a pure time series model with relatively few observations for a large number of individuals, such as a VAR model with 2 N endogenous variables in a bivariate case, alternatives must be found. One solution would be to analyze the system under separation in common features (Hecq, Palm & Urbain, 1999), an extension to separation in cointegration (Granger & Haldrup, 1997; Konishi & Granger, 1993). Under separation in common features, the common feature matrix is block-diagonal with blocks corresponding to one individual i only. Treating the issue in the complete system with separation in common features avoids losing efciency compared to an analysis of the marginal model for individual i since separation does not require block-diagonality of the disturbance covariance matrix. This solution is however difcult to implement for more than two or three individuals. We illustrate this point via a small Monte Carlo experiment, of which the precise specication will be given in Section VI. Consider a DGP made out of bivariate systems similar to (4), with = (SCCF hypothesis), for respectively two and ve individuals. The only cross-sectional relations are due to a non-diagonal disturbance covariance matrix. Complete separation in cointegration, in common features as well as absence of bidirectional short-run Granger causality are thus maintained. Using a standard canonical correlation framework (see inter alia Hecq, Palm & Urbain, 1997) we perform a serial correlation common feature analysis in the marginal model for the rst individual, ignoring the disturbance cross-correlations. Alternatively, under separation in common features, we test the number (s = 2 or s = 5) of common feature vectors for each individual in the complete system. We then constrain the common feature space to be block-diagonal (see Hecq, Palm & Urbain, 1999) and estimate the vector for the rst individual. In Table 1, we report for 5,000 replications the median and the spread (interquartile range) of the bias, 2 test statistics for the overidentifying restrictions implied by the presence of common features as well as a small sample adjusted version (Hecq, 1999). Although separation in common features holds at the level of the DGP, some efciency loss, as measured by the spread, is observed in the marginal model compared to the full system for
138
Table 1. Monte Carlo Results (Separated vs. Marginal Systems)

Marg bias05 N=2 T = 10 T = 25 T = 50 T = 100 0.056 0.026 0.011 0.005 Marg bias05 N=5 T = 10 T = 25 T = 50 T = 100 0.061 0.025 0.012 0.006 bias075025 0.310 0.155 0.104 0.068 2(2) 14.64 7.56 6.30 4.86 2(2) 14.14 7.82 6.30 5.58 2 ss(2) 6.22 5.20 5.04 4.42 2 ss(2) 5.86 5.44 5.18 5.04 Separ bias05 0.040 0.027 0.013 0.007 Separ bias05 0.019 0.011 0.007 bias075025 0.441 0.138 0.090 0.059 2(8) 70.98 18.36 10.16 6.66 2(14) 99.76 62.88 25.18 2 ss(8) 12.8 7.14 6.16 5.14 2 ss(14) 35.04 15.26 9.38
bias075025 0.299 0.152 0.100 0.069
bias075025 0.241 0.087 0.052
T = 25 for N = 2 and for T = 50 for N = 5. However the dispersion is too high for smaller sample size and test statistics reject too often the presence of respectively two and ve common feature vectors. These illustrative Monte Carlo results call for an extension to a (possibly nonstationary) panel common feature analysis. Let the subscript i = 1, . . . , N indicate the different groups/entities/units, t = 1, . . . , T denote the sample period and j = 1, . . . , n denote the number of variables for each group/entity. We assume that the n-dimensional vector of observed I(1) variables for entity i, Xi, t, is generated by a pi-th order cointegrated VAR which can be expressed in error-correction form as follows: Xi, t = i + t + i i Xi, t 1 +
pi 1 j=1
i, jXi, t j + i, t, t = 1, . . . , T, (7)
i = 1, . . . N,
where i denotes xed individual effects, t denotes a vector of deterministic time effects, i and i are n ri matrices of full column rank with ri being the cointegrated rank (ri < n) and i, t is a disturbance. The vector t = ( 1, t, . . . , N, t) is an nN 1 dimensional homoskedastic Gaussian mean innovation process relative to X 1 = {Xi, t j, i = 1, . . . , N; j < t} with non-singular contemporaneous covariance matrix , the (i, j)-th block of which being
139
E(i, t j, t) = i, j. Note that one could allow for random individual effects in expression (7). This would lead to an error-component structure of i, t similar to that used in the panel-data literature. For system (7), we dene a homogeneous SCCF panel model as follows: Denition 1. A panel model is called an homogeneous panel common feature i= j i, j = 1, . . . , N, model if there exists, i = 1, . . . , N, a (n si) matrix ii, t i Xi, t = whose columns span the individual co-feature space, such that is a si-dimensional white noise process for each individual. i, This denition applies to the case where the individual co-feature matrices and hence their column ranks si, are the same across all individuals. A typical dynamic panel data model with xed effects i and deterministic time effects t arises as a special case of (7) when the parameters i, i, i, j and i are the same across entities i (see e.g. Hoogstrate, 1998). In order to clarify the nature of the hypotheses underlying the panel common feature restrictions, in the next subsection, following Groen & Kleibergen (1999) for panel cointegration, we consider a model resulting from sequentially testing and imposing restrictions on a high dimensional unrestricted VECM. A. A Panel VECM Representation We are interested in testing for cointegration and common serial features with respect to n I(1) time series in vector Xi, t within a dynamic model for N individuals i. Without loss of generality, we consider a large VECM with one lag in the rst differences, e.g. a VAR with two lags in levels. The generalization to high order dynamics is immediate by substituting ij by ij(L) in (8) but it makes the notation heavy. We consider the model without any time dummies for sake of simplicity. For t = 1, . . . , T we may write the nNdimensional system as: Xt = 11 . . . 1N 11 . . . 1N X + Xt 1 + ut, t1 N1 . . . NN N1 . . . NN (8)
where Xt = (X 1, t . . . X N, t), ut = (u 1, t . . . u N, t) and Xt 1 = (X 1, t 1 . . . X N, t 1) are vectors of dimension nN 1, or more concisely Xt = urXt 1 + urXt 1 + ut, (9) where ur and ur are nN nN matrices and ut = + t, = ( 1 , . . . , N), t = ( 1, t, . . . , N, t) are nN 1 vectors with t ~ N(0, ).
140
nN nN
11 N1
... ...
1N NN
(10)
When ur = 0, the system (9) is non-cointegrated. The approach presented can be applied to non-cointegrated systems. Obviously, in such system, the WF and SF reduced rank structures are identical. Without imposing any zero block restrictions, the large unrestricted model (8) is not estimable in practice. Consequently, restrictions have to be considered. We rst describe cointegrating restrictions before introducing serial correlation common feature restrictions. 1. Cointegrating Restrictions In A Panel VAR We rst consider restrictions on the long-run matrix ur in the unrestricted VECM. Two types (A and B) of sequences of hypotheses naturally arise in panel data. The hypotheses involved in a sequence can be tested either sequentially or jointly. A1: Absence of long-run Granger-Causality [see Granger & Lin, 1995] between the individual subgroups, i.e. ur is block-diagonal with elements ij = 0 for i j. A2: Cointegration in absence of long-run Granger-causality, i.e. ii = i i, with i and i being n ri matrices of rank ri, i = 1, . . . , N. A3: Homogeneous panel cointegration, i.e. i = 1, i = 1, . . . , N; r = Nr1. B1: Cointegration, i.e. ur = , with and being nN r matrices of rank r. B2: Complete separation in cointegration (see Granger & Haldrup, 1997), i.e. and are block-diagonal with typical blocks i and i respectively, of rank N ri, such that a typical block of is i i as dened in A2, and r = i = 1ri. B3: Homogeneous panel cointegration, i.e. i = 1 ; i = 1, . . . , N; r = Nr1. When the rst two sets of restrictions in either sequence hold, the following restricted structure arises. Xt = 0... 0 11 . . . 1N 1 1 0 + 0 X Xt 1 + ut. (11) t1 0 0 . . . N N1 . . . NN N
When it is appropriate to add a restricted trend in the cointegration space, we replace Xt 1 by X* t 1 = (X t 1, t). For N xed, a likelihood ratio statistic for testing (11) versus (8) can be obtained using the sum of two different conditional likelihood ratio statistics to test the sets of restrictions {A1, A2} or
141
{B1, B2}. Next, homogeneity of panel cointegration can be tested using a likelihood ratio test. A decomposition similar to {A1, A2} is proposed by Groen & Kleibergen (1999). The main problem with this approach is that under A1, that is absence of long-run Granger-causality, the usual tests have an unknown asymptotic distribution, as the possible presence of cointegration interfers with the block-diagonality of ur. On the other hand, once the cointegrating rank in the unrestricted VECM has been xed, a test statistic with separation as the null hypothesis has an 2-asymptotic distribution. It is worthwhile to mention that although model (11) looks rather specic, it is less restrictive than the models used in the dynamic panel literature, where quite frequently, in addition to separation in cointegration, the same parameter structure is assumed to hold across individuals (see inter alia the overview in Phillips & Moon, 1999b). Occasionally, complete separation is relaxed to requiring to be block-diagonal leaving unrestricted (Larsson & Lyhagen, 1999). 2. Common Feature Restrictions Imposing serial correlation common feature and short-run Granger-noncausality restrictions, system (11) becomes: 1* 111 0 . . . 0 0 1 0 . . . 0 0 Xt 1 + 0 0 Xt 1 + ut. X t = N* NNN 0 0... 0 0... N (12) As for cointegrating restrictions, this model may be obtained by considering two of the next three hypotheses under (11). such C1: Serial correlation common features: there exists a (nN s) matrix N that Xt is an s dimensional white noise, with s = i = 1si. C2: Absence of short-run Granger-causality between the individual subgroups: ur is block-diagonal, i.e. ij = 0 for i j. is block-diagonal with the C3: Separation in common features: the matrix i being a typical block on the main diagonal, s = N (si n) matrix i = 1si. 1; i = 1, . . . , N; s = Ns1. i= C4: Homogeneity of common features: Actually the hypothesis C2 is implicit when one stacks VECMs. Restriction C3 is developed in Hecq, Palm & Urbain (1999) for the SCCF as well as for the weak form structure. Here again a likelihood ratio for testing model (12) versus (11) can be obtained as the sum of two conditional likelihood ratio statistics to test either {C1, C2} or {C2, C3}. This means that we can rst test for common cyclical features under the maintained hypothesis of short-run Granger-non-
142
causality C2. Alternatively, we can rst test for absence of short-run causality and then test for SCCF since both sequences of restrictions imply separation in common features. This result is derived from Proposition 3.3. in Hecq, Palm & Urbain (1999) which states that under separation in cointegration and blockdiagonality of this long-run matrix, the presence of common features implies that the co-feature matrix is block-diagonal.
V. GMM ESTIMATION
To test for common features in a time series context, we have the choice between GMM estimators applied to a regression framework and a canonical correlation procedure based on maximum likelihood (ML) estimation. Both methods have their advantages and drawbacks. The ML estimation is fully efcient and likelihood ratio tests are asymptotically most powerful. GMM estimators can be more easily implemented but they are in general not fully efcient. In this section we present a GMM estimator that will be used in our empirical analysis of a bivariate system for consumption and income for the case where at most one serial correlation common feature vector exists. For each individual, let us split Xi, t = (yi, t, zi, t) and let the bivariate DGP be * yi, t = i + i zi, t + i, t zi, t = i(yi * i zi)t 1 + (13)
pi 1 k=1
yi, t 1 +
(i) 1,k
pi 1 k=1
(i) 2,kzi, t 1 + i, t,
(14)
where the second equation for zi, t is just one row of the VECM (11), with normalized cointegrating vector i = [1, * i ]. Both the ys and the zs are autocorrelated as the disturbances i, t depend on lagged values of yi, t, zi, t and on the error correction mechanism. Under the null of serial correlation common features for individual i, i, t is a white noise process and the * i = [1, normalized SCCF vector is given by i ]. In practice (Vahid & Engle, 1993, 1997), after the cointegration analysis in the rst step, the GMM procedure proceeds as follows. Regress the explanatory variables zt on the whole set of instruments (i.e. lags of Xt and cointegrating vectors) in order to obtain the best linear prediction z t. Then regress yt on a constant term and z t. This estimate gives the potential serial correlation i. Finally, one tests for the validity of the common feature vector overidentifying restrictions using Hansens (1982) 2 test.
143
A. Heterogeneous Independent Case When the observations on individuals are assumed cross-sectionally independent, a joint test for the existence of one individual-specic (heterogeneous) common feature vector can be obtained by computing the 2-statistics for the SCCF restrictions for each individual [i ~ 2(i)], with the same number of variables for each i but with the possibility of having a different dynamics and the presence or not of cointegrating vectors. The number of degrees of freedom is then given by i = n(pi 1) + ri (n 1) since si equals one. Using the standard central limit theorem for large N, we then have
N i=1
i ~ N(0, 1)
a
(2)1/2
where =
N k=1
(15)
This procedure is however not appropriate in the presence of cross-correlation, a phenomenon pointed out inter alia by OConnell (1998) in the case of panel unit root tests. The size distortions increase with N and with the crosscorrelation. While these distortions are DGP-dependent, we observe empirical sizes of about 20% (nominal size = 5%) for T = 25 and N = 10 as well as for T = 50 and N = 25 using a Monte Carlo experiment similar to the one presented in Section 6.4 B. Homogeneous and Heterogeneous Case Dependent In most cases disturbances across individuals i will be at least contemporaneously correlated i.e. if some ij 0 for i j, and/or for ii being non-diagonal for some i. For instance, when testing for PPP in panel data, contemporaneous disturbance correlation arises because one country must serve as a benchmark. Also, for instance, for a given country consumption and income cannot be assumed independent. One way to deal with this cross-country correlation is to incorporate a common time dummy in the panel. This solution was pursued by Pedroni (1997b) in the context of panel cointegration test, but it appears that time dummies do not capture all the correlation, see OConnell (1998). Another solution we use here is to account for cross-correlation by using GLS or SUR type corrections. These corrections require that T > N and the asymptotics we consider are mainly based on T while N is xed or at least grows at a lower rate than T. Assuming that all the variables in levels are I(1), we rst test for each individual i the existence of a cointegrating relationship using standard time
144
series-based procedures. In the case the null hypothesis of no-cointegration can be rejected, the cointegration vector(s) are then considered as known in the subsequent analyses. An alternative to the time series based cointegration analysis is to rely on a test procedure designed for cointegrated panel models, a procedure which could possibly be more powerful. The asymptotic arguments used in panel cointegration analysis are however mainly based on large Nasymptotics and independence across units while we are here dealing with xed N cases allowing for dependence across the units. Existing Monte Carlo simulations furthermore reveal (see inter alia McCoskey & Kao, 1998b, Pedroni, 1997b) the occurrence of some problems when cross-correlation exists. Moreover, the properties of common feature test statistics will be affected by the outcome of the cointegration analysis. Indeed, if one erroneously imposes an identical homogeneous cointegrating matrix * i for all i, while for some j cointegration does not hold or holds with a cointegrating matrix different from * i , the likelihood to reject the SCCF restrictions will tend to increase. Before presenting the GMM-estimator, we present the model under common features in general terms. Under separation C3, the model (11) can be written as 1 0 0 2 0 0 0 0 N 0 0
s nN
X1t X2t XNt

nN 1
1 0 0 2 0 0 = 0 0 N 0 0
s nN
u1t u2t uNt

nN 1
(16)
with s =
N i=1
si and ut = (u 1t, u 2t, . . . , u Nt) being IIN(0, ).
Under the homogeneity assumption C4, the model (16) specializes to become 1)Xt = (IN 1)ut. (IN (17)
As in (13) and (14), we partition the vector Xit as [y it, z it], where yit and i is normalized (without zit are si 1 and (n si) 1 subvectors. The matrix i = [Is , * loss of generality) as follows i ]. Under this normalization, the i system (16) can be expressed as
145
y1t y2t yNt

s 1
* 0 0 1 0 * 0 2 = 0 0 * 0 0 N
s (nN s)
z1t z2t zNt

(nN s) 1
ut +
(18)
or more compactly (19) yt = Bzt + vt i ), zt = [z with yt = [y 1t, . . . , y Nt], B = diag(* 1t, . . . , z Nt], vt = ut, = diag( i). Transposing (19) and writing the model for a sample of T observations, we get
Y = Z
Ts
+V
Ts
(20)
T (nN s)(nN s) s
or in vectorized form
y* =
Ts 1 Ts (ns isi2)(ns isi2) 1
Z*
+ v*
Ts 1
(21)
with y* = vec(Y), v* = vec(V), Z* = diag(Isi Zi) with Zi = [zitl], of dimension T (n si), with t = 1, . . . , T, l = 1, . . . , n si; and being a vector * with typical i-th subvector being equal to vec( i ). Under the homogeneity assumption C4, * i = * 1, i = 1, . . . , N, s = Ns1, the system (21) specializes to become y* = Z* rr + v* with the [TNs1 s1(n s1)] matrix Is1 Z1 Is1 Z2 Z* r= ... Is1 ZN * and the [s1(n s1) 1] vector r = vec( 1). The vector of parameters and r can be estimated by GMM provided we have a (Ts k) matrix of instrumental variables W such that EWv* = 0 and k is equal to or larger than the number of unknown parameters in (or r). The GMM estimator solving Wv* = 0 using the weighting matrix S is given by GMM = [Z*WS 1WZ*] 1Z*WS 1Wy*. (23) The optimal weighting matrix is S = WW, where = Ev*v* = IT v, . When is unknown, it will have to be replaced by a consistent v = (22)
146
GMM with optimal weighting estimate. The asymptotic covariance matrix of matrix S is given by GMM) = [Z*W(WW) 1WZ*] 1. (24) Var( Under homogeneity C4, r can be estimated by expression (23) replacing Z* by Z* r. When the number of instruments k is strictly larger than the number of parameters (or r) to be estimated, these overidentifying restrictions can be tested using the well-known minimum distance criterion
min (v*W)(WW) 1(Wv*),
(25)
which has an asymptotic 2-distribution with the number of degrees of freedom being equal to k minus the number of estimated parameters. Some remarks on the choice of the instruments have to be made. We can determine the order pi of the VAR for each country i using for instance information criteria. The lagged rst differences of Xit, i = 1, . . . , pi 1, and the lagged long-run relations can be used to yield n(pi 1) + ri, instruments Wi for Zi in (19) and taking W = diag(Tsi, Wi) where ri is the cointegrating rank of individual i. As is well-known, the OLS estimator regressing y* on Z*, where the Z* are the projections of Z* on W, can be obtained as a GMM estimator by selecting S = ITs in (23) and taking W(WW) 1W as instrument. = W(W* 1W) 1 Similarly, the GLS estimator regressing y* on Z* 1 W* Z*, with * being the disturbance covariance matrix of the (multivariate) regression of Z* on W, can be obtained from (23) by taking S = and using as instruments W(W* 1W) 1W* 1 instead of W. In the empirical analysis in Section VII, we consider a xed effects model because in the macroeconomic application, we study the population and not a sample. Adding xed effects to the model (21) for the case which we analyze, e.g. for si = 1, i = 1, . . . , N and n = 2, yields y = Z [ + Z] + Z* r r + v*, (26) where Z = T IN and Z = IT N, with T and N being unit vectors of dimension T and N respectively. Let JN denote an N N matrix of ones, so ZZ = IT JN and the projection of JN on Z is IT JN with JN = JN/N. This matrix averages over individuals. Also dene time means by ZZ = JT IN and is J I . It is shown in Baltagi (1995, p. 28) that the projection of JT on Z T N rQ 1QZ* r ) 1Z r*Q 1Qy, r, GMM = (Z* (27) T IN for model with only individual effects and where Q = INT J N + J T J N when time dummies are present. The Q = IT IN JT IN IT J 1 W) 1W* 1Z* estimator (27) with Z* r = W(W* r will be indicated as the
147
GLS-LSDV estimator. When the matrix is replaced by the identity matrix, a less-efcient estimator arises which will be denoted as the LSDV estimator. r, GMM with optimal weighting matrix S is The asymptotic covariance matrix of then given by r, GMM) = [Z* r QW(WW) 1WQZ* r ] 1. Var( (28)
A test for the validity the overidentifying restrictions is obtained using (25) and is readily seen to be a test for the null hypothesis of C4, e.g. for the null of i= 1; i = 1, . . . , N, with s = Ns1, si = s1 = 1, homogeneity of common features: i = 1, . . . , N. In this specic case, the number of degrees of freedom for the overidentifying restrictions test (25) is given by
(n 1)(N 1) where n, pi, ri are respectively the number of variables, the number of lags and the number of cointegrating relations for each i. Note that the factor (n 1)(N 1) arises as a consequence of the pooled estimation of the common feature vector. Imposing a common co-feature vector actually decreases by (n 1)(N 1) the number of parameters to be estimated. More generally, one could naturally extend the analysis (in the case n > 2) and consider similar analyses for s1 = 1, . . . , n 1. Sequentially testing, for s1 = 1, . . . , n 1, the validity of the underlying overidentifying restrictions with (25), provides a direct way to test the number of common co-features in a GMM set-up, provided we rst properly normalize the co-feature matrix as above. A somewhat similar use of GMM for the detection of the dimension of the common feature space, albeit in a pure time series context, is discussed in Vahid & Engle (1997). In the next section, we evaluate the merits of this analysis (for si = s1 = 1, i = 1, . . . , N) in a small Monte Carlo experiment.
N i=1
[n(pi 1) + ri (n 1)] +
VI. MONTE CARLO SIMULATIONS

In this section we present some illustrative Monte Carlo evidence on the usefulness of the common feature test statistic (25) presented above for panel data. The data are generated as if there exists a huge VECM with both common feature and cointegrating restrictions. Under the null of reduced rank structures, the bivariate DGP for each of N individuals assumes the existence of one cointegrating vector and of a single common feature vector. It has the form:
148

yi, t 1 0.25 = + (1 zi, t 2 0.5 0.5 (0.6 1
1)
y1, t 1 z1, t 1
0.3)
y1, t 1 i1, t + , z1, t 1 i2, t
where the s are generated from uniform distributions 1 ~ U(0, 0.3), 2 ~ U( 0.25, 0.15) so that E(1) = 0.15 and E(2) = 0.05. The normalized = (1, 0.5) and the normalized cointegration common feature vector is vector is simply = (1, 1). For each individual i, (i1, t, i2, t) is bivariate Gaussian with covariance matrix ii. The cross-contemporaneous correlation matrices between individual i and j are all equal to ij so that the panel VECM covariance matrix is given by (10) with ii =

1 0.8 0.8 1
ij =
0.7 0.6
0.6 . 0.75
We have added a heterogeneous structure increasing5 with N. Figures 1 and 2 illustrate a realization of the DGP for 10 individuals and two variables and then they compare processes with (Fig. 2) and without (Fig. 1)
Fig. 1.
A Realization of the GDP for 10 Individuals.
149
Fig. 2.
A Realization of the DGP with Additional Heteroscedasticity.
this additional heteroscedasticity. From this DGP we see that under the assumption of reduced rank the short run dynamic matrices (for each i) are simply given by
0.30 0.60
0.15 , while under the alternative we chose to 0.30
arbitrarily x one element to zero:
0.30 0.00 . 0.60 0.30
We consider three sample sizes, i.e. T = 10, 25 and 50, and ve cases for the number of individuals, i.e. N = 1, 2, 5, 10 and 25. We report the median and the spread (interquartile range) of the bias of the GMM panel estimator. We also r, GMM. We report the empirical report the median of the standard deviation of size (nominal being 5%) as well as the empirical size-adjusted power for overidentifying restrictions test statistics. df denotes the number of degrees of freedom. Due to the huge computational time required for these simulations, 5,000 replications were used for N = 1, 2, 5; 2,000 for N = 10 and 1,000 for N = 25. The results are presented in Table 2. One can directly observe that the bias is small and decreases when both T and/or N increase. The accuracy of estimates, measured both by the spread and the standard deviation of the
150
Table 2. Monte Carlo Results (GMM estimation and test statistic)

biasMedian N=1 T = 10 T = 25 T = 50 T = 10 T = 25 T = 50 T = 10 T = 25 T = 50 T = 25 T = 50 T = 50 0.0123 0.0101 0.0067 0.0136 0.0069 0.0034 0.0045 0.0044 0.0021 0.0022 0.0020 0.0002 biasQ75Q25 0.2228 0.1387 0.0944 0.1817 0.1057 0.0726 0.1409 0.0751 0.0460 0.0658 0.0377 0.0398 r,GMM)Median ( 0.156 0.098 0.070 0.106 0.079 0.057 0.067 0.060 0.047 0.046 0.038 0.029 2 (df) (2) (2) (2) (5) (5) (5) (14) (14) (14) (29) (29) (74) size 7.88 5.58 5.54 4.98 6.18 5.72 3.96 5.68 5.74 4.70 4.80 5.80 size-adj. power 9.90 19.78 34.68 8.56 16.58 31.52 7.26 12.52 24.82 11.00 21.55 13.80
N=2
N=5
N = 10 N = 25
estimate, also increases with T and/or N. We interpret these illustrative ndings as evidence in favor of the pooling estimator. No substantial size distortions are observed. Remark that the values of N we have retained in these simulations are clearly too small to assess the validity of a central limit theorem based on large N asymptotic.
VII. EMPIRICAL ANALYSIS

The data we use are taken from the Penn World Tables Mark 5.6 (see Summers & Heston, 1991).6 Thanks to the homogeneity in their denition, these data are extremely useful and have been extensively used in empirical literature. However the data are certainly not free of measurement errors because the price to pay for obtaining long series of homogeneous data for more than 150 countries is the reliance on a set of hypotheses, approximations and interpolations. Because of both the quality of the data as well as the underlying theoretical motivation, we limit our analysis to 22 OECD countries for the sample period 19501992 (up to 1991 for Greece and 1990 for Portugal).7 The data extracted are Y = RGDPL: Real GDP per capita (Laspeyres index) in 1985 international prices and C = C: Real Consumption share of GDP in 1985 international prices Y/100. This last operation is necessary to get the consumption in level and not in percentage of income.8 Figure 3 plots the 44
151
Fig. 3.
Consumption and Output Series for the 22 OECD Countries.
series, namely consumption and income variables for the OECD countries. The picture also pleas in favor of disposing tools in order to modeling this information. Lower case c and y denote natural logarithms of C and Y respectively. Table 3 reports time series statistics for each country. The rst column of Table 3 lists in alphabetical order, the names of the countries as well as the date of joining OECD.9 Column 2 gives the quality ranking of the data as presented in Summers & Heston (1991). It is seen that for the most part, the quality of the data is reasonable. Columns 3 and 4 give the value of the Augmented DickeyFuller unit root test for respectively consumption and income. All tests are based on both a constant term and a trend. The number of lags necessary to whiten the residuals is given in parentheses. Columns 5 and 6 give respectively the value of the Engle-Granger Augmented Dickey-Fuller cointegrating test for each country separately and the long-run elasticity of consumption as a dependent variable. Column 7 gives the order of the VAR(pi) in level, where pi is determined using multivariate Hannan-Quinn (HQC) criteria. These lags, as well as the presence of an error correcting mechanism term, will determine the instruments to be used in common features test statistics. In Table 3, a * indicates that individual unit root or cointegration test statistics reject the null at a 5% nominal level. It emerges that, except for
152
Table 3. Time Series Statistics (Individual countries)

Qual. Australia (1971) Austria (1961) Belgium (1961) Canada (1961) Denmark (1961) Finland (1969) France (1961) Germany (1961) Greece (1961) Iceland (1961) Ireland (1961) Italy (1961) Japan (1964) Luxembourg (1961) Netherlands (1961) New Zealand (1973) Norway (1961) Portugal (1961) Spain (1961) Sweden (1961) Switzerland (1961) Turkey (1961) UK (1961) USA (1961) A A A A A A A A A B+ A A A A A A A A A A B+ C A A ADF ct 1.21(4) 0.82(0) 1.43(1) 1.50(1) 0.94(0) 2.48(1) 0.11(2) 2.18(2) 0.58(0) 2.64(1) 2.54(1) 0.61(1) 0.91(0) 1.45(1) 0.71(2) 2.26(0) 1.29(1) 3.54(3)* 1.25(0) 0.70(1) 0.03(4) 3.26(2) 3.61(1)* 1.75(0) ADF yt 0.93(2) 1.25(2) 0.74(1) 1.80(1) 0.94(0) 0.20(2) 0.04(1) 3.10(2) 0.01(0) 2.23(1) 2.82(1) 0.77(1) 0.48(1) 3.32(4) 0.20(2) 1.52(0) 1.76(1) 2.95(3) 1.34(0) 0.30(1) 1.69(2) 3.48(0)* 3.62(1)* 2.05(0) EG 1.46(1) 3.59(0)* 2.36(0) 3.89(1)* 3.69(0)* 1.69(3) 1.96(0) 1.69(2) 0.79(0) 4.52(0)* 3.76(2)* 1.86(1) 4.75(1)* 2.16(4) 3.07(1) 5.93(0)* 1.83(1) 3.07(3) 2.99(0) 3.58(1)* 3.28(0) 1.73(0) 2.13(0) 4.08(0)* * i 0.95 1.00 0.94 1.00 0.82 0.98 0.98 1.07 0.97 1.04 0.81 1.09 0.92 1.34 1.08 1.02 0.80 0.88 0.94 0.81 0.92 0.85 1.04 1.15 HQC 3 1 1 1 1 4 2 2 1 1 1 4 4 4 4 1 1 3 1 2 2 1 3 2
Portugal, UK and Turkey, we cannot reject the unit root hypothesis for consumption and income. Using the Engle-Granger cointegration test, the null hypothesis of non-cointegration is rejected for nine countries with long-run 10 to 1. Consequently, we will use the cointegrating vectors elasticity * i close as instruments in six different versions: four homogeneous cases and two heterogeneous ones. We proceed in two steps. In the rst step the cointegrating vectors are estimated. They are used as instruments in the second step to estimate the common feature vectors. The results are reported in Table 4. The homogeneous cases refer to a panel estimation of a common cointegrating vector, that is parameters are assumed to be the same across countries and the contemporaneous disturbance correlation across countries and across variables for a given country is ignored. Absence of short-run Granger-causality between countries is assumed throughout steps 1 and 2.
153
Because most panel cointegration test statistics assume independence across individuals, we cannot, strictly speaking, rely on these panel cointegration test statistics. However because the estimator of the cointegrating vectors is still consistent we use them to get estimates for four different cases. As Table 3 shows even when the absence of cointegration is not rejected, the elasticity is close to one. We rst analyze a version in which we assume that there exists a homogeneous cointegrating relationship for all the countries with a coefcient * equal to one (see upper panel of Table 4). Similar results are obtained using Johansens MLE based procedure. A second panel cointegration test uses the group mean estimator (GM) of Pesaran et al. (1997). This means that we average cointegrating vectors over the 22 individuals. Table 4. Common Features within 22 OECD Countries
r,GM * i = 1, (i) p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i 0.770 0.769 0.770 0.829 0.804 0.793 0.870 0.837 0.822 0.855 0.821 0.804 0.814 0.726 0.755 0.865 0.784 0.775 NGM 3.71 6.14 4.43 5.36 6.54 4.95 5.74 5.12 3.93 6.03 6.25 4.94 6.89 6.16 4.46 1.59 3.89 2.72 r,GMM 0.745 0.660 0.704 0.718 0.768 0.670 0.710 0.728 0.814 0.687 0.727 0.738 0.782 0.677 0.715 0.733 0.782 0.647 0.696 0.707 0.810 0.682 0.734 0.750 r,GMM) ( 0.050 0.036 0.031 0.036 0.051 0.036 0.031 0.036 0.050 0.036 0.031 0.036 0.051 0.036 0.031 0.036 0.053 0.036 0.031 0.037 0.056 0.037 0.033 0.040 Test 148.98 173.65 211.27 156.04 146.67 176.61 214.06 156.92 131.96 170.16 206.93 145.01 142.93 175.97 213.50 155.12 138.45 158.74 210.03 146.50 115.25 144.00 192.33 131.56 df 65 109 153 93 65 109 153 93 65 109 153 93 65 109 153 93 52 96 140 80 52 96 140 80 p-val < 0.001 < 0.001 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.002 < 0.001 < 0.001 < 0.001 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.001 0.002 < 0.001
GM = 0.979 * i = * (i)
OLS = 0.939 * i = * (i)
* i = LSDV = 0.968 (i)
j * i = * (i,j with cointegration) * i=1 (i with cointegration)
154
A third alternative uses the usual OLS estimator. The last one allows for intercept heterogeneity and is the usual LSDV estimator. Note that the pooled FM-OLS estimator proposed by Pedroni (1997a), which assumes independence across units, gives a point estimate of 0.971 for the 22 OECD countries and 1.021 for the G7 countries, the latter being not signicantly different from one. Both results are very close to those obtained with the LSDV and OLS estimators so that the results of the common cyclical feature analysis obtained with Pedronis FM-OLS estimator are not reported. For the two heterogeneous cases we impose cointegration for the nine countries for which the Engle-Granger ADF test is signicant. In step 2, we take as an instrument, cointegrating vectors for countries for which we reject the null. Notice that Phillips-Hansen Fully Modied OLS estimation was also used to test formally the assumption of unit long-run elasticity. The null of unit long-run elasticity was formally rejected in all cases of cointegration but for three (Austria, Canada and New-Zealand). Two different cases are considered: For the nine countries we take the estimated value of * i given by the longrun regression. We x these values to 1. The maximum lag length for a country is four, so that p* = (p1) = 0, 1, 2 or 3 for some countries. The following cases are considered: p* is xed uniformly to respectively 1, 2, 3 p* is xed to the value determined using the HQ criterion. Note that over-estimating the lag length will certainly reduce the power of the test statistics (Beine & Hecq, 1999). The results of the two panel common feature statistics are presented. For the heterogeneous cases, the rst two r, GM) as well as the columns present the group mean estimates (denoted by value of the Normal test statistics (NGM) in (15) which tests for the signicance of one common feature vector. The next columns present the value of common r, GMM), the feature elasticity for the homogeneous dependent case (denoted by associated standard errors denoted by (r, GMM), as well as the value of the test of the overidentifying restrictions implied by the common feature vector (column labelled Test) asymptotically 2(df) under the null, with the column df indicating the degrees of freedom of these statistics. The nal column labelled p-val reports the associated p-values. Note that in the second step, we always assume the occurrence of nonzero contemporaneous disturbances correlation. r, GM are too high r, GMM and It appears that the estimated coefcient compared with a priori expectations. Moreover we reject the null of a panel
155
common feature model for both test statistics. Table 5 presents the results for the G7. The results are similar to those for the panel of 22 countries. However in several situations we cannot reject the null of one homogeneous common features vector. In these cases, we imposed the unlikely hypotheses of an homogeneous cointegrating vector with a lag order uniformly xed to p* = 3. Finally, we want to notice the implications for empirical modeling that follow from a restriction between the number of variables n and the sum of cointegrated vectors and common features vectors. From Vahid & Engle (1993), Theorem 1, it follows that the common feature space and the cointegration space are linearly independent. This means that the sum of the number of common feature vectors (s) and of the number of cointegrating vectors (r) should be less than or equal to the number of variables (n): r + s n. In a panel context under the absence of long and short-run Granger causality, this has obvious but different implications depending on whether common features vectors and cointegrating vectors are homogeneous or heterogeneous.
Table 5.
Common Features within G7 Countries

r,GM NGM 2.47 2.37 1.81 1.64 1.815 1.49 1.75 1.89 1.51 6.02 3.58 2.13 2.68 2.60 1.66 r,GMM 1.042 0.856 0.872 0.884 1.021 0.857 0.878 0.892 1.036 0.856 0.876 0.890 0.894 0.723 0.787 0.800 1.029 0.859 0.894 0.917 r,GMM) ( 0.087 0.060 0.052 0.058 0.082 0.060 0.052 0.057 0.084 0.060 0.052 0.057 0.074 0.053 0.047 0.051 0.089 0.062 0.053 0.061 Test 32.83 53.70 67.05 50.84 31.51 50.25 62.75 46.22 32.06 51.11 63.87 47.84 49.07 52.66 64.46 46.61 27.69 47.49 60.14 43.97 df 20 34 48 30 20 34 48 30 20 34 48 30 16 30 44 26 16 30 44 26 p-val 0.035 0.017 0.036 0.010 0.048 0.036 0.075* 0.029 0.043 0.030 0.062* 0.021 < 0.001 0.006 0.024 0.008 0.034 0.022 0.053* 0.015
LSDV * i = 1 = * (i)
p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i p* = 1 p* = 2 p* = 3 p* = p* i
0.866 0.763 0.755 0.893 0.777 0.766 0.882 0.771 0.762 0.818 0.710 0.737 0.875 0.753 0.764
GM = 1.035 * i = * (i)
OLS = 1.023 * i = * (i)
j * i = * i,j with size cointegration) * i = * j=1 (i,j with size cointegration)
156
A misspecication of the number of homogeneous cointegrating vectors may for instance too heavily constrain the dimension of the homogeneous common feature space and lead to awed inference regarding the existence of common features. A last remark seems in order. Although we can formally reject the existence of a common homogeneous co-feature relation in this OECD data set, one should be aware that our results do not per se imply the absence of SCCF for some of the countries taken individually.
VIII. CONCLUSION
In this chapter we extended the serial correlation common feature analysis to nonstationary panel data models. Concentrating upon the xed effect model, we dened homogeneous panel common feature models. We give a series of steps allowing to implement these tests. We then apply this framework when investigating the liquidity constraints model for 22 OECD and G7 countries. At a 5% nominal level, we reject the presence of a panel common feature vector. From the empirical analysis we can draw several (tentative) conclusions: First, in a country by country analysis for approximately slightly less than 50% of the countries in the sample, there is evidence of cointegration between consumption and income. The cointegrating vector appears to be homogeneous across these countries with a long-run consumption elasticity close to one. Second, for the sample of 22 countries, the existence of one homogeneous SF (SCCF) common feature vector is rejected in most instances when using the test proposed in (15). For the sample of G7 countries, in several instances, the occurrence of a homogeneous SF common feature vector is not rejected. Notice that this restriction is obviously less restrictive when it only applies to seven countries. However the p-values are quite low and the non rejection of the null hypothesis occurs when the model might be misspecied in particular because we have maintained a homogeneous lag length of 3. Third, the overidentifying restrictions implied by the assumption of a homogeneous common feature vector are rejected in all instances in the sample of 22 countries. For the G7 countries, again there is occasionally evidence in favor of the overidentifying restrictions. Again, it is not surprising to see that the assumption of homogenous common features is rejected more frequently than the assumption of homogenous cointegration. In the long-run consumption and income are closely linked to each other, short-run deviations are generally possible and can be realized through saving or borrowing.
157
Our model representation is not stricto sensus a dynamic panel because only a part of the dynamics is common to all individuals. However it does part of the job. Indeed while no size distortions have been noticed in our Monte Carlo results, we can increase the power of test statistics, by going a step further towards dynamic panel data if the null hypothesis of panel common-cyclical feature model is not rejected. In the opposite case, it is not worth imposing further common restrictions if the null is rejected. This is a clue for considering less restrictive models like heterogeneous or group homogeneous models. A bootstrap procedure could certainly be undertaken to nd the distribution. This is also perhaps the place to choose more exible models like the nonsynchronous common cycle model (Vahid & Engle, 1997) or the weak form common feature analysis (Hecq, Palm & Urbain, 1997).
ACKNOWLEDGMENT
Support from METEOR through the research project Dynamic and Nonstationary Panels: Theoretical and Empirical Issues is gratefully acknowledged. The authors want to thank two anonymous referees and the coeditor for useful comments of a previous version of this paper. The usual disclaimer applies. The GAUSS routines and the data that have been used in this paper are available from http://www.employ.unimaas.nl/j.urbain
NOTES
1. Note that Vahid & Engle (1997) have extended their framework to the case where a linear combination is a MA(q) process and not a white noise. They labelled this model non-synchronous common cycle. 2. The rst step checks for the presence of cointegrating relationships and then, given the estimated cointegration relations, the common feature analysis is carried out in a second step. An alternative is to use a joint estimation procedure that exploits both the cointegration and common features restrictions using a switching algorithm (Hansen & Johansen, 1998; Hecq, 1999). 3. See Anderson and Vahid (1996) for the connection between GMM and canonical correlation estimators. 4. Complete results are available upon request. 5. The operation is the following. Consider an N dimensional vector with increment four g = (1, 5, 9 . . .). We form an nN nN matrix G = gg R with R an n n matrix with all elements equal to 1. Then the heteroskedasticity disturbance covariance matrix * is given by * = G, with given in (10) and the elementwise product or Hadamard product. 6. The data may be downloaded via different internet sites such as http://www.nber.org/pwt56.html or http://datacentre.epas.utoronto.ca:5620/pwt. 7. Because of computation facility, we have balanced the panel in this study and we did not consider either Greece and Portugal.
158
8. We did not consider here a slightly different model in which real government expenditures are substracted from output. Indeed, as raised by Evans & Karas (1996b), the model should be extended to take care of the potential substitutability or complementarity between private and public goods. Without a ne distinction of the components of government expenditures, it might be desirable to extend the model to take into account a third variable. It is also possible to consider a simple alternative model where all the public goods are substitutable to private one by substracting G from Y. 9. Other countries joined the OECD. This was the case of the Czech Republic in 1995, Korea in 1996, Poland 1996 and Mexico 1994. We drop them because the ending year is 1992 in our data set. Also note that OECD has its origin in the Organization for European Economic Cooperation which grouped European Countries. This organization was charged with administering United States aid, under the Marshall Plan, to reconstruct Europe after the World War II. Consequently, for countries that did not participate at the beginning in this project, homogeneity of cointegration and/or common features might be rejected for that reason. 10. As noted in Section 4, the main part of the approach presented in this paper also applies to non-cointegrated systems.
REFERENCES
Ahn, S. K. (1997). Inference of Vector Autoregressive Models with Cointegration and Scalar Components. Journal of the American Statistical Association, 92, 350356. Anderson, H., & Vahid, F. (1996). Testing Multiple Equation Systems for Common Nonlinear Components. Working paper, Department of Economics, Texas A&M University. Banerjee, A. (Ed.) (1999). Testing for Unit Roots and Cointegration Using Panel Data: Theory and Applications. Oxford Bulletin of Economics and Statistics, 61, 607629. Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley. Beine, M., & Hecq, A. (1997). Asymmetric Shocks Inside Future EMU. Journal of Economic Integration, 12, 131140. Beine, M., & Hecq, A. (1998). Codependence and Convergence, an Application to the EC Economies. Journal of Policy Modeling, 20, 403426. Beine, M., & Hecq, A. (1999). Inference in Codependence: Some Monte Carlo Results and Applications. Annales dEconomies et de Statistique, 54, 6990. Campbell, J. Y., & Mankiw, N. G. (1990). Permanent Income, Current Income, & Consumption.Journal of Business and Economic Statistics, 8, 265279. Campbell, J. Y., & Mankiw, N. G. (1991). The Response of Consumption to Income: A CrossCountry Investigation. European Economic Review, 35, 723767. Candelon, B., & Hecq, A. (2000). Stability of the Unemployment-Activity Relationship In: A Codependent System. Applied Economics Letters, forthcoming. Engle, R. F., & Kozicki, S. (1993). Testing for Common Features (with comments). Journal of Business and Economic Statistics, 11, 369395. Engle, R. F., & Watson, M. W. (1981). A One-Factor Multivariate Time Series Model of Metropolitan Wages.Journal of the American Statistical Association, 76, 545565. Evans, P., & Karras, G. (1996a). Convergence Revisited. Journal of Monetary Economics, 37, 249265.
159
Evans, P., & Karras, G. (1996b). Private and Government Consumption With Liquidity Constraints. Journal of International Money and Finance, 2, 255266. Geweke, J. (1977). The Dynamic Factor Analysis of Economic Time Series. In: D. J. Aigner & A. S. Goldberger (Eds), Latent Variables in Socio-Economic Models.Amsterdam: NorthHolland. Gouriroux, C., & Peaucelle, I. (1993). Sries codpendantes: application lhypothse de parit du pouvoir dachat. In: Macroconomie}, Dveloppements Rcents. Economica: Paris. Granger, C. W. J., & Lin, J. L. (1995). Causality in the Long Run. Econometric Theory, 11, 530536. Granger, C. W. J., & Haldrup, N. (1997). Separation in Cointegrated Systems and P-T Decompositions. Oxford Bulletin of Economics and Statistics, 59, 449463. Greene, W. H. (1993). Econometric Analysis. New York: MacMillan. Groen, J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of Vector Error Correction Models. Discussion Paper TI 99055/4, Tinbergen Institute, Erasmus University Rotterdam. Hamilton, J. D. (1994). Time Series Analysis. Princeton: Princeton University Press. Hansen, L. P. (1982). Large Sample Properties of Generalized Method of Moment Estimators. Econometrica, 50, 10291054. Hansen, P. R., & Johansen, S. (1998). Workbook on Cointegration. Oxford: Oxford University Press. Hecq, A. (1999). On the Usefulness of Considering Common Serial Features and Cointegrating Restrictions. Working paper, University of Maastricht RM/99/017. Hecq, A., Palm, F. C., & Urbain, J. P. (1997). Testing for Common Cycles in VAR Models with Cointegration. Working paper, University of Maastricht RM/97/031 (revised 1998). Hecq, A., Palm, F. C., & Urbain, J. P. (1999). Separation and Weak Exogeneity in Cointegrated VAR Models with Common Features. mimeo, University of Maastricht. Hecq, A., Palm, F. C., & Urbain, J. P. (2000). Permanent-Transitory Decomposition in VAR Models with Cointegration and Common Cycles. Oxford Bulletin of Economics and Statistics, forthcoming. Hoogstrate, A. J. (1998). Dynamic Panel Data Models: Theory and Macroeconomic Applications. Ph. D.Thesis, University of Maastricht. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels. mimeo, University of Cambridge. Issler, J. V., & Vahid, F. (1996). Common Cycles in Macroeconomic Aggregates. mimeo. Jobert, T. (1995. Tendances et cycles communs la consommation et au revenu: Implications pour le modle de revenu permanent. Economie et Prvision, 121, 1938. Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press. Kugler, P., & Neusser, K. (1993). International Real Interest Rate Equalization: A Multivariate Time-Series Approach. Journal of Applied Econometrics, 8, 163174. Kunst, R., & Neusser, K. (1990). Cointegration in Macroeconomic System. Journal of Applied Econometrics, 5, 351365. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 40, 144. Konishi, T., & Granger, C. W. J. (1993). Separation in Cointegrated Systems. Working paper, Department of Economics, University of California-San Diego
160
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite Sample Properties. Working paper, Department of Economics, University of Calfornia-San Diego. Larsson, R., & Lyhagen, J. (1999). Likelihood-Based Inference in Multivariate Panel Cointegration Models. Working paper 331, Stockholm School of Economics, SSE. Lumsdaine, R. L., & Prasad, E. (1997). Identifying the Common Components in International Economic Fluctuations. NBER Working paper 5984. Ltkepohl, H. (1991). Introduction to Multiple Time Series Models. Berlin: Springer Verlag. McCoskey, S., & C. Kao. (1998a. A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 5784. McCoskey, S., & Kao, C. (1998b). A Monte Carlo Comparison of Tests for Cointegration in Panel Data. mimeo. OConnell, P. (1998). The Overvaluation of Purchasing Power Parity. Journal of International Economics, 44, 119. Pedroni, P. (1997a). Fully Modied OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper, Department of Economics, Indiana University. Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power Parity. Working paper, Department of Economics, Indiana University. Pesaran, M. H., Shin, Y., & Smith, R. P. (1997). Pooled Estimation of Long-Run Relationships in Dynamic Heterogenous Panels. Working paper, Department of Economics, University of Cambridge. Pesaran, M. H., & Smith, R. P. (1995). Estimating Long-Run Relationships From Dynamic Heterogenous Panels. Journal of Econometrics, 68, 79113. Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 10571111. Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some Recent Developments. Econometric Reviews, forthcoming. Singleton, K. (1980). A Latent Time Series Model of the Cyclical Behavior of Interest Rates. International Economic Review, 21, 559575. Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 19501988. Quarterly Journal of Economics, 106, 327368. Tiao, G. C., & Tsay, R. S. (1989). Model Specication in Multivariate Time Series. Journal of Royal Statistical Society (series B), 51, 157213. Vahid, F., & R. F. Engle (1993). Common Trends and Common Cycles. Journal of Applied Econometrics}, 8, 341360. Vahid, F., & R. F. Engle. (1997). Codependent Cycles. Journal of Econometrics, 80, 199221. Vahid, F., & Issler, J. V. (1999). The Importance of Common-Cyclical Features in VAR Analysis: A Monte-Carlo Study. Presented at ESEM99 in Madrid, Spain.
THE LOCAL POWER OF SOME UNIT ROOT TESTS FOR PANEL DATA
Jrg Breitung
ABSTRACT
To test the hypothesis of a difference stationary time series against a trend stationary alternative, Levin & Lin (1993) and Im, Pesaran & Shin (1997) suggest bias adjusted t-statistics. Such corrections are necessary to account for the nonzero mean of the t-statistic in the case of an OLS detrending method. In this chapter the local power of panel unit root statistics against a sequence of local alternatives is studied. It is shown that the local power of the test statistics is affected by two different terms. The rst term represents the asymptotic effect on the bias due to the detrending method and the second term is the usual location parameter of the limiting distribution under the sequence of local alternatives. It is argued that both terms can offset each other so that the test has no power against the sequence of local alternatives. These results suggest to construct test statistics based on alternative detrending methods. We consider a class of t-statistics that do not require a bias correction. The results of a Monte Carlo experiment suggest that avoiding the bias can improve the power of the test substantially.
I. INTRODUCTION
In a panel data set, a variable yit is observed for cross section units i = 1, . . . , N in t = 1, . . . , T time periods. A well known problem with such data is
161
162
JRG BREITUNG
unobserved heterogeneity (e.g. Hsiao (1986) and Baltagi (1995)). In a univariate time series context heterogeneity may result in individual specic mean and short run dynamics. For illustration consider an autoregressive process of the form yit = i + iyi, t 1 + it , (1) where the error term it is assumed to be uncorrelated across i and t. In this model individual heterogeneity is represented by the individual specic 2 parameters i, i and 2 i = E(it). If there are no further assumptions on the parameters, then the data for each cross section unit can be analyzed separately by running N different regressions. In this case, we take no advantage from pooling the data and, thus, inference may be very inefcient. The other extreme is that we ignore a possible heterogeneity altogether and estimate a pooled 2 regression with 1 = = N, 1 = = N and 2 1 = = N. Of course, ignoring heterogeneity in the data may result in biased estimates (e.g. Baltagi (1995) p. 3f). Traditional panel data analysis adopts a compromise between these two extremes and assumes that individual heterogeneity can be represented by an individual specic intercept i alone. Furthermore, one often encounters additional assumptions on the individual effect i, for example, that it is random and uncorrelated with the regressors. The latter model is known as random-effects model. It is not surprising that early work on tests for unit roots in panel data starts from the Dickey-Fuller type regression with individual specic intercept (e.g. Breitung (1992)). Levin & Lin (henceforth: LL) (1993) and Im, Pesaran & Shin (henceforth: IPS) (1997) consider more general models by allowing for individual specic short run dynamics and time trends. It is well known that the usual dummy variable estimator (or within-group estimator) of dynamic models suffers from the so-called Nickell bias (Nickell 1981). The same is true if individual specic time trends are estimated by using the dummy-variable approach. LL (1993) construct a bias adjusted t-statistic to test the null hypothesis of a unit root process. Unfortunately, bias adjusted test statistics for the model with a constant or a time trend suffer from a severe loss of power. For example, the power of the LL (1993) test without an intercept (and thus without the need to correct for the Nickell bias) against a stationary alternative with an autoregressive coefcient of 0.9 is virtually unity for N = 25 and T = 25. For the bias adjusted test statistic in the model with individual specic intercept (trend), the power against the same alternative drops to 0.45 (0.25). Furthermore IPS (1997) observe a serious size bias if the bias adjusted LL statistic is augmented with lagged differences.
The Local Power of Some Unit Root Tests
163
If there is only a constant in the model, the problem is easily resolved by subtracting the rst observation instead of the mean. As argued in Schmidt & Phillips (1992), the rst observation is the best estimator of the constant under the hypothesis of a random walk. Furthermore, subtracting the rst observation instead of the mean avoids the Nickell bias and, therefore, the test does not require a bias correction (cf. Breitung & Meyer (1994)). To study the asymptotic properties we compare the local power of the bias adjusted test statistics. Our analysis demonstrates that the local power of the test depends on two different terms. The rst term represents the asymptotic effect on the bias due to the detrending method and the second term is the usual location parameter of the limiting distribution under the sequence of local alternatives. It is shown that if the long-run variances are estimated consistently, both terms cancel out each other so that the test statistic is centered around zero under the local alternative. Levin & Lin (1993), suggest to estimate the long-run variances by using a non-parametric estimator computed from the rst differences of the series. An attractive property of this approach is that under the alternative the non-parametric estimator tends to zero so that the resulting test statistic has power against the sequence of local alternatives. A class of tstatistics is suggested that do not require a bias correction. These tests are based on the t-statistic from a simple least-squares regression of transformed variables and it is shown that the limiting distribution of these tests is standard normal. The results of our Monte Carlo experiments suggest that avoiding the detrending bias may improve the power of the test substantially. The rest of this chapter is organized as follows. In Section II the details of the test statistics are given. The local power of the tests is analyzed in Section III. In Section IV a class of t-statistics is suggested in order to avoid the detrending bias. Since the test are based on asymptotic properties, it is interesting to consider the relative performance of the tests in small samples. This problem is studied in Section V by using Monte Carlo simulations. Furthermore, the actual power against a sequence of local alternatives is investigated by means of Monte Carlo simulations. Section VI offers some conclusions and makes suggestions for further research. Finally, a word on the notational conventions applied in this chapter. A standard Brownian motion is written as Wi(r). Although there are different Brownian motions for different cross section units i, we sometimes drop the index i for convenience. This has no consequences for the nal results since they depend on the expectation of the stochastic functionals. Furthermore, if there is no risk of misunderstanding, we drop the limits and the argument r (or dr). For example, the term 01 rWi(r) dr will be economically written as rW. A
164
JRG BREITUNG
detrended Brownian motion is represented as V(r) V = W W 12r rW. As usual in this kind of literature we use [a] to indicate the integer part of a. The proofs of the lemmas and theorems can be found working paper version (Breitung 1999).
II. THE TEST STATISTICS

Assume that the variable yit can be represented as yit = i + it + xit , t = 1, 2, . . . , T , (2) where xit is generated by the autoregressive process xit =
p+1 k=1
ikxi, t k + it
(3)
2 and xis = 0 for s 0. It is assumed that it is white noise with E(2 it) = i and 2+ E|it| < for all i, t and some > 0. Furthermore it is assumed to be independent of js for i j and all t and s. The null hypothesis is that the process is difference stationary, i.e.
H0:
p+1 k=1
ik 1 = 0 for all i = 1, . . . , N .
(4)
Under the alternative we assume that yit is (trend) stationary, that is, i < 0 for all i. The assumptions concerning it ensure that there exists a functional central limit theorem such that T
1/2
[rT] t=1
it iWi(r) ,
where Wi(r) is a Brownian motion, = lim E(T ) and i = T

2 i T 2 i
Phillips & Solo (1992)). The parameter 2 i is sometimes called the long-run variance, since it is computed as 2 times the spectral density at frequency zero. LL (1993) suggest a test procedure against the alternative 1 = = N < 0. Let eit (vi, t 1) denote the residuals from a regression of yit (yi, t 1) on it = eit /i and v it = vit /i, where in 1, t, yi, t 1, . . . , yi, t p. Furthermore, let e
T t=1
it (e.g.
165
practice 2 i is estimated using the residuals eit. The LL test is based on the bias adjusted t-statistic for = 0 in the regression: e it = v i, t 1 + it . LL (1993) show that under the null hypothesis, the ordinary t-statistic tends to minus innity if a constant or a time trend is included in the model. Therefore, they suggest a bias adjusted test statistic given by
LL =

N T i=1 t=1 N T
[ eitv i, t 1 (i /i)aT]
,
v 2 i, t 1
(5)
bT
i=1
t=1
where aT and bT are the small sample analogs of a = E b2 =
V dV
(6)
var[ VdV] E V2
(7)
and V V(r) is a detrended Brownian motion. LL (1993) suggest to use a non1 parametric estimator for 2 i based on the rst differences of the data. IPS (1997) relax the assumption of a common parameter under the alternative. Accordingly, model (2) is estimated for each cross section unit separately, yielding an individual specic Dickey-Fuller t-statistic i. The IPS statistic is given by: IPS = N
1/2
N i=1
[i mT]/T ,
where i is the usual augmented Dickey-Fuller t-statistic for cross section unit i, and mT, 2 T are small sample analogs of m = E
2 = var

VdV V 2 VdV V 2
(8)
(9)
166
JRG BREITUNG
IPS (1997) provide tables for various values of T and the lag order p. As for the LL test, these tables assume that the panels are balanced, that is, all cross section units have the same number of time periods T.
III. LOCAL POWER

In this section we study the local power of alternative test procedures. The sequence of local alternatives given by yit = i + it + xit , where xit = 1 (10)
TN
xi, t 1 + it
c>0.
(11)
To analyze the asymptotic behavior of the tests, it is important to specify the relationship between N and T (see Phillips & Moon (1999)). For our analysis it is convenient to apply sequential limits denoted by (T, N )seq, wherein T is followed by N . Although such an asymptotic framework is more restrictive than using a joint limit and requires moment conditions that are difcult to verify (see IPS (1997)), we follow Kao (1999), Moon & Phillips (1999) and others and apply a sequential limit. Whether our results continue to hold for a joint limit theory is an interesting problem for future research. We will further assume that the initial value yi0 is xed or stochastic with a nite variance. When the initial conditions are allowed to go into the remote past, the initial condition plays a role in the limiting distribution of the process (e.g. Phillips & Lee (1996)). In what follows, however, we will neglect such complications in order to keep the analysis reasonably simple. In the following Lemma we state the important fact that under the local alternative the limiting process of xit is the same as under the null hypothesis. Lemma 1: Under the local alternative given in (10)(11) and a sequential limit (T, N )seq we have T 1/2xi, [rT] iWi(r) , 0c<. This is an important difference to the asymptotic theory in the usual time series context, where under the local alternative the limiting process is an OrnsteinUhlenbeck process (cf. Phillips (1987)). i. First, The probability limits of the tests depend on the parameters i and we consider the theoretical value of 2 i under the local alternative.
167
Lemma 2: Under the local alternative (10)(11) we have

1 2 xiT) = 2 2 i = lim E(T i . T
In what follows we derive the main result by assuming that 2 i is estimated consistently for all values of c 0. First, we present the local power in a model without any deterministics. In this case no bias adjustment is required and the test can be based on the usual t-statistic of the pooled sample (Quah 1994). Theorem 1: Under the sequence of local alternatives given in (10)(11) with i = 0 and i = 0, the t-statistic for = 0 in the pooled regression yit = yi, t 1 + it is asymptotically distributed as ( c/2, 1). In Breitung (1999) it is shown that the same local power is obtained if the individual mean i is removed by subtracting the rst observation or if in addition a common time trend 1 = 2 = . . . = N is assumed. Next we consider the bias corrected test statistics. Under the local alternative the bias adjusted (BA) statistic due to LL (1993) converges to the limit
* BA(c) = lim N
N, T
E N T

N T N
e itv i, t 1 N
N T
( i /i)a
i=1
t=1
i=1
E N 1T 2
v 2 i, t 1
i=1
t=1
Note that numerator and denominator are normalized so that both converge to a xed limit. Since e itv i, t 1 = [i 1 it c/(TN) vi, t 1] vi, t 1 the limit can be written as
N, T
lim N N
* BA(c) =

N 1 i=1 1
E(Ti) a
bE V 2
cE V 2 , b
(12)
where we use i /o = 1 under the local alternative and Ti = T
T t=1
i 1 itv i, t 1 .
It turns out that the limit of the bias adjusted statistic depends on two different terms on the right hand side of (12). The rst term is due to the detrending
168
JRG BREITUNG
method represented by the statistic Ti. The second term is proportional to E V 2 and is similar to the usual location parameter in the asymptotic distribution under the null hypothesis. For example, in the simple regression model yt = xt + ut with stationary variables, the location parameter is proportional to E(xt2). It is important to notice that the expectation of Ti enters the test statistic with the factor N and, therefore, for the asymptotic analysis the expectation must be determined with an accuracy up to O(N 1/2). The following Lemma provides an approximation of this expectation that is sufcient for our purpose. Lemma 3: Under the local alternative given in (10)(11) the asymptotic expectation of Ti is given by
T
lim E(Ti) = (1/15)c/N 0.5 + o(N 1/2) .
Since the result of Lemma 3 is crucial for the local power of the bias adjusted test, the accuracy of the approximation is investigated in a Monte Carlo experiment. First, we generate 10,000 realizations of Ti by letting T = 200, c = 5 and repeat the experiment with various values for N.2 If Lemma 3 holds, a regression of the sample means of Ti on c/N and a constant should yield an estimate for the intercept close to 0.5 and a slope of roughly 1/15 = 0.067. Using N{30, 35, 40, . . . , 500} the following regression function was obtained for the 71 realizations: E(Ti) 0.495 + 0.0629c/N , (0.00060) (0.0016) where standard errors are given in parentheses. The estimated slope coefcient is only slightly smaller than 0.067 and, therefore, the approximation in Lemma 3 seems to perform fairly well in nite samples. Now we present the limiting distribution of the bias adjusted test statistic. Theorem 2: Consider a sequence of local alternatives given in (10)(11). If the estimator for i converges weakly to i, the bias adjusted test statistic is asymptotically distributed as (0, 1). It turns out that the bias adjusted test can fail to have power against the sequence of local alternatives. This nding suggests that the power may be improved by a modication that avoids the bias correction altogether. Such a modied test procedure is suggested in Section IV. It is important to notice that the test suggested by LL (1993) employs a nonparametric estimator that converges to zero for a stationary alternative. In the univariate time series context the unit root tests are inconsistent if the long-run
169
variance is estimated by using the differences of the time series (cf Phillips & Ouliaris (1990), Theorem 5.3). Therefore, Phillips & Perron (1988) estimate 2 i by using the residuals of the autoregression. In a panel data framework, however, this approach yields a test that has no power against the sequence of local alternatives. Finally the local power of the IPS test is investigated. As in the case of the bias adjusted statistic considered above, the probability limit of the test statistic depends on two terms. The rst term is due to the detrending method and depends on
* Ti =

T t=1 T t=1
i 1itv i, t 1
.
v 2 i, t 1
Since this statistic is a ratio of correlated random variables, the analytic evaluation of this bias is very complicated. To obtain a suitable approximation we apply a similar simulation technique that was also used to check the reliability of Lemma 3. Using the same setup as before the following approximation is found for the expectation of * Ti: E(* Ti) 2.151 + 0.212c/N (0.0030) (0.0077) (14) This approximation can be used to compute the limiting distribution of the IPS test given in Theorem 3: For a sequence of local alternative given in (10)(11) the IPS test is asymptotically distributed as (IPS c , 1), where IPS c =
c
lim
E(* Ti)
(c/N)

E V2
c=0
Again we nd that the local power depends on two terms. Our Monte Carlo experiment suggests that the derivative of E(* Ti) is positive so that the detrending bias implies a substantial loss of power. Using 10,000 Monte Carlo replications, the expression E( V 2) is estimated as 0.243. Using the value 100 = 0.597, which is taken from the values reported in IPS (1997), we obtain: cIPS = c(0.2120.243)/0.597 = 0.0401c .
170
JRG BREITUNG
It turns out that the asymptotic mean function has a relatively small slope of roughly 0.04 compared to the slope of 1/2 = 0.707 for the case without deterministic trend (see Theorem 1).
III. TEST STATISTICS WITHOUT BIAS ADJUSTMENT

From the local power analysis we found that bias corrections used for the LL and IPS tests may imply a severe loss of power. It is therefore desirable to avoid the bias term when constructing the t-statistics. For the case that the model includes only a constant, such an unbiased statistic is easily obtained by subtracting the rst observation instead of the individual mean. This is the approach used in Breitung & Meyer (1994). In this section we consider a class of test statistics that do not involve a bias term.3 To facilitate the exposition we will assume that the data are generated by an AR(1) process and, thus, no augmentation with lagged differences is needed. For higher order processes, yit and yi, t 1 are replaced by the residuals from the regressions of yit and yi, t 1 on yi, t 1, . . . , yi, t p. Furthermore, to correct for individual specic variances, the series are adjusted as in the case of the LL statistic. The idea is to transform the variables yit and yi, t 1 such that the usual regression t-statistic can be used to test the unit root hypothesis. For this purpose we dene the T 1 vectors yi = [yi1, . . . , yiT] and xi = [yi0, . . . , yi, T 1]. In order to construct the test statistic we use the transformed vectors y* i = Ayi = [y* i1, . . . , y* iT] and x* i = Bxi = [x* i1, . . . , x* iT] such that E(y* it x* it ) = 0 (15)
for all i and t. Imposing further assumptions to rule out degenerate cases it is possible to show that a t-statistic based on the transformed variables has a standard normal limiting distribution. Theorem 4: Let yit be white noise with E(yit) = i, E(yit i)2 = 2 i > 0 and E(yit i)4 < . Under the assumption (15) and
T
lim E(T 1y* i y* i)>0
lim E(T 1x* i AAx* i) > 0
the statistic
171
UB =
has a standard normal limiting distribution as (N, T )seq. A simple way to satisfy assumption (15) is to use an upper triangular matrix A, where the elements of each row sum to zero. In other words, only the present and future observations are used to transform the differences yit. A well known example for such a transformation is the Helmert transformation given by y* it = st yit

N i=1 N i=1
i 2y* i x* i i 2x* i AAx* i
1 (yi, t + 1 + + yiT) , Tt
t = 1, 2, . . . , T 1, (16)
where s2 t = (T t)/(T t + 1). This transformation is also used in Arellano & Bover (1995), for example. An important property of this transformation is that whenever yit is a white noise process with constant variance, then the same is true for y* it. Obviously, if yit is a random walk with (individual specic) time trend, then y* it has a zero mean and is uncorrelated with yi, t 1. The matrix B is chosen such that E(x* it) = 0 and E(y* it x* it) = 0. A possible transformation with the desired properties is: t1 yiT . (17) x* it = yi, t 1 yi1 T Note that T 1yiT = T 1
T t=1
yit is an estimator of i and, thus, the transformed
variable x* it is adjusted for a time trend. It is easy to verify that in this case y* it and x* are uncorrelated. Furthermore, since the transformation matrix A it corresponding to the Helmert transformation (16) satises AA = I we conclude from Theorem 4 that the t-statistic for H0: * = 0 in the pooled regression y* t = 2, 3, . . . , T 1 (18) it = *xit + e* it has a standard normal limiting distribution. To compute the local power function of this test statistic we need an approximation for E(* Ti) = E T

T 1
y* it x* it
t=1
172
JRG BREITUNG
that is accurate up to O(N 1/2). As for the LL and IPS statistic, such an approximation is obtained by tting a regression function to the simulated values of * Ti: E(* Ti) 0.0104 0.0407cN . (0.0021) (0.0104) (19)
Since the test statistic is constructed to have an expectation of zero under the null hypothesis, we expect to nd a constant close to zero. The estimated constant is indeed quite small but nevertheless signicant. The slope coefcient is signicantly negative so that the test seem to have a local power larger than the size. The following theorem presents further details on the local power of the UB statistic. Theorem 5: For a sequence of local alternative given in (10)(11) the UB test is asymptotically distributed as (UB c , 1), where UB c = c6 lim
E(* Ti) (c/N)
c=0
It is interesting to compare the local power of the IPS and the UB test. Since 6 0.0407 > 0.0401, the UB statistic has a location parameter which is more than twice as large in absolute value compared to the IPS statistic. Again, however, we emphasize that this comparison is inappropriate, because the IPS test is more general than the UB test as it allows for a heterogeneous autoregressive parameter under the alternative.
IV. SMALL SAMPLE PROPERTIES

The asymptotic properties of the tests do not depend on the number of lagged differences that are used to account for higher order autoregressive models. However, as noted by IPS (1997) for a small number of time periods T, the null distribution may be substantially affected by the augmentation lag. They therefore present tables for the mean and the variance of i that depend on the type of deterministics (constant/trend), the number of time periods T and the augmentation lag p. From the usual Dickey-Fuller test for univariate time series it is known that the power of the test deteriorates substantially with an increasing augmentation lag. It is therefore expected that also the power of panel unit root tests are affected by the choice of the augmentation lag. To study the robustness of the size and power of the tests considered in the previous sections we generate time series according to the process
173
xit = xi, t 1 + it
(20)
and yit = i + it + xit. The initial values of the process are set equal to zero. The errors are i.i.d. with it ~ N(0, 1). Since all tests are invariant to the parameters i and i, these parameters are set equal to zero. For the bias and variance corrections of the LL and IPS tests the tabulated values in LL (1993) and IPS (1997) are used. To represent a typical regional panel data set, we let T = 30 (years) and N = 20 (countries). All rejection frequencies are computed from 1000 realizations with a nominal signicance level of 0.05. Table 1 presents the rejection frequencies for the different tests. For p > 0 the LL test turns out to be quite conservative. This was also observed by IPS (1997) and, therefore, the values for the mean and variance of this test should also be tabulated for different augmentation lags. With respect to the power of the test it turns out that for p = 0 the power of the LL and IPS tests are roughly similar. For p > 0 the IPS test is more powerful than the LL test, at least if the critical values of the LL test are not adjusted for different augmentation lags. The UB statistic suggested in Section IV appears to be substantially more powerful than the LL and IPS tests. Furthermore the size of the UB test is fairly robust with respect to the augmentation lag. Notice that for the UB test no tables are required for different values of p and T. In the next Monte Carlo experiment we consider the validity of the theoretical results for the actual power of the test. For this purpose we set Table 1.
LL 1.00 0.95 0.90 0.80 1.00 0.95 0.90 0.80 0.001 0.001 0.001 0.002 0.025 0.048 0.189 0.801
Empirical size and power for T = 30 and N = 20

IPS p=0 0.046 0.076 0.198 0.723 p=2 0.045 0.072 0.118 0.365 0.038 0.147 0.260 0.508 0.000 0.000 0.000 0.000 0.073 0.127 0.396 0.897 0.005 0.009 0.041 0.277 UB LL IPS p=1 0.053 0.077 0.152 0.544 p=3 0.040 0.056 0.107 0.257 0.053 0.195 0.266 0.418 0.069 0.213 0.417 0.807 UB
Note: Empirical sizes computed from 1000 Monte Carlo replications of model (20). p denotes the number of lagged differences. The nominal size is 0.05.
174
JRG BREITUNG
= 120/(TN). If the test does not have power against such alternative, we expect that the power of the test tends to the size as N and T . In our Monte Carlo comparison we also include a variant of the LL test that estimates the long-run variances by using the regression residuals instead of the rst difference of the process. As shown in Section III such a test has a local power equal to the size. The critical values for this test are computed by Monte Carlo simulations. The respective test is denoted as LL*. Table 2 presents the outcome of such a Monte Carlo experiment. As predicted by Theorem 2, the power of the LL* test is close the size for all N and T. All other tests appear to converge to a limit larger than the size, where the limiting power of the UB test is nearly twice as large as the limiting power of the IPS test. The original LL test turns out to have power against the local alternative but the power is substantially smaller than the power of the IPS and UB statistics. The ndings of the Monte Carlo experiment can be compared to the results of our theoretical analysis. From Theorem 3 it is expected that the IPS test has Table 2. Power against local alternatives
LL N 25 50 70 100 T 25 50 70 100 0.378 0.269 0.210 0.170 LL* N and T 0.064 0.056 0.033 0.050 0.384 0.300 0.296 0.261 0.668 0.660 0.608 0.579 IPS UB
T xed, N 50 70 100 25 25 25 0.235 0.156 0.090 0.038 0.038 0.028 0.342 0.313 0.273 0.575 0.535 0.450
N xed, T 25 25 25 50 70 100 0.415 0.378 0.298 0.061 0.020 0.028 0.419 0.421 0.402 0.724 0.742 0.783
Note: This table reports the rejection rates computed from 1000 replications of model (20) with = 1 20/(TN). The signicance level is 0.05. The statistic LL* is constructed similarly to the LL test but using the residuals from the autoregressions to estimate 2 i . For this test the values for the expectation and variance are computed by additional Monte Carlo simulations.
175
a limiting power of ( 1.645 + 20 0.0401) = 0.199, where ( ) denotes the c.d.f. of the standard normal distribution. The empirical power for N = 100 and T = 100 is 0.261, which is higher than the predicted power based on Theorem 3. This may be due to the simulation error when using (14). An analogous calculation using the results for the UB statistic yields a limiting power of ( 1.645 + 20 0.0997) = 0.636. Since the empirical power for N = 100 and T = 100 is 0.579, the value derived from Theorem 5 using (19) tends to be too high. Finally it is interesting to note that the power of the tests appears to deteriorate with xed T and increasing N. For the LL test the local power seems to tend slowly to the size as T is xed and T .
V. CONCLUSION
In this chapter we have considered the local power of some well known tests and a new test for unit roots in panel data. We found that the LL and IPS tests suffer from a severe loss of power if individual specic trends are included. Therefore, a class of test statistic is suggested that does not employ a bias adjustment and it is found that the power of this test is substantially higher than the LL and the IPS tests. Furthermore, it turns out that the LL test is very sensitive to the augmentation lag. It is therefore recommended to apply tables for the mean and variance that take into account the lag-augmentation of the test. The results further indicate that the power of the tests is very sensitive to the specication of the deterministic terms. If there is only a constant or a joint linear trend, then subtracting the rst observation yields a very powerful test. Including individual specic trends when it is unnecessary leads to a dramatic loss of power. Hence, in practice it is desirable to have a test for a common deterministic trend against the alternative of individual specic time trends. As pointed out by a referee, there are other detrending methods that may be used to construct an improved test procedure. A natural candidate is the quasi difference detrending suggested by Elliot, Rothenberg & Stock (1996) (see also Phillips & Xiao (1998)). Unfortunately, it can be shown that a t-statistic computed from quasi differenced data also suffers from a (Nickell type) bias so that again a bias correction is required to obtain a reasonable test procedure. Nevertheless, a test procedure based on quasi differences may perform better than test procedures with OLS detrending. In this chapter, our strategy is to avoid the bias term altogether. The comparison of our approach to a test procedure based on quasi differences is left for future research.
176
JRG BREITUNG
ACKNOWLEDGMENTS
The research for this paper was carried out within the SFB 373 at the Humboldt University Berlin and the METEOR research project Dynamic and Nonstationary Panels: Theoretical and Empirical Issues. I thank Carsten Trenkler and two referees for their helpful comments and suggestions.
NOTES
1. In LL (1993) the test statistic is divided by NT which is computed as the overall standard deviation of e it. However, since e it is already adjusted for its standard deviation, we can drop NT when computing the test statistic. 2. I repeated the experiment for different values of c and T. The results turn out to be fairly robust. 3. Another possibility is to use alternative estimation methods like the Generalized Methods of Moments (GMM). Breitung (1997) apply second differences and obtains a unit root test without bias adjustment by using an appropriate GMM estimator.
REFERENCES
Arellano M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 2951. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley and Sons. Breitung, J. (1992). Dynamische Modelle fr die Paneldatenanalyse (Dynamic Models for the Analysis of Panel Data). PhD dissertation, Haag + Herchen, Frankfurt. Breitung, J. (1997). Testing for Unit Roots in Panel Data Using a GMM Approach. Statistical Papers, 38, 253269. Breitung, J. (1999). The Local Power of Some Unit Root Tests for Panel Data. SFB 373 Discussion paper, No. 691999, Humboldt University Berlin. Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different Bargaining Levels Cointegrated? Applied Economics, 26, 353361. Cheung, K. S. (1995), Lag Order and Critical Values of the Augmented Dickey-Fuller Test. Journal of Business and Economic Statistics, 13, 277280. Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimates for Autoregressive Time Series With a Unit Root. Journal of the American Statistical Association, 74, 427431. Elliot, G., Rothenberg, T. J., & Stock, J. H. (1996). Efcient Tests for an Autoregressive Unit Root. Econometrica, 64, 813836. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. DAE Working paper, No 9526, University of Cambridge, revised version. Kao, C. (1999). Spurious Regression and Residual-based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 144. Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. Working paper, Department of Economics, University of California San Diego.
177
Moon, H. R., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using Panel Data. mimeo, Yale University. Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426. Phillips, P. C. B. (1987). Towards a Unied Asymptotic Theory of Autoregression. Biometrika, 74, 53548. Phillips, P. C. B., & Lee, C. C. (1996). Efciency Gains from Quasi-Differencing Under Nonstationarity. In: P. M. Robinson & M. Rosenblatt (Eds), Essays in Memory of E. J. Hannan (pp. 300314). Phillips, P. C. B., & Moon, H. R. (1999). Linear Regression Limit Theory for Nonstationary Panel Data. Econometrica, 67, 10571111. Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for Cointegration. Econometrica, 58, 165193. Phillips, P. C. B., & Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 75, 335346. Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20, 9711001. Phillips, P. C. B., & Xiao, Z. (1998). A Primer on Unit Root Testing. Journal of Economic Surveys, 12, 423467. Quah, D, (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 919. Schmidt, P., & Phillips, P. C. B. (1992). LM Test for a Unit Root in the Presence of Deterministic Trends. Oxford Bulletin of Economics and Statistics, 54, 257287.
ON THE ESTIMATION AND INFERENCE OF A COINTEGRATED REGRESSION IN PANEL DATA

Chihwa Kao and Min-Hsien Chiang
ABSTRACT
In this chapter, we study the asymptotic distributions for ordinary least squares (OLS), fully modied OLS (FMOLS), and dynamic OLS (DOLS) estimators in cointegrated regression models in panel data. We show that the OLS, FMOLS, and DOLS estimators are all asymptotically normally distributed. However, the asymptotic distribution of the OLS estimator is shown to have a non-zero mean. Monte Carlo results illustrate the sampling behavior of the proposed estimators and show that (1) the OLS estimator has a non-negligible bias in nite samples, (2) the FMOLS estimator does not improve over the OLS estimator in general, and (3) the DOLS outperforms both the OLS and FMOLS estimators.
I. INTRODUCTION
Evaluating the statistical properties of data along the time dimension has proven to be very different from analysis of the cross-section dimension. As economists have gained access to better data with more observations across time, understanding these properties has grown increasingly important. An area of particular concern in time-series econometrics has been the use of nonstationary data. With the desire to study the behavior of cross-sectional data
179
180
CHIHWA KAO & MIN-HSIEN CHIANG
over time and the increasing use of panel data, e.g. Summers and Heston (1991) data, one new research area is examining the properties of non-stationary timeseries data in panel form. It is an intriguing question to ask: how exactly does this hybrid style of data combine the statistical elements of traditional crosssectional analysis and time-series analysis? In particular, what is the correct way to analyze non-stationarity, the spurious regression problem, and cointegration in panel data? Given the immense interest in testing for unit roots and cointegration in timeseries data, not much attention has been paid to testing the unit roots in panel data. The only theoretical studies we know of in this area are Breitung & Meyer (1994); Quah (1994); Levin & Lin (1993); Im, Pesaran & Shin (1995); and Maddala & Wu (1999). Breitung & Meyer (1994) derived the asymptotic normality of the Dickey-Fuller test statistic for panel data with a large crosssection dimension and a small time-series dimension. Quah (1994) studied a unit root test for panel data that simultaneously have extensive cross-section and time-series variation. He showed that the asymptotic distribution for the proposed test is a mixture of the standard normal and Dickey-Fuller-Phillips asymptotics. Levin & Lin (1993) derived the asymptotic distributions for unit roots on panel data and showed that the power of these tests increases dramatically as the cross-section dimension increases. Im et al. (1995) critiqued the Levin and Lin panel unit root statistics and proposed alternatives. Maddala & Wu (1999) provided a comparison of the tests of Im et al. (1995) and Levin & Lin (1993). They suggested a new test based on the Fisher test. Recently, some attention has been given to the cointegration tests and estimation with regression models in panel data, e.g. Kao (1999), McCoskey & Kao (1998), Pedroni (1996, 1997) and Phillips & Moon (1999). Kao (1999) studied a spurious regression in panel data, along with asymptotic properties of the ordinary least squares (OLS) estimator and other conventional statistics. Kao showed that the OLS estimator is consistent for its true value, but the tstatistic diverges so that inferences about the regression coefcient, , are wrong with a probability that goes to one. Furthermore, Kao examined the Dickey-Fuller (DF) and the augmented Dickey-Fuller (ADF) tests to test the null hypothesis of no cointegration in panel data. McCoskey & Kao (1998) proposed further tests for the null hypothesis of cointegration in panel data. Pedroni (1997) derived asymptotic distributions for residual-based tests of cointegration for both homogeneous and heterogeneous panels. Pedroni (1996) proposed a fully modied estimator for heterogeneous panels. Phillips & Moon (1999) developed both sequential limit and joint limit theories for nonstationary panel data. Pesaran & Smith (1995) are not directly concerned with cointegration but do touch on a number of related issues, including the potential
Panel Cointegration
181
problems of homogeneity misspecication for cointegrated panels. See the survey paper by Baltagi & Kao (2000) in this volume. This chapter makes two main contributions. First, it adds to the literature by suggesting a computationally simpler dynamic OLS (DOLS) estimator in panel cointegrated regression models. Second, it provides a serious study of the nite sample properties of the OLS, fully modied OLS (FMOLS), and DOLS estimators. Section 2 introduces the model and assumptions. Section 3 develops the asymptotic theory for the OLS, FMOLS and DOLS estimators. Section 4 gives the limiting distributions of the FMOLS and DOLS estimators for heterogeneous panels. Section 5 presents Monte Carlo results to illustrate the nite sample properties of the OLS, FMOLS, and DOLS estimators. Section 6 summarizes the ndings. The proofs of Theorems 1, 2, and 4 are not presented since the proofs can be found in Phillips & Moon (1999) and Pedroni (1997). The appendix contains the proofs of Theorems 3 and 5. A word on notation. We write the integral 01W(s)ds, as W, when there is no ambiguity over limits. We dene 1/2 to be any matrix such that = (1/2)(1/2). We use || A || to denote {tr(AA)}1/2, |A| to denote the determinant p of A, to denote weak convergence, to denote convergence in probability, [x] to denote the largest integer x, I(0) and I(1) to signify a time-series that is integrated of order zero and one, respectively, and BM() to denote Brownian motion with the covariance matrix .
II. THE MODEL AND ASSUMPTIONS

Consider the following xed effect panel regression: yit = i + x it + uit, i = 1, . . . , N, t = 1, . . . , T, (1) where {yit} are 1 1, is a k 1 vector of the slope parameters, {i} are the intercepts, and {uit} are the stationary disturbance terms. We assume that {xit} are k 1 integrated processes of order one for all i, where xit = xit 1 + it. Under these specications, (1) describes a system of cointegrated regressions, i.e. yit is cointegrated with xit. The initialization of this system is yi0 = xi0 = Op(1) as T , for all i. The individual constant term i can be extended into general deterministic time trends such as 0i + 1it + , . . . , + pit p. Assumption 1. The asymptotic theory employed in this paper is a sequential limit theory established by Phillips & Moon (1999) in which T and followed by N .
182
Next, we characterize the innovation vector wit = (uit, it). We assume that wit is a linear process that satises the following assumption. Assumption 2. For each i, we assume: (a) wit = (L)it =
(b) it is i.i.d. with zero mean, variance matrix , and nite fourth order cumulants. Assumption 2 implies that (e.g. Phillips & Solo, 1992) the partial sum process
1 T
j it j,
ja|| j || < , |(1)| 0, for some a > 1.
j=0
j=0
[Tr] t=1
wit satises the following multivariate invariance principle:

1 T
[Tr] t=1
wit Bi(r) = BMi() as T for all i,
(2)
where Bi =
Bui . Bi
The long-run covariance matrix of {wit} is given by =
j=
E(wijw i0)
= (1)(1) = + + = where = and
u u
u ,
j=1
E(wijw i0) =
u u
(3)
Panel Cointegration
183
= E(wi0w i0) = are partitioned conformably with wit.
u u
(4)
Assumption 3. is non-singular, i.e. {xit}, are not cointegrated. Dene u. = u u 1u. Then, Bi can be rewritten as Bi = (5)
Bui 1/2 u. = Bi 0
1/2 u 1/2
Vi , Wi
(6)
where
Vi = BM(I) is a standardized Brownian motion. Dene the one-sided Wi
long-run covariance =+ = with =
j=0
E(wijw i0)
u u
u .
Here we assume that panels are homogeneous, i.e. the variances are constant across the cross-section units. We will relax this assumption in Section 4 to allow for different variances for different i. Remark 1. The benets of using panel data models have been discussed extensively by Hsiao (1986) and Baltagi (1995), though Hsiao & Baltagi assume the time dimension is small while the cross-section dimension is large. However, in international trade, open macroeconomics, urban regional, public nance, and nance, panel data usually have long time-series and crosssection dimensions. The data of Summers & Heston (1991) are a notable example.
184
Remark 2. The advantage of using the sequential limit theory is that it offers a quick and easy way to derive the asymptotics as demonstrated by Phillips & Moon (1999). Phillips & Moon also provide detailed treatments of the connections between the sequential limit theory and the joint limit theory. Remark 3. If one wants to obtain a consistent estimate of in (1) or wants to test some restrictions on , then an individual time-series regression or a multiple time-series regression is probably enough. So what are the advantages of using the (N, T) asymptotics, e.g. sequential asymptotics in Assumption 1, instead of T asymptotics? One of the advantages is that we can get a normal approximation of the limit distributions of the estimators and test statistics with the convergence rate NT. More importantly, the biases of the estimators and test statistics can be reduced when N and T are large. For example, later in this paper we will show that the biases of the OLS, FMOLS, and DOLS estimators in Table 2 were reduced by half when the sample size was changed from (N = 1, T = 20) to (N = 20, T = 20). However, in order to obtain an asymptotic normality using the (N, T) asymptotics we need to make some strong assumptions; for example, in this paper we assume that the error terms are independent across i. Remark 4. The results in this chapter require that regressors are not cointegrated. Assuming that I(1) regressors are not cointegrated with each other is indeed restrictive. The authors are currently investigating this issue.
III. OLS, FMOLS, AND DOLS ESTIMATORS

Let us rst study the limiting distribution of the OLS estimator for equation (1). The OLS estimator of is OLS =

N T i=1 t=1
(xit x i)(xit x i)

1 N T i=1 t=1
(xit x i)(yit y i) .
(7)
All the limits in Theorems 16 are taken as T followed by N sequentially from Assumption 1. First, we present the following theorem: Theorem 1. If Assumptions 13 hold, then
p 1 1 OLS ) (a) T( 3 u + 6 u, 1 OLS ) NNT N(0, 6 (b) NT( u.),
where
Panel Cointegration
185
1 NT = N

N T i=1
1 T2
(xit x it)(xit x i)
t=1
1 N
1/2
1/2 idW W u + u i
i=1
i = Wi Wi. and W The normality of the OLS estimator in Theorem 1 comes naturally. When summing across i, the non-standard asymptotic distribution due to the unit root in the time dimension is smoothed out. From Theorem 1 we note that there is 1 u., i.e. an interesting interpretation of the asymptotic covariance matrix, 6 1 u. can be seen as the long-run noise-to-signal ratio. We also note that 1 2u is due to the endogeneity of the regressor xit, and u is due to the serial correlation. It can be shown easily that
1 1 u + 6 u. NT 3 p
If wit = (uit, it) are i.i.d., then

1 u, NT 3 p
u, , and , u be which was examined by Kao & Chen (1995). Let consistent estimates of , u, , and u respectively. Then from (b) in + OLS Theorem 1, we can dene a bias-corrected OLS, ,
+ OLS OLS =
NT T
such that
1 + OLS NT( ) N(0, 6 u.),
where
1 1 NT = 3 u + 6 u.
Chen, McCoskey & Kao (1999) investigated the nite sample proprieties of the OLS estimator in (7), the t-statistic, the bias-corrected OLS estimator, and the bias-corrected t-statistic. They found that the bias-corrected OLS estimator does not improve over the OLS estimator in general. The results of Chen et al. suggest that alternatives, such as the FMOLS estimator or the DOLS estimator (e.g. Saikkonen, 1991; Stock & Watson, 1993) may be more promising in
186
cointegrated panel regressions. Thus, we begin our study by examining the FM. The FMOLS estimator is limiting distribution of the FMOLS estimator, constructed by making corrections for endogeneity and serial correlation to the OLS in (7). Dene OLS estimator 1 + uit = uit u it, (8) 1 + (9) u it = uit u it, 1 + yit = yit u xit, (10) and + u 1xit. = yit (11) y it Note that
+ uit 1 = it 0
1 u Ik
uit , it
which has the long-run covariance matrix
u. 0
0 ,
where Ik is a k k identity matrix. The endogeneity correction is achieved by modifying the variable yit, in (1) with the transformation + u 1xit y it = yit 1xit. = i + x it + uit u The serial correlation correction term has the form
+ ) u = (u
1 u 1
u, 1 u = u and are kernel estimates of u and . Therefore, the FMOLS where estimator is FM =

N T i=1 t=1 N T i=1 t=1
(xit x i)(xit x i)
+ + (xit x i) yit T u
(12)
Panel Cointegration
187
FM. Now, we state the limiting distribution of

1 FM ) N(0, 6 Theorem 2. If Assumptions 13 hold, then NT( u.). It can be shown easily that the limiting distribution of FM becomes 1 FM ) N(0, 2 NT( u.)
(13)
by the exclusion of the individual-specic intercept, i. it, were estimated, we used Remark 5. Once the estimates of wit, w = 1 NT to estimate . was estimated by =1 N
N T i=1 t=1 l
w itw it
(14)

N T i=1
1 T
t=1
1 w itw it + T
(w itw it w it + w it) ,
=1
t=+1
(15)
where l is a weight function or a kernel. Using Phillips & Durlauf (1986) and can be shown to be consistent for and and sequential limit theory, . ) does not FM require N( Remark 6. The distribution results for diverge as N grows large. However, may not be small when T is xed. ) may be non-neglibible in panel data with nite It follows that N( samples. D, which uses the past and future values Next, we propose a DOLS estimator, of xit as additional regressors. We then show that the limiting distribution of D is the same as the FMOLS estimator, FM. But rst, we need the following additional assumption: Assumption 4. The spectral density matrix fww() is bounded away from zero and full rank for all i, i.e. fww() IT, [0, ], > 0. When Assumptions 2 and 4 hold, the process {uit} can be written as (see Saikkonen, 1991): uit = for all i, where
j=
cijit + j + vit
(16)
188
j=
|| cij || < ,
{vit} is stationary with zero mean, and {vit} and {it} are uncorrelated not only contemporaneously but also in all lags and leads. In practice, the leads and lags may be truncated while retaining (16) approximately, so that uit =
j=q
cijit + j + v it.
for all i. This is because {cij} are assumed to be absolutely summable, i.e.
j=
|| cij || < .
We also need to require that q tends to innity with T at a suitable rate: Assumption 5. q as T such that T1/2 for all i. We then substitute (16) into (1) to get yit = i + x it + where v it = vit +
|j|>q
q3 0, and T
|| cij || 0
(17)
j=q
q q
cijit + j + v it,
|j|>q j=q
cijit + j.
(18)
D, by running the following Therefore, we obtain the DOLS of , regression: yit = i + x it +
cijxit + j + v it.
(19)
FM as in D has the same limiting distribution Next, we show that Theorem 2. 1 D ) N(0, 6 Theorem 3. If Assumptions 15 hold, then NT( u.).
Panel Cointegration
189
IV. HETEROGENEOUS PANELS

This chapter so far assumes that the panel data are homogeneous. The substantial heterogeneity exhibited by actual data in the cross-sectional dimension may restrict the practical applicability of the FMOLS and DOLS estimators. Also, the estimators in Sections 2 and 3 are not easily extended to cases of broader cross-sectional heterogeneity since the variances and biases are specied in terms of the asymptotic covariance parameters that are assumed to be shared cross-sectionally. In this section, we propose an alternative representation of the panel FMOLS estimator for heterogeneous panels. Before we discuss the FMOLS estimator we need the following assumptions: Assumption 6. We assume the panels are heterogeneous, i.e. i, i and i are varied for different i. We also assume the invariance principle in (2), (16), and (17) in Assumption 5 still holds. Let 1/2xit, x* it = i + 1/2 it u* , it = iu. u
1 + iu i = uit u it it, 1 1/2 + iu 1/2 1/2 it ( i i y it = yit xit)), xit iu.(iu. x
(20)
(21) (22)
and
+ 1/2 it , y* it = iu. y
(23)
iu. are consistent estimators of i and i and where

1 iu. = iu iui iu,
1/2 1/2 respectively. Similar to Pedroni (1996) the correction term, iu.(iu. 1/2 x xit)), is needed in (22) in the heterogeneous panel. We note that it (i 1/2 1/2xit) = 0 in the iu. (22) will be the same as (11) only if x it (i heterogeneous panel. Also (22) requires knowing something about the true . OLS. Therefore, In practice, in (22) can be replaced by a preliminary OLS, let 1 1/2 ++ iu 1/2 1/2 it ( i i OLS, = yit xit)) y it xit iu.(iu. x and
++ 1/2 it . y* it = iu. y
190
Assumption 7. i is not singular for all i. Then, we dene the FMOLS estimator for heterogeneous panels as * FM =

N T i=1 t=1
(x* * * it x i )(x* it x i )

1 N T i=1 t=1
iu (x* * it x i )y* it T*
(24) where
1/2 iu. 1/2 * iu + iu = i
and i+ i) u = (iu
1 1 i iu
1 i i u i = iu.
* Theorem 4. If Assumptions 12 and 67 hold, then NT( FM ) N(0, 6Ik). * The DOLS estimator for heterogeneous panels, D, can be obtained by running the following regression: y* it = i + x* it +
j = qi
qi
cijx* * it + j + v it,
(25)
where v * it is dened similarly as in (18). Note that in (25) different lag truncations, qi, may have to be used because the error terms are heterogeneous across i. Therefore, we need to assume that qi tends to innity with T at a suitable rate for all i: Assumption 8. qi as T such that T1/2 for all i. * In the following theorem we show that D also has the same limiting distribution as * FM.
q3 i 0, and T
| j | > qi
|| cij || 0
(26)
Panel Cointegration
191
* Theorem 5. If Assumptions 12 and 68 hold, then NT( D ) N(0, 6Ik). * Remark 7. Theorems 4 and 5 show that the limiting distributions of FM and * D are free of nuisance parameters. Remark 8. We now consider a linear hypothesis that involves the elements of the coefcient vector . We show that hypothesis tests constructed using the FMOLS and DOLS estimators have asymptotic chi-squared distributions. The null hypothesis has the form: H0:R = r, (27) where r is an m 1 known vector and R is a known m k matrix describing the D for FM or restrictions. A natural test statistic of the Wald test using homogeneous panels is
1 1 D r). D r)[R u.R] 1(R W = NT2(R 6
(28)
* D Remark 9. For the heterogeneous panels, a natural statistic using FM or * to test the null hypothesis is
1 1 * * W* = NT2(R (R D r)[RR] D r). 6
(29)
It is clear that W and W* converge in distribution to a chi-squared random variable with m degrees of freedom, 2 m, as T and followed by N sequentially under the null hypothesis. Hence, we establish the following results: W 2 m, and W* 2 m. Because the FMOLS and the DOLS estimators have the same asymptotic distributions, it is easy to verify that the Wald statistics based on the FMOLS estimator share the same limiting distributions as those based on the DOLS estimator.
V. MONTE CARLO SIMULATIONS

The ultimate goal of this Monte Carlo study is to compare the sample properties of OLS, FMOLS, and DOLS for two models: a homogeneous panel
192
and a heterogeneous panel. The simulations were performed by a Sun SparcServer 1000 and an Ultra Enterprise 3000. GAUSS 3.2.31 and COINT 2.0 were used to perform the simulations. Random numbers for error terms, (u* it, * it), for Sections 5 A, B and D, were generated by the GAUSS procedure RNDNS. At each replication, we generated an N(T + 1000) length of random numbers and then split it into N series so that each series had the same mean and variance. The rst 1, 000 observations were discarded for each series. {u* it} } were constructed with u = 0 and = 0. and {* it i0 i0 In order to compare the performance of the OLS, FMOLS, and DOLS estimators, the following data generating process (DGP) was used: (30) yit = i + xit + uit and xit = xit 1 + it where (uit, it) follows an ARMA(1, 1) process:

uit 0.5 0 = it 0 0.5 u* it iid ~N * it 0 , 0
uit 1 u* 0.3 it + + it 1 * 21 it
0.4 0.6

u* it 1 * it 1
with

1 21 21 1
The design in (30) nests several important special cases. First, when is replaced by

0 0 0 0

0.5 0 0 0.5
and 21 is constant across i, then the DGP becomes the
homogeneous panel in Section 5A. Second, when
0 0 , and 21 and 21 are random variable different across i, then the DGP 0 0

0.5 0 0 0.5
is replaced by
is the heterogeneous panel in Section 5D. A. Homogeneous Panel To compare the performance of the OLS, FMOLS, and DOLS estimators for the homogeneous panel we conducted Monte Carlo experiments based on a
Panel Cointegration
193
design similar to that of Phillips & Hansen (1990) and Phillips & Loretan (1991). yit = i + xit + uit and xit = xit 1 + it for i = 1, . . ., N, t = 1, . . . , T, where
with
u* it iid ~N * it
uit u* 0.3 it = + it * 21 it
0.4 0.6

u* it 1 * it 1
(31)

0 , 0
1 21
21 1
We generated i from a uniform distribution, U[0, 10], and set = 2. From Theorems 13 we know that the asymptotic results depend upon variances and covariances of the errors uit and it. The design in (31) is a good one since the endogeneity of the system is controlled by only two parameters, 21 and 21. We allowed 21 and 21 to vary and considered values of {0.8, 0.4, 0.0, 0.8} for 21 and {0.8, 0.4, 0.4} for 21. The estimate of the long-run covariance matrix in (15) was obtained by using the procedure KERNEL in COINT 2.0 with a Bartlett window. The lag truncation number was set arbitrarily at ve. Results with other kernels, such as Parzen and quadratic spectral kernels, are not reported, because no essential differences were found for most cases. Next, we recorded the results from our Monte Carlo experiments that OLS; the FMOLS examined the nite-sample properties of the OLS estimator, estimator, FM; and the DOLS estimator, D. The results we report are based on 10,000 replications and are summarized in Tables 14 and Figures 18. The FMOLS estimator was obtained by using a Bartlett window of lag length ve as in (15). Four lags and two leads were used for the DOLS estimator. Table 1 reports the Monte Carlo means and standard deviations (in FM ), and ( D ) for sample sizes OLS ), ( parentheses) of ( OLS, decrease at a rate of T = N = (20, 40, 60). The biases of the OLS estimator, T. For example, with 21 = 0.8 and 21 = 0.8, the bias at T = 20 is 0.201 and at T = 40 is 0.104. Also, the biases increase in 21 (with 21 > 0) and decrease in 21.
194
Table 1.
D OLS 21 = 0.4 FM D OLS D 21 = 0.8 FM
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
OLS
21 = 0.8 FM
21 = 0.8 T = 20
T = 40
T = 60
21 = 0.4 T = 20
0.201 (0.049) 0.104 (0.019) 0.070 (0.010)
0.176 (0.044) 0.099 (0.017) 0.069 (0.009)
0.001 (0.040) 0.000 (0.013) 0.000 (0.007)
0.097 (0.032) 0.049 (0.012) 0.033 (0.007)
0.113 (0.035) 0.062 (0.013) 0.042 (0.007)
0.002 (0.033) 0.001 (0.011) 0.000 (0.006)
0.022 (0.011) 0.011 (0.004) 0.007 (0.002)
0.069 (0.016) 0.036 (0.006) 0.024 (0.003)
0.009 (0.009) 0.004 (0.003) 0.003 (0.002)
T = 40
T = 60
21 = 0.0 T = 20
0.132 (0.038) 0.066 (0.014) 0.044 (0.007)
0.064 (0.025) 0.038 (0.009) 0.027 (0.005)
0.001 (0.027) 0.001 (0.027) 0.000 (0.005)
0.082 (0.030) 0.041 (0.011) 0.027 (0.006)
0.068 (0.029) 0.038 (0.011) 0.026 (0.006)
0.002 (0.031) 0.001 (0.009) 0.001 (0.005)
0.014 (0.013) 0.007 (0.005) 0.005 (0.002)
0.073 (0.018) 0.037 (0.006) 0.025 (0.003)
0.003 (0.013) 0.001 (0.004) 0.001 (0.002)
T = 40
T = 60
21 = 0.8 T = 20
0.079 (0.027) 0.039 (0.009) 0.026 (0.005)
0.002 (0.015) 0.005 (0.005) 0.004 (0.003)
0.001 (0.017) 0.001 (0.005) 0.000 (0.003)
0.059 (0.026) 0.029 (0.009) 0.019 (0.005)
0.019 (0.022) 0.012 (0.008) 0.009 (0.004)
0.002 (0.026) 0.001 (0.008) 0.001 (0.008)
0.005 (0.016) 0.002 (0.006) 0.001 (0.003)
0.069 (0.021) 0.035 (0.007) 0.023 (0.004)
0.006 (0.017) 0.003 (0.005) 0.002 (0.003)
T = 40
T = 60
0.029 (0.016) 0.015 (0.006) 0.009 (0.003)
0.038 (0.012) 0.018 (0.004) 0.011 (0.002)
0.007 (0.008) 0.003 (0.002) 0.002 (0.001)
0.019 (0.017) 0.009 (0.006) 0.007 (0.003)
0.036 (0.015) 0.018 (0.005) 0.012 (0.002)
0.007 (0.014) 0.003 (0.004) 0.002 (0.002)
0.114 (0.034) 0.057 (0.012) 0.038 (0.007)
0.012 (0.028) 0.011 (0.009) 0.010 (0.005)
0.000 (0.031) 0.000 (0.009) 0.000 (0.005)
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
Panel Cointegration
195
Table 2.
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators for Different N and T
OLS 0.135 (0.184) 0.070 (0.093) 0.047 (0.063) 0.024 (0.032) 0.082 (0.030) 0.042 (0.016) 0.028 (0.010) 0.014 (0.005) 0.081 (0.022) 0.041 (0.011) 0.028 (0.007) 0.014 (0.004) 0.080 (0.017) 0.041 (0.009) 0.027 (0.006) 0.014 (0.003) 0.079 (0.012) 0.041 (0.006) 0.027 (0.004) 0.014 (0.002) FM(5) 0.104 (0.196) 0.059 (0.012) 0.041 (0.064) 0.023 (0.031) 0.068 (0.029) 0.039 (0.015) 0.027 (0.010) 0.014 (0.005) 0.066 (0.021) 0.038 (0.011) 0.026 (0.007) 0.014 (0.004) 0.067 (0.017) 0.038 (0.009) 0.026 (0.006) 0.014 (0.003) 0.066 (0.012) 0.037 (0.006) 0.026 (0.004) 0.014 (0.002) FM(2) 0.122 (0.189) 0.065 (0.092) 0.043 (0.061) 0.022 (0.031) 0.075 (0.029) 0.039 (0.015) 0.026 (0.009) 0.013 (0.005) 0.073 (0.021) 0.038 (0.011) 0.025 (0.007) 0.013 (0.003) 0.073 (0.017) 0.038 (0.009) 0.025 (0.006) 0.012 (0.003) 0.072 (0.012) 0.037 (0.006) 0.025 (0.004) 0.013 (0.002) D(4,2) 0.007 (0.297) 0.001 (0.106) 0.001 (0.064) 0.001 (0.029) 0.002 (0.031) 0.001 (0.015) 0.000 (0.009) 0.000 (0.005) 0.001 (0.022) 0.001 (0.009) 0.001 (0.007) 0.000 (0.003) 0.002 (0.018) 0.001 (0.008) 0.001 (0.005) 0.000 (0.003) 0.002 (0.012) 0.001 (0.006) 0.001 (0.004) 0.000 (0.002) D(2,1) 0.031 (0.211) 0.015 (0.090) 0.009 (0.057) 0.004 (0.027) 0.017 (0.028) 0.008 (0.014) 0.006 (0.009) 0.003 (0.004) 0.017 (0.019) 0.008 (0.009) 0.005 (0.006) 0.003 (0.004) 0.016 (0.016) 0.008 (0.008) 0.005 (0.005) 0.003 (0.003) 0.016 (0.011) 0.008 (0.005) 0.005 (0.004) 0.003 (0.002)
(N,T) (1,20) (1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 = 0.4 and 21 = 0.4.
196
Table 3.
DOLS OLS 21 = 0.4 FMOLS DOLS OLS DOLS 21 = 0.8 FMOLS
Means Biases and Standard Deviations of t-statistics
OLS
21 = 0.8 FMOLS
21 = 0.8 T = 20
T = 40
T = 60
21 = 0.4 T = 20
7.247 (1.526) 10.047 (1.484) 12.250 (1.468)
5.594 (1.330) 8.435 (1.382) 10.749 (1.439)
0.047 (1.281) 0.004 (1.119) 0.004 (1.093)
4.650 (1.393) 6.503 (1.389) 7.937 (1.397)
4.823 (1.414) 6.833 (1.366) 8.429 (1.377)
0.086 (1.423) 0.069 (1.187) 0.084 (1.135)
1.758 (0.859) 2.491 (0.847) 3.030 (0.847)
7.927 (1.719) 11.584 (1.826) 14.402 (1.840)
1.049 (1.122) 1.386 (1.006) 1.633 (0.959)
T = 40
T = 60
21 = 0.0 T = 20
5.425 (1.340) 7.507 (1.302) 9.161 (1.287)
2.377 (1.042) 4.558 (1.071) 6.012 (1.109)
0.046 (1.132) 0.017 (1.023) 0.009 (1.009)
3.905 (1.334) 5.462 (1.325) 6.676 (1.329)
3.017 (1.282) 4.401 (1.205) 5.489 (1.197)
0.124 (1.402) 0.104 (1.168) 0.126 (1.118)
0.925 (0.867) 1.336 (0.856) 1.626 (0.859)
6.864 (1.642) 9.744 (1.665) 11.966 (1.644)
0.277 (1.203) 0.362 (1.054) 0.408 (0.999)
T = 40
T = 60
21 = 0.8 T = 20
3.927 (1.200) 5.453 (1.173) 6.674 (1.161)
0.145 (0.919) 0.796 (0.888) 1.294 (0.899)
0.054 (0.993) 0.001 (0.926) 0.147 (0.927)
2.944 (1.241) 4.134 (1.229) 5.070 (1.229)
1.006 (1.180) 1.684 (1.086) 2.198 (1.065)
0.096 (1.342) 0.168 (1.134) 0.199 (1.088)
0.277 (0.897) 0.334 (0.885) 0.405 (0.891)
5.198 (1.503) 7.086 (1.441) 8.556 (1.395)
0.439 (1.277) 0.547 (1.104) 0.663 (1.047)
T = 40
T = 60
2.067 (1.066) 2.898 (1.050) 3.574 (1.040)
3.694 (1.201) 5.509 (1.243) 7.130 (1.281)
0.635 (0.732) 0.948 (0.712) 1.236 (0.737)
1.229 (1.084) 1.758 (1.067) 2.188 (1.061)
2.893 (1.214) 4.041 (1.161) 4.983 (1.143)
0.530 (1.107) 0.741 (0.984) 0.913 (0.964)
4.495 (1.123) 6.255 (1.088) 7.630 (1.092)
0.542 (1.209) 1.349 (1.103) 1.975 (1.087)
0.013 (1.350) 0.002 (1.160) 0.003 (1.109)
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
Panel Cointegration
197
Table 4.
Means Biases and Standard Deviations of t-statistics for Different N and T

OLS 1.169 (1.497) 1.116 (1.380) 1.090 (1.357) 1.092 (1.333) 3.905 (1.334) 3.934 (1.307) 3.861 (1.306) 3.893 (1.312) 5.439 (1.347) 5.462 (1.325) 5.457 (1.328) 5.469 (1.296) 6.677 (1.329) 6.699 (1.323) 6.676 (1.329) 6.677 (1.311) 9.407 (1.350) 9.418 (1.313) 9.411 (1.310) 9.408 (1.315) FMOLS(5) 1.264 (2.326) 1.169 (1.805) 1.162 (1.692) 1.239 (1.165) 3.017 (1.281) 3.202 (1.206) 3.202 (1.150) 3.247 (1.149) 4.163 (1.269) 4.401 (1.205) 4.506 (1.199) 4.647 (1.190) 5.097 (1.258) 5.384 (1.204) 5.489 (1.197) 5.656 (1.196) 7.153 (1.262) 7.753 (1.171) 7.717 (1.182) 7.932 (1.195) FMOLS(2) 1.334 (2.031) 1.232 (1.738) 1.195 (1.676) 1.217 (1.652) 3.156 (1.230) 3.169 (1.200) 3.111 (1.191) 3.141 (1.209) 4.342 (1.226) 4.344 (1.197) 4.339 (1.192) 4.356 (1.176) 5.314 (1.208) 5.309 (1.192) 5.289 (1.191) 5.299 (1.182) 7.446 (1.215) 7.753 (1.171) 7.429 (1.174) 7.432 (1.181) DOLS(4,2) 0.304 (3.224) 0.113 (2.086) 0.071 (1.778) 0.056 (1.531) 0.124 (1.402) 0.114 (1.186) 0.053 (1.122) 0.073 (1.078) 0.088 (1.358) 0.104 (1.168) 0.098 (1.121) 0.106 (1.050) 0.169 (1.361) 0.162 (1.169) 0.126 (1.118) 0.115 (1.056) 0.220 (1.348) 0.193 (1.157) 0.177 (1.093) 0.152 (1.057) DOLS(2,1) 0.232 (2.109) 0.258 (1.689) 0.254 (1.554) 0.234 (1.448) 0.695 (1.184) 0.634 (1.099) 0.677 (1.079) 0.642 (1.061) 1.008 (1.169) 0.928 (1.092) 0.913 (1.081) 0.879 (1.033) 1.179 (1.162) 1.097 (1.094) 1.106 (1.074) 1.083 (1.041) 1.662 (1.163) 1.565 (1.085) 1.549 (1.053) 1.530 (1.040)
(N,T) (1,20) (1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 = 0.4 and 21 = 0.4.
198
Fig. 1.
Distribution of biases of Estimators with N = 40, T = 20.
Panel Cointegration
Fig. 2.
Distribution of t-statistics with N = 40, T = 20. 199
200
Fig. 3.
Panel Cointegration
Fig. 4.
Distribution of t-statistics with N = 40, T = 40.
201
202
Fig. 5.
Panel Cointegration
Fig. 6.
Distribution of t-statistics with N = 40, T = 60.
203
204
Fig. 7.
Panel Cointegration
Fig. 8.
Distribution of t-statistics with N = 40, T = 120. 205
206
While we expected the OLS estimator to be biased, we expected the FMOLS estimator to produce much better estimates. However, it is noticeable that the FMOLS estimator has a downward bias when 21 0 and an upward bias when FM, presents the same degree of 21 < 0. In general, the FMOLS estimator, OLS. For example, while the difculty with bias as does the OLS estimator, OLS FM, reduces the bias substantially and outperforms FMOLS estimator, when 21 > 0 and 21 < 0, the opposite is true when 21 > 0 and 21 > 0. Likewise, FM is less biased than OLS for values of 21 = 0.8. Yet, for when 21 = 0.8, OLS is less than the bias in FM. There seems values of 21 = 0.4, the bias in FM when 21 < 0. This is probably due OLS and to be little to choose between to the failure of the non-parametric correction procedure in the presence of a negative serial correlation of the errors, i.e. a negative MA value, 21 < 0. FM outperforms OLS when 21 < 0. On the Finally, for the cases where 21 = 0.0, OLS when 21 > 0. FM is more biased than other hand, D, is distinctly In contrast, the results in Table 1 show that the DOLS, superior to the OLS and FMOLS estimators for all cases in terms of the mean biases. It was noticeable that the FMOLS leads to a signicant bias. Clearly, the DOLS outperformed both the OLS and FMOLS estimators. The FMOLS estimator is also complicated by the dependence of the correction in (11) and (12) upon the preliminary estimator (here we use OLS), which may be biased in nite samples. The DOLS differs from the FMOLS estimator in that the DOLS requires no initial estimation and no non-parametric correction. It is important to know the effects of the variations in panel dimensions on the results, since the actual panel data have a wide variety of cross-section and time-series dimensions. Table 2 considers 20 different combinations for N and T, each ranging from 20 to 120 with 21 = 0.4 and 21 = 0.4. First, we notice that the cross-section dimension has a signicant effect on the biases of FM, and D when N is increased from 1 to 20. However, when N is OLS, increased from 20 to 40 and beyond, there is little effect on the biases of FM, and D. From this it seems that in practice the T dimension must OLS, exceed the N dimension, especially for the OLS and FMOLS estimators, in order to get a good approximation of the limiting distributions of the estimators. For example, for each of the estimators in Table 2, the reported bias is substantially less for (T = 120, N = 40) than it is for either (T = 40, N = 40) or (T = 40, N = 120). The results in Table 2 again conrm the superiority of the DOLS. The largest bias in the DOLS with four lags and two leads, DOLS(4, 2), is less than or equal to 0.02 for all cases except at N = 1 and T = 20, which can be compared with a simulation standard error (in parentheses) that is less than 0.007 when N 20 and, T 60, conrming the accuracy of the DOLS(4, 2). The biases in DOLS with two lags and one lead, DOLS(2, 1) start off slightly biased
Panel Cointegration
207
at N = 1 and T = 20, and converge to an almost unbiased coefcient estimate at N = 20 and T = 40. The biases of DOLS(2, 1) move in the opposite direction to those of DOLS(4, 2). Figures 1, 3, 5 and 7 display estimated pdfs for the estimators for 21 = 0.4 and = 0.4 with N = 40 (T = 20 in Figure 1, T = 40 in Figure 3, T = 60 in Figure 5 and T = 120 in Figure 7). In Figure 1, N = 40, T = 20, the DOLS is much better centered than the OLS and FMOLS. In Figures 3, 5 and 7, the biases of the OLS and FMOLS were reduced as T increases, the DOLS still dominates the OLS and FMOLS. Monte Carlo means and standard deviations of the t-statistic, t = 0, are given in Table 3. Here, the OLS t-statistic is the conventional t-statistic as printed by standard statistical packages, and the FMOLS and DOLS t-statistics. With all values of 21 and 21, the DOLS(4, 2) t-statistic is well approximated by a standard N(0, 1) suggested from the asymptotic results. The DOLS(4, 2) tstatistic is much closer to the standard normal density than the OLS t-statistic and the FMOLS t-statistic. When 21 > 0 and 21 < 0, the OLS t-statistic is more heavily biased than the FMOLS t-statistic. Again, when 21 > 0 and 21 > 0, the opposite is true. Even when 21 = 0, the FMOLS t-statistic is not well approximated by a standard N(0, 1). The OLS t-statistic performs better than the FMOLS t-statistic when 21 = 0.8 and 21 > 0 and when 21 0.4 and 21 = 0.8, but not in other cases. The FMOLS t-statistic in general does not perform better than the OLS t-statistic. Table 4 shows that both the OLS t-statistic and the FMOLS t-statistic become more negatively biased as the dimension of cross-section N increases. The heavily negative biases of the FMOLS t-statistic in Tables 34 again indicate the poor performance of the FMOLS estimator. For the DOLS(4, 2), the biases decrease rapidly and the standard errors converge to 1.0 as T increases. Similar to Table 2, we observe from Table 4 that for the DOLS tstatistic the T dimension is more important than the N dimension in reducing the biases of the t-statistics. However, the improvement of the DOLS t-statistic is rather marginal as T increases. Figures 2, 4, 6 and 8 display estimated pdfs for the t-statistics for 21 = 0.4 and = 0.4 with N = 40 (T = 20 in Figure 2, T = 40 in Figure 4, T = 60 in Figure 6 and T = 120 in Figure 8). The gures show clearly that the DOLS t-statistic is well approximated by a standard N(0, 1) especially as T increases. From the results in Tables 2 and 4 and Figures 18 we note that the sequential limit theory approximates the limiting distributions of the DOLS and its t-statistic very well. in (15) It is known that when the length of time series is short the estimate may be sensitive to the length of the bandwidth. In Tables 2 and 4, we rst
208
investigate the sensitivity of the FMOLS estimator with respect to the choice of length of the bandwidth. We extend the experiments by changing the lag length from 5 to 2 for a Barlett window. Overall, the results show that changing the lag length from 5 to 2 does not lead to substantial changes in biases for the FMOLS estimator and its t-statistic. However, the biases of the DOLS estimator and its t-statistic are reduced substantially when the lags and leads are changed from (2, 1) to (4, 2) as predicted from Theorem 3. The results from Tables 2 and 4 show that the DOLS method gives different estimates of and the t-statistic depending on the number of lags and leads we choose. This seems to be a drawback of the DOLS estimator. Further research is needed on how to choose the lags and leads for the DOLS estimator in the panel setting. B. ARMA(1, 1) Error Terms In this section, we look at simulations where, instead of the errors being generated by an MA(1) process, like in (31), the errors are generated by an ARMA(1, 1) process, as in (30). One may question that the MA(1) specication in (31) may be unfair to the FMOLS estimator. One of the reasons why the performance of the DOLS is much better than that of the FMOLS lies in the simulation design in (31), which assumes that the error terms are MA(1) processes. If (uit , it) is an MA(1) process, then uit can be written exactly with three terms, it1, it, and it + 1 and no lag truncation approximation is required for the DOLS. Tables 5 and 6 report the performance of OLS, FMOLS, and DOLS and their t-statistics when the errors are generated by an ARMA(1, 1) process. Tables 5 and 6 show that the FMOLS estimator and its t-statistic are less biased than the OLS estimator for most cases and is outperformed by the DOLS. Again, when 21 0.0 and 21 = 0.8 the FMOLS estimator and its t-statistic suffer from severe biases. On the other hand, we observe that DOLS shows less improvement compared with OLS and FMOLS, in contrast to Tables 1 and 3. However, the good performance of DOLS may disappear for high order ARMA(p, q) error process. C. Non-normal Errors In this section, we conduct an experiment where the error terms are nonnormal. The DGP is similar to that of Gonzalo (1994):
Table 5.
D OLS 21 = 0.4 FM D OLS D 21 = 0.8 FM
OLS
21 = 0.8 FM
Panel Cointegration
21 = 0.8 T = 20
T = 40
T = 60
21 = 0.4 T = 20
0.110 (0.042) 0.052 (0.015) 0.034 (0.008)
0.101 (0.038) 0.052 (0.014) 0.035 (0.008)
0.003 (0.037) 0.001 (0.012) 0.000 (0.007)
0.049 (0.029) 0.024 (0.010) 0.015 (0.006)
0.062 (0.020) 0.031 (0.011) 0.021 (0.006)
0.000 (0.030) 0.000 (0.010) 0.000 (0.005)
0.009 (0.011) 0.004 (0.004) 0.003 (0.002)
0.036 (0.012) 0.017 (0.004) 0.012 (0.002)
0.003 (0.009) 0.001 (0.003) 0.001 (0.002)
T = 40
T = 60
21 = 0.0 T = 20
0.073 (0.032) 0.034 (0.011) 0.022 (0.006)
0.039 (0.024) 0.020 (0.008) 0.013 (0.004)
0.001 (0.024) 0.000 (0.008) 0.000 (0.004)
0.045 (0.028) 0.021 (0.010) 0.013 (0.005)
0.038 (0.027) 0.019 (0.009) 0.012 (0.005)
0.000 (0.028) 0.000 (0.009) 0.000 (0.005)
0.006 (0.013) 0.002 (0.004) 0.002 (0.002)
0.037 (0.014) 0.017 (0.004) 0.012 (0.002)
0.001 (0.012) 0.001 (0.004) 0.000 (0.002)
T = 40
T = 60
21 = 0.8 T = 20
0.046 (0.025) 0.021 (0.009) 0.014 (0.005)
0.006 (0.015) 0.003 (0.005) 0.002 (0.003)
0.001 (0.015) 0.000 (0.005) 0.001 (0.003)
0.035 (0.025) 0.016 (0.008) 0.011 (0.005)
0.013 (0.022) 0.006 (0.007) 0.004 (0.004)
0.001 (0.023) 0.001 (0.008) 0.001 (0.004)
0.001 (0.016) 0.001 (0.006) 0.000 (0.003)
0.034 (0.016) 0.016 (0.005) 0.010 (0.003)
0.003 (0.015) 0.001 (0.005) 0.002 (0.003)
T = 40
T = 60
0.020 (0.016) 0.008 (0.005) 0.006 (0.003)
0.017 (0.009) 0.008 (0.003) 0.005 (0.001)
0.002 (0.007) 0.002 (0.002) 0.001 (0.001)
0.016 (0.017) 0.007 (0.006) 0.005 (0.003)
0.017 (0.013) 0.008 (0.004) 0.005 (0.002)
0.003 (0.012) 0.001 (0.004) 0.001 (0.002)
0.035 (0.024) 0.016 (0.009) 0.011 (0.005)
0.012 (0.024) 0.007 (0.009) 0.005 (0.005)
0.000 (0.031) 0.000 (0.009) 0.000 (0.005)
209
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
210
Table 6.
DOLS OLS 21 = 0.4 FMOLS DOLS OLS DOLS 21 = 0.8 FMOLS
OLS
21 = 0.8 FMOLS
21 = 0.8 T = 20
T = 40
T = 60
21 = 0.4 T = 20
5.316 (1.929) 7.013 (1.903) 8.437 (1.899)
3.569 (1.323) 4.601 (1.219) 5.22 (1.195)
0.119 (1.290) 0.090 (1.119) 0.068 (1.077)
3.411 (1.924) 4.583 (1.949) 5.523 (1.969)
2.912 (1.390) 3.580 (1.216) 4.206 (1.178)
0.006 (1.417) 0.009 (1.166) 0.006 (1.111)
1.158 (1.426) 1.723 (1.445) 2.097 (1.435)
4.589 (1.420) 6.144 (1.343) 7.428 (1.294)
0.347 (1.139) 0.505 (1.011) 0.603 (0.978)
T = 40
T = 60
21 = 0.0 T = 20
4.152 (1.762) 5.424 (1.733) 6.521 (1.721)
1.857 (1.106) 2.576 (1.044) 3.179 (1.036)
0.056 (1.132) 0.045 (1.027) 0.034 (1.004)
3.064 (1.867) 4.069 (1.880) 4.899 (1.898)
1.877 (1.314) 2.346 (1.149) 2.779 (1.114)
0.025 (1.388) 0.011 (1.152) 0.027 (1.096)
0.705 (1.454) 1.099 (1.479) 1.343 (1.473)
3.858 (1.373) 5.034 (1.268) 6.016 (1.211)
0.068 (1.208) 0.134 (1.053) 0.144 (1.014)
T = 40
T = 60
21 = 0.8 T = 20
3.184 (1.644) 4.120 (1.616) 4.952 (1.599)
0.353 (0.952) 0.624 (0.897) 0.827 (0.904)
0.034 (0.956) 0.047 (0.909) 0.058 (0.913)
2.538 (1.769) 3.327 (1.771) 4.131 (1.746)
0.732 (1.226) 0.967 (1.085) 1.141 (1.021)
0.038 (1.313) 0.075 (1.116) 0.206 (1.118)
0.047 (1.498) 0.194 (1.528) 0.064 (1.498)
2.825 (1.327) 3.557 (1.194) 4.005 (1.096)
0.230 (1.276) 0.212 (1.095) 0.693 (1.094)
T = 40
T = 60
1.956 (1.529) 2.471 (1.507) 2.966 (1.484)
1.733 (0.933) 2.511 (0.871) 3.270 (0.897)
0.214 (0.663) 0.317 (0.664) 0.428 (0.694)
1.496 (1.589) 1.888 (1.578) 2.267 (1.571)
1.429 (1.015) 1.917 (1.010) 2.237 (0.999)
0.221 (1.052) 0.294 (0.956) 0.363 (0.941)
2.315 (1.577) 3.089 (1.644) 3.736 (1.676)
0.564 (1.195) 0.876 (1.088) 1.132 (1.062)
0.002 (1.551) 0.005 (1.239) 0.003 (1.155)
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
Panel Cointegration
211

u* it = and
uit u* 0.3 it = + it * 21 it
0.4 0.6
u* it1 , * it1
(32)
1 2 1/2 0.5* it + (10.5 ) u** it ,
* it = ** it ,
where u** and ** are independent exponential random variables with a it it parameter 1. The results from Tables 78 show that while the DOLS estimator performs better in terms of the biases, the distribution of the DOLS t-statistic is far from the asymptotic N(0, 1). The standard deviations of the DOLS tstatistic are badly underestimated. To summarize the results so far, it would appear that the DOLS estimator is the best estimator overall, though the standard error for the DOLS t-statistic shows signicant downward bias when the error terms are generated from nonnormal distributions. D. Heterogeneous Panel In Sections AC we compare the small sample properties of the OLS, FMOLS, and DOLS estimators and conclude that the DOLS estimator and its t-statistic generally exhibit the least bias. One of the reasons for the poor performance of the FMOLS estimator in the homogeneous panel is that the FMOLS estimator needs to use a kernel estimator for the asymptotic covariance matrix, while the DOLS does not. By contrast, for the heterogeneous panel both DOLS in (20) and OLS in (33) use kernel estimators. Consequently, one may expect that the much better performance of the DOLS estimator in Sections 5A-C is limited to only very specialized cases, e.g. in the homogeneous panel. To test this, we now compare the performance of the OLS, FMOLS, and DOLS estimators for a heterogeneous panel using Monte Carlo experiments similar to those in Section 5A. The DGP is yit = i + xit + uit and xit = xit1 + it for i = 1, . . . , N, t = 1, . . . T, where
212
Table 7.
D OLS = 0.5 FM D OLS D =1 FM
OLS
= 0.25 FM
21 = 0.8 T = 20
T = 40
T = 60
21 = 0.4 T = 20
0.005 (0.009) 0.001 (0.002) 0.001 (0.001)
0.011 (0.009) 0.003 (0.002) 0.001 (0.001)
0.000 (0.002) 0.000 (0.000) 0.000 (0.000)
0.002 (0.006) 0.001 (0.001) 0.000 (0.001)
0.007 (0.006) 0.002 (0.001) 0.001 (0.001)
0.000 (0.003) 0.028 (0.001) 0.000 (0.000)
0.001 (0.003) 0.000 (0.001) 0.000 (0.000)
0.004 (0.003) 0.001 (0.001) 0.001 (0.000)
0.000 (0.002) 0.000 (0.000) 0.000 (0.000)
T = 40
T = 60
21 = 0.0 T = 20
0.002 (0.009) 0.002 (0.004) 0.001 (0.002)
0.008 (0.009) 0.005 (0.004) 0.002 (0.002)
0.001 (0.005) 0.000 (0.001) 0.000 (0.001)
0.002 (0.009) 0.000 (0.002) 0.000 (0.001)
0.008 (0.009) 0.002 (0.002) 0.001 (0.001)
0.000 (0.005) 0.000 (0.001) 0.000 (0.001)
0.001 (0.004) 0.000 (0.001) 0.000 (0.000)
0.005 (0.004) 0.001 (0.001) 0.001 (0.000)
0.000 (0.002) 0.000 (0.001) 0.000 (0.000)
T = 40
T = 60
21 = 0.8 T = 20
0.012 (0.058) 0.003 (0.014) 0.001 (0.007)
0.010 (0.057) 0.002 (0.014) 0.001 (0.007)
0.001 (0.054) 0.000 (0.013) 0.000 (0.006)
0.005 (0.017) 0.001 (0.004) 0.001 (0.002)
0.007 (0.016) 0.002 (0.004) 0.001 (0.002)
0.001 (0.014) 0.000 (0.003) 0.000 (0.002)
0.001 (0.005) 0.000 (0.001) 0.000 (0.001)
0.005 (0.005) 0.001 (0.001) 0.001 (0.001)
0.000 (0.003) 0.000 (0.001) 0.000 (0.000)
T = 40
T = 60
0.011 (0.013) 0.003 (0.003) 0.001 (0.001)
0.022 (0.012) 0.006 (0.003) 0.003 (0.001)
0.000 (0.002) 0.000 (0.001) 0.000 (0.000)
0.034 (0.020) 0.009 (0.005) 0.004 (0.002)
0.049 (0.019) 0.014 (0.005) 0.007 (0.002)
0.001 (0.013) 0.000 (0.003) 0.000 (0.001)
0.039 (0.016) 0.012 (0.004) 0.005 (0.002)
0.008 (0.014) 0.003 (0.004) 0.002 (0.002)
0.000 (0.013) 0.000 (0.003) 0.000 (0.001)
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are non-normal.
Table 8.
DOLS OLS = 0.5 FMOLS DOLS OLS DOLS =1 FMOLS
OLS
= 0.25 FMOLS
Panel Cointegration
T = 20
T = 40
T = 60
21 = 0.4 T = 20
0.699 (1.311) 0.717 (1.253) 0.741 (1.267)
1.248 (0.940) 0.892 (0.599) 0.738 (0.488)
0.006 (0.209) 0.002 (0.139) 0.002 (0.113)
0.472 (1.245) 0.484 (1.191) 0.506 (1.199)
1.055 (0.931) 0.752 (0.597) 0.623 (0.483)
0.039 (0.421) 0.003 (0.276) 0.028 (0.227)
0.406 (1.040) 0.424 (0.981) 0.445 (0.979)
1.265 (0.925) 0.918 (0.588) 0.764 (0.472)
0.118 (0.520) 0.096 (0.336) 0.088 (0.276)
T = 40
T = 60
21 = 0.0 T = 20
0.259 (1.243) 0.587 (1.250) 0.611 (1.264)
0.884 (0.932) 0.787 (0.599) 0.651 (0.488)
0.071 (0.561) 0.007 (0.230) 0.008 (0.188)
0.259 (1.243) 0.268 (1.189) 0.289 (1.197)
0.884 (0.932) 0.626 (0.599) 0.519 (0.485)
0.071 (0.561) 0.054 (0.363) 0.052 (0.299)
0.199 (1.040) 0.213 (0.981) 0.232 (0.978)
1.152 (0.927) 0.831 (0.589) 0.692 (0.474)
0.019 (0.567) 0.016 (0.368) 0.020 (0.304)
T = 40
T = 60
21 = 0.8 T = 20
0.275 (1.271) 0.282 (1.231) 0.264 (1.248)
0.164 (0.941) 0.106 (0.616) 0.093 (0.505)
0.014 (0.896) 0.013 (0.579) 0.002 (0.477)
0.340 (1.236) 0.347 (1.186) 0.332 (1.193)
0.398 (0.941) 0.268 (0.611) 0.226 (0.497)
0.031 (0.784) 0.025 (0.509) 0.013 (0.421)
0.145 (1.041) 0.141 (0.982) 0.125 (0.978)
0.961 (0.931) 0.685 (0.594) 0.570 (0.478)
0.066 (0.619) 0.053 (0.407) 0.039 (0.337)
T = 40
T = 60
1.104 (1.326) 1.134 (1.262) 1.163 (1.274)
1.714 (0.951) 1.249 (0.605) 1.036 (0.492)
0.000 (0.189) 0.001 (0.126) 0.001 (0.102)
2.286 (1.278) 2.368 (1.208) 2.416 (1.214)
2.528 (0.976) 1.947 (0.633) 1.637 (0.513)
0.035 (0.650) 0.035 (0.446) 0.033 (0.363)
2.749 (1.067) 2.946 (0.992) 3.011 (0.981)
0.539 (0.984) 0.598 (0.672) 0.538 (0.554)
0.026 (0.899) 0.008 (0.624) 0.002 (0.525)
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator. (d) The error terms are non-normal.
213
214
with
u* it iid ~N * it
uit u* 0.3 it = + it * 21 it
0.4 0.6

u* it1 * it1

0 , 0
1 21
21 1
As in Section A, we generated i from a uniform distribution, U[0, 10], and set = 2. In this section, we allowed 21 and 21 to be random in order to generate the heterogeneous panel, i.e. both 21 and 21 are generated from a uniform distribution, U[0.8, 0.8]. We hold these values xed in simulations. An i, was obtained by the COINT 2.0 with a Bartlett estimate of i = i + i + i, window. The lag truncation number was set at 5. The three estimators considered are the FMOLS, DOLS, and the OLS, where the OLS is dened as * OLS =

N T i=1 t=1
(x** ** ** it x i )(x** it x i )

1 N T i=1 t=1
(x** ** it x i )(y** it )
with x** ** = it = wi xit, y** it = wiyit, x i
1 T
T t=1
(33) 1 x** it , and wi = [i ]11. Two FMOLS
estimators will be considered, one using the lag length of 5 (FMOLS(5)), the second using the lag length of 2 (FMOLS(2)). Two DOLS estimators are also considered: DOLS with four lags and two leads, DOLS(4, 2) and DOLS with two lags and one lead, DOLS(2, 1). The relatively good performance of the DOLS estimator in a homogeneous panel can also be observed in Table 9. The biases of the OLS and FMOLS estimators are substantial. Again, the DOLS outperforms the OLS and FMOLS. Note from Table 9 that the FMOLS always has more bias than the OLS for all N and T except when N = 1. The poor performance of the FMOLS in the heterogenous panels indicates that the FMOLS in Section 4 is not recommended in practice. A possible reason for the poor performance of the FMOLS in heterogenous panels is that it has to go through two non-parametric corrections, as in (22) and (23). Therefore the failure of the non-parametric correction could be very severe for the FMOLS estimator in heterogenous panels. Pedroni (1996) proposed several alternative versions of the FMOLS estimator such as an FMOLS estimator based on the
Panel Cointegration
215
Table 9.
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators for Different N and T in a Heterogeneous Panel
* OLS 0.102 (0.163) 0.052 (0.079) 0.035 (0.052) 0.018 (0.026) 0.025 (0.032) 0.016 (0.014) 0.012 (0.009) 0.006 (0.004) 0.023 (0.024) 0.015 (0.009) 0.013 (0.006) 0.014 (0.004) 0.023 (0.019) 0.015 (0.008) 0.011 (0.005) 0.006 (0.002) 0.022 (0.014) 0.015 (0.006) 0.011 (0.004) 0.006 (0.002) * FM(5) 0.076 (0.319) 0.006 (0.116) 0.004 (0.066) 0.008 (0.027) 0.069 (0.054) 0.041 (0.019) 0.028 (0.011) 0.014 (0.005) 0.089 (0.038) 0.048 (0.013) 0.032 (0.008) 0.014 (0.004) 0.073 (0.031) 0.042 (0.011) 0.029 (0.006) 0.014 (0.003) 0.075 (0.003) 0.042 (0.008) 0.029 (0.004) 0.014 (0.002) * FM(2) 0.008 (0.212) 0.018 (0.084) 0.014 (0.050) 0.009 (0.023) 0.073 (0.034) 0.035 (0.014) 0.023 (0.009) 0.011 (0.004) 0.083 (0.024) 0.039 (0.009) 0.026 (0.006) 0.012 (0.003) 0.074 (0.019) 0.036 (0.008) 0.023 (0.005) 0.011 (0.002) 0.072 (0.022) 0.036 (0.006) 0.024 (0.004) 0.011 (0.002) * D(4,2) 0.011 (0.405) 0.001 (0.121) 0.001 (0.071) 0.000 (0.030) 0.000 (0.054) 0.001 (0.020) 0.000 (0.012) 0.000 (0.005) 0.000 (0.038) 0.001 (0.014) 0.000 (0.009) 0.000 (0.003) 0.001 (0.031) 0.001 (0.011) 0.000 (0.007) 0.000 (0.003) 0.001 (0.022) 0.001 (0.008) 0.000 (0.005) 0.000 (0.002) * D(2,1) 0.004 (0.264) 0.006 (0.099) 0.005 (0.061) 0.002 (0.029) 0.006 (0.040) 0.004 (0.017) 0.003 (0.011) 0.002 (0.005) 0.007 (0.028) 0.004 (0.012) 0.003 (0.008) 0.002 (0.004) 0.006 (0.023) 0.004 (0.009) 0.003 (0.006) 0.002 (0.003) 0.016 (0.011) 0.004 (0.007) 0.003 (0.004) 0.002 (0.002)
(N,T) (1,20) (1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 ~ U[0.8,0.8] and 21 ~ U[0.8,0.8].
216
transformation of the estimated residuals and a group-mean based FMOLS estimator. It would be interesting to study further the issues of estimation and inference in heterogenous panels. However, it goes beyond the scope of this chapter. From Table 10, we note that the DOLS t-statistics tend to have heavier tails than predicted by the asymptotic distribution theory, though the bias of the DOLS t-statistic is much lower than those of the OLS and FMOLS t-statistics. It appears that the DOLS still is the best estimator overall in a heterogeneous panel.
V. CONCLUSION
This chapter discusses limiting distributions for the OLS, FMOLS, and DOLS estimators in a cointegrated regression. We also investigate the nite sample proprieties of the OLS, FMOLS, and DOLS estimators. The results from Monte Carlo simulations can be summarized as follows: First, for the homogeneous panel, when the serial correlation parameter, 21, and the endogeneity parameter, 21, are both negative, the OLS is the most biased estimator. The OLS is biased in almost all cases for the heterogenous panel. Second, the FMOLS is more biased than the OLS when 21 0 and 21 > 0 for the homogeneous panel. The FMOLS is severely biased for the heterogenous panel in almost all trials. This indicates the failure of the parametric correction is very serious, especially in the heterogenous panel. Third, DOLS performs very well in all cases for both the homogeneous and heterogenous panels. Adding the number of leads and lags reduces the bias of the DOLS substantially. This was predicted by the asymptotic theory in Theorem 3. Fourth, the sequential limit theory approximates the limit distributions of the DOLS and its t-statistic very well. All in all, our ndings are summarized as follows: (i) The OLS estimator has a non-negligible bias in nite samples. (ii) The FMOLS estimator does not improve over the OLS estimator in general. (iii) The FMOLS estimator is complicated by the dependence of the correction terms upon the preliminary estimator (here we use OLS), which may be very biased in nite samples with panel data. More seriously, the failure of the non-parametric correction for the FMOLS in panel data could be severe. This indicates that the DOLS estimator may be more promising than the OLS or FMOLS estimators in estimating cointegrated panel regressions.
Panel Cointegration
217
Table 10.
Means Biases and Standard Deviations of t-statistics for Different N and T in a Heterogeneous Panel
OLS 0.893 (1.390) 0.861 (1.265) 0.844 (1.233) 0.845 (1.212) 1.221 (1.578) 1.629 (1.344) 1.774 (1.282) 1.957 (1.239) 1.612 (1.640) 2.194 (1.392) 2.417 (1.306) 2.832 (1.234) 1.946 (1.697) 2.715 (1.389) 3.045 (1.328) 3.346 (1.250) 2.675 (1.720) 3.802 (1.408) 4.269 (1.336) 4.715 (1.250) FMOLS(5) 0.588 (2.473) 0.101 (1.849) 0.095 (1.579) 0.372 (1.336) 2.411 (1.902) 2.899 (1.345) 3.031 (1.195) 3.095 (1.047) 4.381 (1.882) 4.807 (1.341) 4.905 (1.199) 4.886 (1.059) 4.408 (1.884) 5.171 (1.320) 5.361 (1.170) 5.420 (1.033) 6.382 (1.878) 7.399 (1.314) 7.633 (1.162) 7.723 (1.045) FMOLS(2) 0.058 (1.643) 0.280 (1.331) 0.347 (1.207) 0.459 (1.139) 2.530 (1.192) 2.518 (0.999) 2.508 (0.952) 2.466 (0.907) 4.079 (1.191) 3.969 (1.004) 3.932 (0.960) 3.839 (0.911) 4.474 (1.182) 4.407 (0.976) 4.380 (0.933) 4.281 (0.889) 6.383 (1.169) 6.272 (0.967) 6.209 (0.931) 6.084 (0.897) DOLS(4,2) 0.093 (3.303) 0.009 (1.980) 0.016 (1.729) 0.016 (1.510) 0.010 (1.983) 0.059 (1.485) 0.004 (1.329) 0.046 (1.197) 0.039 (1.987) 0.068 (1.472) 0.007 (1.319) 0.099 (1.181) 0.041 (1.932) 0.110 (1.452) 0.027 (1.307) 0.105 (1.181) 0.073 (1.939) 0.145 (1.444) 0.047 (1.307) 0.136 (1.178) DOLS(2,1) 0.029 (2.156) 0.106 (1.618) 0.119 (1.489) 0.101 (1.405) 0.219 (1.468) 0.271 (1.259) 0.347 (1.184) 0.393 (1.121) 0.365 (1.466) 0.432 (1.233) 0.515 (1.169) 0.608 (1.099) 0.408 (1.449) 0.472 (1.221) 0.572 (1.165) 0.697 (1.099) 0.580 (1.439) 0.683 (1.215) 0.803 (1.165) 0.977 (1.098)
(N,T) (1,20) (1,40) (1,60) (1,120) (20,20) (20,40) (20,60) (20,120) (40,20) (40,40) (40,60) (40,120) (60,20) (60,40) (60,60) (60,120) (120,20) (120,40) (120,60) (120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2) estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1) estimators. (c) 21 ~ U[0.8,0.8] and 21 ~ U[0.8,0.8].
218
ACKNOWLEDGMENTS
We thank Suzanne McCoskey, Peter Pedroni, Andrew Levin and participants of the 1998 North American Winter Meetings of the Econometric Society for helpful comments and Bangtian Chen for his research assistance on an earlier draft of this chapter. Thanks also go to Denise Paul for correcting my English and carefully checking the manuscript to enhance its readability. A Gauss program for this paper can be retrieved from http://web.syr.edu/ ~ cdkao. Address correspondence to: Chihwa Kao, Center for Policy Research, 426 Eggers Hall, Syracuse University, Syracuse, NY. 132441020; e-mail: cdkao@maxwell.syr.edu.
REFERENCES
Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons. Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey. Advances in Econometrics, 15, 751. Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different Bargaining Levels Cointegrated? Applied Economics, 26, 353361. Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management Sciences, 19, 75114. Gonzalo, J. (1994). Five Alternative Methods of Estimating Long-Run Equilibrium Relationships. Journal of Econometrics, 60, 203233. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels. Manuscript, University of Cambridge. Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 144. Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data When the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center for Policy Research, Syracuse University. Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: New Results. Discussion paper, Department of Economics, UC-San Diego. Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test: Evidence From Simulations and the Bootstrap. Oxford Bulletin of Economics and Statistics, 61, 631652. McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 5784. Pesaran, H., & Smith, R. (1995). Estimating Long-Run Relationships from Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79113. Pedroni, P. (1997). Panel Cointegration: Asymptotics and Finite Sample Properties of Pooled Time Series Tests with an Application to the PPP Hypothesis. Working paper, Department of Economics, No. 95013, Indiana University.
Panel Cointegration
219
Pedroni, P. (1996). Fully Modied OLS for Heterogeneous Cointegrated Panels and the Case of Purchasing Power Parity. Working paper, Department of Economics, No. 9620, Indiana University. Phillips, P. C. B., & Durlauf, S. N. (1986). Multiple Time Series Regression with Integrated Processes. Review of Economic Studies, 53, 473495. Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables Regression with I(1) Processes. Review of Economic Studies, 57, 99125. Phillips, P. C. B., & Loretan, M. (1991). Estimating Long-Run Economic Equilibria. Review of Economic Studies, 58, 407436. Phillips, P. C. B., & Moon, H. (1999). Linear Regression Limit Theory for Non-stationary Panel Data. Econometrica, 67, 10571111. Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20, 9711001. Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data. Economics Letters, 44, 919. Saikkonen, P. (1991). Asymptotically Efcient Estimation of Cointegrating Regressions. Econometric Theory, 58, 121. Summers, R., & Heston, A. (1991). The Penn World Table; An Expanded Set of International Comparisons 19501988. Quarterly Journal of Economics, 106, 327368. Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems. Econometrica, 61, 783820.
APPENDIX
Proof of Theorem 3 First we write (19) in vector form: i yi = ei + xi + ZiqC + v = xi + ZiD + v i (say), where yi, is a T 1 vector of yit; e is T 1 unit vector; Ziq is the T 2q matrix of observations on the 2 q regressors xit q, , xit + q; xi is a vector of T k i is a T 1 vector of v it; Zi is a of xit; C is a (2 q) 1 vector of cij; v T (2 q + 1) matrix, Zi = (e, Ziq); and D is a (2 q + 1) 1 vector of 1 Z parameters. Let Qi = I Zi(Z iZi) i. It follows that D ) = (
(x iQi xi)
(x i) . iQiv
i=1
i=1
D ) by NT to get We rescale (
220
N
D ) = NT(
1 = N

1 N
i=1
1 (x iQi xi) T2
1
1 N N
N
i=1
1 (x i) iQiv T
6iT
i=1
1 N N
5iT
i=1
1 where 5NT = N
N i=1
= [6NT] 1[N5NT],
1 1 5iT, 5iT = (x i), 6NT = iQiv T N
N i=1
6iT, and 6iT =
1 (x iQi xi). T2
Observe that from Saikkonen (1991) 6iT = =

1 (x iQi xi) T2 1 (x iWT xi) + op(1) T2
1 (xit x i)(xit x i) + op(1) = 2 T t=q+1
and
Tq
iB B i,
1 5iT = (x i) iQiv T 1 = (x i) + op(1) iWTv T 1 (xit x i) vit + op(1) = T t=q+1

Tq
+ dBui B ,
Panel Cointegration
221
1 ee. Then applying T 1 + idBui the multivariate Lindeberg-Levy central limit theorem to B and N N 1 i B B combining this with the limit of i as in Theorem 2, we have N i=1
i = Bi as T for all i, where B
Bi and WT = IT
1 N
iB B i
i=1
1 + idBui B N(0, 6 u.)
as N . It follows that using the sequential limit theory 1 D ) N(0, 6 NT( u.) as required. Proof of Theorem 5 The proof is the same as that of Theorem 3. First, similar to Theorem 3, we write (25) in vector form: * y* i = ei + x* i + Z* iqC + v i = x* * i + Z* iD+v i (say), * and dene y* i , e, Z* iq, x* i , C, v i , Z* i , Zi, D, and Q* i as in the proof of Theorem 3. Then we have:
1 * NT( D ) = N 1 N

N i=1 N
1 (x* i Q* i x* i) T2
1
1 N N
i=1
1 (x* * i Q* iv i) T
and
8iT
i=1
1 N
7iT
i=1
= [8NT] 1[N7NT], where 8iT = 7NT =

1 N
N i=1
7iT,
1 7iT = (x* * i Q* i v i), T
8NT =
1 N
N i=1
8iT,
1 (x* i Q* i x* i). T2
222
Observe that from Assumption 8, we have 8iT = =

1 (x* i Q* i x* i) T2 1 (x* i W* T x* i ) + op(1) T2
1 (x* * * = 2 it x i )(x* it x i ) + op(1) T t=q +1
and
T qi i
iW i, W
1 7iT = (x* * i Q* i v i) T 1 * = (x* i WT v i ) + op(1) T 1 (x* * v* = it x i ) it + op(1) T t=q +1

i

T qi
idVi, W
as T for all i. The remainder of the proof follows that of Theorem 3.
TESTING FOR UNIT ROOTS IN PANELS IN THE PRESENCE OF STRUCTURAL CHANGE WITH AN APPLICATION TO OECD UNEMPLOYMENT
Christian J. Murray and David H. Papell
ABSTRACT
There has been extensive research on testing for unit roots in the presence of structural change and on testing for unit roots in panels. This chapter takes a small step towards combining the two research agendas. We propose a unit root test for non-trending data in the presence of a onetime change in the mean for a heterogeneous panel. The date of the break is determined endogenously. We perform simulations to investigate the power of the test, and apply the test to a data set of annual unemployment rates for 17 OECD countries from 1955 to 1990.
I. INTRODUCTION
The work of Perron (1989) has inspired extensive research on testing for unit roots in the presence of structural change. Banerjee, Lumsdaine & Stock (1992), Zivot & Andrews (1992), and Perron (1997), among many others, develop tests which allow the break to be determined endogenously and Lumsdaine & Papell (1997) extend the tests to allow for two breaks. Starting with Levin & Lin (1992), much work has also been done on testing for unit
223
224
CHRISTIAN J. MURRAY & DAVID H. PAPELL
roots in panels, including papers by Im, Peseran & Shin (1997), Maddala & Wu (1999), and Bowman (1999). This chapter takes a small step towards combining the two research agendas. We propose a unit root test for non-trending data in the presence of a one-time change in the mean for a heterogeneous panel. The date of the break, which is common across the countries of the panel, is determined endogenously and, in the additive outlier framework, is assumed to occur instantaneously. The speed of mean reversion is also common across countries. The intercepts, coefcients on the break dummy variable, and serial correlation structure, however, are country specic. In the context of testing for a unit root in the presence of structural change, our test is most closely related to the work of Perron & Vogelsang (1992). They develop a test for a unit root in non-trending data in the presence of a one-time change in the mean of a single series, with the date of the change determined endogenously. In the panel unit root context, the most closely related work is Papell (1997), who utilizes a feasible generalized least squares (SUR) method which allows for both contemporaneous and heterogeneous serial correlation. Levin & Lin (1992) and Bowman (1999) show that, in the absence of structural change, panel unit root tests have good power in moderately sized samples of 10 or more countries, even with fairly long persistence. We conduct two power experiments, both involving panels of non-trending, stationary series with a one-time change in the mean. First, using conventional panel unit root tests, we nd very low power to reject the unit root null. Second, using tests that incorporate structural change, the power is much improved. We apply the test to a data set of annual unemployment rates for 17 OECD countries from 1955 to 1990. Using the panel tests in the presence of structural change, we nd much stronger rejections of unit roots than can be found with univariate tests that do not incorporate structural change, panel tests that do not incorporate structural change, or univariate tests that do incorporate structural change.
II. PANEL UNIT ROOT TESTS IN THE PRESENCE OF STRUCTURAL CHANGE

In this section, we develop panel unit root tests in the presence of structural change. We rst discuss conventional Augmented Dickey-Fuller (ADF) unit root tests, panel unit root tests which do not incorporate structural change, and single-equation unit root tests with structural change, and then describe how to combine elements from the latter two tests to construct a panel unit root test
Testing for Unit Roots in Panels in the Presence of Structural Change
225
with structural change. While our tests are for non-trending data, an extension to trending data would be straightforward. The most common tests for unit roots are Augmented Dickey-Fuller tests. ADF tests for non-trending data involve running the following regression: ut = + ut 1 +
k i=1
ciut i + t,
(1)
where ut is the variable of interest. The null hypothesis of a unit root is rejected if the value of the t-statistic for (in absolute value) is greater than the appropriate critical value. While the critical values are non-standard, they are readily available.1 There is substantial evidence that the lag truncation parameter k is best selected according to data-dependent methods rather than choosing a xed k a priori. We follow the method suggested by Campbell & Perron (1991), Hall (1994), and Ng & Perron (1995). Start with an upper bound kmax on k. If the tstatistic on the coefcient of the last lag is signicant, (using the 10% value of the asymptotic distribution of 1.645), then kmax = k. If it is not signicant, then k is lowered by one. This procedure is repeated until the last lag becomes signicant. If no lag is signicant, then k is chosen to equal zero. Panel unit root tests in the ADF framework for non-trending data with heterogeneous intercepts, which are equivalent to including country-specic dummy variables, involve estimating the following regressions: ujt = j + ujt 1 +
kj i=1
cjiujt i + jt.
(2)
The subscript j = 1, . . . , N indexes the elements of the panel which, for convenience of exposition, we will call countries. While Levin & Lin (1992) show that imposing homogeneous intercepts results in substantial increases in power, there is rarely any support for such a restriction in practice. We estimate equation (2) by feasible generalized least squares (SUR), with the coefcient equated across countries and the lag length kj set equal to the value chosen by the single equation models described in equation (1).2 This method accounts for contemporaneous and serial correlation, both of which are often important in practice.3 In Papell (1997), this method is used to investigate purchasing power parity. The critical values for panel unit root tests computed by Levin & Lin (1992) do not incorporate serial correlation in the disturbances. While, if the number of observations is large enough, the panel ADF statistic converges to the
226
asymptotic distribution of the panel Dickey-Fuller statistic with no serial correlation, this is a serious problem in samples of the size normally used, especially when the recursive t-statistic method is used to select the lag length. Using Monte Carlo methods, we compute nite sample critical values for our test statistics which account for both serial correlation and cross correlation in the residuals. First, we generate unit root series for panels of 5, 10, 15, and 20 countries with 50, 100, and 200 observations. We then t autoregressive (AR) models to the rst differences of each series, using the Schwarz criterion to choose the optimal model, and then treat the optimal estimated AR models as the true data generating process for the errors of each of the series. For each panel, we construct pseudo samples using the optimal AR models with iid N(0, 2) errors where 2 is the estimated innovation variance of the optimal AR model.4 We then integrate the AR models to get the data in levels. Our test statistic is the t-statistic on in equation (2), with the lag length kj for each series chosen by univariate methods as described above. The critical values for the nite sample distributions, obtained from 10,000 replications, are reported in Table 1. We now discuss univariate tests for a unit root in the presence of structural change for non-trending data, using the methods of Perron & Vogelsang (1992). Additive Outlier (AO) models, where the structural change occurs instantaneously, are estimated by the following two equations:5 ut = + DUt + t, (3) and t =
k i=0
iDTBt i + t 1 +
k i=1
cit i + t,
(4)
where t is the estimated residual from equation (3).6 TB is the break date, DTBt = 1 if t = TB + 1, 0 otherwise, and DUt = 1 if t > TB, 0 otherwise.7 Equations (3) and (4) are estimated sequentially for each break year TB = k + 2, . . . , T 1, where T is the number of observations. The break date is chosen to minimize the t-statistic for , and data-dependent methods are used to select the lag length k. The null hypothesis of a unit root is rejected if the tstatistic on is sufciently large (in absolute value). The nite sample critical values of Perron & Vogelsang (1992) can be used to assess the signicance of the unit root statistic. We proceed to construct a test for unit roots in panel data in the presence of structural change. With heterogeneous intercepts, the panel AO model is estimated by the following two equations:
227
Table 1.
Finite Sample Critical Values for Panel Unit Root Tests without Structural Change
1% T 50 5 10 15 20 5.525 6.964 8.327 9.775 100 5.272 6.604 7.675 8.683 200 5.121 6.251 7.234 8.119
5% T 50 5 10 15 20 4.789 6.244 7.603 8.940 100 4.641 5.923 6.964 7.955 200 4.512 5.640 6.629 7.512
10% T 50 5 10 15 20 4.452 5.857 7.221 8.528 100 4.314 5.594 6.621 7.587 200 4.177 5.317 6.308 7.145
ujt = j + DUjt + jt, and jt =
(5)
kj i=0
jtDTBjt i + jt 1 +
kj i=1
cjtjt i + jt,
(6)
228
where jt are the residuals from (5), DTBjt = 1 if t = TB + 1, 0 otherwise, DUjt = 1 if t > TB, 0 otherwise, and j = 1, . . . , N indexes the countries. Using the Monte Carlo methods described above, with 2500 replications, we compute nite sample critical values for our test statistic, the t-statistic on in equation (6).8
III. POWER OF PANEL UNIT ROOT TESTS

Finite sample critical values for panel unit root tests, which incorporate lag selection, are presented in Table 1. Critical values for panel unit root tests with structural change are presented in Table 2. As mentioned earlier, we allow for panels 5, 10, 15, and 20 countries (N), with 50, 100, and 200 observations (T). In selecting the lag length, kmax is set to 4, 8, and 12 for T = 50, 100, and 200 respectively. Tables 1 and 2 reveal three properties of panel unit root statistics. An increase in T leads to a decrease in the absolute value of the critical value of the unit root statistic, whereas an increase in N increases its absolute value. Also, allowing for structural change increases the absolute value of the panel unit root statistic. We now focus on the power of the t-statistic on in equations (3) and (4) and equations (5) and (6). The range of (the sum of the AR coefcients) we consider is 0.95, 0.90, and 0.80. We consider mean shifts, , of 0.5 and 1.0. In the following empirical application, these values correspond to a one-half and full percentage point increase in the unemployment rate. We set the break date in the middle of the sample, i.e. TB = T/2.9 Tables 3 and 4 present the nite sample power of panel unit root tests without and with structural change, respectively. The AR length is again chosen by the Schwarz criterion. The number of repetitions used for Table 3 is 2500, while 1000 repetitions are used for Table 4. The upper bound on the standard error of rejection frequencies in Table 4 is 0.016. Table 3 documents the generally poor power of panel unit root tests which fail to allow for a shift in mean which is indeed present. For the alternative closest to the null, = 0.95 and = 0.5, power is essentially zero. Holding constant, power monotonically increases as is lowered to 0.90 and 0.80, but it is only for the latter case where we begin to see decent power for a reasonable amount of data. Holding constant, increasing monotonically reduces power. This is consistent with Perrons (1989) nding that for a stationary time series, a larger mean shift increases the probability of spuriously nding a unit root. This is problematic in the context of our following empirical example. A value of = 1 corresponds to a small (1%), permanent change in the mean unemployment rate. Our results suggest that if is close to but less than one,
229
Table 2.
Finite Sample Critical Values for Panel Unit Root Tests with Structural Change
1% T 50 5 10 15 20 7.329 9.056 10.940 12.667 100 6.941 8.658 9.995 11.103 200 6.915 8.415 9.571 10.672
5% T 50 5 10 15 20 6.613 8.484 10.279 12.011 100 6.432 8.046 9.461 10.618 200 6.334 7.852 9.105 10.225
10% T 50 5 10 15 20 6.344 8.203 10.025 11.705 100 6.113 7.785 9.184 10.361 200 6.051 7.553 8.815 9.958
it is probable that panel unit root tests will incorrectly nd that unemployment is integrated, rather than stationary around a one time shift in mean. Table 4 demonstrates that allowing for a mean shift greatly increases power relative to Table 3. For all values of and considered, the power is at least 50%, and often times 100%, for a panel of at least 10 countries with at least 100 observations. Indeed, for T = 100, there are only two instances in which the power is less that 50%, and those occur for the smallest panel considered, N = 5, and the most persistent value of , 0.95.
230
Table 3.
Power of Panel Unit Root Tests without Structural Change

= 0.95, = 0.5 T 50 100 0.0008 0.0004 0.0000 0.0000 = 0.90, = 0.5 T 50 100 0.0560 0.1204 0.2300 0.3084 = 0.80, = 0.5 T 50 100 0.8400 0.9908 0.9992 1.0000 200 0.9872 1.0000 1.0000 1.0000 5 10 15 20 50 0.0036 0.0052 0.0052 0.0044 200 0.3780 0.8312 0.9608 0.9924 5 10 15 20 50 0.0000 0.0000 0.0000 0.0000 200 0.0008 0.0000 0.0000 0.0000 5 10 15 20 50 0.0000 0.0000 0.0000 0.0000 = 0.95, = 1.0 T 100 0.0000 0.0000 0.0000 0.0000 = 0.90, = 1.0 T 100 0.0000 0.0000 0.0000 0.0000 = 0.80, = 1.0 T 100 0.0336 0.1784 0.4208 0.6432 200 0.2052 0.6876 0.9124 0.9872 200 0.0000 0.0000 0.0000 0.0008 200 0.0000 0.0000 0.0000 0.0000
5 10 15 20
0.0004 0.0008 0.0000 0.0000
5 10 15 20
0.0180 0.0116 0.0120 0.0084
5 10 15 20
0.3652 0.6848 0.8216 0.8732
IV. EMPIRICAL EXAMPLE: UNIT ROOTS IN UNEMPLOYMENT

We use annual series of unemployment for 17 OECD countries from 1955 to 1990. The source of the data is Layard, Nickell & Jackman (1991). We do not update the data past 1990. Unemployment rates rose sharply, especially in Europe, during the early 1990s. In Papell, Murray & Ghiblawi (2000), the single equation methods of Bai & Perron (1998) detect considerable evidence
231
Table 4.
Power of Panel Unit Root Tests with Structural Change

= 0.95, = 0.5 T 50 100 0.2320 0.5160 0.7250 0.8730 = 0.90, = 0.5 T 50 100 0.7790 0.9930 1.0000 1.0000 = 0.80, = 0.5 T 50 100 1.0000 1.0000 1.0000 1.0000 200 1.0000 1.0000 1.0000 1.0000 N 5 10 15 20 50 0.8000 0.8520 0.9960 0.9990 200 1.0000 1.0000 1.0000 1.0000 5 10 15 20 50 0.2920 0.5150 0.5600 0.5590 200 0.8460 0.9960 1.0000 1.0000 5 10 15 20 50 0.0220 0.0160 0.0060 0.0020 = 0.95, = 1.0 T 100 0.4130 0.7570 0.8770 0.9570 = 0.90, = 1.0 T 100 0.9430 1.0000 1.0000 1.0000 = 0.80, = 1.0 T 100 1.0000 1.0000 1.0000 1.0000 200 1.0000 1.0000 1.0000 1.0000 200 1.0000 1.0000 1.0000 1.0000 200 0.9980 1.0000 1.0000 1.0000
5 10 15 20
0.0710 0.0840 0.0810 0.0520
5 10 15 20
0.2750 0.4730 0.5730 0.6600
5 10 15 20
0.8000 0.9910 0.9990 0.9990
of multiple structural changes with unemployment data extended through 1997. Testing for unit roots in panels with multiple structural changes, however, is well beyond the scope of this chapter. Our empirical results, therefore, should be interpreted as an illustration of the techniques rather than as an economic analysis of postwar unemployment. The rst step in our investigation is to test for unit roots using methods that do not account for structural change. The objective of this exercise is to provide a benchmark for our later results. We run Augmented Dickey-Fuller (ADF)
232
tests, as in equation (1), for each of the 17 countries in the sample. The results of the ADF tests are reported in Table 5. We set kmax to 4. Using critical values from MacKinnon (1991), we nd that the null of a unit root cannot be rejected for any of the series at the 10% level. Table 5.
Country Australia Austria Belgium Canada Denmark Finland France Germany Ireland Italy Japan Netherlands Norway Spain Sweden U.K. U.S.A.
Augmented Dickey-Fuller Tests

0.437 (1.60) 0.188 (1.26) 0.337 (1.48) 0.819 (1.61) 0.222 (0.82) 0.359 (1.42) 0.176 (1.38) 0.239 (1.19) 0.470 (1.36) 0.597 (2.04) 0.210 (1.91) 0.248 (1.21) 0.435 (1.01) 0.369 (1.85) 0.413 (1.82) 0.391 (1.38) 1.389 (2.14) 0.936 (1.15) 0.915 (1.28) 0.953 (1.40) 0.893 (1.46) 0.993 (0.14) 0.912 (1.26) 0.987 (0.54) 0.929 (1.32) 0.952 (1.28) 0.885 (2.08) 0.883 (2.04) 0.966 (0.96) 0.835 (0.84) 0.945 (2.25 ) 0.760 (1.37) 0.947 (1.14) 0.766 (2.16) k 0 1 1 0 4 2 1 1 1 3 3 2 2 3 2 2 0
Note: The critical values for the ADF test, calculated from MacKinnon (1991) with 36 observations, are 3.62 (1%), 2.94 (5%), and 2.61 (10%). Numbers in parentheses are t-statistics.
233
One possible reason for the failure of the ADF tests to reject the unit root hypothesis is the relatively short (36 years) time span of the data.10 We investigate this possibility by conducting panel unit root tests, described by equation (2), to exploit cross-section variability among the 17 unemployment rates. The results of the panel unit root tests are reported in Table 6.11 The null hypothesis of a unit root cannot be rejected, at even the 10% level, either for the OECD countries as a whole or for smaller panels consisting of European (13), European Community (EC) (9), European Free Trade Area (EFTA) (4), Non-European (4), or Non-EC (EFTA plus Non-Europe) (8) countries.12 The results for the univariate AO model of equations (3) and (4) are reported in Table 7. The null hypothesis of a unit root is rejected for Finland, Ireland and Spain at the 1% level, Belgium, France, Italy and Norway at the 5% level, and Austria, Canada, Denmark, and the United Kingdom at the 10% level. The structural breaks are all positive, reecting the general rise in unemployment among the OECD countries. The structural break occurs between 1974 and 1976 for nine out of eleven countries for which the unit root null can be rejected. The results of the panel unit root tests from equations (5) and (6) that account for structural change, along with the associated critical values, are reported in Table 6.
Group OECD EUROPE EC NON-EC EFTA NON-EUROPE
Panel Unit Root Tests

N 17 13 9 8 4 4 Critical Values 0.924 0.936 0.941 0.846 0.868 0.863 t 6.40 4.73 3.96 4.82 3.04 3.52
1% 10.16 8.52 7.09 6.83 5.45 5.45
5% 9.00 7.58 6.28 5.99 4.67 4.67
10% 8.48 7.16 5.86 5.58 4.27 4.27
234
Table 8.13 The unit root hypothesis is strongly (at the 1% level) rejected in favor of stationarity with a one-time break in 1975 for the OECD, European, and EC countries and a break in 1973 for the non-EC and EFTA countries. For the nonTable 7. The Additive Outlier Model
Country Australia Austria Belgium Canada Denmark Finland France Germany Ireland Italy Japan Netherlands Norway Spain Sweden U.K. U.S.A. Break Year 1973 1979 1975 1976 1975 1974 1975 1972 1976 1976 1969 1976 1986 1974 1964 1974 1974 2.053 (6.99) 1.704 (13.55) 2.771 (8.70) 5.145 (17.95) 2.557 (8.29) 1.915 (8.61) 2.052 (6.35) 1.417 (3.63) 5.627 (10.14) 4.650 (16.43) 1.653 (12.19) 1.945 (4.94) 2.094 (16.96) 2.400 (2.57) 1.470 (10.40) 2.715 (6.41) 4.840 (19.21) 4.536 (10.61) 1.460 (6.42) 6.908 (13.99) 3.754 (8.17) 5.696 (11.93) 2.885 (8.65) 5.914 (11.81) 3.317 (6.01) 7.287 (8.19) 1.907 (4.20) 0.423 (2.38) 6.662 (10.55) 1.781 (4.91) 11.463 (8.20) 0.334 (2.01) 5.604 (8.82) 2.141 (5.67) 0.609 (3.99) 0.623 (4.33)c 0.404 (4.96)b 0.277 (4.33)c 0.513 (4.34)c 0.227 (6.64)a 0.660 (4.95)b 0.732 (3.63) 0.657 (7.58)a 0.702 (4.75)b 0.783 (3.53) 0.606 (4.06) 0.303 (4.78)b 0.685 (7.61)a 0.536 (3.87) 0.493 (4.60)c 0.251 (4.10) k 0 1 4 3 3 1 4 1 3 3 3 2 1 4 1 4 3
Note: The critical values for the AO model, reported in Perron and Vogelsang (1992), are 5.20 (1%), 4.67 (5%), and 4.33 (10%). Numbers in parentheses are t-statistics. Superscripts a, b, and c denote rejection of the unit root null at the 1%, 5%, and 10% signicance levels respectively.
235
Table 8.
Panel Unit Root Tests with Structural Change

N 17 13 9 8 4 4 Break Year 1975 1975 1975 1973 1973 1975 Critical Values 0.638 0.651 0.670 0.550 0.557 0.629 t 21.91a 18.92a 16.15a 10.36a 8.45a 5.61
1% 12.38 10.89 9.13 8.60 7.18 7.18
5% 11.56 10.00 8.35 8.01 6.46 6.46
10% 11.16 9.63 7.97 7.66 6.11 6.11
Note: Superscripts a, b, and c denote rejection of the unit root null at the 1%, 5%, and 10% signicance levels respectively.
Europe countries, the unit root null could not be rejected at the 10% level. This panel, however, consists of only four countries.
V. CONCLUSIONS
The purpose of this chapter was to develop and implement panel unit root tests in the presence of structural change. To that end, we combine methods from two previously disjoint literatures: testing for a unit root in panels and testing
236
for a unit root in the presence of structural change. The resultant test allows for both serial and contemporaneous correlation, both of which are often found to be important in the panel unit root context. The motivation for the test comes from the hypothesis that conventional panel unit root tests, those that do not incorporate structural change, will have low power if the data are stationary with structural change. While this is well established in the univariate literature, it is only a conjecture in the panel context. We investigate this conjecture by conducting power experiments for panels of non-trending, stationary series with a one-time change in the mean, and nd that conventional panel unit root tests generally have very low power. We then conduct the same experiments using methods that test for a unit root in the presence of structural change, and nd that the power of the tests is much improved. We apply our test to a data set of annual unemployment rates for 17 OECD countries from 1955 to 1990. For these countries, unit root tests that do not incorporate structural change, whether univariate or panel, provide no evidence against the unit root null. While univariate tests that incorporate structural change do provide some evidence against unit roots, the short span of the data suggests that power may be problematic. Using our panel test with a one-time structural change, we nd very strong evidence of regime-wise stationarity. This evidence is both for the full panel and for a number of smaller subpanels. Our work could be extended in a number of directions. While the test incorporates a one-time break in non-trending data, extensions to multiple breaks and/or trending data would be straightforward. Once variety in the number of breaks, type of breaks, number of countries, and number of observations are allowed for, the number of possibilities increases rapidly. With the availability of programs for calculating critical values, we suspect that it will be more fruitful to develop tests on a case-by-case basis rather than attempt to achieve generality.14
NOTES
1. MacKinnon (1991) shows how to calculate critical values for ADF tests for any sample size. 2. If the coefcient is not equated across countries, as in Breuer, McNown & Wallace (2000), the gains in power over univariate methods are much smaller. Im, Peseran & Shin (1997) report higher power without equating across countries, but their alternative hypothesis is that one member of the panel, rather than all members, are stationary.
237
3. If there is no serial correlation (k = 0), or if the ks and cs are constrained to be equal across countries, as in OConnell (1998), the FGLS estimator can be iterated to achieve maximum likelihood. These restrictions, however, rarely (if ever) hold in practice. 4. For all of the critical value calculations, we generate 50 more observations than are reported, and then discard the rst 50 observations. 5. Innovational outlier models, where the structural change occurs gradually, can also be estimated. 6. As explained by Perron & Vogelsang (1992), the dummy variables DTBti are included to ensure that the t-statistic on in equation (4) has the same asymptotic distribution as in the IO model and is invariant to the value of k. 7. The dummy variable DTBt is included to allow for a change in the mean under the null. 8. Abuaf and Jorion (1990) conduct panel unit root tests which allow for structural change, but the time of the break is assumed to be known a priori. 9. The results in Tables 3 and 4 are qualitatively unchanged for TB = T/4 or 3T/4. 10. Froot & Rogoff (1995) show that, if a variable follows a stationary AR(1) process with a half life of three years, it would take 72 years of annual data to reject the unit root null using the 5% Dickey-Fuller critical value. 11. The critical values, also reported in Table 6, are calculated for the exact number of countries and observations in each of the panels, using the Monte Carlo methods described above. 12. The members of the EC (included in our data) are Belgium, Denmark, France, Germany, Ireland, Italy, Netherlands, Spain, and the United Kingdom. The EFTA countries are Austria, Finland, Norway, and Sweden. 13. The critical values are calculated for the exact number of countries and observations in each of the panels, using the Monte Carlo methods described above. 14. An example is Papell (2000), who develops a panel unit root test in the presence of three breaks in the slope, but none in the intercept, of the trend function, with further restrictions imposed for consistency with purchasing power parity.
REFERENCES
Abuaf, N., & Jorion, P. (1990). Purchasing Power Parity in the Long Run. Journal of Finance, 45, 157174. Bai, J., & Perron, P. (1998). Estimating and Testing Linear Models with Multiple Structural Changes. Econometrica, 66, 4778. Banerjee, A., Lumsdaine, R. L., & Stock, J. H. (1992). Recursive and Sequential Tests of the Unit Root and Trend-Break Hypotheses: Theory and International Evidence. Journal of Business and Economic Statistics, 10, 271288. Bowman, D. (1999). Efcient Tests for Autoregressive Unit Roots in Panel Data. IFDP #646, Board of Governors of the Federal Reserve System. Breuer, J., McNown, R., & Wallace, M. (2000). The Quest for Purchasing Power Parity With A Series-Specic Test using Panel Data. Working paper, Department of Economics, University of South Carolina.
238
Campbell, J. Y., & Perron, P. (1991). Pitfalls and Opportunities: What Macroeconomists Should Know About Unit Roots. In: O. J. Blanchard & S. Fischer (Eds), NBER Macroeconomic Annual (pp. 141201). Cambridge: MIT Press. Froot, K. A., & Rogoff, K. (1995). Perspectives on PPP and Long-Run Real Exchange Rates. In: G. Grossman & K. Rogoff (Eds), Handbook of International Economics, Vol. 3 (pp. 1647 1688). North Holland: Amsterdam. Hall, A. R. (1994). Testing for a Unit Root in Time Series with Pretest Data-Based Model Selection. Journal of Business and Economic Statistics, 12, 461470. Im, S., Pesaran, H., & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. Working paper, Department of Economics, University of Cambridge. Layard, R., Nickell, S., & Jackman, R. (1991). Unemployment: Macroeconomic Performance and The Labour Market. Oxford: Oxford University Press. Levin, A., & Lin, C. F. (1992). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. Discussion paper 9223, Department of Economics, University of CaliforniaSan Diego. Lumsdaine, R. L., & Papell, D. H. (1997). Multiple Trend Breaks and the Unit Root Hypothesis. Review of Economics and Statistics, 79, 212218. Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631652. MacKinnon, J. G. (1991). Critical Values for Cointegration Tests. In: R. F. Engle & C. W. J. Granger (Eds), Long-Run Economic Relationships: Readings in Cointegration (pp. 267 276). Oxford: Oxford University Press. Ng, S., & Perron, P. (1995). Unit Root Tests in ARMA Models with Data Dependent Methods for the Selection of the Truncation Lag. Journal of the American Statistical Association, 90, 268281. OConnell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of International Economics, 44, 120. Papell, D. H. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float. Journal of International Economics, 43, 313332. Papell, D. H. (2000). The Great Appreciation, the Great Depreciation, and the Purchasing Power Parity Hypothesis. Working paper, Department of Economics, University of Houston. Papell, D. H., Murray, C. J., & Ghiblawi, H. (2000). The Structure of Unemployment. Review of Economics and Statistics, 82, 309315. Perron, P. (1989). The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis. Econometrica, 57, 13611401. Perron, P. (1997). Further Evidence on Breaking Trend Functions in Macroeconomic Variables. Journal of Econometrics, 80, 355385. Perron, P., & Vogelsang, T. J. (1992). Non-stationarity and Level Shifts With An Application to Purchasing Power Parity. Journal of Business and Economic Statistics, 10, 301320. Zivot, E., & Andrews, D. W. K. (1992). Further Evidence on the Great Crash, the Oil- Price Shock, and The Unit Root Hypothesis. Journal of Business and Economic Statistics, 10, 251270.
PANEL DATA LIMIT THEORY AND ASYMPTOTIC ANALYSIS OF A PANEL REGRESSION WITH NEAR INTEGRATED REGRESSORS
Heikki Kauppi
ABSTRACT
This chapter develops a new limit theory for panel data with large numbers of cross section, n, and time series, T, observations. The results apply when n and T tend to innity simultaneously and provide useful tools for obtaining convergencies in probability and in distribution in cases where the panel data may be cross sectionally heterogenous in a fairly general way. We demonstrate how the new theory can be applied to derive asymptotics for a panel regression where regressors are generated by a local to unit root process with heterogenous localizing coefcients across cross section.
I. INTRODUCTION
In the last few years much new research has emerged that develops econometric methods for panel data where both the numbers of cross section and time series observations are large. This research is motivated by the increasing availability of important panel data sets that cover large numbers of different countries, sectors, and individuals over long periods of time. Many of these data sets
239
240
HEIKKI KAUPPI
consist of macroeconomic variables that display characteristics resembling those generated by integrated processes. Accordingly, standard panel methods cannot be applied for these data and an appropriate method has to take into account the possible strong persistence of the data. Therefore, particular techniques have been developed for testing for unit roots and cointegration in panel data and for statistical analysis of panel regressions with integrated regressors. Typical empirical applications of these methods involve estimation and testing for the existence of long-run relationships between international nancial series such as relative prices and spot and future exchange rates. The purpose of this chapter is to develop a new panel data limit theory that can be applied to derive asymptotics for a variety of interesting estimators and test statistics in the context of models for panel data with large cross sectional dimension, n, and time series dimension, T. Our new theory assumes that n and T tend to innity simultaneously and builds upon the concepts of joint convergence in probability and in distribution for double indexed processes developed by Phillips & Moon (1999a). The contribution of the chapter is to develop new versions of the law of large numbers and the central limit theorem that apply in panels where the data may be cross sectionally heterogenous in a fairly general way. We demonstrate the usefulness of the new theory in an application where we study asymptotic inference in a panel regression in which the regressors are generated by an autoregressive process with a root local to unity. In this framework, both the regression errors and the errors that drive the autoregressive regressors are specied by a general linear process. The model then deviates from the previously analyzed panel cointegration regressions only in that the autoregressive parameters in the regressors are not necessarily exactly equal to one but rather may be just within a range of near alternatives to unity. This generalization of earlier models is motivated by the fact that in most empirical questions in macroeconomics and nance where the new panel cointegration methods are applied an assumption about exact unit roots can be considerably uncertain. Given that near unit roots are known to result in severe inferential problems for the usual time series cointegration methods it is important to examine related problems in the context of panel data analysis. Our application of the panel asymptotics reveals the following. First, due to error serial correlation biases the usual pooled panel OLS estimator is invalid for inference. Second, a corrected version of this estimator proved to be nTconsistent with an asymptotic normal distribution centered to the true regression parameter irrespective whether the regressors have near or exact unit roots. Unfortunately, this positive result only holds in the special case where the model does not exhibit any deterministic effects, such as individual
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
241
intercepts. In the third application, we derive asymptotics for a pooled panel fully modied estimator of Phillips & Moon (1999a) who assumed exact unit roots. The asymptotic results show that this estimator is subject to severe bias effects, if the regressors are nearly rather than exactly integrated. Our theoretical ndings are illustrated by small sample simulations. Overall, the analysis indicates that near unit roots are in general likely to result in insuperable inferential problems even in the context of panel data analysis. The organization of the chapter is as follows. The new limit theorems are given in Section II. Section III presents the applications of the panel asymptotics, while concluding remarks are given in Section IV. Proofs of the theorems are in the appendix.
II. THEORY
In panel data limit theory we consider a double indexed process Xn, T , in which both n and T tend to innity. In general, the limit of Xn, T depends on the treatment of the indices n and T, and the properties that link the two dimensions of the process. Phillips & Moon (1999a) discuss different approaches. One possibility is to allow n and T to pass to innity along a diagonal path determined by a monotonically increasing functional relation of the type T = T(n) as the index n . This approach simplies the asymptotic theory by replacing Xn, T with a single indexed process Xn, T(n). However, a drawback of this diagonal path limit theory is that the assumed expansion path (n, T(n)) may not provide an appropriate approximation for a given (n, T) situation. Furthermore, the limit theory is likely to depend on the specic functional relation T = T(n) that is used in the asymptotic development. Following Phillips & Moon (1999a) we therefore focus on an alternative approach where n and T are allowed to tend to innity simultaneously without imposing a specic diagonal path for the divergence of the indices. Merely as an auxiliary tool, we also consider a special form of multi-index asymptotics, called the sequential limit theory. Again, this theory is introduced by Phillips & Moon (1999a). The general idea of this approach is to derive limit results in two steps. The rst step is to x one index, say n, and allow the other, say T, to pass to innity, giving an intermediate limit. The nal limit result is then obtained by letting n tend to innity subsequently. While the sequential limit theory can offer an easy route to a limit result it may give asymptotic results that are misleading in cases where both indices tend to innity simultaneously (see Phillips & Moon (1999b)). Nevertheless, this theory can often serve as a helpful tool to obtain conjectures about limit results that hold under the more general joint limit theory.
242
HEIKKI KAUPPI
In this section, we consider a general double indexed process of the form

1 Xn, T = kn
n i=1
Yi, T,
where the Yi, T are independent random vectors across i and kn is either n or n. A typical Yi, T component is a standardized sum of the time series component of the panel data. Examples are given in the following section. To this end,
1 suppose we are interested in the probability limit of Xn, T = n
p
n i=1
Yi, T. Assume
Yi, T Yi as T for all i. Then, by the independence of Yi, T across i for all T,
1 it follows that Xn, T Xn as T for all n, where Xn = n
p
n i=1
Yi. Here it should
be noticed that one has to assume that the Yi are dened on the same probability
1 space for all i so that the sum of the limit random variables n
n i=1
Yi is well
dened on the same probability space. This can be justied as shown by Phillips & Moon (1999a, Appendix B). By allowing n and applying an
1 appropriate law of large numbers to Xn = n 1 sequential limit of Xn, T . Let X = lim n n
p
n i=1
n i=1
Yi we may then nd the
E(Yi) exist and be nite. Then,
Xn X so that as T followed by n , Xn, T X. This is a sequential probability limit result in the sense dened by Phillips & Moon (1999a). In general, the sequential probability limit X of Xn, T is not the same as the probability limit of Xn, T under joint convergence of the indices n, T and may not even exist or requires a different normalization. Examples are given in Phillips & Moon (1999b). Therefore, an interesting question arises: when does the sequential limit coincide with the joint limit? The following theorem is adopted from Phillips & Moon (1999a, Theorem 1) and gives sufcient conditions under which the joint probability limit and the sequential probability limit are
p
243
identical. Hereafter, we denote by (n, T ) the joint limit as T and n simultaneously. Also, note that below denotes weak convergence of the associated probability measure, ||A|| is the usual notation for the Euclidean norm tr(AA) of a matrix A, 1{.} denotes an indicator function, and lim supn, T xn, T signies the superior limit of a sequence {xn, T} when joint convergence is considered. Theorem 1. Suppose the random (k 1) vectors Yi, T are independent across i for all T and integrable. Assume that Yi, T Yi as T for all i. Let the following conditions hold: (i)
1 lim supn, T n
1 (ii) lim supn, T n 1 (iii) lim supn, T n
(iv) lim supn

1 If limn n
n i=1
1 n

n i=1 n i=1 n i=1 n i=1
E||Yi, T|| < , ||E(Yi, T) E(Yi)|| = 0, E||Yi, T||1{||Yi, T|| > n} = 0 for all > 0,
E||Yi||1{||Yi|| > n} = 0 for all > 0.
1 E(Yi) = X exists and Xn = n
Xn, T =
1 n
n i=1
n i=1
Yi X as n , then
Yi, T X as (T, n ).
Theorem 1 gives fairly general conditions under which a joint probability limit can be established. However, in many cases it may be rather tedious to verify all the required conditions (i) through (iv) of the theorem. As shown by Corollary 1 of Phillips & Moon (1999a) somewhat easier conditions can be obtained in the special case, where the Yi, T are scaled variates of an iid process. However, there are certainly various interesting situations where the heterogeneity of the different panel members arises from other sources so that Corollary 1 of Phillips & Moon (1999a) cannot be applied. Therefore, for dealing with heterogenous panels of other types we have designed the following theorem. The basic idea of Theorem 2 arises from Markovs law of large numbers that applies in the case of independent variates Zi satisfying Markovs condition, E||Zi||1 + M < for some > 0 and for all i.
244
HEIKKI KAUPPI
Theorem 2. Suppose that the random (k 1) vectors Yi, T are independent across i for all T and integrable. Assume that Yi, T Yi as T for all i. Let the following conditions hold: (a) supi||E(Yi, T) E(Yi)|| 0 as T . (b) supTE||Yi, T||1 + M < for some > 0 and for all i, If limn
1 n
n i=1
E(Yi) = X exists, then

1 n
n i=1
Yi, T X as (T, n ).
We turn to consider conditions under which we can obtain convergencies in distribution as (n, T ). As in the case of the probability limit, we can often Yi, T, say. n i = 1 (Examples are given in Phillips & Moon (1999a, b).) As to how to obtain convergencies in joint limits as (T, n ), again, Phillips & Moon (1999a) give some general results. Their Theorem 2 provides a joint central limit theorem for (T, n ) that employs a Lindeberg condition for double indexed processes. In addition, their Theorem 3 gives a version which applies to iid variates scaled differently across cross section. Again, to deal with other types of heterogeneities across cross section we have developed the following version of the joint central limit theorem. Theorem 3. Suppose that Yi, T are independent scalar variables across i for all T with E(Yi, T) = 0 and Var(Yi, T) = Vi, T. Assume the following conditions hold:
1 (i) limn, T n
easily derive a sequential weak convergence result for Xn, T =
(ii) supTE|Y | Then,
i=1 2+ i, T
Vi, T = V is nite and positive, M < for some > 0 and for all i.
Xn, T =
1 n
n i=1
Yi, T N(0, V) as (T, n ).
The basic idea of Theorem 3 is to employ a Lyapunov condition to guarantee that the Lindeberg condition holds. The corresponding vector case can be handled by using Theorem 3 and the Cramer-Wold device.
245
III. AN APPLICATION
Most of the recent applications of the new large n, T panel data limit theory has involved studying and developing estimators and tests for panel cointegrating regressions where the regressors are integrated of order one. In this section we analyze problems that arise in these models when the regressors are nearly rather than exactly integrated of order one. We start by introducing the model and assumptions. A. The Model We focus on the simple two variable panel regression yi, t = xi, t + ui, t,
ci xi, t = i xi, t 1 + i, t, i = exp(ci /T) 1 + , T
(1) (2)
(t = 1, . . . , T, i = 1, . . . , n), where the initial values zi, 0 = (yi, 0, xi, 0) are iid, E||zi, 0||4 < , and the errors are specied below. To this end, notice that if i = 1 (i.e. ci = 0) in (2) for each i, then the xi, t are pure or exact unit root processes and the system given by equations (1) and (2) coincides with the homogenous panel cointegration regression studied by Phillips & Moon (1999a) and many others (for a survey, see Phillips & Moon (1999b)). In these studies the regression coefcient in (1) is called a cointegrating parameter and it represents a stationary relationship that holds between yi, t and xi, t for every i. Such a common long-run relationship is often predicted by economic theory and it is then of central interest to estimate and test whether it satises theoretically sound restrictions. A typical example involves testing for the existence of a purchasing power parity hypothesis in a panel of suitably similar countries. In contrast to the recent panel cointegration literature, we do not restrict attention to models, where the regressors are generated by exact unit root processes. Indeed, although most macroeconomic variables analyzed in the recent panel cointegration studies display strong autocorrelation, there are seldom strong prior reasons why the autoregressive parameter should be unity. The problem is aggravated by the fact that unit root tests cannot reliably detect small deviations from unity. Given this uncertainty about the unit roots, it is of interest to study problems that arise in the statistical inference about the regression parameter in (1) when the autoregressive parameters in (2) are close to rather than exactly equal to one. From earlier literature we know that such
246
HEIKKI KAUPPI
problematic near alternatives are best modeled by the local to unit root ci parametrization i = exp(ci /T) 1 + in (2) (see e.g. Elliott (1998) and Stock T (1997)). By this device it is possible to obtain asymptotic results that provide reasonable approximations in cases where the regressors xi, t are stationary but revert to their means so slowly that the standard xed i asymptotics fail to attain satisfactory accuracy. We close this section by imposing the following assumption. Assumption 1. The errors i, t = (ui, t, i, t) are linear processes satisfying the following conditions: (a) i, t = C(L)i, t =
(b) i, t = (i, t, wi, t), where i, t and wi, t are mutually independent and iid across 2 4 i and over t with E(i, t) = E(wi, t) = 0, E(2 i, t) = E(wi, t) = 1, and E(i, t) = 4 4 E(wi, t) = < for all i and t. Under Assumption 1 the error process in the system (1) and (2) satisfy the same conditions as the error process of the homogenous panel cointegration regression of Phillips & Moon (1999a, Assumptions 8 and 9). B. Preliminary Analysis For preliminary insights, we derive sequential limits for the pooled panel OLS estimator,
j=0
Cji, t j, where
j=0
j3||Cj|| < ,

n T i=1 t=1 n T i=1 t=1
xi, tyi, t
.
x2 i, t
(3)
Let [Tr] denote the integer part of Tr. From Phillips & Solo (1992), we know i, t converges weakly T t = 1 to a two dimensional Brownian motion Bi(r) = (Bui(r), Bi(r)), (0 r 1), with that under Assumption 1, the partial sum process the long-run covariance matrix =
j=
[Tr]
E(i, j i, 0), which we partition
247
= [kl], (k, l = u, ). Furthermore, by the well know limit theory for near integrated processes (e.g. Phillips (1987, 1988)) as T ,
1 T
1 T2
x2 i, t
1
Kci(r)2dr,
(4)
t=1
xi, tui, t
Kci(r)dBui(r) + u,
(5)
t=1
where u is a non-diagonal element of the one sided long-run covariance matrix =
j=0
E(i, j (k, l = u, ), and Kci(r) = i, 0) = [kl],
e(r s)cidBi(s),
(0 r 1), is an Ornstein-Uhlenbeck process. Given (4) and (5) we may deduce for xed n as T , ) 1 T( n

n i=1 0 1
1 2
Kci(r) dr

1 n
1 n
Kci(r)dBui(r) + u .
i=1
(6)
This result provides the rst step for obtaining sequential asymptotics for (3). The second step is to derive the limit of the right hand side of (6) as n . For simplicity assume ci = c for all i. Then, notice that the with mean zero and variance E
Kci(r)dBui(r)
= uu
1 0 0
Kci(r)dBui(r) are iid
e2(r s)cdsdr < ,
(7)
where the equality follows from well known results for stochastic integrals. Consequently, we may apply the strong law of large numbers to obtain
1 n
as
n i=1 0 1
Kci(r)dBui(r) 0, as n ,
as
where denotes almost sure convergence. Furthermore, the also iid, E
Kci(r)2dr =
1 0 0
(8) Kci(r)2dr are
e2(r s)cdsdr > 0,
248
HEIKKI KAUPPI
and E
Kci(r)2dr
< . Thus, we may deduce that the denominator on the
right hand side of (6) converges almost surely to
n . In view of these results, we may now conclude that as T followed by n ,
1 0 0
e2(r s)cdsdr, as
) 1/ T(

1 0 0
e2(r s)cdsdr
(9)
is consistent it is subject to a second order This result indicates that although bias effect arising from temporal correlation between the system errors ui, t and i, t. Note that if i = 1 in (2), the bias term in (9) still exists and actually becomes equal to 2u /. In contrast, if u = 0, there is no asymptotic bias in irrespective of the values of the localizing parameters the estimation error of ci in (2). In fact, if u = 0, we obtain the sequential weak convergence result ) N(0, V nT( ), where V =
uu 1
r
(10)
1 0 0 1 0
e2(r s)cdsdr
The latter limiting result essentially follows from the fact that
1 n
n i=1
Kci(r)dBui(r)
is asymptotically normally distributed with zero mean and variance given in (7). C. Serial Correlation Corrected Estimation In view of the above analysis we may conjecture that the asymptotics in (10) can be attained even when u 0 provided that we have a suitable estimator for
249
u. One alternative is to use the kernel estimation strategy that is used in the pooled fully modied (PFM) estimator of Phillips & Moon (1999a). The PFM estimator will be introduced in the subsequent section and it employes the = [ = [ kl], (k, l = u, ), of and , averaged kernel estimators kl] and respectively, dened by =1 n
=1 n i(j) = 1 Here T

n i=1 n i=1
i= i,
j=T+1 T1
i, i=

T1 j=0
i(j), (j/K)
i(j). (j/K)
(11)
i, t + j i, t, where the summation is over 1 t, t + j T, while
(j/K) is a lag kernel for which (0) = 1, (x) = ( x),
1 (x) with Parzens exponent q(0, ) such that kq = lim < . As to x0 |x|q applicable lag kernel functions and the choice of the bandwidth parameter K we follow Phillips & Moon (1999a) and impose the following assumption. 1 Assumption 2. The lag kernel (j/K) in (11) has Parzen exponent q > , and 2 the bandwidth parameter K tends to innity with K/T 0 and K2q/T > 0, as T . ) Remark 1. Under Assumption 2 the normalized estimation errors n( and n( ) converge in probability to zero. This result was stated in Phillips & Moon (1999a, Proof of Theorem 9) and holds as (T, n ) with n/T 0. This result is employed in the proofs of the theorems given below.
(x)2dx < , and
Remark 2. Notice that the kernel estimators dened in (11) are not feasible, since they employ the unknown errors i, t = (ui, t, i, t). A natural approach to xi, t, from a preliminary i, t = yi, t estimate ui, t and i, t is to use the residuals u pooled panel OLS regression, and the differences xi, t , respectively. It is easy to show that the associated estimation errors for ui, t and i, t are of orders of magnitude T 1 and T 1/2, respectively. In view of this and Remark 1 we may then expect that under the assumptions of this chapter and irrespective whether i, t and xi, t in places of the xi, t in (2) have exact or near unit roots, the use of u ui, t and i, t, respectively, has no effect on the rate of consistency of the kernel
250
HEIKKI KAUPPI
estimators in (11). However, following Phillips & Moon (1999a), we proceed by working with the true errors i, t , since we want to avoid any further technical complications that might arise in an asymptotic analysis where the kernel estimators in (11) use the estimates u i, t and xi, t in places of ui, t and i, t, respectively. Now we are ready to dene a robust estimator for ,
*=
u is given in (11). The estimator in (12) is called a serial correlation where corrected pooled panel estimator. We turn to establish the joint asymptotics of the new estimator in (12). Let Jci(r) =

n T i=1 t=1 n T i=1 t=1
u xi, tyi, t nT
,
x2 i, t
(12)
e(r s)cidWi(s), where Wi(r) is a standard Brownian motion. Hereafter,
we assume that the values of ci are uniformly bounded and such that the arithmetic mean of the expected values of nite number, i.e.
1 lim n n
i=1
1 Jci(r) dr = lim n n
2

1 0 n 1 i=1 0 0
Jci(r)2dr converges to a positive

r
e2(r s)cidsdr = xx
exists and is nite by assumption. The latter condition is not restrictive and basically means that we assume that the appropriately normalized sample
1 second moment of the pooled regressors xi, t, i.e. 2 nT probability.
n T i=1 i=1
x2 i, t, converges in
Theorem 4. Suppose Assumptions 1 and 2 hold and that data are generated by (1) and (2) with ci such that supi|ci| c < . Then under joint limits as (T, n ) with n/T 0 * ) N(0, V nT( *), where V * =
uu 1 . xx
251
As is apparent from Theorem 4 the serial correlation corrected pooled panel OLS estimator has indeed very desirable properties. It is nT-consistent, asymptotically normal and free of asymptotic biases irrespective whether the regressors xi, t in (2) carry out exact or near unit roots in their generating mechanisms. This is a remarkable improvement that can be gained, if panel data are used, since none of the existing time series estimators for cointegrating parameters can achieve these features. Rather, as shown e.g. by Elliott (1998) the time series cointegration regression estimators tend to suffer from second order biases unless the regressors are generated by exact unit root processes, and these biases lead to severe size distortions in hypothesis testing. In contrast, we will show below that by the use of the serial correlation corrected pooled panel OLS estimator we can achieve robust inferences in fairly general situations where individual regressors may have roots that vary heterogeneously within a range of values near one. Unfortunately, the situation turns out less hopeful, if the panel regression in (1) includes individual intercepts or if the data exhibit linear or higher order time trends. While there is a natural way to modify the new serial correlation corrected pooled OLS estimator to take these effects into account, it turns out that in these cases near unit roots result in nuisance parameters that produce bias effects to the asymptotics of the estimator. To see why this happens suppose the regression in (1) includes an intercept that may vary across individuals. This suggests the use of demeaned data in the formula of the estimator. Accordingly, modify (12) to the form
*=

n T i=1 t=1 n T i=1 t=1
u x i, ty i, t nT
,
x 2 i, t
(13)
1 1 where y i, t = yi, t y i and x i, t = xi, t x i, with y i = yi, t and x i = xi, t, T t=1 T t=1 respectively. The asymptotic properties of the estimator in (13) are easily found by employing the sequential limit theory. To reveal the most essential part of this exercise note that we have 1 T
T t=1
x i, tu i, t
c (r)dBu (r) + u, K i i
(14)
252
HEIKKI KAUPPI
c (r) is a demeaned Ornstein-Uhlenbeck process dened by K c (r) = where K i i Kci(r)
still remove the bias effects that arise from the presence of u on the right hand side of (14), the remaining term, i.e.
Kci(s)ds. Now, while the temporal correlation correction in (13) can
c (r). mean in comparison with the case in (5), where we had Kci(r) in place of K i In fact, E
c (r)dBu (r), does no longer have a zero K i i
0 n
c (r)dBu (r) = u K i i
1 0 0
e(r s)cidsdr
and we thus obtain

1 n
i=1 0
p c (r)dBu (r) uxx, as n , K i i
where xx is given above. In view of this result it is easy to see that the estimator in (13) is subject to an asymptotic bias, which depends on the nuisance parameters ci. Unfortunately, no technique is currently available that would provide consistent estimates for the single localizing coefcients ci. Only in the special case where the localizing coefcient are the same across i, we may use the cross sectional dimension of the panel to provide consistent estimates for the common localizing coefcient (see Moon & Phillips (1999)). This fact opens a possibility for correcting the bias effects. However, such a correction may be rather complicated and is to be restricted in cases where the common c is well below zero (cf. Moon & Phillips (1999)). While it is out of the scope of this study to consider this matter in more detail, in empirical applications the special case of a common c is nevertheless hardly realistic. D. Fully Modied Estimation We turn to consider the PFM estimator of Phillips & Moon (1999a). The idea of the PFM estimator is to modify the pooled OLS estimator in (3) by employing non-parametric corrections in the same way as in the fully modied OLS (FM-OLS) estimator of Phillips & Hansen (1990). The estimator is dened by
253
+=

n T i=1 t=1 n T i=1 t=1
+ + xi, tyi, t nTu
,
x2 i, t
(15)
where
1 + u xi, t yi, t = yi, t
(16) (17)
and
+ 1 u u , = u
employ the kernel estimators in (11). The equation (16) gives an endogeneity correction and is similar to that in the FM-OLS estimator of Phillips & Hansen (1990). The equation (17) gives the contemporaneous and serial correlation corrections that are needed to remove all the second order bias effects arising from temporal correlation between ui, t and i, t. Under the assumption that the regressors xi, t in (2) have exact unit roots the joint asymptotics of the PFM estimator are determined by Theorem 9 of Phillips & Moon (1999a). The following theorem shows how this result changes when the regressors xi, t are generated by the more general class of near unit root processes. Here we make an additional (technical) assumption that the values of ci are such that the ci-weighted average of the expected values of
Jci(r)2dr converges to a nite number, i.e.

1 limn n
ciE
Jci(r)2dr = c xx
i=1
exists and is nite by assumption. Theorem 5. Suppose the assumptions of Theorem 4 hold. Then under joint limits as (T, n ) with n/T 0 + + ) nBn, T N(0, V (a) nT( ), p + ) B, (b) T( where
+ V =
u 1 , xx
(18)
254
1 with u = uu 2 u , and
HEIKKI KAUPPI
u Bn, T =
n i=1
T(eci/ T 1)
n T
u c xx . B = xx

T t=1 i=1 t=1
xi, t xi, t 1
(19)
x2 i, t
(20)
The following corollary holds when the assumption of Phillips & Moon (1999a) about exact unit roots in the regressors xi, t is valid. Corollary 6. Suppose Assumptions 1 and 2 hold and data are generated by (1) and (2) with ci = 0 for all i. Then under joint limits as (T, n ) with n/T 0 1 + ) N(0, 2u nT( ). It is indeed easy to see that the result of Corollary 6 follows from Theorem 5, 1 1 1 Jci(r)2dr = E Wi(r)2dr = because if ci = 0, then Bn, T = B = 0, and E 2 0 0 1 giving V + = 2u . The result of Corollary 6 coincides precisely with that of Theorem 9 of Phillips & Moon (1999a) and it is illustrative to compare it to Theorems 4 and 5 above. First, note from Corollary 6 the obvious fact that + is nT-consistent, when the exact unit root assumption holds, then + is asymptotically normal and unbiased. In addition, note that in this case 2 1 *, because u = uu u uu. This is the generally more efcient than price that we have to pay, if the autoregressive parameters in (2) happen to be * instead of +. exactly equal to one and we use the estimator + is However, as Theorem 5 indicates the behavior of the estimator radically different, if the regressors xi, t are generated by processes with roots + is no more nT-consistent. that are only local to one. First, the estimator Rather, in order to obtain nT-rate asymptotics, a bias term Bn, T given in (19) has to be subtracted from the estimation error. In fact, in view of the result (b) of Theorem 5, if the xi, t are near, rather than exact, unit root processes, the + is only T-consistent and has an asymptotic bias given by B in (20). estimator If there is no simultaneity in the model, i.e. if u = 0, then the biases disappear and the PFM estimator is nT-consistent and has an asymptotic normal distribution with the same variance as that of the serial correlation corrected pooled OLS estimator. To see why the biases arise notice rst that when an autoregressive parameter i in (2) is just nearly one with ci non-zero, then xi, t = i, t + (eci /T 1)xi, t 1,
255
where (eci /T 1) ci /T. It is then easy to see that the use of xi, t in the endogeneity correction term (16) gives raise to Bn, T in (19), which has the limit given in (20). It is worth noticing that if the nuisance parameters ci were known, we could employ a quasi-difference in place of the pure difference xi, t in (16) so that the bias term, Bn, T = 0. However, as we already noted above such a solution is generally infeasible because the localizing coefcient ci are unknown and cannot be consistently estimated from the individual time series xi, t. We close this section by pointing out that the above bias problem also occurs in cases where the PFM estimator is modied to account of deterministic effects like individual intercepts in (1). This fact can be easily veried through sequential asymptotics (for details see Kauppi (1999, p. 124125)). E. Hypothesis Testing In this section we consider testing a simple hypothesis H0: = 0 against H1: 0. First, in view of Theorem 4 we could use the serial correlation corrected pooled OLS estimator to obtained the t-test statistic
* 0) t* = nT(
1 nT2
n T i=1 t=1
x2 i, t
uu
In view of Theorem 4 and the result (36) given in its proof in the appendix it is easy to deduce the following corollary. Corollary 7. Suppose the assumptions of Theorem 4 hold. Then, under joint limits as (T, n ) with n/T 0, t* N(0, 1). For comparison we will also consider assuming exact unit roots in xi, t and accordingly employing the PFM estimator based t-test + 0) t + = nT(
1 , 2 u
1 where and u = uu 2 are obtained from the kernel estimators u given in (11) (cf. Phillips & Moon (1999a, Remark (c), p. 1086)).
Corollary 8. Suppose the assumptions of Theorem 5 hold. Then, under joint limits as (T, n ) with n/T 0 (a) t + diverges, if u 0 and B 0, where B is given in (20);
256
HEIKKI KAUPPI
1 += (b) t + N(0, Vt+ ), if u = 0, where Vt xx. 2
Part (a) of Corollary 8 states the obvious consequence of Theorem 5 that the ttest statistic t + diverges, if the regressors are generated by local to unit root processes and u is non-zero. This means that hypothesis tests based on the PFM estimator are generally severely distorted. The result of part (b) of Corollary 8 shows that even when there is no simultaneity, i.e. u = 0, the test does not have the desired standard normal distribution. To illustrate this latter effect suppose that ci = c for all i. Then, if u = 0, we have
because E
Jci(r)2dr =
1 0 0
+= Vt
2c2 , e 2c 1
2c
(21)
e2(r s)cdsdr = (e2c 2c 1)/4c2 for all i. It is easy
+ becomes larger than unity. to see from (21) that for negative values of c, the Vt + is approximately equal to 5.55 For example, for c = 5 and c = 10, the Vt and 10.53, respectively. Notice that if the usual 5% critical value 1.96 is applied in the t + -test, then the true asymptotic rejection rates that correspond to c = 5 and c = 10 are approximately equal to 40.3% and 54.6%, respectively.
F. Simulations In this section, we illustrate the theoretical ndings obtained in the previous section by conducting some simple Monte Carlo experiments. We focus on investigating the size behavior of the PFM t-test statistic, t + , and that of the bias corrected t-test, t*. For the experiments we generate articial data by employing equations (1) and (2), where we impose = 1 in (1). The errors i, t = (ui, t, i, t) are generated simply by equation i, t = chol(C)i, t, where i, t ~ nid(0, I2) across i = 1, . . . , n, and over t = 1, . . . , T, and chol(C) is the Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1, 2 C12 = C21 = u. Thus, we have E(ui, t) = E(i, t) = 0, E(u2 i, t) = E(i, t) = 1 = uu = and E(ui, ti, t) = u. The initial values yi, 0 and xi, 0 are set to zeros. Table 1 reports percentage rejection rates of the t-tests, t + and t*, respectively, when a 5% critical value 1.96 is applied, n = 50, T = 250, and the local to unit root coefcients are set equal to a common value c, i.e. we use i = = 1 + c/T for all i. In computing the long-run covariance estimates in t + and t*, respectively, we employed the Parzen kernel function and the bandwidth parameter value K = 1.[2] The columns under c = 0 report results when an exact unit root assumption holds. In accordance with the analytical
257
Table 1.
Monte Carlo results with n = 50 and T = 250

c = 5 t* 4.70 4.40 6.80 4.50 4.50 t + 42.10 89.80 100.0 100.0 100.0 t* 5.00 4.30 4.90 4.00 5.80 t + 52.30 99.60 100.0 100.0 100.0 c = 10 t* 4.20 5.40 4.90 5.60 4.50
c=0 u 0 0.2 0.4 0.6 0.8 t + 5.20 5.30 6.60 4.30 4.30
Notes: The columns under t + and t* report Monte Carlo rejection rates of the respective t-tests computed by employing long-run covariance estimates that were achieved by using a Parzen kernel function and a bandwidth parameter value K = 1. A nominal 5% asymptotic level were applied. In each replication, the data were obtained by using equations (1) and (2) with = 1 and i = = 1 + c/T in (1) and (2), respectively, initial values zeros, and with the errors i,t = (ui,t,i,t) generated by equation i,t = chol(C)i,t, where i,t ~ nid(0, I2) across i = 1, . . . , n, and over t = 1, . . . , T, and chol(C) is the Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1, C12 = C21 = u. Results are based on 1000 replications.
results of the previous section, in this case, the size behavior of the two tests is good. The columns under c = 5 and c = 10 give rejection rates when the roots of the regressors are only nearly one. As predicted by Corollary 8, now the t + -test is very sensitive to deviations from exact unit roots and suffers from severe size distortions through all values of u. Notice that even when u = 0 the t + -test rejects far in excess to the desired 5% nominal level as was predicted by the considerations of the previous section. In contrast, as predicted by Corollary 7 the bias corrected t-test, t*, maintains well the desired size level through different values of u. Table 2 reports otherwise similarly computed test results as those of Table 1 except that now n and T are set to 25 and 100, respectively. As is apparent the results do not change much from those of Table 1. This indicates that our asymptotic results can provide fairly accurate approximations with sample sizes that are typical in empirical applications. Table 3 examines the performance of the bias corrected t-test when the individual localizing coefcients in the generating mechanisms of the regressors vary across different panel members. The heterogeneity across panel members were obtained by using otherwise similarly generated data as in Tables 1 and 2 except that all the individual specic localizing coefcients ci were drawn from a uniform distribution on the interval [c, 0]. For example, the column denoted by (n = 25, T = 100) and c = 10 reports simulation results
258
HEIKKI KAUPPI
Table 2.
Monte Carlo results with n = 25 and T = 100

c = 5 t* 6.20 6.10 5.10 4.90 5.80 t + 37.40 74.60 99.20 100.0 100.0 t* 5.50 4.00 5.10 6.20 5.60 t + 52.30 96.50 100.0 100.0 100.0 c = 10 t* 5.50 6.60 6.20 5.80 5.00
c=0 u 0 0.2 0.4 0.6 0.8 t + 6.80 6.60 6.00 5.40 5.30
Notes: See the notes of Table 1.
based on an experiment, where the autoregressive coefcients i( = 1 + ci/T) across different panel members vary uniformly within the range [0.9, 1]. A comparison of the results of Table 3 to those of Tables 1 and 2 clearly indicates that the bias corrected t-test behaves equally well whether the xi, t have homogenous or heterogenous localizing coefcients. In view of the above reported simulation experiments we may conclude that near unit roots indeed result in severe size distortions to hypothesis tests based on the PFM estimator. On the other hand, the results are fairly promising with
Table 3.
Monte Carlo results on the bias corrected test when localizing coefcients are heterogenous
(n = 50, T = 250) (n = 25, T = 100) c = 5 5.00} 6.20 5.12 5.98 5.44 c = 10 5.18 5.00 5.34 5.46 5.92
u 0 0.2 0.4 0.6 0.8
c = 5 4.82 5.80 4.96 5.42 5.18
c = 10 5.10 5.06 4.62 5.06 5.18
Notes: The table reports Monte Carlo rejection rates of the t*-test computed in the same way as in Tables 1 and 2. The data were obtained otherwise similarly as in Tables 1 and 2 except that in each replication the individual specic localizing coefcient ci (i = 1, . . . , n) were drawn from a uniform distribution on the interval [c, 0]. The applied values of c are given in the top of each column. Results are based on 5000 replications.
259
regard to the new bias corrected test, which was able to maintain good size behavior through all the performed experiments. However, it should be pointed out that our simulation setup here is rather simple and it is likely that some problems arise in more complicated models. For example, if the data generating mechanism obeys a more general short-run dynamics than experimented here, then it can be expected that the non-parametric corrections are subject to somewhat larger (nite sample) estimation errors, which may weaken the performance of the bias corrected test. Furthermore, an additional source of estimation error results in when the non-parametric estimators use estimated values in places of the true values of the errors.
IV. CONCLUDING REMARKS

This chapter developed new panel data limit theory that can be used in obtaining convergencies in probability and in distribution when there is heterogeneity across panel members and the cross sectional and time series dimensions of the data tend to innity simultaneously. The new theory was applied to study asymptotics of a panel regression in which the regressors were generated by a local to unit root process with cross sectionally heterogenous localizing coefcients. The application demonstrated that a serial correlation corrected pooled panel OLS estimator yields nT-consistent and asymptotically normal estimates that are centered to the true parameter value irrespective of whether the regressors are nearly or exactly integrated. While this desirable result holds only in the special case without deterministic effects, our asymptotic analysis also indicated that the panel fully modied estimator is subject to asymptotic biases even in this simple case, if the regressors are nearly rather than exactly integrated. Therefore, much care should be taken in interpreting results achieved by the recent panel cointegration methods that assume exact unit roots when near unit roots are equally plausible.
NOTES
1. This is proved by Phillips & Moon (1999a, Theorem 8) when ci = 0 for all i. Furthermore, similar result can be proved in the case where the ci are nonzero by following lines given in the proof of Theorem 5 of this chapter. 2. In empirical applications a bandwidth parameter value K = 1 is hardly realistic. However, in the present simulation setup the actual value of K does not play an important role, because we use iid errors in the simulations. For example, in all of the
260
HEIKKI KAUPPI
reported cases, essentially similar results were obtained by using the bandwidth parameter value K = 4.
ACKNOWLEDGMENTS
I would like to thank the two referees for their useful comments and suggestions. This paper was completed while the author worked at the Research Department of the Bank of Finland whose hospitality is gratefully acknowledged. This paper is a part of the research program of the Research Unit on Economic Structures and Growth (RUESG) at the Department of Economics at the University of Helsinki. Financial support from the Yrj Jahnsson Foundation is appreciated. The usual disclaimer applies.
REFERENCES
Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley. Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press. Elliott, G. (1998). On The Robustness of Cointegration Methods When Regressors Almost Have Unit Roots. Econometrica, 66(1), 149158. Kauppi, H. (1999). Essays on Econometrics of Cointegration. Research Reports Nro 84, Dissertationes Oeconomicae, Department of Economics, University of Helsinki. Moon, H., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using Panel Data. Cowles Foundation Discussion Paper No. 1224, Yale University, (http://cowles.econ.yale.edu/). Phillips, P. C. B. (1987). Towards A Unied Asymptotic Theory for Autoregression. Biometrica, 74(3), 535547. Phillips, P. C. B. (1988). Regression Theory for Near-integrated Time Series. Econometrica, 56(5), 10211043. Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference In Instrumental Variables Regression With I(1) Processes. Review of Economic Studies, 57, 99125. Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Non-stationary Panel Data. Econometrica, 67(5), 10571111. Phillips, P. C. B., & Moon, H. (1999b). Non-stationary Panel Data Analysis: An Overview of Some Recent Developments. Cowles Foundation Discussion Paper No. 1221, Yale University, (http://cowles.econ.yale.edu/). Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. The Annals of Statistics, 20(2), 9711001. Stock, J. H. (1997). Cointegration, Long-run Comovements, and Long Horizon Forecasting. In: D. Kreps & K. F. Wallis (Eds), Advances in Econometrics Proceedings of the Seventh World Congress of the Econometric Society. Cambridge: Cambridge University Press. Stout, W. F. (1974). Almost Sure Convergence. New York: Academic Press. White, H. (1984). Asymptotic Theory for Econometricians. Academic Press: San Diego, California.
261
APPENDIX
APPENDIX A: PROOF OF THEOREM 2
1 From the conditions of the theorem we know that Xn, T = Yi, T n i=1 n 1 Xn = Yi as T for all xed n. Since supTE||Yi, T||1 + M < for all i and n i=1
because Yi, T Yi implies ||Yi, T||1 + ||Yi||1 + by the continuous mapping theorem we also have E||Yi||1 + M < by Theorem 5.3 of Billingsley (1968) (see also discussion on p. 33 of Billingsley (1968)). By arguments given in the proof of Theorem 1 of Phillips & Moon (1999a) we can justify that the Yi are independent across i, since the Yi, T are independent across i for all T. Given this and the fact that E||Yi||1 + M < , we may apply Markovs law of large p numbers to deduce Xn X as n (e.g. White (1984, p. 33)). Furthermore, if we establish conditions (i) through (iv) of Theorem 1, then Xn, T X as (T, n ). First, condition (i) holds, since
1 n
p
n i=1
1 E||Yi, T|| n
n i=1
< , sup E||Yi, T|| M

T
where the last two inequalities follow from condition (b) of the theorem. Also, condition (ii) holds, since
1 n
n i=1
||E(Yi, T) E(Yi)|| sup ||E(Yi, T) E(Yi)|| 0, as T ,

i
by condition (a). For condition (iii) we use the fact that E||Yi, T||1{||Yi, T|| > n} 1 M sup E||Yi, T||1 + for all i, where the rst inequality follows from (n) T (n) arguments given by Billingsley (1968, p. 32) and the second inequality holds by condition (b). Now, for any > 0,
1 n
n i=1
E||Yi, T||1{||Yi, T|| > n}
M , (n)
262
HEIKKI KAUPPI
and therefore, condition (iii) follows. Condition (iv) holds by the same 1 M E||Yi||1 + . argument as we notice that now E||Yi||1{||Yi|| > n} (n) (n)
APPENDIX B: PROOF OF THEOREM 3

Let s
2 n, T
n i=1
Vi, T and dene i, n, T =
n i=1 n, T
Yi, T . Then sn, T
i, n, T N(0, 1), as (T, n ),
(22)
by Theorem 2 of Phillips & Moon (1999a), if the Lindeberg condition

lim
n i=1 n
E[2 i, n, T1{|i, n, T| > }] = 0, > 0,

1
Yi, T N(0, V) as (T, n ). It n i = 1 remains to verify the above Lindeberg condition. We have for given > 0, holds. Given condition (i), (22) implies
n i=1
E[
2 i, n, T
1{
2 i, n, T
> }] =
n 1 = 2 sn, T n n 1 2 sn, T n

E
i=1
Y2 Y2 i, T 1 2i, T > 2 sn, T sn, T
2 E Y2 i, T1 Y i, T >
i=1 n
s2 n, T n n
2 sup E Y 2 i, T1 Y i, T > T
i=1
s2 n, T n n
(23)
(1 + ) N < for By condition (ii) we can always nd > 0 such that sup E|Y 2 i, T| T
all i. Given this we obtain

2 sup E Y 2 i, T1 Y i, T > T

s2 n, T n n

s2 n, T n
(24)
263
for all i (cf. Billingsley (1968, p. 32)). In view of (23) and (24) and given that n s2 n, T = V < we may condition (i) implies lim 2 = 1/V < (V > 0) and lim n, T s n, T n n, T now conclude that
n, T
lim
n i=1
2 E[2 i, n, T1{i, n, T > }] = 0,
so that the Lindeberg condition follows.
APPENDIX C: PROOF OF THEOREM 4

We start by giving some intermediate results that we will use repeatedly in the main part of the proof given below. First, just as in Phillips & Moon (1999a, Lemma 2), based on Phillips and Solo (1992), we decompose the i, t as i, t 1 i, t, (25) i, t = Ci, t + where C = C(1) =
k=0
Ck and i, t =
Assumption 1(a), C is nite and

j=0 j=0
ji, t j with C j= C

j||2 = j ||C
2

k=j+1
Ck. Under
2
Cs
< (see
j=0
s=j+1
Phillips & Moon (1999a, p. 1083)). It follows that (26) E|| i, t||2 M < . We partition C = [Cab], (a, b = , w), so that the long-run covariance matrix = CC =
2 C2 + Cw CCw + CwCww
CCw + CwCww uu = 2 2 Cw + Cww u
(27)
For subsequent reference note that the components of i, t = (ui, t, i, t) in (25) may be written as i, t 1 u i, t, (28) ui, t = Ci, t + Cwwi, t + u i, t = Cwi, t + Cwwwi, t + i, t 1 i, t, (29) i, t are the two components in i, t. where u i, t and Next, by equation (2) xi, t =
t s=1
e((t s)/T)cii, s + e(t/T)cixi, 0
264
HEIKKI KAUPPI
and using (29) we can write this as xi, t = Cw f()i, t + Cww f(w)i, t + R(x)i, t, where have used the notation f(a)i, t = and R(x)i, t = e
(t 1)/T)ci
(30)
t s=1
e((t s)/T)ciai, s, a = , w,
(31)
i, 0 + (1 e
ci /T
t1 s=1
e((t 1 s)/T)ci i, s i, t + e(t/T)cixi, 0.
(32)
For later analysis it is useful to have the following two moment bounds. First,
2 f(a)i, 1 t sup sup E sup i 1tT T 1tT T
m i=1
t s=1
e((t s)/T)2 supi|ci| M < ,
(33)
since e((t s)/T)2 supi|ci| M < (recall that supi|ci| c < ). Second, using the inequality E
i, t are iid across i we obtain

i
m i=1
Xi| m
E|Xi|2 (e.g. Davidson (1994, p. 140)) and the fact
sup sup E(R2 sup e((t 1)/T)2 supi|ci| E( 2 2 (x)i, t) 4 i, 0) + 4 sup E( i, t)

1tT 1tT 1tT
+ 4( sup e(t/T)2 supi|ci|)E(x2 i, 0)

1tT
+ 4T 2(1 esupi|ci| /T)2 sup
t1 t1 k=1 s=1
1tT
1 T2
e((2t 2 k s)/T)2 supi|ci|E| i, s i, k| (34)

(t/T)2 supi|ci| 2 supi|ci| 2 i, t
M < .
To see that (34) holds note that sup1 t T e e , E( ) M by (26), 2 supi|ci| /T 2 ) = O(1), and by the E(x2 i, 0) M (by the initial value condition), T (1 e Cauchy-Schwartz inequality E| i, k i, s| E( i, k)2E( i, s)2 M, where the latter inequality follows again from (26).
265
We turn to give the completing steps of the proof of Theorem 4. Write

1 n
* ) = nT(
n T
i=1
u u) (xi, tui, t u) n(
t=1
1 n
u u) = op(1), as (n, T ) with n/T 0 (recall Remark 1). It where n( sufces to show that
1 nT

n i=1
1 T2
,
x2 i, t
t=1
n T i=1 t=1
(xi, tui, t u) N(0, uuxx), as (T, n ) with n/T 0, (35)
and
1 nT 2
n T i=1 t=1 n i=1
x2 i, t xx, as (T, n ).
(36)
To prove (36) use (30) to write

1 nT 2
n T i=1 t=1
1 x = Cw n
2 i, t 2 n
1 + 2CwCww n
n
+ 2Cww
1 n

T n T
1 T2
T
2 ()i, t
+C
2 ww
t=1
1 n
i=1
1 T2
2 f(w)i, t
t=1
i=1
1 T2
T
t=1
1 f()i, t f(w)i, t + 2Cw n

n
i=1
1 T2
f()i, tR(x)i, t
t=1
i=1
1 T2
f(w)i, tR(x)i, t +
t=1
1 n
i=1
1 T2
R2 (x)i, t
t=1
2 = C2 wIb1 + CwwIb2 + 2CwCwwIIb1 + 2CwIIb2 + 2CwwIIb3 + IIb4, say.
We now show that Cw2Ib1 + C2 wwIb2 xx and IIb1, IIb2, IIb3, IIb4 0 as (T, n ) so that (36) follows. Write
1 Ib1 = n
n i=1
Yi, T,
(37)
266
HEIKKI KAUPPI
1 where Yi, T = 2 T
T t=1
f(2 )i, t. For an application of Theorem 2 observe that Yi, T are
independent across i for all T and as T , Yi, T Yi = E(Yi) =
1 0 0 p
e(r s)2cidsdr and by assumption lim
1 n n

1 0 n 1 i=1 0 0
Jci(r)2dr. We know
r
e(r s)2cidsdr = xx
exists. Therefore, if the conditions (i) and (ii) of Theorem 2 hold,

1 n
For verifying condition (i) let p = 1 + and use the denition of Yi, T in (37) to obtain
1 (E|Yi, T| ) = 2 E T
p 1/p
n i=1
Yi, T xx as (T, n ).

T p 1/p T t
2 ()i, t
t=1
1 2 T
e((t s)/T)cii, s
t=1
s=1
2p
1/p
, (38)
where the inequality follows from the Minkowskis inequality and the denition of f()i, t in (31). Now, the e((t s)/T)cii, s, (1 s t T), are independent random variables with zero means and E|e((t s)/T)cii, s|2p (esupi|ci|})2 + 2 E|i, s|2 + 2 M for some M < and some > 0. Therefore, we may apply Theorem 3.7.8 of Stout (1974, p. 213) to obtain E
t s=1
e((t s)/T)cii, s
2p
Mt p,
(39)
where M is nite and independent of i. By inserting (39) into (38) and rising to the power of p = 1 + it is easy to see that E|Yi, T|1 + M so that condition (i) of Theorem 2 follows. For condition (ii) of Theorem 2 it sufces to note that the supremum of the absolute difference between E(Yi, T) = and E(Yi) =
1 T2
1 0 0
T t t=1 q=1
e(t q/T)2ci
e(r s)2cidsdr tends to zero uniformly in i as T (this follows
since supi|ci| c < , for details see Kauppi (1999, p. 135136)).
267
Obviously the above analysis remains the same if we replace i, t in the denition of Yi, T in (37) with wi, t implying that Ib2 has the same limit as Ib1. 2 2 2 2 Noticing from (27) that Cw + Cww = we therefore see that CwIb1 + CwwIb2 converges in probability to xx as desired. p We turn to prove that IIb1, IIb2, IIb3, IIb4 0 as (T, n ) by showing that E(IIb1)2, E|IIb2|, E|IIb3|, E|IIb4| 0 as (T, n ). First, by the inequality E

m m
Xi| m
E|Xi|2 (e.g. Davidson, 1994, p. 140) and condition (b) of

2
i=1
i=1
1 Assumption 1 we have E(IIb1) = 2 E(f()i, t /T)2E(f(w)i, t /T)2 = n T i=1 t=1 1 O , where the latter equality follows from (33). Second, the use of the n
n T T t=1
triangular and Cauchy-Schwartz inequalities shows that

1 E n

n T i=1
1 T2
f(a)i, tR(x)i, t
t=1
1 1 T n

n i=1
1 T
f(a)i, t T
E|R(x)i, t|2 = O

1 T
where the equality follows from (33) and (34). Hence, E|IIb2|, E|IIb3| 0 as (T, n ). It is also straightforward to do similar calculations with IIb4 that show E|IIb4| 0 as (T, n ). This completes the proof of (36). We turn to prove the result in (35). First, use (28) through (30) to write
1 nT

n T i=1 t=1 n T
(xi, tui, t u)
i=1
1 T
(Cw f()i, t + Cww f(w)i, t)(Ci, t + Cwwi, t) u
t=1

(40)
i=1
1 T
[xi, t( ui, t 1 u i, t) + R(x)i, t(Ci, t + Cwwi, t)] + u u
t=1
= Ia + IIa, say.
Note that f(a)i, 1 = ai, 1 and f(a)i, t = eci /Tf(a)i, t 1 + ai, t, (ai, t = i, t, wi, t), t 2, so that we may write
268
HEIKKI KAUPPI
Ia =
1 n 1
n 1

n T i=1
1 T
(Cw f()i, t 1 + Cww f(w)i, t 1)(Ci, t + Cwwi, t)

T
t=2
i=1 n
(eci /T 1) T
T
(Cw f()i, t 1 + Cww f(w)i, t 1)(Ci, t + Cwwi, t)
t=2
i=1
1 T
[(Cwi, t + Cwwwi, t)(Ci, t + Cwwi, t) u] = Ia1 + Ia2 + Ia3, say.
t=1
To consider the asymptotic properties of Ia1 write Ia1 = where

1 Yi, T = T 1 n
n i=1
Yi, T,
T t=2 T
[CwC f()i, t 1i, t + CwwCwf(w)i, t 1wi, t
+ CwCw f()i, t 1wi, t + CwwC f(w)i, t 1i, t]. Since the summands in Yi, T are uncorrelated over t and the four terms in the square brackets in (41) are mutually uncorrelated for all t it follows that
1 E(Y ) = 2 T
2 i, T
t=2
2 2 2 2 2 2 [Cw CE(f()i, t 1i, t) + CwwCwE(f(w)i, t 1wi, t)
2 2 2 2 2 2 + Cw CwE(f()i, t 1wi, t) + CwwCE(f(w)i, t 1i, t) ]
1 = uu 2 T
T t1 t=2 s=1
e((t 1 s)/T)2ci,
(42)
where the last equality uses (27) and the fact that E(f(a)i, t 1bi, t)2 =
t1 s=1
e((t 1 s)/T)2ci (a, b = , w).
Now, we apply Theorem 3. First, note that the Yi, T in (41) are independent across i for all T with mean zero and variance Vi, T = E(Y2 i, T ) in (42). Let Vi = uu
1 0 0
e(r s)2cidsdr and write
269
1 n

n n n i=1
1 Vi, T = n
i=1
1 Vi + n
(Vi, T Vi).
(43)
i=1
Using the fact that supi|ci| c < it is straightforward to show that the second term on the right hand side of (43) tends to zero as n, T (see Kauppi (1999, p. 135136)). On the other hand, the rst term in (43) has the positive and nite limit xx. Thus, condition (i) of Theorem 3 holds with V = uuxx. For establishing condition (ii) of Theorem 3 recall the denition of Yi, T from (41), let p = 2 + and apply the inequality E Davidson (1994, p. 140)) to obtain
T
1 E|Yi, T| ME T
p
1 + MwE T

m p m
Xi
T
mp 1
E|Xi|p (e.g.
i=1
i=1
f()i, t 1i, t
t=2
1 + MwwE T
p
f(w)i, t 1wi, t
t=2
f()i, t 1wi, t
t=2
1 + MwE T
f(w)i, t 1i, t ,
(44)
t=2
where Mab = 4p 1|CwaCb|p M < (a, b = , w). Furthermore, by the fact that i, t are iid we have E
f()i, t 1i, t
=E
t1
T
p
e((t 1 s)/T)cii, si, t

t1
s=1
= E|i, t| E
T
p/2
e((t 1 s) /T)cii, s
s=1
t1 T
M < ,
(45)
because |e
t1
((t 1 s)/T)ci
| e |ci|} M < , E|i, t|

supi
2+
M < , and E
M(t 1)(2 + )/2 for some M < and for some > 0, where the result with regard to E

t1
i, s
2+
s=1

i, s
s=1
2+
follows from Theorem 3.7.8 of Stout (1974, p. 213) (note that
270
HEIKKI KAUPPI
an iid sequence is also a martingale difference sequence). Now, given (45) and the fact that the f()i, t 1i, t, (2 t T) are martingale difference sequences for all i, we may apply Theorem 3.7.8 of Stout (1974, p. 213) one more time giving
f()i, t 1i, t T
t=2
T1 T
p/2
M < for all i.
The same arguments show that the other three expectations in (44) are similarly bounded, and therefore, supTE|Yi, T|p = supTE|Yi, T|2 + M < for some > 0 and all i. Hence, the conditions of Theorem 3 hold and we have shown that Ia1 converges weakly to the distribution given in (35) as (T, n ). Furthermore, p since supi|eci /T 1| = O(T 1), it follows immediately that Ia2 0 as (T, n ). For Ia3 recall from (27) that u = CCw + CwCww so that Ia3 =
1 n 1 n

n T i=1 n
1 T 1 T
[(Cwi, t + Cwwwi, t)(Ci, t + Cwwi, t) (CCw + CwCww)]
t=1 T
2 [CCw(2 i, t 1) + CwwCw(wi, t 1)
i=1
t=1
+ (CwCw + CwwC)i, twi, t] 0 as (T, n ), where the probability limit follows because the summands in the square brackets are iid with zero mean and nite second order moment across both i and t. The remaining step in the proof of Theorem 4 is to show that IIa in (40) is asymptotically negligible. First, in the same way as in the proof of Lemma 16 of Phillips & Moon (1999, p. 1105) we may decompose the one sided long-run covariance matrix =+
CsC k
CkC s = +
k=1
s=k
k=0
s=1
k=0
kC 0. C k + 1 CC
j = [C ab, j], (a, b = , w); we may Using this in conjunction with the partition C write
1 n
IIa =

n T i=1
1 T
xi, t( ui, t 1 u i, t)
t=1
j=0
, j + Cww, j + 1C w, j) (Cw, j + 1C
271
1 n

n T i=1
1 T
w, 0C + C ww, 0Cw)} = IIa1 + IIa2, R(x)i, t(Ci, t + Cwwi, t) + (C say.
t=1
For IIa1 note that we can write

1 T
T t=1
1 1 xi, tu i, t 1 = xi, 1u i, 0 + T T
T t=2
1 1 xi, tu i, t 1 = xi, 1u i, 0 + T T
1 1 = xi, 1u i, 0 + eci /T T T
T1 t=1
xi, tu i, t +
1 T
T1 t=1
T1 t=1
xi, t + 1u i, t
i, t + 1u i, t,
and, thus,
1 T
T t=1 n
xi, t( ui, t 1 u i, t) =
1 T
T1 t=1
1 1 i, t + 1u i, t + xi, 1u i, 0 xi, Tu i, T T T 1 T
+ (eci /T 1) In view of this expression we get IIa1 =

1 n
T1 t=1
xi, tu i, t.
n + T

i=1
1 T
T1
i, t + 1u i, t
t=1
i=1
1 1 xi, 1u i, 0 T n

j=0 n i=1
, j + Cww, j + 1C w, j) (Cw, j + 1C
1 1 xi, Tu i, T + T n
, j + Cww, j + 1C w, j) (Cw, j + 1C
j=0
n i=1
(e
ci /T
1 1) T
T1 t=1
xi, tu i, t
= IIa1a + IIa1b + IIa1c + IIa1d + O
As a counterpart to the result E
n i = 1 proof of Lemma 16 of Phillips & Moon (1999, p. 1107) we have

n , say. T
n
R1, i, T
=O
1 derived in the T
272
HEIKKI KAUPPI
i=1
1 T
T1
i, t t+1
t=1
kC C k+1
=O
k=0
1 . T
(46)
Since IIa1a is the (1, 2) element of the matrix inside the norm on the left hand p side of (46), we have IIa1a 0 as (T, n ). Next, by the triangle and CauchySchwartz inequalities E
i=1
1 xi, Tu i, T T
n1 Tn
xi, T
i=1
E| ui, T|2
n T
sup E
xi, T
1in
E| ui, T|2 = O

xi, t T
n , T
where the equality is easily veried by using (26), (30), (33) and (34). p p Therefore, IIa1c 0 as (T, n ) with n/T 0. Obviously, also, IIa1b 0 as (T, n ) with n/T 0. Finally, for IIa1d, let rT = T|esup|ci|/T 1| and note that E|IIa1d| rTE
rT
i=1
1 T2
T1
xi, tu i, t rT
t=1
n1 Tn
i=1
1 T
T1
u i, t
t=1
n1 Tn
i=1
1 T
T1
xi, t
t=1
E| ui, t|2 = O
n , T
by similar arguments to those used for IIa1c and the fact that rT = O(1). p We turn to show that IIa2 0 as (T, n ) with n/T 0. Using (32) write IIa2 =
1
n 1

n T i=1 n
1 T
w, 0C + C ww, 0Cw) i, t(Ci, t + Cwwi, t) (C

((t 1)/T)ci
t=1
i=1
1 T
(e
i, 0 + (1 e
say.
ci /T
t=1
t1 s=1
e((t 1 s)/T)ci i, s + e(t/T)cixi, 0)
(Ci, t + Cwwi, t) = IIa2a + IIa2b,

p
Here IIa2a 0 as (T, n ) with n/T 0, because IIa2a is identical with the term
1 n
n i=1
R3, i, T in the proof of Lemma 16 of Phillips & Moon (1999a,

p
273
p. 1105). Finally, the result IIa2b 0 as (T, n ) with n/T 0 follows from similar arguments as those used for IIa1. Details are straightforward and thus are omitted. This completes the proof of the theorem.
APPENDIX D: PROOF OF THEOREM 5

The proof follows from the same arguments as the proof of Theorem 4. To see the main lines write + ) nBn, T nT(
1 n
, T 1 2 xi, t n i=1 T 2 t=1 + where the denominator has the limit given in (36). Next let u = u 1 u and note that the nominator in the above estimation error can be written as
1
n
n T
i=1
1 + ci /T 1 u [xi, t(ui, t u xi, t 1)u xi, txi, t 1] ) + T(e
t=1
1 n
n T i=1
1 T
1 + [xi, t(ui, t u i, t) u ]
t=1
n( u
1 u ) n
1
n T i=1
1 T
(xi, txi, t )
t=1
1 u u) + ), n( u n(
where the n-normalized estimation errors of the kernel estimators are op(1) as (n, T ) with n/T 0 (recall Remark 1). Furthermore, using the fact that xi, t = (eci /T 1)xi, t 1 + i, t we can write
1 n
n T i=1
1 T
t=1
1 (xi, txi, t ) = n
1 + n

n i=1
(eci /T 1) T
T
xi, txi, t 1
t=1
i=1
1 T
(xi, ti, t ) = Op(1),
t=1
where the last equality holds as (n, T ) and can be proved by applying the arguments given in the proof of Theorem 4. Thus, for the result in part (a) of Theorem 5, it sufces that
274
HEIKKI KAUPPI
1 n
n T i=1
1 T
1 + [xi, t(ui, t u i, t) u ] N(0, u xx),
t=1
as (T, n ) with n/T 0. The details of the proof of this latter result are similar to those of the proof of (35) and are thus omitted. Finally, note that the limiting result in part (b) of the theorem follows from lines used in the proof of (36) and the fact that the arithmetic average of the quantities ciE(01 Jci(r)2dr) converges to a nite number c xx.
STATIONARITY TESTS IN HETEROGENEOUS PANELS

Yong Yin and Shaowen Wu
ABSTRACT
Several stationarity tests in heterogeneous panel data models are proposed in this chapter. By allowing maximum degree of heterogeneity in the panel, two different ways of pooling information from independent tests, the group mean and the Fisher tests, are used to develop the panel stationarity tests. We consider the case of serially correlated errors in the level and trend stationary models. The small sample performances of the tests are investigated via Monte Carlo simulations. The simulation experiments reveal good small sample performances. In the presence of serial correlation, either the group mean or the Fisher tests based on individual KPSS tests with l2 and LMC tests with p = 1 are recommended for use in empirical work due to their good small sample performances.
I. INTRODUCTION
Dynamic panel data analysis has attracted more and more attention. This is partly due to the recent availability of large panel data sets. These data sets usually cover different countries, industries, or regions over relatively long time spans. They offer new opportunities as well as challenges to the analysis of dynamic panel data models, especially the heterogeneous panel data models as researchers usually would anticipate great differences among the cross-section units in the data.
275
276
YONG YIN & SHAOWEN WU
Along with the development of univariate non-stationary time series analysis, researchers also show more interests in analyzing non-stationary panel data. So far, people have proposed various methods to test for unit roots and cointegration along with methods of estimating cointegrating system in the context of panel data, see Baltagi & Kao (2000) for an up to date survey in this volume. The biggest advantage of using the panel data approach is the increased effective sample size, therefore it can effectively increase the powers of statistical tests and the efciencies of estimation methods compared with their univariate counterparts. However, extending univariate methods of handling non-stationary data to the context of panel data raises the question of heterogeneity as well. The early development of dynamic panel data analysis mainly deals with the homogeneous models. But the availability of panel data sets such as the Penn World Table raises the issue of plausibility of the homogeneous assumption. The parameters as well as dynamic structures of different cross-section units might be different. Hence, it is necessary to develop methods investigating the non-stationary properties in the heterogeneous panel data models. Heterogeneous panel data model is referred to the situation that both the error term structures as well as the slopes can be different across the units. This is quite different from the usual xed-effects (random-effects) models. There have been some papers dealing with tests for unit root and cointegration in the heterogeneous panel in the literature, see, for example, Im, Pesaran & Shin (1997), Maddala & Wu (1999) for panel unit root tests, and Pedroni (1995, 1997), Kao (1999), McCoskey & Kao (1997, 1998), and Wu & Yin (1999) for panel cointegration tests. Baltagi & Kao (2000) recently give a complete survey on this subject as well. As in the univariate case, it would be interesting to test for unit roots by using stationarity as the null. Not only does it provide a complement to the conventional unit root tests using nonstationarity as the null, but it also incorporates the moving average structure that seems to be a common empirical feature, especially for macroeconomic data.1 Thus, it is quite natural to develop stationarity tests for the heterogeneous panel. However, panel stationarity tests have not yet received serious attention in the literature. Stationarity tests have been developed for residuals to be used as the residual-based tests for the null of cointegration in panel data models in McCoskey & Kao (1998). Hadri (1998) addresses panel stationarity test directly. However, he only considers models with i.i.d. errors and only considers homogeneous deterministic trends under the null hypothesis. In this chapter, we shall develop some stationarity tests in heterogeneous panel data models. The models we consider will allow both heterogeneous
Panel Stationarity Tests
277
deterministic trends under the null and different error structures. The tests should be able to handle serially correlated errors in the models. In the univariate case, based on a Lagrange Multiplier (LM) test in case of i.i.d. errors, there are two different extensions to handle the existence of serial correlation. Kwiatkowski, Phillips, Schmidt & Shin (1992) (KPSS hereafter) propose to use nonparametric estimation to handle the situation while Leybourne & McCabe (1994) (LMC hereafter) propose to use augmented autoregressive components to take care of it. We shall propose panel stationarity tests utilizing both tests. One type of the tests we propose would be based on the group mean of the individual test statistics, which can be shown to have a normal distribution asymptotically after some adjustments are made to the group mean. The second test is in line with Maddala & Wu (1999). The idea of the test could be traced back to Fisher (1932), which pools the p-values from individual tests. We will also design some Monte Carlo experiments to investigate the small sample performances of the proposed tests. The rest of the chapter is organized as follows. In Section II we will set up the models for heterogeneous panel and discuss panel stationary tests. Monte Carlo simulation designs and results aiming at investigating small sample performances of proposed tests can be found in Section III, and Section IV concludes.
II. TESTS FOR STATIONARITY IN THE HETEROGENEOUS PANELS

The basic model for testing for trend stationarity in the univariate time series is as follows: yt = rt + t + t where rt is a random walk: rt = rt 1 + t It is assumed that t ~ iid(0, ), t ~ iid(0, 2 ), and t and t are independent. The initial value r0 is treated as xed and serves as the role of an intercept. The null of stationarity is simply 2 = 0. Under the null, yt is trend stationary 2 because t is assumed to be stationary. Dene q = 2 /. q is the so-called signal-to-noise ratio in structural time series models. The null can be specied as H0 : q = 0 as well. If = 0, the model will be reduced to
2
(1)
yt = rt + t and under the null yt is level stationary instead of trend stationary.
(2)
278
The statistic considered in the literature is both the one-sided LM test statistic and the local best invariant (LBI) test statistic under the stronger t be the residuals from the regression assumption that the ts are normal.2 Let e as = of yt on a linear time trend. Dene process of the residuals St =
t i=1
T t=1
e 2 t /T and the partial sum
e i . Then the LM test statistic is LM =
T t=1 1
S2 2 t / .
In order to construct the LM test statistic to test the null hypothesis of level stationary instead of trend stationary, we should dene e t as the residuals from the regression of yt on an intercept only. It has been shown that for the trend stationary model, T 2LM
d
V2(r)2 dr
under the null hypothesis, where V2(r) is the second-level Brownian bridge given by V2(r) = W(r) + (2r 3r2)W(1) + (6r + 6r2)
W(s) ds, with W(r)
being a Wiener process. For the level stationary model, under the null, T 2LM
d
V(r) = W(r) rW(1). There are two ways to incorporate serial correlation into the basic univariate models. One way is due to KPSS and the other one is due to LMC. In KPSS, the models are still (1) and (2) with modication that t can be serially correlated in any form. The usual specication is that t satises the strong mixing regularity conditions of Phillips & Perron (1988). Under such conditions, the normalized numerators of the LM test statistics will converge to the corresponding Brownian bridges associated with the long-run variance 2 of t. So the effort is concentrated on how to get a consistent estimator of 2. KPSS consider the Newey & West (1987) consistent estimator s2(l), which is based on nonparametric estimation of s (l) = T
V(r)2 dr,
where
V(r)
is
standard
Brownian
bridge:

l T
T t=1
1 e 2 t + 2T
w(s, l)
e te t s . This estimator depends on the choice of a spectral
s=1
t=s+1
window w(s, l) along with the truncation parameter l. KPSS use the Bartlett window and recommend choosing l = o(T 1/2). The for resulting test statistics are labeled as for level stationary models and
279
tend stationary models with () = T
T l=1 d
2 2 2 S2 t /s (l), where both St and s (l)
depend on e t, which is the residual from the regression of yt on an intercept only for the level stationary models and on a linear trend for the trend stationary models. It has also been proved that
both tests are consistent. See KPSS for more details of derivation and proof along with some simulation results. The KPSS tests handle the serial correlation in a way similar to those of Phillips-Perron tests for unit roots. LMC, on the other hand, propose to use the augmented autoregression to handle serial correlation, which is similar in a way to those of the Augmented Dickey-Fuller tests for unit roots. Since any stationary structure can be represented by autoregressive structures, LMC work with transformed models of (1) and (2). That is, (L)yt = rt + t + t for trend stationary models, and (L)yt = rt + t for level stationary models, where (L) is a polynomial in lag operator L. To construct the test statistics, one should estimate ARIMA(p, 1, 1) models in order to remove the serial correlation rst, and proceed with the whitened series to get the LM test statistic as if there is no serial correlation. LMC label for the trend stationary the test statistic s for the level stationary models and s models. Please see their paper for detailed descriptions and discussions of the tests. They also show that under the null s
d
V(r)2 dr,
V2(r)2 dr and
LMC argue that their tests are superior to the KPSS tests due to the fact that the augmented autoregression is used to control for serial correlation. Theoretically, the LMC tests are more powerful than the KPSS tests because the LMC test statistics are Op(T) under the alternative while the KPSS test statistics are Op(T/l). This superiority is also shown through Monte Carlo simulation.3 The univariate model for testing for stationarity can be readily extended to the panel data models. Let yit, i = 1, . . . , N, t = 1, . . . , T, be the observed N cross section units of time span of T for which we want to test for stationarity. Let us consider the following models. Level stationarity: yit = rit + it Trend stationarity: yit = rit + it + it (3) (4)
V(r)2 dr and s
V2(r)2 dr.
Where rit = rit 1 + it, with ri0s being xed constants such that ri0 is not necessarily equal to rj0 if i j.4
280
Assumption (i) E(it) = 0, and E(itjs) =
(ii) For each cross-section unit i, it either satises the strong mixing conditions for functional central limit theorem to be hold with long-run variance of 2 i, or it can be expressed in a p-th order AR model. (iii) E(itjs) = 0 i, j, t, s Note that assumption (i) adds heterogeneity to the error structure of by allowing heteroskedasticity. Assumption (ii) also allows heteroskedasticity in while assumption (iii) rules out contemporaneous correlation and states that and are uncorrelated within units as well. 2 Dene qi = 2 i/i, that is, qis are the signal-to-noise ratios in each crosssection units. The null hypothesis can be expressed as H0 : qi = 0 for all i. For level stationary models, under H0, each cross-section unit is stationary around a level ri0, which is not necessarily the same across the units. While for trend stationary models, under H0, each cross-section unit is stationary around a linear trend ri0 + it, which is also not necessarily the same across the units. The different levels and linear trends truly reect the possibility of heterogeneity across sections. The alternative hypothesis is that H1 : qi > 0 for all i. Here, we introduce heterogeneity by allowing different signal-to-noise ratios across sections. That is, the signal-to-noise ratios are only required to be greater than 0 but not necessarily to be the same under the alternative. be the individual KPSS test statistic for the i-th unit. Dene Let and 1 =
2 i 0
if i = j and t = s otherwise
V(r)2 dr and 2 =
mean tests as
= and
1 N N

0 N i=1 N i=1
V2(r)2 dr. We can construct the standardized group
i E(1)
Var(1) i E(2)
for level stationary models
for trend stationary models. Var(2) Similarly, let s i and s i be the individual LMC test statistic for the i-th unit. Dene the standardized group mean tests as
1 N N
281
N
s = and
N
1 N
i=1
s i E(1)
Var(1)
N
for level stationary models
s =
1 N
i=1
s i E(2)
Var(2)
for trend stationary models.
By using the sequential limit theorem, it can be shown that under the null, all four test statistics would have the standard normal distribution asymptotically under the assumption spelled out earlier. Note that the sequential limit theorem requires that T goes to innity followed by N goes to innity, and the asymptotic can be established by an application of the Lindberg-Levy central limit theorem.5 The consistency of the tests is followed by the consistency of the univariate tests established in the literature. It should be noted that the tests are still consistent in the case of a mixed alternative hypothesis in which only part of the panel are nonstationary while the rest are stationary, as long as = lim N1/N > 0 where N1 is the number of nonstationary series under the
N
alternative. Hadri (1998) used the characteristic function given by Anderson & Darling (1952) to compute the means and the variances of i. For the level stationary model, the mean is 1/6 and the variance is 1/45 while for the trend stationary model, the mean is 1/15 and the variance is 11/6300. However, as suggested in Im, Pesaran & Sin (1997), one can use the mean and the variance of small sample distributions (in nite T) obtained via simulations to enhance the nite sample performances of the group mean tests.6 The group mean test pools independent individual test statistics to nd evidence on the composite null. In the literature, there is another way to pool information from individual test to test the composite null, which is due to Fisher (1932). The idea has been applied to develop panel unit root tests in Maddala & Wu (1999) and panel cointegration tests in Wu & Yin (1999). Both the KPSS and the LMC tests can be used to formulate the Fisher tests to test for stationarity as well. Let Pi be the p-value of the individual test for stationarity for the i-th unit (using either the KPSS or the LMC test). Dene the Fisher test statistic as = 2
N i=1
log Pi.7 Then has a 2 distribution with
degree of freedom 2N under the null hypothesis that qi = 0 for all i. Note that
282
the validity of the 2 distribution depends on the accuracy of the distributions from which Pis are derived, and thus it does not rely on the asymptotic of N where the group mean test does. On the other hand, the small sample distribution is usually unknown, so it is necessary to get the small sample distributions via simulations to enhance the small sample performance of the Fisher tests.8
III. MONTE CARLO SIMULATION RESULTS

In this section, we will design some Monte Carlo simulation experiments to investigate the small sample properties of the panel stationarity tests we proposed in the previous section. The object of the simulations is to shed lights on the relative small sample performances of various tests. As we have seen, we can use either the KPSS or the LMC tests to handle the serial correlation. For each univariate stationarity test, we can use either the group mean test or the Fisher test to formulate the panel version. As illustrated in Maddala & Wu (1999) and Wu & Yin (1999), in many cases they considered, the performances of the group mean and Fisher tests are very similar to each other. However we still need to investigate it for stationarity tests. As for the univariate KPSS and LMC tests, LMC established small sample supremacy of their tests. But whether this supremacy can be carried over to the panel tests based on the individual LMC test remains a question, and it can be answered by simulation experiments. The basic models for simulations are models (3) and (4) with rit = rit 1 + it where it ~ iidN(0, qi2 i ). The models for it are it = iit 1 + uit where 2 ) ). Hence when i = 0, its are i.i.d. within each unit, while uit ~ iidN(0, (1 2 i i its are serially correlated within each unit when i 0. These two models are extensions of the standard univariate models for stationarity to the panel data. The introduction of different i 2 i ri0 and i is to allow the largest degree of heterogeneity. For this purpose, we set the parameters as follows: i ~ U[0, 1], 2 i ~ U[0.5, 1.5], ri0 ~ U[0, 5] i = 0 for i.i.d. case and i ~ U[0.1, 0.3] for the case of serial correlation where U denotes the uniform distribution. The null hypothesis is specied as qi = 0 for all i. For the alternative hypothesis, we only consider the case where all qis are positive following the
283
tradition in the literature. It should be noted that all our tests are consistent even when there are only parts of the series are non-stationary under the alternative as long as the portion of nonstationary units is non-vanishing asymptotically. Furthermore, we only consider the alternative H1 : qi = q = 0.001 for simplicity.9 We consider time dimensions of 25, 50, and 100 and cross sectional dimensions of 15, 25, 50, and 100. The normal variates are generated by RNDN function in the matrix programming language GAUSS. We apply the group mean and Fisher tests based on the LM, KPSS, and LMC tests to each panel. For each case, the number of iterations is 5,000. For the group mean test, the mean and the variance of small sample distributions are derived from 100,000 simulations for the corresponding time span and test procedures. For the Fisher test, the small sample distributions are simulated using 100,000 replications as well. In order to carry out our experiments, we still need to select two parameters. One is the truncation parameter l in the individual KPSS tests and the other one is the order of autoregression p in the individual LMC tests. Following earlier simulation results regarding the univariate KPSS tests in the literature, we experiment with l1 = int 4 l3 = int 12

T 100
1/4

T 100
1/4
, l2 = int 8

T 100
1/4
, and
, where int[ ] returns the integer part of the argument.
Also, following earlier simulation results in the literature, we choose the Parzen window instead of the Bartlett window used by KPSS as the former performs better than the later. For the LMC test, we experiment with p = 1, 2, and 3 following Monte Carlo experiments by LMC. Let us rst look at the white noise case. In this case i = 0 and the tests based on the individual LM tests are the appropriate ones to be used. Table 1 presents the sizes of the group mean and the Fisher tests based on the LM, KPSS, and LMC tests for the level stationary model. Note that by choosing l = 0 in the KPSS test or p = 0 in the LMC test, the resulting test statistic is nothing but that of the LM test. That is why the results for the tests based on the LM test are listed in the column with the heading of p(l) = 0. We also listed the results for N = 1 as a benchmark, where the results simply replicate those for the univariate case. As we can see from the table, the size performances of the panel stationarity tests are quite satisfactory in this case. In addition the performances are relatively better as T gets larger. In most cases, the Fisher tests have better size performances than the group mean tests, especially for larger T and smaller N. This is not surprising as the Fisher test is an exact test while the group mean
284
Table 1.
Sizes of Panel Stationarity Tests: Level Stationary Model, White Noise

N 1 15 25 50 100 15 25 50 100 1 15 25 50 100 15 25 50 100 1 15 25 50 100 15 25 50 100 p(l) = 0 0.047 0.061 0.053 0.054 0.046 0.050 0.045 0.047 0.043 0.047 0.066 0.066 0.056 0.057 0.052 0.054 0.049 0.051 0.051 0.056 0.057 0.056 0.056 0.047 0.047 0.049 0.053 l1 0.049 KPSS l2 0.053 l3 0.055 p=1 0.047 0.063 0.057 0.054 0.046 0.046 0.048 0.046 0.041 0.047 0.051 0.067 0.059 0.056 0.049 0.054 0.049 0.048 0.048 0.060 0.061 0.062 0.059 0.045 0.050 0.051 0.053 LMC p=2 0.049 0.059 0.058 0.054 0.050 0.045 0.044 0.047 0.046 0.050 0.065 0.070 0.065 0.061 0.050 0.055 0.055 0.054 0.045 0.063 0.062 0.054 0.056 0.049 0.049 0.047 0.052 p=3 0.051 0.063 0.063 0.059 0.051 0.052 0.053 0.052 0.047 0.050 0.058 0.067 0.057 0.055 0.046 0.053 0.049 0.051 0.049 0.069 0.063 0.064 0.059 0.048 0.049 0.053 0.051
25
Group Mean Test 0.057 0.061 0.059 0.057 0.053 0.053 0.055 0.055 0.055 0.047 0.050 0.053 0.047 0.048 0.050 0.043 0.046 Fisher Test 0.051 0.048 0.052 0.047 0.047 0.056 0.051 0.053 0.052 0.051
50
100
Group Mean Test 0.058 0.057 0.059 0.057 0.058 0.061 0.057 0.058 0.054 0.058 0.056 0.055 0.046 0.049 0.051 0.053 Fisher Test 0.047 0.051 0.051 0.053 0.046 0.051 0.049 0.051
Note: 1. The data generating process is yit = ri0 + it, and it ~ i.i.d.N(0, 2 i ). 2. Please see text for choices of parameters 3. li is the truncation parameter used in individual KPSS test and p is the order of autoregression in ARIMA(p,1,1) used in individual LMC test. p(l) = 0 indicates individual LM test is used.
285
test is an asymptotic test (in N). As for the tests based on the KPSS tests with different lag truncation parameters and the LMC tests with different autoregression orders, the sizes are also quite close to the nominal size of 5%. In general, we also observe that the size performances are better for larger T and the Fisher tests have better size performances in this case. Table 2 presents the powers of the panel stationarity tests for the level stationary models. To make things comparable, all the powers are adjusted according to their true sizes. The powers of the LM based tests clearly state the superiority of the panel stationary tests over their univariate counterparts. When T = 25, the power of the univariate LM test is only 0.117, while the power jumps to 0.392 when 15 cross-section units are used, and it is close to 1 (0.954 for the group mean test and 0.952 for the Fisher test) when N = 100. As a matter of fact, all the powers for T = 100 are 1 and they are close to 1 when T = 50. The powers of the group mean and the Fisher tests in most cases are almost the same. It is documented in the literature that increasing the lag truncation parameter l in the KPSS tests and the autoregression order p in the LMC tests can reduce the powers. This is replicated in Table 2 as those entries for N = 1. However, due to the powerfulness of the panel stationarity tests, the reduction in the powers by overestimating is not an issue in some cases, especially for larger T and N, as in those cases the powers are 1 or close to 1. This is a unique feature of panel stationarity tests. The reduction in power is smaller for the LMC tests as p increases than for the KPSS tests as l increases. The size and power performances of panel stationarity tests in the case of white noise for the trend stationary models are reported in Tables 3 and 4. We have similar observations in these two tables. One thing we need to point out is that in this case the powers are smaller than those of level stationary models, especially for the case of T = 25 where the powers are much smaller. The powers are only 0.280 for the group mean test and 0.279 for the Fisher test even when N = 100, though these represent an increase of nearly four-folds from the univariate case. Next, let us look at the results for the case of serial correlation. Table 5 gives us the sizes of panel stationarity tests in this case. Note that size distortions are expected for the tests based on the LM tests. This can be seen in the table for the case of N = 1. But the size distortions become much worse as N increases. As a matter of fact, the actual sizes are close to 1 when N = 100. This is due to the fact that the size distortions are amplied through pooling the crosssectional units, as pointed out in Wu & Yin (1999) for the panel cointegration tests as well. The size distortions are still quite severe when l1 is used in the KPSS tests and they become moderate when l2 and l3 are used for T = 50 and
286
Table 2.
Size Adjusted Powers of Panel Stationarity Tests: Level Stationary Model, White Noise
25
50
100
Note: 1. The data generating it ~ i.i.d.N(0, 2 i ). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
process is
yit = rit + it, rit = ri,t 1 + it,
it ~ i.i.d.N(0, q2 i ),
and
287
Table 3.
Sizes of Panel Stationarity Test Based on Group Mean: Trend Stationary Model, White Noise
25
50
100
Note: 1. The data generating process is yit = ri0 + it + it, and it ~ i.i.d.N(0, 2 i ). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
288
Table 4.
Size Adjusted Powers of Panel Stationarity Test:Trend Stationary Model, White Noise
25
50
100
Note: 1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2 i ), and it ~ i.i.d.N(0, 2 i ). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
289
Table 5.
Sizes of Panel Stationarity Tests: Level Stationary Model, Serial Correlation

KPSS l1 0.059 0.232 0.302 0.433 0.657 0.205 0.270 0.401 0.641 0.057 0.162 0.208 0.300 0.472 0.140 0.190 0.279 0.456 0.062 0.130 0.169 0.210 0.307 0.109 0.148 0.193 0.293 LMC p=1 0.051 0.130 0.144 0.181 0.212 0.104 0.123 0.156 0.212 0.055 0.099 0.102 0.117 0.145 0.077 0.082 0.091 0.116 0.052 0.077 0.082 0.081 0.087 0.057 0.064 0.066 0.070
N 1 15 25 50 100 15 25 50 100 1 15 25 50 100 15 25 50 100 1 15 25 50 100 15 25 50 100
p(l) = 0 0.079 0.532 0.694 0.904 0.993 0.490 0.669 0.897 0.993 0.080 0.551 0.747 0.945 0.999 0.517 0.729 0.944 0.999 0.094 0.563 0.783 0.944 0.998 0.532 0.773 0.943 0.998
l2 0.050
l3 0.047
p=2 0.054 0.150 0.182 0.230 0.314 0.129 0.160 0.210 0.314 0.060 0.126 0.144 0.182 0.250 0.096 0.113 0.155 0.213 0.058 0.086 0.094 0.096 0.106 0.066 0.075 0.072 0.083
p=3 0.058 0.152 0.172 0.221 0.328 0.129 0.150 0.206 0.328 0.059 0.138 0.161 0.209 0.286 0.109 0.137 0.178 0.264 0.057 0.099 0.114 0.124 0.145 0.074 0.083 0.092 0.112
25
Group Mean Test 0.074 0.032 0.076 0.025 0.079 0.017 0.087 0.012 Fisher Test 0.066 0.067 0.072 0.089 0.050 0.028 0.024 0.016 0.012 0.046
50
Group Mean Test 0.081 0.058 0.090 0.052 0.103 0.049 0.132 0.048 Fisher Test 0.070 0.077 0.095 0.128 0.053 0.050 0.050 0.047 0.047 0.052
100
Group Mean Test 0.080 0.065 0.083 0.063 0.091 0.065 0.104 0.066 Fisher Test 0.062 0.071 0.083 0.098 0.052 0.056 0.059 0.062
Note: 2 1. The data generating process is yit = ri0 + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2 i )i ). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
290
100. For the LMC test, the size distortion is still considerably large when the true order of autoregression (p = 1) is used when T = 25. The size distortions become smaller and moderate when T increases to 50 and 100. Interestingly, overestimating in this case increases the size distortions. We can also observe that the Fisher tests in general have better size performances than the group mean tests. Table 6 reports the power performances of the panel stationarity tests in the presence of serial correlation. The rst thing we can notice is that the powers are lower than those in the white noise case for some combinations of N and T. The powers are around 60% even when N = 100 and T = 25 for the KPSS tests with l2 and the LMC tests with p = 1, which have relatively moderate size distortions. The powers are close to 1 when N is larger than 50 and T = 50 for these two tests (the group mean and Fisher tests). When T = 100, however, all the powers are still 1 or very close to 1. In such a case, smaller size distortion would be the primary criterion to decide which test to be used in practice. The powers of the KPSS tests with l2 and the LMC test with p = 1 are almost the same for most cases though the results for N = 1 actually indicate that the later has an advantage in the univariate case, which agrees with the ndings in LMC. There are almost no differences in the power performances of the group mean and the Fisher tests. The size distortions of the panel stationarity tests for the trend stationary models with serial correlation are presented in Table 7 with size adjusted powers presented in Table 8. For the size distortions, we have the same observations as those for the level stationary models. Quite interestingly, the KPSS tests with l2 has slightly edge over the LMC tests with p = 1 when T = 50 while the situation is reversed when T = 100. But we observe severe negative size distortions for the KPSS tests with l2 when T = 25. Except for this case, the size distortions for these two tests are smaller than the corresponding ones in the level stationary models. The Fisher tests have relatively better size performances than the group mean tests, especially when the individual LMC tests are used. As for the adjusted powers, we only need to report the lower powers compared to the level stationary models since things are relatively the same as those for the level stationary models. For the KPSS tests with l2 and the LMC tests with p = 1, the powers are about 70% even when N = 100 for T = 50, compared with powers of 1 in the same situation for the level stationary models. The powers are close to 1 when T = 100 and there are more than 25 cross-section units in the panel. In summary, through Monte Carlo simulations, we found the tests we proposed have quite satisfactory small sample performances in most cases we considered. In the absence of serial correlation, the tests based on the LM tests
291
Table 6.
Size Adjusted Powers of Panel Stationarity Tests:Level Stationary Model, Serial Correlation
N 1 15 25 50 100 15 25 50 100 1 15 25 50 100 15 25 50 100 1 15 25 50 100 15 25 50 100 p(l) = 0 0.153 0.249 0.338 0.489 0.754 0.247 0.337 0.484 0.750 0.316 0.886 0.862 0.998 1.000 0.885 0.962 1.000 1.000 0.530 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 KPSS l1 0.109 l2 0.095 l3 0.079 LMC p=1 0.100 0.207 0.250 0.394 0.588 0.212 0.248 0.394 0.584 0.235 0.775 0.912 0.993 1.000 0.774 0.917 1.000 1.000 0.524 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 p=2 0.095 0.174 0.207 0.329 0.532 0.171 0.207 0.331 0.534 0.198 0.712 0.854 0.981 1.000 0.723 0.858 1.000 1.000 0.490 0.999 1.000 1.000 1.000 0.999 1.000 1.000 1.000 p=3 0.089 0.157 0.204 0.302 0.479 0.161 0.200 0.304 0.490 0.183 0.643 0.813 0.967 0.999 0.651 0.812 1.000 1.000 0.471 0.998 1.000 1.000 1.000 0.999 1.000 1.000 1.000
25
50
100
Note: 1. The data generating process is yit = ri0 + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2 i ), it = ii,t 1 + uit, 2 and uit ~ i.i.d.N(0, (1 2 i )i ). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
292
Table 7.
Sizes of Panel Stationarity Tests:Trend Stationary Model, Serial Correlation

25
50
100
Note: 2 1. The data generating process is yit = rit + it + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2 i )i ). 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
293
Table 8.
Size Adjusted Powers of Panel Stationarity Tests:Trend Stationary Model, Serial Correlation
25
50
100
Note: 1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2 i ), it = 2 ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2 i )i ) 2. See Note 2 in Table 1. 3. See Note 3 in Table 1.
294
have sizes close to the nominal size and powers much higher than the univariate LM tests. Using the KPSS and LMC tests in this case would not result in much size distortions, but would result in power losses for some combinations of N and T, while the powers are already 1 or close to 1 for other combinations of N and T. In the presence of serial correlation, we found that the tests based on the KPSS tests with l2 and the LMC tests with p = 1 have relatively good size performances though there are still moderate to severe size distortions when the time span is short (T = 25), especially for the trend stationary models. And the powers of all tests are lower than their counterparts in the white noise case. Overall, the Fisher tests have better size performances than the group mean tests while their power performances are almost the same.
IV. CONCLUSION
In this chapter, we developed several tests for stationarity in the heterogeneous panel. We analyzed both level stationary and trend stationary models. By allowing maximum degree of heterogeneity in the panel, we considered two different ways to pool information regarding the null hypothesis from each cross-section units by using the group mean test and the Fisher test. The group mean test pools the information of the univariate test statistics while the Fisher test summarizes the p-values of the individual tests. For the univariate stationary tests, we consider the KPSS and LMC tests in the case of serial correlation. The group mean tests based on the KPSS, and LMC tests are asymptotically normal while the Fisher test statistics follow 2 distributions. The small sample performances of the tests were investigated via Monte Carlo simulation experiments. The results of simulations showed that the tests we proposed have quite satisfactory size and power performances. In general, the Fisher type tests have better size performances than the group mean type tests while they have similar power performances. The tests based on the KPSS tests with l2 and the LMC tests with p = 1 perform very similarly in terms of size and power in most cases when there is serial correlation, except for the short time span (T = 25). The size performances of these two tests are quite good in the presence of serial correlation when T = 50 and 100. However, there are still moderate to severe size distortions when T = 25 in the presence of serial correlation. In such a case, bootstrapping method might be an effective way to obtain better size performances. This would be an interesting topic for future research. According to our simulation results, we would recommend to use either the group mean tests or the Fisher tests which are based on both the KPSS tests with l2 and the LMC tests with p = 1 to test for stationarity in the heterogeneous panel data models in empirical work.
295
ACKNOWLEDGMENTS
We would like to thank Badi Baltagi and three anonymous referees for their helpful comments. Of course, all remaining errors are ours.
NOTES
1. See, for example, Schwert (1987). 2. See KPSS for all relevant references and derivations of the tests. 3. Please see LMC for the details of this argument. Of course, this supremacy depends on the correct specication of the LMC model, as pointed out by one anonymous referee. 4. This means that the intercepts in different cross-section units can be different, one aspect of the heterogeneous panel. 5. The moment restriction in applying the Lindberg-Levy CLT should not be a problem here because all tests are variants of the LM tests, which are bounded. 6. The small sample distributions of these tests can be derived by simulating series of given T under the null and apply the given test to the simulated series over a prespecied number of iterations. 7. In a recent paper, Choi (2000) proposes to standardize the Fisher test statistics as well. But this is unnecessary unless N is large enough. 8. Please see Maddala & Wu (1999) for a detailed comparison between the group mean and the Fisher tests. 9. By construction of the tests, the qis can be different across the units.
REFERENCES
Anderson, T. W., & Darling, D. A. (1952). Asymptotic Theory of Certain Goodness of Fit Criteria Based on Stochastic Processes. Annals of Mathematical Statistics 23: 193212. Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey. Advances in Econometrics, 15, 751. Choi, I, (1999). Unit Root Tests for Panel Data. Manuscript, Kookmin University. Fisher, R. A, (1932). Statistical Methods for Research Workers (4th ed.). Edinburgh: Oliver and Boyd. Hadri, K, (1998). Testing for Stationarity in Heterogeneous Panel Data. Working paper, School of Business and Economics, Exeter University. Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels. Discussion paper, University of Cambridge. Kao, C, (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data. Journal of Econometrics, 90, 144. Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics, 54, 91115. Leybourne, S. J., & McCabe, B. P. M. (1994). A Consistent Test for a Unit Root. Journal of Business and Economic Statistics, 12, 157166.
296
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test. Oxford Bulletin of Economics and Statistics, forthcoming. McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel Data. Econometric Reviews, 17, 5784. McCoskey, S., & Kao, C. (1997). A Monte Carlo Comparison of Tests for Cointegration in Panel Data. Working paper, Center for Policy Research and Department of Economics, Syracuse University. Newey, W. K., & West,K. D. (1987). A Simple Positive Semi-Denite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55, 703708. Pedroni, P, (1995). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests With an Application to the PPP Hypothesis. Working paper, Department of Economics, Indiana University. Pedroni, P, (1997). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time Series Tests With an Application to the PPP Hypothesis, New Results. Working paper, Department of Economics, Indiana University. Phillips, P. C. B., & Perron, P. (1988). Testing For a Unit Root in Time Series Regression. Biometrika, 75, 335346. Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo Comparison. Working paper, Department of Economics, State University of New York at Buffalo.
INSTRUMENTAL VARIABLE ESTIMATION OF SEMIPARAMETRIC DYNAMIC PANEL DATA MODELS: MONTE CARLO RESULTS ON SEVERAL NEW AND EXISTING ESTIMATORS
M. Douglas Berg, Qi Li and Aman Ullah
ABSTRACT
We consider the problem of instrumental variable estimation of semiparametric dynamic panel data models. We propose several new semiparametric instrumental variable estimators for estimating a dynamic panel data model. Monte Carlo experiments show that the new estimators perform much better than the estimators suggested by Li & Stengos (1996) and Li & Ullah (1998).
I. INTRODUCTION
Economic research has been enriched by the availability of panel data that measure individual cross-sectional behavior over time. For reviews on the literature of estimation and inference in parametric panel data models, see Baltagi (1995), Chamberlain (1984), Hsiao (1986) and Matyas & Sevestre
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 297315. 2000 by Elsevier Science Inc. ISBN: 0-7623-0688-2
297
298
M. DOUGLAS BERG, QI LI & AMAN ULLAH
(1996)). Recently, semiparametric modeling and estimation has attracted much attention among statisticians and econometricians. One popular semiparametric model is the partially linear model. In this chapter we consider the problem of estimating a semiparametric dynamic panel data model which includes the following model as a special case: yit = yit 1 + (zit) + uit, (1.1) where the functional form of ( ) is unspecied. Therefore (1.1) is a semiparametric dynamic panel data model. When ( ) has a known form, say (zit) = z it, we obtain a parametric dynamic panel data model: yit = yit 1 + z it + uit. (1.2) When the error uit has a one-way error component structure, i.e. uit = i + it, then yit 1 and uit are correlated and instrumental variable methods are needed to obtain consistent estimation for . There is a rich literature on how to obtain consistent and efcient estimation results for parametric dynamic models, see Ahn & Schmidt (1995), Anderson & Hsiao (1981), Arellano & Bover (1995), Baltagi & Grifn (1998), Pesaran & Smith (1995) and Kiviet (1995), among others. The consistent and efcient estimation results for the parametric dynamic panel data model (1.2) depend crucially on the correct specication of the model. If (zit) z it, parametric estimation methods based on a misspecied model (1.2) will in general lead to inconsistent estimation of . Semiparametric partially linear models have the advantage of not specifying the functional form of ( ). Hence a consistent semiparametric estimator of based on (1.1) is robust to functional form specication of ( ). There is a rich literature on estimating a partially linear model with independent data using various non-parametric techniques, e.g. Engle et al. (1986), Robinson (1988), Stock (1989), Donald & Newey (1994), Li (1996). Also, see Ullah & Roy (1998), Ullah & Mundra (1998), and Khanna et al. (1999) for the estimation and applications of static partially linear panel data models. However, little attention has been paid to dynamic partially linear panel data models. Although Li & Stengos (1996) and Li & Ullah (1998) discussed how to estimate model (1.1) by semiparametric instrumental variable methods, no simulations are reported in those works and hence the nite sample performance of the estimators proposed in Li & Stengos (1996) and Li & Ullah (1998) are unknown.1 Li & Stengos (1996) proposed a semiparametric OLS type IV (OLSIV) estimator for estimating . When the error follows an one-way error components structure. The OLS type estimator is not efcient because it
Semiparametric Dynamic Panel Data Model
299
ignores this error structure. Li & Ullah (1996) therefore proposed a semiparametric GLS-type IV (GLSIV) estimator. However, the GLSIV estimator in Li & Ullah (1998) did not make full use of the one-way error component structure. In fact when the model is just identied, their semiparametric IVGLS estimator reduces to a semiparametric IVOLS estimator and hence it is inefcient in the sense that the one-way error component structure is not utilized in constructing the estimator. In this chapter we propose a new semiparametric IVGLS estimator and a new semiparametric IVWithin estimator that are more efcient than the ones considered in Li & Stengos (1996), and Li & Ullah (1998). We then use Monte Carlo experiments to examine the nite sample performances of the new semiparametric estimators and some existing estimators (e.g. Li & Ullah (1998) and Li & Stengos (1998)). Our simulation results show that the new estimators perform substantially better than the existing ones. The chapter is organized as follows. Section 2 rst reviews the semiparametric estimators of Li & Stengos (1996), and Li & Ullah (1998). We then propose some new estimators. Section 3 reports Monte Carlo simulations to compare the relative performances of various estimators. Finally section 4 concludes the paper.
II. THE MODEL

We consider a slightly more general semiparametric dynamic panel data model than (1.1) considered in the introduction section. yit = x it + (zit) + uit, (i = 1, . . . , N; t = 1, . . . , T), (2.1) where xit is of dimension p 1, is a p 1 unknown parameter, zit is of dimension d, () is an unknown smooth function. We assume that the rst element of xit is yit 1 so that model (2.1) is a semiparametric dynamic panel data model. We are mainly interested in obtaining accurate estimation for . We consider the case that the error uit follows an one-way error components specication, uit = i + it,
2 2
(2.2)
where i is i.i.d. (0, ), it is i.i.d. (0, ), i and jt are uncorrelated for all i and jt. In this chapter we propose a new semiparametric IVGLS estimator that fully uses the one-way error component structure. We also propose a semiparametric IV-within-transformation estimator which has the advantage of computationally simplicity. Because it does not require one to estimate the
300
variance components. We then employ Monte Carlo simulations to investigate the nite sample performance of our proposed semiparametric IV estimators and compare them with some existing estimators. GLS type estimators require knowledge of error variance structure. In vector notation, the one-way error component model of (2.2) has the following form, u = (IN eT) + , (2.3) where eT is a column of ones of dimension T, = (1, 2, . . . , N) is of dimension N 1, u and are both of dimension NT 1 with u = (u11, . . . , u1T, . . . , uN1, . . . , uNT) and is similarly dened.
2 = E(uu) = 2 IN JT + INT,
(2.4) (2.5)
+ 2 = IN [ J ET] IN ,
2 1 T
where JT = eT e T is a T T matrix with all elements equal to one, JT = JT /T, 2 2 T and 2 = T + . By noting the facts that J E = 0 , J + E ET = IT J 1 T T T T = IT, and T and ET are idempotent matrices, it is easy to check that the inverse of both J is given by2
2 1 1 = IN [(1/2 , 1)JT + (1/ )ET] IN
(2.6)
and T + (1/)ET] IN 1/2, 1/2 = IN [(1/1)J The above expression of procedure discussed below.
1
(2.7)
and
1/2
will be used in GLS estimation
A. Some Infeasible Estimators Equation (2.1) contains an unknown function ( ), following Robinson (1988), we rst eliminate ( ). Taking conditional expectation of (2.1) conditional on zit and then subtracting it from (2.1) leads to yit E(yit|zit) = (xit E(xit|zit)) + uit v it + uit, where vit = xit E(xit|zit). In vector-matrix notation we have y E(y|z) = v + u, (2.9)
def
(2.8)
where y, E(y|z) and u are all NT 1 vectors with typical elements given by yit, E(yit|zit) and uit, respectively, and v is of dimension NT p with typical row given by vit = xit E(xit|zit).
301
Equation (2.9) no longer contains the unknown function ( ). Note that vit and uit are correlated because vit contains yit 1 and uit contains the random individual effects i. Suppose there exists a q 1(q p) instrumental variable it that is correlated with xit and uncorrelated with uit, then we can use def wit = it E(it|zit) as IV for vit. For example, consider a simple case where both xit and zit are scalars with xit = yit 1 and zit is strictly exogenous, then one can choose it = zit 1 as instrument for yit 1. In vector-matrix notation, an (infeasible) IVOLS estimator of based on (2.9) is (see White (1984, 1987) for a discussion on IV estimation) IVO = (vwwv) 1vww(y E(y|z)) = + (vwwv) 1vwwu. (2.10) When the model is just identied, i.e. p = q, and if we assume that wv is IVO becomes invertible, then IVO = (wv) 1(vw) 1vww(y E(y|z)) = (wv) 1w(y E(y|z)). (2.11) The above IVOLS estimator is not efcient because it ignores the error component variance structure. Li and Ullah (1998) suggested estimating by = (vw(w 1w) 1wv) 1 vw(w 1w) 1w(y E(y|z)). (2.12) However, when q = p and if we assume that the square matrices vw and w 1w are both invertible, then we have from (2.12) = (wv) 1(w 1w)(vw) 1vw(w 1w) 1w(y E(y|z)) IVO, = (wv) 1w(y E(y|z)) = reduces to the IVOLS estimator of (2.11) when the model is just that is, also ignores the variance component identied. Therefore, the IV estimator structure when the model is just identied. A new IVGLS estimator that fully uses the one-way error component structure is given by IVG = (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1(y E(y|z)) = + (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1u, (2.13) IVG of (2.13) is an optimal IV estimator as discussed in White (1984, 1987). When the model is just identied, i.e. p = q, and if we assume that both IVG of (2.13) becomes w 1v and w 1w are invertible, then 1 1 1 1 1 IVG = (w v) (w w)(v w) v 1w(w 1w) 1w 1(y E(y|z)) (2.14) = (w 1v) 1w 1(y E(y|z)), which is different from IVO of (2.11). Note that one can transform the model by premultiplying y, v and w by 1/2. Denote y* = 1/2y, v* = 1/2v and w* = 1/2w, then the IVGLS estimator of (2.13) is simply
302
IVG = (w*v*) 1(w*w*)(v*w*) 1v*w*(w*w*) 1w*(y* E(y*|z)), (2.15) which is easier to compute since it does not require one to invert a NT NT matrix. p Let n = NT, then under the conditions of (i) wu/n 0 (w is a legitimate IV), (ii) v 1w/n A, and (iii) w 1w/n B, a positive denite matrix, one can show that
p p
IVG ) N(0, AB 1A). n(

p
(2.16)
The proof of (2.16) is similar to the proof of lemma 3 of Li and Ullah and is therefore omitted here. Next we propose a simple IV estimator based on the within transformation. Within type estimator has the advantage of computationally simple, it only requires the least squares regression of the within transformed variables. Dene it = yit y i , it = E(yit|zit) and dene the within transformed variables: y i , v i and w it = it it = vit v i and w it = wit w i , where y i = T i s = 1 yis /T, i , v are similarly dened. The IVWithin estimator is given by ). IVW = ( vw w v ) 1v w w ( y (2.17) When the model is just identied, we have ). IVW = (w v ) 1( vw) 1v w w ( y ). ( y = (w v ) 1w (2.18) The within type estimator has the advantage of being computationally simple because it does not require one to estimate the error variance . B. Feasible Estimators IVG and IVW discussed above are not feasible, because the IVO, The estimators conditional mean functions E(y|z), E(x|z) and E(w|z) as well as , are unknown. The feasible estimators can be obtained by replacing the unknown conditional mean functions by their non-parametric estimators, such as the non-parametric 2 kernel estimators, and replacing 2 1 and by consistent estimators of them. Following Robinson (1988), we use a kernel estimation method to estimate the unknown conditional expectations. Specically we denote the kernel fit, y it, x it and w it, respectively, estimators of f(zit), E(yit|zit), E(xit|zit), E(wit|zit) by where
1 fit = NThd
j s
Kit, js,
(2.19)
303
y it =
1 NThd 1 NThd
x it = and w it =

j s j s
yjsKit, js / fit,
(2.20)
xjsKit, js / fit,
(2.21)
1 NThd
j s
wjsKit, js / fit,
(2.22)
where Kit, js = K((zit zjs)/h), K( ) is the kernel function and h is the smoothing parameter. Note that when xit = yit 1, we have it 1|zit) = (NThd) 1 x it = E(y
j s
yjs 1 Kit, js / fit,
(2.23)
it 1|zit 1) = (NThd) 1 yjs 1 Kit 1, js 1 / which is different from y it 1 = E(y j s fit 1. it and we estimate wit it E(it|zit) We estimate vit xit E(xit|zit) by xit x it, where by it it = (NThd) 1
j s
js Kit, js / fit,
(2.24)
is the kernel estimator of E(it|zit). In vector-matrix notation, the feasible IVOLS estimator of is obtained from (2.10) by replacing E(yit|zit), vit = xit E(xit|zit) and wit = it E(it|zit) by it, respectively, it and it their kernel estimators y it, xit x IVO = [(x x )( )(x x )( )(y y )( )] 1(x x )( ). (2.25) Similarly, we have 1( 1( 1(x x IVG = {(x x ) [( ) )] 1( ) ) )} 1 (x x )
1
1( 1(y y )[( ) )] 1( ) ( ),
(2.26) (2.27)
1 is a consistent estimator of 1 given by where 1, 1 = IN
304
with 1 = (1/ 2 2 1)JT, )ET + (1/ 2 (IN ET) u/[N(T 1)] =u + , = T (IN JT) u/N, =u
2 1 2 2 2
(2.28) (2.29) (2.30) (2.31) (2.32)
and u is of dimension n 1 with a typical element given by IVO. it (xit x it) u it = yit y
For the feasible semiparametric IV within estimator, we will use the same tilde notation to denote the feasible quantity to avoid introducing too many new notations. For example we use v it to denote kernel estimator of vit v i . Recall that vit = xit E(xit|zit). Hence we have
1 v it = (xit x it) T
T s=1 T s=1 T
(xis x is).
(2.33)
Similarly, recall that wit = it E(it|zit) and it = E(yit|zit), we have it) 1 w it = (it T and it = it 1 T
s=1
is), (is
(2.34)
is.
(2.35)
it = yit y i . With the notations given in (2.33) to (2.35), y it remains the same as y we obtain the feasible semiparametric IVWithin estimator, ). IVW = ( vw w v ) 1v w w ( y (2.36) In the next section we compare the nite sample performances of the new estimators proposed in this paper with those suggested by Li & Stengos (1996) and Li & Ullah (1998) via Monte Carlo simulations.
III. MONTE CARLO RESULTS

We use the following data generating process (DGP): yit = yit 1 + zit + z2 it + i + it = yit 1 + (zit) + i + it, (2.37)
305
where zit is independent and uniformly distributed in the interval of [ 3,3], it is i.i.d. N(0,1). We choose = 0.5, = 0, 0.5, 1. We x total 2 2 2 2 variance of 2 + = 10 and vary = /( + ) to be 0.2, 0.5, 0.8. We choose it = zit 1 as IV for yit 1. For comparison we also compute the following non-IV semiparametric estimators: (I) A semiparametric OLS estimator given by OLS = [(x x )(x x )] 1(x x )(y y ). (II) A semiparametric GLS estimator dened by 1(x x 1(y y GLS = [(x x ) )] 1(x x ) ). (III) A semiparametric within estimator W = [ vv ] 1v y , (2.39) (2.38)
(2.40)
it (1/T) where v it = xit x y it = yit (1/T)
T s=1
T s=1
(xis x is) is the same as dened in (2.33) and
yis.
(I)(III) do not use instrumental variables and hence these estimators are expected to have large bias because they ignore the fact that yit and uit are correlated. However, they are also expected to have smaller variances compared with the IV estimators. Therefore, for small and moderate samples, their mean square error (MSE) are not necessarily larger than the semiparametric IV estimators. Of course when the sample size is sufciently large, we expect the semiparametric IV estimators to have smaller MSE because after all, they are consistent estimators, while the non-IV estimators are inconsistent. The bias of non-IV estimators will not die out as the sample size increases. We report estimated bias, standard deviation (Std) and root mean square errors (Rmse) for all the estimators. These are computed via )=M Bias(
1
) = {M Rmse(
is the estimated value of at the jth replication. We use M = 2000 in all the simulations. We choose T = 6 and N = 50, 100, 200, 500.

M j=1 M j=1
j ), (
)= M Std(

M 1 j=1
j Mean( ))2 (
1/2
and
j )2}1/2, where M is the number of replication and j (
306
The simulation results are given in Tables 1 and 2. The smallest Rmse for each case (for a given N and ) is shown as boldface number(s). The simulations results are qualitatively similar for = 0, = 0.5 and = 1. Therefore, we only report the cases of = 0 and = 1 to save space. Table 1 reports the result for = 0. From Table 1 we see that the non-IV GLS and W have large bias because these estimators ignore OLS, estimators: the fact that yit 1 is correlated with uit. However, these non-IV estimators all have smaller standard deviations (or variances) than the semiparametric IV estimators. When N is small (N 100) and with small to moderate values of ( 0.5), GLS has the smallest Rmse among all the estimators. GLS is no longer the best because of the large bias For N 100 with = 0.8, IVW have the smallest IVG and due to the strong individual effects. In this case Rmse. IVO has the smallest Rmse. For N = 200 and N = 500 and for small = 0.2, IVG and IVW become the best in terms of But larger values of ( = 0.5, 0.8), the Rmse criterion. GLS has the smallest Rmse. However, for = 0.8, the For N 100 and 0.5 GLS is very large and hence its Rmse is much larger than the IV bias in IVW have the smallest Rmse for = 0.8. IVG and estimators. GLS and W remain the same order as OLS, As N increases, the bias in expected. The variances of the IV estimators decrease as N increases, and as a result, the IV estimators dominate the non-IV estimators when N 200. For = 0.2, IVOLS estimator has the smallest Rmse. For = 0.5 and = 0.8, IV GLS and IVWithin estimators have much smaller Rmse compared with the IVOLS estimator. The IVOLS estimator ignores the one-way error component structure. Hence when the individual effects are large, IVOLSs performance is expected to be worse than that of the IVGLS estimator. We observe, as expected, the bias of non-IV estimators increase as increases. We also observe that the Rmse for IVOLS estimator remain the same for different values of , while for IVGLS and IVWithin estimators, the Rmse decrease as increases. Next, we observe that the results of Table 2 is very similar to that of Table 1. That is, the result is not sensitive to the different functional form of (zit). This is as expected because all the estimators are semiparametric and hence they are robust to functional form specications of ( ). The DGP given in (2.37) is a just identied model. We have also conducted some simulations for over identied model. In particular, we consider the following model
307
Table 1.
= 0.2 Bias Std OLS GLS W IVO IVG IVW 0.193 0.103 0.241 0.019 0.006 0.005 0.045 0.056 0.058 0.290 0.215 0.225
The case of = 0.
N = 50 = 0.5 Bias Std 0.352 0.099 0.213 0.042 0.008 0.009 0.030 0.059 0.057 0.329 0.171 0.174 = 0.8 Std 0.016 0.040 0.061 2.39 0.111 0.111
Rmse 0.198 0.117 0.248 0.291 0.215 0.225
Rmse 0.353 0.115 0.220 0.331 0.171 0.174
Bias
Rmse 0.442 0.313 0.149 2.39 0.112 0.112
0.442 0.310 0.136 0.128 0.012 0.013
Rmse 0.199 0.111 0.246 0.139 0.146 0.151
N = 100 = 0.5 Bias Std 0.354 0.100 0.220 0.023 0.007 0.008 0.021 0.040 0.040 0.158 0.117 0.118
Rmse 0.355 0.108 0.223 0.159 0.117 0.118
Bias
= 0.8 Std 0.011 0.027 0.042 0.528 0.077 0.076
Rmse 0.443 0.313 0.160 0.530 0.077 0.077
0.443 0.312 0.154 0.049 0.009 0.010
Rmse 0.200 0.108 0.246 0.097 0.103 0.105
N = 200 = 0.5 Bias Std 0.356 0.100 0.224 0.010 0.005 0.006 0.015 0.029 0.029 0.101 0.083 0.084
Rmse 0.356 0.104 0.226 0.101 0.083 0.084
Bias
= 0.8 Std 0.008 0.020 0.029 0.106 0.054 0.054
Rmse 0.444 0.312 0.168 0.107 0.055 0.055
0.444 0.312 0.166 0.016 0.007 0.007
Rmse 0.200 0.106 0.245 0.058 0.065 0.067
N = 500 = 0.5 Bias Std 0.357 0.100 0.227 0.003 0.006 0.006 0.009 0.018 0.018 0.057 0.052 0.053
Rmse 0.357 0.101 0.228 0.057 0.053 0.053
Bias
= 0.8 Std 0.005 0.013 0.018 0.057 0.034 0.034
Rmse 0.444 0.311 0.177 0.058 0.034 0.034
0.444 0.311 0.176 0.004 0.006 0.006
308
Table 2.
The case of = 1.
N = 50 = 0.5 Bias Std 0.348 0.092 0.208 0.045 0.008 0.009 0.031 0.058 0.057 0.341 0.171 0.174 = 0.8 Std 0.016 0.041 0.059 3.53 0.112 0.111
Rmse 0.196 0.117 0.244 0.302 0.215 0.225
Rmse 0.350 0.109 0.216 0.344 0.172 0.174
Bias
Rmse 0.439 0.301 0.144 3.53 0.112 0.112
0.438 0.298 0.132 0.168 0.012 0.013
= 0.2} Bias Std OLS GLS W IVO IVG IVW 0.194 0.104 0.238 0.008 0.006 0.006 0.031 0.039 0.041 0.139 0.146 0.150
Rmse 0.196 0.111 0.242 0.139 0.146 0.150
N = 100 = 0.5 Bias Std 0.351 0.094 0.214 0.023 0.007 0.008 0.021 0.040 0.040 0.156 0.117 0.118
Rmse 0.352 0.102 0.218 0.158 0.118 0.119
Bias
= 0.8 Std 0.012 0.028 0.041 0.243 0.077 0.077
Rmse 0.440 0.301 0.153 0.246 0.077 0.077
0.440 0.299 0.148 0.042 0.009 0.010
Rmse 0.197 0.108 0.241 0.097 0.103 0.105
N = 200 = 0.5 Bias Std 0.353 0.093 0.218 0.010 0.005 0.006 0.015 0.029 0.028 0.101 0.083 0.084
Rmse 0.353 0.097 0.220 0.101 0.083 0.084
Bias
= 0.8 Std 0.008 0.021 0.028 0.106 0.054 0.054
Rmse 0.441 0.299 0.161 0.107 0.055 0.055
0.441 0.298 0.158 0.016 0.007 0.007
Rmse 0.197 0.106 0.241 0.058 0.065 0.067
N = 500 = 0.5 Bias Std 0.353 0.092 0.221 0.003 0.006 0.006 0.009 0.018 0.018 0.057 0.052 0.053
Rmse 0.353 0.094 0.222 0.057 0.053 0.053
Bias
= 0.8 Std 0.005 0.013 0.018 0.057 0.034 0.034
Rmse 0.441 0.298 0.168 0.058 0.035 0.035
0.441 0.297 0.167 0.004 0.006 0.006
309
yit = yi,t 1 + z1,it + 1z1,it + z2,it + 2z2,it + i + it = yi,t 1 + (z1,it,z2,it) + i + it. (2.41) The simulation results for the above over identied model lead to the same conclusion as the just identied model. Therefore, we do not report the results for the over identied case to save space. However, the results are available from the authors upon request.
IV. CONCLUDING REMARKS

In this chapter we consider the problem of estimating a semiparametric partially linear panel data model with errors that has a one-way error components structure. We propose two new semiparametric IV estimator for the coefcient of the parametric component, and we argue that the new semiparametric estimators are more efcient than the ones suggested by Li & Stengos (1996) and Li & Ullah (1998) because the new estimators make full use of the one-way error components structure. The Monte Carlo simulation results conrm our theoretical analysis. Throughout the chapter we assume the existence of random individual effects. In practice one may want to test the existence of random individual effects. For this purpose one can use the test statistic suggested by Li & Hsiao (1998) for testing the null of no random individual effects in a partially linear dynamic panel data model. Also in this chapter we only consider the case that i is a random effect. We now briey discuss the case of xed effects semiparametric partially linear models. The model is the same as given in (2.1) and (2.2) except that now we assume the individual effect i is a xed effect rather than a random effect. The semiparametric IVOLS and IVGLS estimators that either ignore the xed effects or treat the xed effects as random effects will not lead to consistent estimation results by the same reason as in the parametric regression model case. However, the semiparametric within estimator, which wipes out the individual effects whether it is xed or random, remains a consistent estimator in the case of a xed effect model. Our Monte Carlo results of Section 3 show that the within semiparametric IVW performs quite well relative to other estimators. Therefore, we estimator recommend its use in practice.
ACKNOWLEDGMENTS
We would like to thank a referee and Badi Baltagi for very useful comments that greatly improve the paper. Q. Lis research is supported by Natural
310
Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, Ontario Premiers Research Excellence Awards, and Bush program in economics on public policy. A. Ullah thanks the Academic Senate of UCR for the research support.
NOTES
1. Li & Ullah (1998) reported some Monte Carlo results on a static semiparametric panel data model. They also proposed two semiparametric instrumental variable estimators for a semiparametric dynamic panel data model, but they did not conduct any Monte Carlo simulations on the dynamic model. 2. Using the simple spectral decomposition method to derive the inverse of was proposed by Wansbeek & Kapteyn (1982, 1983).
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efcient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 527. Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models With Error Components. Journal of American Statistical Association, 76, 598606. Arellano, M., & Bover, O. (1995). Another Look at The Instrumental Variable Estimation of Error Components Models. Journal of Econometrics, 68, 2851. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: Wiley. Baltagi, B. H., & Grifn, J. M. (1997). Pooled Estimators vs. Their Heterogeneous Counterparts in The Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303327. Chamberlain, G. (1984). Panel Data. In: Z. Griliches & M. Intriligator (Eds), Handbook of Econometrics (pp. 12471318 ), Vol. II. Amsterdam: North Holland. Donald, S. G., & Newey, W. K. (1994). Series Estimation of Semilinear Regression. Journal of Multivariate Analysis, 50, 3040. Engle, R. F., Granger, C. W. J., Rice, J., & Weiss, A. (1986). Semiparametric Estimates of The Relationship Between Weather and Electricity Sales. Journal of the American Statistical Association, 81, 310320. Hsiao, C. (1986). Analysis of Panel Data. Econometric Society monograph No. 11. New York: Cambridge: Cambridge University Press. Khanna, M., Mundra, K., & Ullah, A. (1999). Parametric and Semiparametric Estimation of The Effect of Firm Attributes on Efciency: The Electricity Generating Sector in India. Journal of International Trade and Economic Development, forthcoming. Kiviet, J. F. (1995). On Bias, Inconsistency and Efciency of Some Estimators in Dynamic Panel Data Models. Journal of Econometrics, 68, 5378. Li, Q. (1996). On The Root-n-consistent Semiparametric Estimation of Partially Linear Models. Economics Letters, 51, 277285. Li, Q., & Hsiao, C. (1998). Testing Serial Correlation in Semiparametric Panel Data Models. Journal of Econometrics, 87, 207237. Li, Q., & Stengos, T. (1996). Semiparametric Estimation of Partially Linear Panel Data Models. Journal of Econometrics, 71, 389397.
311
Li, Q., & Ullah, A. (1998). Estimating partially linear models with one-way error components. Econometric Reviews, 17, 145166. Matyas, L., & Sevestre, P. (1992). The Econometrics of Panel Data. Dordrecht: Kluwer, 2nd edition. Pesaran, M. H., & Smith, R. (1995). Estimation of Long-run Relationship From Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79114. Robinson, P. M. (1988). Root-N-consistent Semiparametric Regression. Econometrica, 56, 931954. Stock, J. H. (1989). Nonparametric Policy Analysis. Journal of the American Statistical Association, 84, 567575. Ullah, A., & Roy, N. (1998). Nonparametric and Semiparametric Econometrics of Panel Data. In: A. Ullah and D. E. A. Giles (Eds), Handbook on Applied Economic Statistics (pp. 579 604), Ch. 17. Marcel Dekker. Ullah, A., & Mundra, K. (1999). Semiparametric Panel Data Estimation: An Application to Immigrates Homelink Effect on U.S. Producer Trade Flows. Working paper 15, Department of Economics, University of California at Riverside. Wansbeek, T. J., & Kapteyn, A. (1982). A Simple Way to Obtain the Spectral Decomposition of Variance Components Models for Balanced Data. Communications in Statistics, A11, 21052112. Wansbeek, T. J., & Kapteyn, A. (1983). A Note on Spectral Decomposition and Maximum Likelihood Estimation of ANOVA Models With Balanced Data. Statistics and Probability Letters, 1, 213215. White, H. (1984). Asymptotic Theory for Econometricians. New York: Academic Press. White, H. (1986. Instrumental Variables Analogs of Generalized Least Squares Estimator. R. S. Mariano (Ed.), Advances in Statistical Analysis and Statistical Computing (pp.173277), Vol.1. New York: JAI Press.
APPENDIX
/** This is a gauss program using Monte Carlo simulation to examine the nite sample performanes of some semiparametric instrumental variable estimators in a semiparametric dynamic panel data model, written by M. Douglas Berg **/ output le = c:\gauss\doug\work1.out reset; format /rd 8,3; n = 100; T = 6; T00 = 30; T0 = T + T00 + 1; NT = N*T; nr = 500; @ number of replication @ lamt = 0.5; b1 = 1; b2 = 0; sig2 = 10; rho = 0.8; sigmu2 = rho*sig2; signu2 = (1-rho)*sig2; sigmu = sqrt(sigmu2); signu = sqrt(signu2); s1_5 = sqrt(t*sigmu2 + signu2); sv_5 = signu; @ true parameter values @ ycz = zeros(nt,1); y1cz = ycz; z1cz = ycz; fz = ycz;
312
kel = zeros(nt,1); lam1 = zeros(nr,1); lam3 = lam1; lam1n = lam1; lam3n = lam1; lam4n = lam1; lam6n = lam1; y0 = zeros(n,t0); rndseed 7893450; i1 = 1; do while i1 < = nr; z0 = 2*sqrt(3)*rndu(n,t0) sqrt(3); u0 = rndn(n,t0); mu = rndn(n,1); @ Monte Carlo simulation loop @
i2 = 2; do while i2 < = t0; y0[.,i2] = lamt*y0[.,i21] + b1*z0[.,i2] ^ + signu*u0[.,i2] + sigmu*mu; + b2*z0[.,i2]2 i2 = i2 + 1; endo; y = y0[.,T00 + 1:T00 + T]; y1 = y0[.,T00:T00 + T1]; z = z0[.,T00 + 1:T00 + T]; z1 = z0[.,T00:T00 + T1]; yv = reshape( y, nt, 1 ); y1v = reshape( y1, nt, 1 ); zv = reshape( z, nt, 1 ); z1v = reshape( z1, nt, 1 ); hz = stdc(zv)*(nt^ (1/5)); hz1 = stdc(z1v)*(nt^ (1/5)); zvh = zv/hz; z1vh = z1v/hz1;
@ Generate y @
i3 = 1; do while i3 < = nt; @ Nonparametric Estimation Loop @ zd = zvh[i3,.] zvh; z1d = z1vh[i3,.] z1vh; ^ )/sqrt(2*pi); kelz = prodc( (exp(0.5*zd2)) ^ )/sqrt(2*pi); kelz1 = prodc( (exp(0.5*z1d2)) ycz[i3,.] = yv*kelz/(nt*hz); y1cz[i3,.] = y1v*kelz/(nt*hz); z1cz[i3,.] = z1v*kelz/(nt*hz); fz[i3,.] = sumc( kelz )/(nt*hz); i3 = i3 + 1; endo; w1v = z1v z1cz./fz; xxv = y1v y1cz./fz; yyv = yv ycz./fz; @ Li-Ullah, Li-Stengos IV @
313
lam1[i1,.] = inv(w1v*xxv)*w1v*yyv; lam3[i1,.] = inv(xxv*xxv)*xxv*yyv; u01 = yyv xxv*lam1[i1,.]; u03 = yyv xxv*lam3[i1,.]; Jbt = ones(t,t)/t; Et = eye(t) Jbt; u11 = Et*( (reshape( u01,n,t)) ); u11 = reshape( u11,nt,1 ); sv2 = u11*u11/(n*(t1)); u22 = Jbt*( (reshape(u01,n,t)) ); u22 = reshape( u22,nt,1 ); smu2 = u22*u22/n; s12 = sv2 + t*smu2; sv_1 = sqrt( sv2 ); s1_1 = sqrt( s12 ); u11 = Et*( (reshape( u03,n,t)) ); u11 = reshape( u11,nt,1 ); sv2 = u11*u11/(n*(t1)); u22 = Jbt*( (reshape(u03,n,t)) ); u22 = reshape( u22,nt,1 ); smu2 = u22*u22/n; s12 = sv2 + t*smu2; sv_3 = sqrt( sv2 ); s1_3 = sqrt( s12 ); At_1 = Jbt/s1_1 + Et/sv_1; At_3 = Jbt/s1_3 + Et/sv_3; At_5 = Jbt/s1_5 + Et/sv_5; At_w = Et; yyn_1 = At_1*( (reshape(yyv,n,t)) ); yyn_3 = At_3*( (reshape(yyv,n,t)) ); yyn_6 = At_w*( (reshape(yyv,n,t)) ); xxn_1 = At_1*( (reshape(xxv,n,t)) ); xxn_3 = At_3*( (reshape(xxv,n,t)) ); xxn_6 = At_w*( (reshape(xxv,n,t)) ); w1n_w = At_w*( (reshape(w1v,n,t)) ); w1n = At_1*( (reshape(w1v,n,t)) ); yyv_1 = reshape(yyn_1,nt,1); yyv_3 = reshape(yyn_3,nt,1);
@ IV-OLS estimator @ @ Semi-OLS estimator @
314
yyv_6 = reshape(yyn_6,nt,1); xxv_1 = reshape(xxn_1,nt,1); xxv_3 = reshape(xxn_3,nt,1); w1v_w = reshape(w1n_w,nt,1); xxv_6 = reshape(xxn_6,nt,1); w1v = reshape(w1n,nt,1); lam1n[i1,.] = inv(w1v*xxv_1)*w1v*yyv_1; @ IV-GLS estimato @ lam3n[i1,.] = inv(xxv_3*xxv_3)*xxv_3*yyv_3; @ Semi-GLS estimator @ lam4n[i1,.] = inv(w1v_w*xxv_6)*w1v_w*yyv_6; @ IV-Within estimator @ lam6n[i1,.] = inv(xxv_6*xxv_6)*xxv_6*yyv_6; @ Semi-Within est. @ i1 = i1 + 1; endo; Bias1 = meanc( lam1 lamt ); @ Bias @ Bias3 = meanc( lam3 lamt ); ^ ) ); @ Root-MSE @ rmse1 = sqrt( meanc( (lam1-lamt)2 ^ ) ); rmse3 = sqrt( meanc( (lam3-lamt)2 std1 = stdc(lam1); @ Standard Dev. @ std3 = stdc(lam3); Bias1n = meanc( lam1n lamt ); Bias3n = meanc( lam3n lamt ); Bias4n = meanc( lam4n lamt ); Bias6n = meanc( lam6n lamt ); ^ ) ); rmse1n = sqrt( meanc( (lam1n-lamt)2 ^ ) ); rmse3n = sqrt( meanc( (lam3n-lamt)2 ^ ) ); rmse4n = sqrt( meanc( (lam4n-lamt)2 ^ ) ); rmse6n = sqrt( meanc( (lam6n-lamt)2 std1n = stdc(lam1n); std3n = stdc(lam3n); std4n = stdc(lam4n); std6n = stdc(lam6n); print "********************************************************"; print "IVO1, bias1, std1, rmse1 = " bias1 std1 rmse1; print "OLS, bias3, std3, rmse3 = " bias3 std3 rmse3; print "********************************************************"; print "IVG1, bias1n, std1n, rmse1n = " bias1n std1n rmse1n; print "GLS, bias3n, std3n, rmse3n = " bias3n std3n rmse3n; print "********************************************************"; print "With1, bias4n, std4n, rmse4n = " bias4n std4n rmse4n;
315
print "With, bias6n, std6n, rmse6n = " bias6n std6n rmse6n; print "********************************************************"; end;
SMALL SAMPLE PERFORMANCE OF DYNAMIC PANEL DATA ESTIMATORS IN ESTIMATING THE GROWTH-CONVERGENCE EQUATION: A MONTE CARLO STUDY
Nazrul Islam
ABSTRACT
This chapter conducts a Monte Carlo investigation into small sample properties of some of the dynamic panel data estimators that have been applied to estimate the growth-convergence equation using SummersHeston data set. The results show that the OLS estimation of this equation is likely to yield seriously upward biased estimates. However, indiscriminate use of panel estimators is also risky, because some of them display large bias and mean square error. Yet, there are panel estimators that have much smaller bias and mean square error. Through a judicious choice of panel estimators it is therefore possible to obtain better estimates of the parameters of the growth-convergence equation. The growth researchers may make use of this potential.
I. INTRODUCTION
One of the issues around which the recent growth literature has evolved is that of convergence. This refers to the idea that, because of diminishing returns to
317
318
NAZRUL ISLAM
capital, poorer economies should grow faster and catch up with the richer ones. Statistically, convergence is therefore interpreted as a negative correlation between the initial level of income and the subsequent growth rate. Accordingly, a popular method for testing the convergence hypothesis has been to run growth-initial level regressions or growth-convergence regressions, where subsequent growth rates are regressed on initial levels of income. For a long time, growth-convergence regressions were estimated using crosssection data. However, recently researchers have drawn attention to the fact that the growth-convergence equation actually represents a dynamic panel data model, and by ignoring the individual effects, cross-section estimation courts omitted variable bias (OVB). Thus, Islam (1993, 1995) argues for using panel procedures to overcome this bias and in particular implements Chamberlains (1982, 1983) Minimum Distance (MD) procedure to estimate the equation. Knight et al. (1993) make similar arguments and also use the Minimum Distance procedure to produce similar results. Islam, in addition, presents results from the Least Squares with Dummy Variables (LSDV) procedure. Since these initial works, panel estimation of the growth-convergence equation has spread considerably. For example, Lee, Pesaran & Smith (1997, 1998) consider maximum likelihood estimation of the growth-convergence equation using panel data. Caselli et al. (1996) emphasize the problem of endogeneity in this equation and use the Arellano-Bond GMM panel procedure to overcome the problem. Barro (1997) and Barro & Sala-i-Martin (1995) use pooled estimation on panel data sets. Lee et al. (1998) also present evidence on panel estimation of the growth convergence equation. The panel estimates presented in these papers generally differ from corresponding cross-section estimates. However, they also differ among themselves. Nerlove (1999) highlights this by using a variety of panel estimators to estimate the growth-convergence equation and compiling the results. Similar ndings were presented earlier in Islam (1993). This creates a problem of choosing among various panel estimators. Unfortunately, theoretical properties of dynamic panel data estimators are generally asymptotic and often equivalent. This creates the necessity of Monte Carlo studies to ascertain the small sample properties of these estimators. However, Monte Carlo studies are more useful when they are customized to the specication and the data set that are used in actual estimation. Although many researchers have recently presented Monte Carlo evidence on small sample properties of dynamic panel estimators, studies focusing on the growth-convergence equation and using the Summers-Heston (1988, 1991) data set are rare. This chapter tries to help overcome this lacking. The study focuses on those estimators that have been used so far to estimate the growth-convergence
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
319
equation. Accordingly, the estimators included are: least squares with dummy variables (LSDV); the two instrumental variable estimators of Anderson & Hsiao (1981, 1982), namely AH(l), based on level instruments, and AH(d), based on difference instruments; the minimum distance (MD) estimator, suggested by Chamberlain (1982, 1983); and the one-step (ABGMM1) and two-step (ABGMM2) generalized method of moments estimators proposed by Arellano & Bond (1991). In addition, the exercise includes simultaneous equations (SE) estimators such as the two stage least squares estimator (2SLS), the three stage least squares estimator (3SLS), and the generalized three stage least squares estimator (G3SLS). To complete the picture, the study also includes the (pooled) ordinary least squares (OLS) estimator, which ignores the individual effects. The two main parameters of the model are the dynamic adjustment parameter (attached to the lagged dependent variable) and , the parameter of the exogenous variable. The Monte Carlo results show that the OLS estimates of are, as expected, positively biased, and the magnitude of this bias averages to about seventeen percent of the true parameter value. For most of the panel estimators, the direction of bias is negative, with only the AH(d) estimator providing some exceptions. The bias is small for the AH(d), the LSDV, and the MD estimators, ranging between ve and six percent. The bias of the 2SLS, 3SLS, and 3SGLS estimates of ranges between eight to ten percent. The largest bias is observed for the ABGMM estimators, averaging to twenty two percent. The AH(l) estimator perform so poorly that we refrain from reporting its results. The results regarding root mean square error (RMSE) demonstrate a similar pattern. The average RMSE as percentage of the true value of proves to be seventeen percent for the OLS estimator. For the LSDV and the MD estimators, this percentage ranges between six and seven. For the AH(d), 2SLS, 3SLS, and 3GSLS estimators, it ranges between ten and twenty. This percentage is the highest for the ABGMM estimators, ranging between forty to forty-six percent. With regard to , the bias of the OLS estimates is again positive, but now averages much higher to forty-eight percent of the parameter value. The direction of bias of the panel estimates of is quite mixed. However, panel estimates of are on average quite close to the true parameter value. The magnitude of the algebraic average of the bias for the 2SLS, 3SLS, LSDV and the MD estimator remain under one percent. For AH(d) and G3SLS it ranges between one and two percent. For the ABGMM estimates, this percentage is higher but still within ve to seven percent.
320
NAZRUL ISLAM
The RMSE results for display a similar ranking of performance. However, the smallness of bias in estimation of is nullied greatly by large variance of the estimates. As a result, the RMSE values for are in general much higher than for . For a good number of panel estimators, which include AH(d), 2SLS, and 3SLS, the RMSE remain under thirty-ve percent of true value of . For the LSDV and the MD, this percentage is under twenty-ve. However, for 3GSLS, this percentage is fty-six. For the ABGMM it is around two hundred percent. For the OLS the ratio is fty-six percent. The results indicate that the OLS estimation of the growth-convergence equation is very likely to give considerably biased results. However, indiscriminate use of panel estimators is risky too. Yet, there are panel estimators that have much smaller bias and RMSE than the OLS. Hence, a judicious choice of panel estimator has the potential to yield much better estimates of the parameters of the growth convergence equation. Growth researchers may make use of this potential. In addition to the above, several general points emerge from this study. First, the performances of the two AH estimators contrast sharply. The source of this contrast lies in different degree of correlation of the instruments with the instrumented variables. This highlights the importance of research into estimation with weak instruments. Second, a comparison of the ABGMM1 results with that of ABGMM2 and of 2SLS results with that of either 3SLS or 3GSLS shows that simpler estimators not requiring estimated weighting matrices may perform better than sophisticated estimators that do require such matrices. Use of estimated weighting matrices creates avenue for unwarranted noise to enter into estimation. Third, increasing the number of instruments may not necessarily improve estimation results. This is revealed by the poor performance of the ABGMM estimators compared to that of AH(d). Fourth, theoretically inconsistent estimators can display good small sample performance. The performance of the LSDV estimator, which is inconsistent in the direction of N, illustrates this. Finally, the results of this chapter are in general agreement with other recent Monte Carlo studies, which have also reported large bias of the ABGMM estimators and better performance of the LSDV estimator. The discussion of the chapter is organized as follows. Section 2 reviews previous Monte Carlo studies of dynamic panel estimators and species the objectives of the current study. Section 3 presents the model and discusses the data generation processes. Section 4 presents the results. Section 5 contains some concluding remarks.
321
II. PREVIOUS MONTE CARLO STUDIES

Much of the recent empirical research on growth has revolved around estimation of the growth-convergence equation. A close inspection of this equation shows that it is actually a dynamic panel data model.1A cross-section estimation of the equation therefore suffers from omitted variable bias. This has led to panel estimation. Different panel estimators have however produced different results. Theoretical properties of many of these estimators are asymptotic and equivalent. Hence, Monte Carlo evidence is necessary to gauge which of these estimates are more acceptable. The issue of small sample properties of dynamic panel estimators is not new. Earlier, the gas demand study by Balestra & Nerlove (1966) also raised this issue. This led Nerlove to conduct several Monte Carlo studies. Nerlove (1967) considers a simple auto-regressive model with no exogenous variable and compares the performance of the OLS, LSDV, MLE, and several variants of the GLS estimator in estimating the model. In Nerlove (1971), the dynamic panel model is extended to include an exogenous variable. This allows consideration of instrumental variable (IV) estimator with lagged values of the exogenous variable as instrument. It also allows having another variant of the two-stage GLS. Overall, Nerloves Monte Carlo results favor the GLS estimators over other estimators. Since Nerloves work, there have been signicant developments in the eld of dynamic panel data estimators.2 Among these is introduction of the Anderson & Hsiao (1981, 1982) instrumental variable estimators that use further lagged values of the dependent variable as instruments. Arellano & Bond (1991) carry this idea further and propose using all lagged variables (provided they qualify) as instruments within a GMM framework. Ahn & Schimdt (1995, 1997, 1999), Arellano & Bover (1995), Blundell & Bond (1998), Hahn (1999), Wansbeek & Knaap (1998), and Ziliak (1997) suggest various extensions and modications of the Arellano-Bond GMM estimator (ABGMM). On the other hand, Kiviet (1995) and Wansbeek & Knaap (1998) propose modications of the LSDV and LIML estimators, respectively. Many of the recent works offer Monte Carlo evidence too. Thus Arellano & Bond (1991) perform a Monte Carlo study to compare primarily the small sample properties of their GMM estimators with corresponding properties of the Anderson-Hsiao estimators. According to their results, the GMM estimators perform better than the Anderson-Hsiao IV estimators, though not so much in terms of bias as in terms of dispersion. However, simulation studies of AlonsoBorrengo & Arellano (1999), Kiviet (1995), Harris & Matyas (1996), Judson & Owen (1997), Wansbeek & Knaap (1998), and Ziliak (1997) report signicant
322
NAZRUL ISLAM
bias of the ABGMM estimators. Kiviet (1995) reports good performance of his bias-corrected LSDV estimator. On the other hand, Wansbeek & Knaap (1998) report better performance of a covariance-corrected instrumental variable estimator and their LIML estimator. Baltagi & Kao (2000) in this volume give an extensive survey of recent developments in dynamic panel data models. These studies have illuminated the small sample properties of various dynamic panel estimators. However, most of these studies do not focus on any particular model or data set. Ziliak (1997)s study is probably an exception, and it focuses on a labor supply model and uses the PSID data. However, it is known that Monte Carlo results are more useful when the exercise is customized to the model whose estimation is in question and when the simulations are conducted on the basis of the data set that is actually used for estimation of the model. From this point of view there exists a void regarding the growth-convergence equation. Monte Carlo evidence on small sample performance of panel data estimators in estimating this equation is rare. This chapter tries to overcome this lacking to some extent. It focuses exclusively on the growth-convergence equation and bases the simulations on the Summers-Heston data set that has been widely used in estimating this equation. This focus also guides the choice of estimators to be included in the study. The main feature of the growth-convergence equation is that the exogenous variable of the model is correlated with the individual, country effects. This implies that panel estimators that rely on uncorrelated randomeffects assumption are not suitable for estimation of this equation. On the other hand, estimators that highlight this correlation, such as the Minimum Distance estimator of Chamberlain, may play an important role in estimating it. The study also considers several different generation mechanism of the random error term, and it considers estimation of the equation in several different samples that have widely gured in the recent growth literature. Because of its customized nature, the results of this study should be directly useful for the empirical growth researchers.
III. MODEL, PARAMETER VALUES, AND DATA GENERATION

A. The Model The dynamic panel data model that arises in the convergence literature is as follows: yit = yi,t 1 + xi,t 1 + i + t + vit. (1)
323
Here yit represents log of per capita GDP of country i at time t, yi,t 1 is the same lagged by one period, and xi,t 1 is the difference in log of investment and population growth rate variables of country i at time t 1. Finally, i and t are individual and time effect terms, and vit is the transitory error which varies across both individual and time. In this set up, (t1) and t denote initial and subsequent periods of time, respectively. The derivation of this equation proceeds from the Cobb-Douglas aggregate production function, Yt = 1 , where Y, K, and L are output, capital, and labor respectively, and A K t (AtLt) is the labor-augmenting technology which grows exponentially at the exogenous rate g. The derivation yields the following correspondence between the coefcients of equation (1) and the structural parameters of the production function: = e = (1 e ) t = g(t2 e
1
(2) (3) (4) (5)
i = (1 e ) ln A0i

t1).
Here is the length of time between t2 and t1, where t2 and t1 correspond to t and (t1) of equation (1), respectively. The parameter is known as the rate of convergence and is given by = (1 )(n + g + ), where n is the exponential growth rate of L, and is the rate of depreciation of capital. An important issue regarding this model is specication of the individual effect term i. The equation (4) shows that i basically stands for A0i. Mankiw, Romer & Weil (1992, p. 6) dene A0i as follows: The A0i term reects not just technology but resource endowments, climate, institutions, and so on; it may therefore differ across countries. From this denition, it is obvious that A0i is correlated with xi,t 1, which represents savings and fertility behavior in an economy. Thus equation (1) represents a dynamic panel data model with correlated effects. This shows why random-effects estimators are not appropriate for the growth-convergence equation. However, there are different ways to specify the correlation between i and xi,t 1. Mundlak (1971) proposes a simple specication whereby i is a function of x i, the time mean of xi,t 1. This is however restrictive and renders the random effects model equivalent to the xed effects model, provided the transitory error term is serially uncorrelated. Hence, a more general specication is preferable. Following Chamberlain, we adopt the following specication of i: i = 0 + 1x ` i0 + 2xi1 + + TxT 1 + i, (6)
324
NAZRUL ISLAM
where i distributed as N(0, 2 ). Viewed as a linear predictor, this does not involve any restriction. Viewed as a conditional expectation function, the only restriction is linearity. Almost all researchers have used the Summers-Heston data set to estimate the growth-convergence equation. This data set has yearly data. However, it is generally believed that yearly data are not suitable for studying growth, because inuence of business uctuations are likely to have more role in such data. Most of the panel studies have used ve-year averages/panels for estimation of the model. Accordingly, the value of in this study is set to ve.3 B. Parameter Values Considered in full, the model presented in equation (1) and (6) has three sets of parameters. The rst consists of the auto-regressive parameter and the slope parameter . These are the main parameters of interest. The second set consists of 0, 1, . . . . , T , which arise from specication of the individual effect term i. In addition, this set includes the time effect terms, ts. The third set consists of parameters which govern the error terms vit and i. An important issue in data generation is specication of the transitory error term vit. A value of ve implies that vits are ve years apart. However, some possibility of serial correlation in vit still remains. Accordingly we allow for the following three possibilities: 1. UC (serially Uncorrelated) process: vit ~ N(0, 2 v ). 2. MA (1) process: vit = it + i,t 1, with ~ N(0, 2 ). 3. AR (1) process: vit = vi,t 1 + , with ~ N(0, 2 ). There are two reasons for limiting the order of MA and AR processes to one. First, given that vits are ve calendar years apart, orders greater than one are not very plausible theoretically. Second, even if such higher orders cannot be ruled out theoretically, the limited value of T does not make them very feasible. The data used in this chapter range from 1960 to 1985. With equal to ve, this implies ve cross-sections in the panel, i.e. T equals ve. With regard to parameter values for which to conduct the simulations, we again follow the principle of customization. We let the data determine the set of parameter values for which to conduct the simulations. The following threestep procedure is employed for this purpose. In the rst step, we obtain consistent estimates of and . This is done by an instrumental variable (IV) regression based on the rst-differenced model and using lagged xits as instruments. These consistent estimates of and are used to compute
325
composite residuals (t + i + vit). In the second step, these residuals are regressed on xits and year dummies to get estimates of s and ts. The residuals from this second step regression give estimates of (i + vit) s. We can denote these as uits. The third step consists of estimating the parameters of the MA(1) and AR(1) models from the estimated values of uits. We use Chamberlains Minimum Distance estimation procedure to do this and get estimated values of , , and the corresponding values of and .4 In growth-convergence studies, three different samples have been frequently used. Following Mankiw et al. (1992), these samples are often referred to as the NONOIL, INTER, and OECD. Of these, the OECD is the smallest and consists of 22 OECD countries. The NONOIL is the largest and consists of most of the sizable countries of the world for which oil extraction is not the dominant economic activity. This sample consists of 96 countries. Finally, the INTER is an intermediate sample comprised of all those countries included in the NONOIL sample except those for which data quality is not satisfactory. This sample consists of 74 countries. Table 1 gives the values of the parameters that belong to the rst and second set. These are also the parameters that remain the same under different generation mechanisms of vit. Certain aspects of these parameter values are worth noting. First, there seems to be some agreement across samples regarding direction in which xits of different years relate to the individual effect term i. This is reected in similar signs of ts across samples. However, this agreement is not complete. Second, the way different time periods affect the growth process differs across samples. Table 1. Common Parameter Values
Parameter 0 1 2 3 4 5 70 75 80 85 NONOIL 0.7886 0.1641 1.3334 0.0028 0.1200 0.1243 0.0267 0.2277 0.0171 0.0156 0.0067 0.0669 INTER 0.7925 0.1732 1.3588 0.1927 0.1098 0.1644 0.1286 0.1715 0.0093 0.0015 0.0218 0.0523 OECD 0.6294 0.0954 2.8986 0.5863 0.6354 0.0702 0.6355 0.3484 0.0680 0.0827 0.1295 0.1238
326
NAZRUL ISLAM
This is revealed by the signs of ts in different samples. There are some differences in this regard between the NONOIL and the INTER samples. However, the difference between these two samples on the one hand, and the OECD, on the other, proves to be more signicant. Next we turn to the parameter values that differ with the three different generation mechanisms of vit. The estimated values of these parameters are compiled in Table 2. Several things may be noted from this Table. First, the largest estimated values of and are about 0.2 and 0.3, respectively. This indicates that any serial dependence that vit may have in the actual data is of fairly low order.5 This in turn suggests that the relative performance of different estimators may not vary widely across different ways of modeling of vit. Second, variance of the individual country effect term remains quite stable under alternative generating schemes of vit in all different samples. Third, the estimate of the variance of vit also remains very similar across the samples. Fourth, the relative values of and v suggest that variation in the individual effect term i account for a signicant part of the overall variation in the data. C. Data Generation Once the parameter values are available, data generation can begin. It proceeds through the following steps. First of all, values of xits are constructed from the
Table 2.
Parameter v v v
Parameter Values for Different Generating Mechanisms of vit

NONOIL Uncorrelated vit 0.1054 0.1281 MA(1) vit 0.2037 0.1179 0.1225 0.1153 AR(1) vit 0.2994 0.1227 0.1183 0.1171 0.1787 0.0943 0.0995 0.0927 0.1394 0.0319 0.0742 0.0316 0.1250 0.0990 0.1010 0.0980 0.1125 0.0302 0.0742 0.0300 INTER OECD
0.0872 0.0139
0.0300 0.0762
327
Summers-Heston data set in the way described above.6 This data set also provides the initial values, y0i. We assume that all disturbance terms have normal distribution.7 The second step differs for different models of vit. For the uncorrelated model, random values of vit and i are generated using 2 distributions N(0, 2 v ) and N(0, ), respectively. These values of vit and i are then combined with the given values of yi,t 1 and xi,t 1, and the parameter values in Table 1 to produce yit. For the rst period, y0is serve as the yi,t 1s. For the subsequent periods, the value of yit serves as the lagged value of y for generating yi,t + 1. The process continues till the last (T-th) period is reached. For the MA(1) model, i is again generated using distribution N(0, 2 ). However, generation of vit now requires generation of it from the distribution N(0, 2 ). These values of it are then combined with the values of to produce the vits. Generation of vits for the AR(1) proceeds in analogous manner. Once the data are generated, estimation can proceed. We now turn to the estimation results.
IV. SIMULATION RESULTS

Given a certain number of cross-sections available (i.e. given T), different panel data estimators can make use of different numbers of these cross-sections at the nal stage of estimation. In simulation, therefore, it is possible to adopt two different approaches. One is to keep the actual number of cross-sections used by the estimators the same by generating varying number of cross-sections for different estimators. The other is to keep the number of cross-sections generated the same and let the number of actual cross-sections used in the nal stage of estimation by different estimators to vary. It is the second situation that a researcher faces in actual practice. In order to conform to this real situation, we adopt the second approach. In our particular case, there are ve crosssections available, namely for 1965, 1970, 1975, 1980, and 1985, and T is ve. We let the actual number of cross-sections used by individual estimators to vary.8 As is known, not all panel estimators are geared to estimation of all the parameters of the model. Because of this and also in order not to clutter the presentation with too many numerical results, we focus here only on results regarding and . The simulation results presented in this chapter are on the basis of one thousand replications. In most cases, Monte Carlo distributions stabilized with only one hundred replications. Hence increasing the number of replications by any further was not necessary. The two criteria that are usually used in judging performance of an estimator are bias and mean square error (MSE). In order to make assessment easy, we
328
NAZRUL ISLAM
present tables showing bias and root mean square error (RMSE) in relative form, i.e. as percentage of the true parameter value.9 Tables 3 and 4 provide the relative magnitudes of bias, and Tables 5 and 6 show the relative magnitudes of root mean square error for the estimates of and , respectively. These Tables indicate that the relative performance of the estimators varies across samples and vit generation mechanisms (DGM). To convey an overall picture, we therefore compute the (algebraic) average of the bias and RMSE for each estimator. These are row-averages and are presented in the last column of the Tables. We will rst describe the results in terms of these averages and then consider the inter-sample and inter-DGM variations. Beginning with , we may rst consider results regarding bias. Table 3 shows that the OLS estimates of are, as expected, positively biased, and this bias averages to seventeen percent. The panel estimates of , on the other hand and as expected, are negatively biased. The only exception in this regard is the AH(d) estimator, which displays small positive bias when vit is generated under the uncorrelated (UC) scheme. However, the average bias is negative for this estimator too. We refrain from reporting results for the AH(l) estimator because of its very poor performance. (We will come to this issue shortly.) Among the panel estimators, the bias is smaller for the AH(d), the LSDV, and the MD estimators, ranging between ve and six percent. These are followed by the SE estimators, for which this bias ranges between eight to ten percent. The largest bias, about twenty-two percent, is associated with the ABGMM estimators. Table 5 shows that the RMSE in estimating has a similar pattern. The average RMSE for the OLS estimator stands at seventeen percent. For the LSDV and the MD estimator, this ratio lies between six and seven percent. For the AH(d) estimator the ratio averages to eleven percent. For the SE estimators, this ratio lies between thirteen to twenty percent. For the ABGMM estimators, this ratio equals to or exceeds forty percent. Looking at the bias results for (Table 4), we see that the OLS estimates are again severely biased upwards, with the bias now averaging to forty-eight percent. The direction of bias of the panel estimators is mixed. But the panel procedures yield estimates that are on average quite close to the true parameter values. The absolute value of this bias for the panel estimators ranges from under one to seven percent. Within this range, however, the LSDV, the MD, the 2SLS, and the 3SLS estimators perform better, with average bias being less than one percent. Next comes the AH(d) and the G3SLS estimator, having a bias ranging between one and two percent. The largest biases, ranging between ve and seven percent, are recorded for the ABGMM estimators. The smallness of the average biases of the panel estimates of is however swamped by large variances of the Monte Carlo distributions. This nds
Table 3. For in the model: yit = yi,t 1 + xi,t 1 + i + t + vit

NONOIL INTER INTER INTER OECD OECD OECD
Bias as Percentage of True Parameter Value
Estimator
NONOIL
NONOIL
Row
OLS LSDV AH(l) AH(d) AGMM1 AGMM2 2SLS 3SLS G3SLS MD
UC 14.8 8.0 nr 0.4 10.7 9.7 9.3 4.5 6.0 6.7
MA(1) 14.6 8.2 nr 14.5 10.4 10.1 9.3 3.3 5.4 6.9
AR(1) 14.8 7.9 nr 15.9 10.6 10.2 8.6 6.5 5.2 6.4
UC 15.2 8.4 nr 0.2 44.4 47.3 3.1 5.8 8.3 6.9
MA(1) 15.2 9.3 nr 9.5 49.5 49.5 3.1 7.9 10.1 7.9
AR(1) 15.4 8.0 nr 10.0 43.4 44.4 2.8 5.3 8.8 6.7
UC 21.5 1.6 nr 0.6 9.5 8.6 18.8 12.7 19.9 1.3
MA(1) 20.9 1.7 nr 1.2 8.6 8.2 15.8 12.2 16.5 1.1
AR(1) 21.2 1.4 nr 1.6 8.3 8.5 17.1 12.2 13.4 1.2
Average 17.1 6.1 nr 5.7 21.7 21.8 9.8 7.8 10.4 5.0
Notes: 1. The true values of are different for different sample and are provided in Table 1. 2. Row Average is the algebraic average of the numbers in the row. 3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive generation mechanism of the transitory error vit. 4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
329
330

Bias as Percentage of True Parameter Value
Estimator
NONOIL
NONOIL
Row
UC 31.4 1.0 nr 0.4 13.7 3.9 2.3 0.2 1.3 0.2
MA(1) 32.1 0.7 nr 2.2 14.4 5.2 2.7 0.2 2.4 0.7
AR(1) 31.6 0.3 nr 4.0 14.5 14.7 1.9 2.0 1.7 0.8
UC 11.8 0.5 nr 0.6 7.5 3.1 2.7 2.0 2.0 1.0
MA(1) 11.7 1.4 nr 1.1 16.3 22.1 2.0 2.3 2.2 0.5
AR(1) 11.1 1.1 nr 1.0 26.4 34.5 2.5 9.3 8.7 0.0
UC 100.0 1.5 nr 1.7 7.3 17.3 0.8 5.1 14.2 0.6
MA(1) 99.9 0.7 nr 2.5 5.4 19.9 3.9 4.5 8.1 0.6
AR(1) 100.5 2.1 nr 0.4 2.6 1.5 3.9 8.9 2.0 1.3
Average 47.8 0.1 nr 1.5 6.9 5.3 0.9 0.8 1.4 0.03
NAZRUL ISLAM

Root MSE as Percentage of True Parameter Value
Estimator
NONOIL
NONOIL
Row
UC 15.0 8.5 nr 8.3 27.7 29.6 12.1 8.5 10.0 7.4
MA(1) 14.8 8.7 nr 16.6 27.5 29.3 12.6 14.7 18.1 7.8
AR(1) 14.9 8.5 nr 17.6 26.7 28.9 12.0 10.4 8.7 7.4
UC 15.3 8.9 nr 5.4 64.8 79.7 5.1 9.6 11.9 7.6
MA(1) 15.3 9.9 nr 13.9 70.4 84.9 5.4 11.1 13.8 8.7
AR(1) 15.3 8.7 nr 13.3 65.9 77.1 5.0 8.9 12.6 7.6
UC 22.3 3.5 nr 7.3 24.3 32.9 24.3 28.4 40.9 3.0
MA(1) 21.7 3.6 nr 7.2 21.9 29.1 21.3 28.4 37.6 3.1
AR(1) 22.0 3.6 nr 7.5 23.7 31.0 23.0 23.6 29.3 3.2
Average 17.4 7.1 nr 10.8 39.2 46.9 13.4 16.0 20.3 6.2
331
332

Root MSE as Percentage of True Parameter Value
Estimator
NONOIL
NONOIL
Row
UC 34.6 12.8 nr 19.9 147.0 169.6 17.1 13.7 17.5 13.3
MA(1) 35.2 15.3 nr 20.1 151.9 169.7 18.4 21.9 28.2 15.4
AR(1) 34.7 15.4 nr 19.1 145.3 165.1 17.5 16.1 15.4 15.8
UC 18.8 12.4 nr 20.3 148.0 187.7 19.5 16.5 17.8 12.6
MA(1) 18.1 14.5 nr 21.2 153.8 205.2 20.8 15.6 18.5 15.1
AR(1) 17.7 14.3 nr 18.1 143.6 181.9 19.3 17.8 23.5 14.4
UC 117.4 40.1 nr 64.9 243.5 306.8 58.3 67.2 119.4 40.5
MA(1) 116.6 44.9 nr 60.2 237.9 284.6 54.7 64.9 111.1 44.6
AR(1) 116.0 43.8 nr 62.1 226.5 284.1 57.9 82.5 149.6 45.6
Average 56.6 23.7 nr 34.0 177.5 217.2 31.5 35.1 55.7 24.1
Notes: 1. The true values of are different for different sample and are provided in Table 1 2) Row Average is the algebraic average of the numbers in the row. 3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive generation mechanism of the transitory error vit. 4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
NAZRUL ISLAM
333
reection in the large relative RMSE values reported in Table 6. The ratio of RMSE to true value of for the OLS estimator stands at fty-seven percent. For most of the panel estimators this ratio is much lower. For the LSDV and the MD estimators, this ratio is close to twenty-four percent. For the AH(d), the 2SLS, and 3SLS estimators, the ratio lies between thirty-two and thirty-ve percent. The G3SLS estimator displays a higher ratio, fty-six percent, which is close to that observed for the OLS estimator. For the ABGMM estimators, however, this ratio ranges from 178 to 217 percent, which is much higher than that for the OLS. These results show that the OLS estimation of the growth-convergence equation is very likely to produce signicantly biased estimates. The performance of the panel estimators, on the other hand, varies. The LSDV and the MD estimators perform well. The SE estimators come next in performance. The AH estimators display very contrasting performance. The AH(l) estimator perform so poorly that we refrain from presenting its results. On the other hand, the AH(d) estimator performs sometimes better than the SE estimators. The ABGMM estimators are found to display large bias and RMSE. These results agree with recent Monte Carlo evidence produced by other researchers in other contexts. For example several studies have reported bias of the ABGMM estimators. Other studies have reported good small sample performance of the LSDV estimator. These results imply that the OLS estimation of the growth-convergence equation should be avoided. Indiscriminate use of panel estimator is also fraught with danger. However, a judicious choice of panel estimator can yield better estimates of the parameters of the growth convergence equation. Empirical growth researchers can make use of this possibility. Beyond these results of immediate concern, the study brings out several general points. The rst of these concerns the contrasting performance of the AH estimators. Both these estimators rely on the assumption of orthogonality of lagged yi to vit. This assumption holds only when vit is serially uncorrelated. Therefore, one would expect both these estimators to perform well when vit is serially uncorrelated, and both of them to perform poorly when vit follows either the AR(1) or the MA(1) pattern. However, as the numbers in the Tables show, the AH(d) performs relatively well under all different generation mechanisms of vit and for all samples, while the performance of AH(l) is found to be unsatisfactory under all different generation mechanisms of vit and for all samples, particularly for the NONOIL and the INTER samples. The explanation, as it turns out, lies in the difference in the degree of correlation of the instruments with the instrumented variables. It is found that (yi,t 2 yi,t 3), the instrument used by the AH(d), is strongly correlated with the explanatory
334
NAZRUL ISLAM
variable (yi,t 1 yi,t 2), while yi,t 2, the instrument used by the AH(l), is very poorly correlated with (yi,t 1 yi,t 2). This poor correlation nds reection in astronomically large values of standard error for the AH(l) estimates. These results reconrm the necessity of instruments to be sufciently correlated with the instrumented variable (in addition to being uncorrelated with the error), and highlight the importance of the research on estimation with weak instruments.10 A second point concerns the performance of the ABGMM estimators as well as the AH(d) estimator. The performance of these estimators does not vary that much over the three generation mechanisms of vit. This is particularly true with regard to estimation of . This is somewhat surprising because these estimators depend rather heavily for their validity on orthogonality of lagged values of yit to vit, and this orthogonality is violated when vit follows either an AR or a MA scheme. It is true that the order of serial correlation is low. However, one would expect some effect of the serial correlation given that it nullies validity of so many instruments. Actually, the AH(d) estimator does show some sensitivity with respect to the generation scheme of vit. Why the ABGMM estimators do not display similar sensitivity is an intriguing question. The third point relates to the variation of performance of the estimators across samples. The overall picture portrayed above is on the basis of average over samples and DGMs. Looking at inter-sample variation, however, it is difcult to establish a pattern. For example, going by the results on bias of estimated , the performance of the OLS estimator deteriorates for the OECD when compared with that for either the NONOIL or the INTER samples. However, in case of the LSDV and the MD estimators, the opposite is true. The ABGMM and the SE estimators show a yet different kind of contrast. The performance of the ABGMM estimators deteriorates for the INTER sample in comparison with that for either the NONOIL or the OECD samples. In case of the SE estimators, the opposite is true. The contrasting performance of the ABGMM and the SE estimators may not be entirely surprising in view of the fact that while the former depends on lagged yits as instruments, the SE estimators rely entirely on the xits. The fourth point concerns relative performance of simple and sophisticated versions of generically similar estimators. The averaged RMSE values presented in Tables 5 and 6 show that the simpler 2SLS estimators outperforms the 3SLS and the G3SLS. Similarly, in terms of these averaged values, the ABGMM1 outperforms the ABGMM2.11 This highlights the fact that sophisticated estimators requiring estimated weighting matrices may not necessarily perform better than their simpler counterpart estimators that do not require such matrices. Estimation of these weighting matrices creates
335
additional scope for noise to enter the estimation process, and that may nullify the potential gain. The nal point concerns the performance of the LSDV estimator. As is known, for a dynamic panel data model, the LSDV is inconsistent in the direction of N. True that the LSDV estimator is consistent in the direction of T. However T in this study is too small to make one a-priori hopeful of the benet of T-asymptotics. The results of this chapter regarding LSDV estimates show that even theoretically inconsistent estimators can have good small sample properties. This reinforces the importance of Monte Carlos studies.
V. CONCLUDING REMARKS
The issue of small sample properties of dynamic panel estimators is important. Both substantive and methodological conclusions often depend on attention given to this issue. For example, Caselli et al. (1996) reject the Solow model based on their results from estimation of the growth-convergence equation using a variant of the ABGMM estimator. The small sample bias of this estimator reported in this and other studies may raise the question whether such a rejection was too quick. Also, the estimation results prompt the authors to abandon the strictly model-based specication in favor of an extended version that includes a variety of variables based on heuristic reasoning. From a methodological point of view, this is a throwback to the earlier stage of crosscountry growth research when specications used to be informal, and the coefcient of the regressions did not have exact correspondence with the structural parameters of the production function. One of the great merits of Mankiw, Romer & Weil (1992) and Barro & Sala-i-Martin (1992) was to put an end to this stage. Methodologically, therefore, a return to informal specications may not be the ideal thing to do. A more satisfactory solution is perhaps to adopt a two-stage analysis, with the rst stage adhering to the formal, model-based specication and yielding unbiased estimates of parameters and productivity. The second stage may focus on the role of the heuristic variables in explaining productivity differences. However, this requires attention to the issue of small sample performance of the estimator used in the rst stage.
NOTES
1. For a derivation of the growth-convergence equation, see Barro & Sala-i-Martin (1992, 1995), Mankiw, Romer & Weil (1992), and Mankiw (1995). For conversion of the growth-convergence equation into a dynamic panel data model, see Islam (1993, 1995).
336
NAZRUL ISLAM
2. For discussions of many of these new estimators, see Baltagi (1995) and Hsiao (1986). 3. This is value of that has been used in Islam (1993, 1995), Knight et al. (1993), Caselli et al. (1996) and in several other papers. 4. For example, for the MA(1) model, this starts by noticing that E(uiu i ) has the following structure:
2 2 + (1 + 2)2 2 2 2 2 + 2 2 2 2 2 2 2 2 + + (1 + ) + 2 2 2 2 2 E(uiu 2 2 2 2 2 + + (1 + ) + i) = 2 2 2 2 2 2 2 2 2 + + (1 + ) + 2 2 2 2 2 2 2 + + (1 + )2
where ui = (ui1, ui2, . . . , uiT), and T = 5. As expected, E(uiu i ) has three parameters, namely , , and . The sample analog of this covariance matrix is obtained from 1 u iu i, where u i = ( ui1, . . . , u iT), and u its are obtained from the second step. There N i are T(T + 1)/2 = 15 distinct elements in this sample covariance matrix, which are (nonlinear) functions of the three underlying parameters , , and . Estimates of , , and can be obtained from these 15 elements using the MD estimation framework. See for details Chamberlain (1982, 1983). An analogous procedure is followed for the AR(1) model to obtain the estimates of , , and . Estimation of v and for the UC case is easier. 5. Perhaps also of interest is that the value of both and are the largest in the NONOIL sample and the smallest in the OECD sample, with the values for the INTER sample being in between. 6. For further details on construction of the xits, see Islam (1995). 7. In this study we have limited ourselves to parametric distributions of the disturbance term. In principle it is possible to do away with parametric assumptions. We leave this as a future task. 8. To save space, we do not provide detailed description of the estimators. Many of these are well known. For the rest, the interested reader can see the cited references. An appendix containing the description of the estimators is also available from the author upon request. 9. In this chapter we report only the summary results. The detailed results are in a set of Appendix Tables, which are available upon request. 10. See for example Nelson & Startz (1990), Staiger & Stock (1997), and Wang & Zivot (1998). 11. To be sure, this ranking does not hold for every sample and every DGM. For example in the NONOIL sample, regardless of the DGM, results from the 3SLS and the G3SLS estimators seem to be better than that from the 2SLS. For the INTER sample, however, the 2SLS seems to perform better than either the 3SLS or the G3SLS. In case of the OECD sample, the situation is less clear cut. In terms of the mean of the Monte Carlo distribution, the 3SLS and the G3SLS fare better than the 2SLS, though not in terms of dispersion. On the other hand, in the OECD sample, the Monte Carlo distributions for the 2SLS estimator have very large standard deviation. One reason for
337
deterioration of performance of the 3SLS and the G3SLS estimators in the INTER and the OECD samples, when compared to that in the NONOIL sample, may lie in samplesize. The sizes of the former samples are smaller that that of the latter. Since the superiority of the 3SLS and the G3SLS over the 2SLS estimator is an asymptotic result, a larger sample size may help this result to surface.
ACKNOWLEDGMENTS
I would like to thank Professor Chamberlain, Professor Jorgenson, and Professor Guido Imbens for their guidance to my work on this paper. Initial versions of this chapter were presented in seminars at Harvard University and Emory University. Comments of the participants of these seminars are greatly appreciated. I would like to extend my sincere thanks to the three referees and the editor, Professor Badi Baltagi, for their comments and suggestions that led to signicant improvement of this chapter. All remaining errors are mine.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efcient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 527. Ahn, S. C., & Schmidt, P. (1997). Efcient Estimation of Dynamic Panel Models: Alternative Assumptions and Simplied Estimation. Journal of Econometrics, 76, 309321. Ahn, S. C., & Schmidt, P. (1999). Estimation of Linear Panel Data Models Using GMM. In: Matyas (Eds), Generalized Method of Moments Estimation. Cambridge: Cambridge University Press. Alonso-Borrengo, C., & Arellano, M. (1999). Symmetrically Nomalized Instrumental-Variable Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 3649. Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components. Journal of American Statistical Association, 76, 598606. Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics, 18, 4782. Arellano, M., & Bond, S. (1991). Some Tests of Specication for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. The Review of Economic Studies, 58, 277297. Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variable Estimation of Error Components Models. Journal of Econometrics, 68, 2952. Balestra, P., & Nerlove, M. (1966). Pooling Cross-section and Time Series Data in the Estimation of a Dynamic Model: The Demand of Natural Gas. Econometrica, 34, 585612. Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons. Baltagi, B. H., & Kao, C. (2000). Non-stationary Panels, Cointegration in Panels, & Dynamic Panels: A Survey. Advances in Econometrics, 15 (this volume). Barro, R. (1997). Determinants of Economic Growth: A Cross-country Empirical Study. Cambridge: MIT Press. Barro, R., & Sala-i-Martin, X. (1992). Convergence. Journal of Political Economy, 100(2), 223251.
338
NAZRUL ISLAM
Barro, R., & Sala-i-Martin, X. (1995). Economic Growth. Boston: McGraw Hill. Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variable Estimators. Econometrica, 62, 657681. Blundell, R., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115143. Caselli, F., Esquivel, G., & Lefort, F. (1996). Reopening the Convergence Debate: A New Look at Cross-country Growth Empirics. Journal of Economic Growth, 1(3), 363390. Chamberlain, G. (1982). Multivariate Regression Models for Panel Data. Journal of Econometrics, 18, 546. Chamberlain, G. (1983). Panel Data. In: Z. Griliches, Z. & M. Intrilligator (Eds), Handbook of Econometrics (pp. 12471318), Vol. II. North-Holland. Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed Effects? Journal of Econometrics, 93, 309326. Harris, M. N., & Matyas, L. A. (1996). Comparative Analysis of Different Estimators for Dynamic Panel Data Models. Working paper: 04/96, Department of Econometrics and Business Statistics, Monash University. Harris, M., Longmire, R., & Maytas, L. (1996). Robustness of Estimators for Dynamic Panel Data Models to Misspecication. Working paper No. 14/96, Department of Econometrics and Business Statistics, Monash University. Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press. Islam, N. (1993). Estimation of Dynamic Models from Panel Data. Unpublished Ph.D. Dissertation, Department of Economics, Harvard University. Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, CX, 11271170. Judson, R. A., & Owen, A. L. (1997). Estimating Dynamic Panel Data Models: Practical Guide for Macroeconomists. Board of Governors of the Federal Reserve System, Finance and Economics Discussion Paper Series 1997/03. Kiviet, J. (1995). On Bias, Inconsistency, & Efciency of Various Estimators in Dynamic Panel Data Models. Journal of Econometrics, 68, 5378. Knight, M., Loyaza, N., & Villanueva, D. (1993). Testing for Neoclassical Theory of Growth. IMF Staff Papers, 40(3), 512541. Lee, K., Pesaran, H., & Smith, R. (1997). Growth and Convergence in a Multi-Country Empirical Stochastic Growth Model. Journal of Applied Econometrics, 12, 357392. Lee, K., Pesaran, H., & Smith, R. (1998). Growth Empirics: A Panel Data Approach A Comment. Quarterly Journal of Economics, CXIII, 319323. Lee, M., Longmire, R., Matyas, L., & Harris, M. (1998). Growth Convergence: Some Panel Evidence. Applied Economics, 30, 907912. Mankiw, N. G. (1995). The Growth of Nations. Brookings Papers on Economic Activity, 1, 275310. Mankiw, N. G., Romer, D., & Weil, D. (1992). A Contribution to the Empirics of Growth. Quarterly Journal of Economics, CVII, 407437. Maytas, L. (Ed.) (1999). Generalized Method of Moments Estimation. Cambridge: Cambridge University Press. Mundlak, Y. (1971). On the Pooling of Time Series and Cross-section Data. Econometrica, XXXVI, 6985. Nelson, C. R., & Startz, R. (1990). Some Further Results on the Exact Small Sample Properties of the Instrumental Variables Estimator. Econometrica, 58, 967976.
339
Nerlove, M. (1967). Experimental Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-sections. Economic Studies Quarterly, 18, 4274. Nerlove, M. (1971). Further Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-sections. Econometrica, 39, 383396. Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri, L. Lee & M. Pesaran (Eds), Analysis of Panel and Limited Dependent Variable Models. Cambridge: Cambridge University Press. Nickel, S. (1979). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 13991416. Staiger, D., & Stock, J. H. (1997). Instrumental Variable Regressions with Weak Instruments. Econometrica, 65, 557586. Summers, R., & Heston, A. (1988). A New Set of International Comparisons of Real Product and Price Levels Estimates for 130 Countries, 195085. Review of Income and Wealth, XXXIV, 126. Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 19501988. Quarterly Journal of Economics, 106, 327368. Wang, J., & Zivot, E. (1998). Inference on Structural Parameters in Instrumental Variables Regression with Weak Instruments. Econometrica, 66(6), 13891404. Wansbeek, T. J., & Knaap, T. (1998). Estimating a Dynamic Panel Data Model with Heterogenous Trends. Working paper, Department of Economics, University of Groningen. Ziliak, J. P. (1997). Efcient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators. Journal of Business and Economic Statistics, 15, 419431.

(Advances in Econometrics) Badi H. Baltagi-Nonstationary Panels, Panel Cointegration, and Dynamic Panels-JAI Press (NY) (2000)

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

(Advances in Econometrics) Badi H. Baltagi-Nonstationary Panels, Panel Cointegration, and Dynamic Panels-JAI Press (NY) (2000)

Hochgeladen von

Copyright:

Verfügbare Formate

LIST OF CONTRIBUTORS

BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL

BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL

NONSTATIONARY PANELS, COINTEGRATION IN PANELS AND DYNAMIC PANELS: A SURVEY

BADI H. BALTAGI & CHIHWA KAO

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

II. PANEL UNIT ROOTS TESTS

BADI H. BALTAGI & CHIHWA KAO

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

y it = yit Then we have

and the corresponding t-statistic, under the null hypothesis is given by

BADI H. BALTAGI & CHIHWA KAO

uniformly for r[0, 1]. For a xed N, we have

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

(i, t) N(T( 1) + 7.5) N 0,

1.25t + 1.875N N(0, 1)

448 (t + 3.75N) N(0, 1) 277

3 3(17T 2 20T + 17) N 0, T+1 5(T 1)(T + 1)3

15 15(193T 2 728T + 1147) N 0, 2(T + 2) 112(T + 2)3(T 2)

BADI H. BALTAGI & CHIHWA KAO

ij yit j + z iti + it.

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

Var[tiT | i = 1] N( t E[tiT | i = ]) Var[tiT | i = 1]

as N by the Lindeberg-Levy central limit theorem. Hence tIPS = N(0, 1) (10)

BADI H. BALTAGI & CHIHWA KAO

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

yit = z it + eit where eit =

BADI H. BALTAGI & CHIHWA KAO

It can be shown that

as T followed by N provided E[ W 2 iZ] < . Also,

III. SPURIOUS REGRESSION IN PANEL DATA

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

xit = xit 1 + it, and eit is I(1). The OLS estimator of is

where y it is dened in (3) and x it = xit

as by a sequential limit theory, where E[ WiZWiZ] 1 0 2 1 1 2 1 i Ik 2 1 (i, t) Ik 15 zit Then we have

BADI H. BALTAGI & CHIHWA KAO

IV. PANEL COINTEGRATION TESTS

tests by assuming zit = {i}: DF =

( eit e it 1)2. Kao proposed the following four DF type

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

DFt = 1.25t + 1.875N,

BADI H. BALTAGI & CHIHWA KAO

(20) (21) (22)

where Sit, is partial sum process of the residuals, Sit =

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

BADI H. BALTAGI & CHIHWA KAO

22i = 11i = scalar case L and L 2 u

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

as T . Now for a xed N, it is clear that

BADI H. BALTAGI & CHIHWA KAO

It can be shown that

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

= where Zki = tr as T . Then

BADI H. BALTAGI & CHIHWA KAO

V. ESTIMATION AND INFERENCE IN PANEL COINTEGRATION MODELS

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

BADI H. BALTAGI & CHIHWA KAO

using sequential limit theory, where zit 0 1 i (i, t) and i = E[1i] 0 0

is the long-run covariance matrix of (uit, it), also i =

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

BADI H. BALTAGI & CHIHWA KAO

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey