Sie sind auf Seite 1von 33

Econometrics Journal (2004), volume 7, pp. 585617.

A comparison of autoregressive distributed lag and dynamic OLS


cointegration estimators in the case of a serially correlated
cointegration error
E KATERINI PANOPOULOU

AND

N IKITAS P ITTIS

University of Piraeus, Department of Banking and Financial Management, 80 M.Karaoli


and A. Dimitriou str. 18534 Piraeus, Greece
E-mail: npittis@unipi.gr
Received: September 2004

Summary This paper deals with a family of parametric, single-equation cointegration


estimators that arise in the context of the autoregressive distributed lag (ADL) models. We
particularly focus on a subclass of the ADL models, those that do not involve lagged values
of the dependent variable, referred to as augmented static (AS) models. The general ADL
and the restricted AS models give rise to the ADL and dynamic OLS (DOLS) estimators,
respectively. The relative performance of these estimators is assessed by means of Monte
Carlo simulations in the context of a triangular data generation process (DGP) where the
cointegration error and the error that drives the regressor follow a VAR(1) process. The results
suggest that ADL fares consistently better than DOLS, both in terms of estimation precision
and reliability of statistical inferences. This is due to the fact that DOLS, as opposed to ADL,
does not fully correct for the second-order asymptotic bias effects of cointegration, since
a truncation bias always remains. As a result, the performance of DOLS approaches that
of ADL, as the number of lagged values of the first difference of the regressor in the AS
model increases. Another set of Monte Carlo simulations suggests that the commonly used
information criteria select the correct order of the ADL model quite frequently, thus making
the employment of ADL over DOLS quite appealing and feasible. Additional results suggest
that ADL re-emerges as the optimal estimator within a wider class of asymptotically efficient
estimators including, apart from DOLS, the semiparametric fully modified least squares
(FMLS) estimator of Phillips and Hansen (1990, Review of Economic Studies 57, 99125), the
non-linear parametric estimator (PL) of Phillips and Loretan (1991, Review of Economic Studies
58, 40736) and the system-based maximum likelihood estimator (JOH) of Johansen (1991,
Econometrica 59, 155180). All the aforementioned results are robust to alternative models
for the error term, such as vector autoregressions of higher order, or vector moving average
processes.

Key words: ADL, DOLS, FMLS, Cointegration estimators, Information criteria.

Corresponding author.

C Royal Economic Society 2004. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main

Street, Malden, MA, 02148, USA.

586

Ekaterini Panopoulou and Nikitas Pittis

1. INTRODUCTION
Since the seminal paper of Engle and Granger (1987), the concept of cointegration has attracted a
great deal of attention in both the theoretical and the applied econometric literature. Efficient
cointegration estimators, either in a single or in a system-of-equations framework, are now
available with well-known asymptotic properties. An interesting aspect of cointegration is
that single-equation methods are immuned to the classical problem of the endogeneity of the
regressor(s). That is, the OLS estimator converges at rate T, where T is the sample size,
regardless of the correlation structure between the cointegration error and the regressor (see Stock
1987). However, long-run correlation and/or endogeneity problems are still encountered when
statistical inference on the cointegration vector is conducted. In the presence of contemporaneous
and/or temporal correlation between the cointegration error and the regressor, the asymptotic
distribution for the OLS estimator does not belong to the local asymptotic mixtures of normal
(LAMN) family and depends on nuisance parameters (see Park and Phillips 1988; Phillips 1988;
Sims et al. 1990; Phillips and Loretan 1991).
Various single-equation estimation methods dealing with the second-order effects, either
parametrically or non-parametrically, have been suggested in the literature (see, e.g. Phillips
and Hansen 1990; Stock and Watson 1993). The parametric methods attempt to estimate the
long-run parameters in the context of a dynamic model, in which the regression error forms a
martingale difference sequence with respect to a selected information set. The resulting models
fall into the category of the Hendry-style autoregressive distributed lag (ADL) models, which
encompass the error correction models (ECM) as a special case (see Hendry et al. 1984; Banerjee
et al. 1993; Pesaran and Shin 1999). In empirical applications, however, the ADL class of models
is rarely employed. Instead, applied researchers seem to favour a subclass of the ADL family,
namely those models that do not involve lagged values of the dependent variable, say yt . These
models can be thought of as arising from the static equation of yt on xt augmented by current and
past values of the first difference of the regressor.1 We will refer to these models as the augmented
static (AS) models. Estimation of the cointegration vector in the context of the AS models by
means of least squares is asymptotically optimal and the resulting estimator is usually referred to
as the dynamic ordinary least squares (DOLS) estimator (see Stock and Watson 1993). In other
words, for optimal parametric inference we do not have to employ the full dynamic ADL model;
instead the AS model suffices. This is due to the fact that the AS model is based on the projection
of the cointegration error on the current, and past values of the error that drives the regressor (say,
set A), that is, it involves all the necessary parametric corrections for removing the second-order
effects.2 On the other hand, the ADL model is based on the projection of the cointegration error
on the full information set (say, set B) that is on set A, plus the past values of the cointegration
error. This in turn implies that the AS and ADL models differ in two respects. First, the error in the
AS model, as opposed to the error in the ADL model is, in general, serially correlated. This is not
a major problem, provided that the long-run variance of the error in the AS model is consistently

1 The discussion refers to the case that there are no feedbacks from the cointegration error to the error that drives the
regressor. In the case that the cointegration error Granger causes the regressors error, the generating mechanism for the
latter is not fully estimated. In such a case, further augmentation of the ADL model by the leads of the regressor restores
strong exogeneity and removes the second-order asymptotic bias (see Phillips and Loretan 1991; Saikonnen 1991; Stock
and Watson 1993; Pesaran and Shin 1999).
2 This is true under the assumption that the cointegration error does not Granger cause the error that drives the regressor.
We relax this assumption in the third section of the paper.


C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

587

estimated (see Kramer 1986; Park and Phillips 1988). Second, and more importantly, in the cases
that the cointegration error and the error that drives the regressor follow a vector autoregressive
process of order m (VAR(m)), the projection of the cointegration error on set B is summarized in
terms of a small number of variables. On the other hand, the projection of the cointegration error
on set A results in an infinite weighted sum of current and past values of the error that drives the
regressor. In practice, of course, this infinite sum is truncated at a specific lag, say p, so there is
always a truncation remainder, which represents the second-order effects that have not been taken
into account. Therefore, the ADL model, utilizing the exact projection of the cointegration error
on set B, offers a better framework for estimating the cointegration vector than the AS model that
utilizes an approximate projection of the cointegration error on set A.
The preceding discussion implies that the relative performance of ADL against DOLS is
likely to depend on the specific parametric model that generates the errors. For example, if the
error-generating mechanism is a vector moving average (VMA) process, then the performance
of the ADL estimator in finite samples is likely to be comparable to that of DOLS. A direct
implication of the VMA assumption is that the memory of the cointegration error is designed to
be extremely short. This, however, does not seem to be the case, when actual data are used. In most
macroeconomic applications, the equilibrium error seems to exhibit a rather long memory. In fact,
sometimes it is difficult to distinguish between such a highly persistent error and a nonstationary
one. In view of this, it is natural to compare ADL and DOLS within a framework that is capable
of reproducing the observed behaviour of the cointegration error. Stock and Watson (1993) (SW,
henceforth) specify a VAR(1) model for the errors, which does give rise to a highly persistent
cointegration error. Their designs, however, are such that the truncation bias of DOLS is zero,
thus favouring the DOLS estimator against its competitors.3
In this paper, we follow SW and employ a triangular data generation process (DGP) assuming
that the cointegration error and the error that drives the regressor follow a VAR(1) process
with normal innovations. The purpose of this paper is to compare the performance of the ADL
and DOLS estimators when the cointegration error exhibits various degrees of persistence. The
parameter that controls the persistence of the cointegration error also controls the truncation bias
of the DOLS estimator. The performance of the estimators under consideration is assessed via
Monte Carlo simulations. The results confirm the superiority of the ADL estimator over DOLS
for all possible scenarios on the persistence of the cointegration error and the Granger-causality
structure between the cointegration error and the error that drives the regressor. In fact, in most
cases, the limiting performance of DOLS, as the number of lagged values of the first difference of
the regressor in the AS model increases, seems to be that of ADL. These results strongly suggest
the employment of the ADL estimator, provided that the correct order of the model is selected.
In this respect, additional Monte Carlo simulations suggest that the commonly used information
criteria are capable of delivering the correct order of ADL at a satisfactory frequency. Another set
of simulations suggest that all the aforementioned results favouring the ADL estimator are robust
to alternative error processes, such as VAR(2) or even VMA(1) processes.
This paper is organized as follows. Section 2 introduces the DGP and derives the ADL and
AS models, as well as the conditions that render them equivalent. Section 3 reports the Monte
Carlo results. For completeness, we also report simulation evidence on the performance of some

3 SW consider parameter settings such that the truncation effect is zero (cases A and B in pp.79579). However, these
authors are not interested in comparing the DOLS estimator with the more general ADL estimator. Their concern lies on
examining the performance of the DOLS estimator against that of some other commonly used estimators.


C Royal Economic Society 2004

588

Ekaterini Panopoulou and Nikitas Pittis

other commonly used estimators, such as the semiparametric fully modified least squares (FMLS)
estimator of Phillips and Hansen (1990), the non-linear-in-parameters estimator of Phillips and
Loretan (1991), henceforth (PL), which utilizes the same dynamic structure with that of ADL,
and the system-based estimator of Johansen (1988, 1991), henceforth (JOH). Within this broader
set of alternative estimators, ADL re-emerges as the optimal estimator, closely followed by the
PL estimator. Section 4 concludes the paper by briefly summarizing our main results.

2. MODELS AND ESTIMATORS


Let z t and u t be two bivariate processes, with z t = [yt , xt ] and u t = [u 1t , u 2t ] . We further
assume that u t is a VAR(1) process, driven by e t = [e 1t , e 2t ] and the generating mechanism for
yt is given by the system

and

u 1t
u 2t


=


a11
a21

e1t
e2t

yt = xt + u 1t

(1)

xt = u 2t

(2)

a12
a22


NIID



u 1t1
u 2t1


+

  
0
11
0
12

e1t
e2t

12
22


, a21 = 0

(3)


(4)

for t = 1, 2, . . . , T .
Both eigenvalues of the matrix A = [aij ], i, j = 1, 2 are assumed to be less than 1 in modulus, in
order for yt and xt to be I(1) variables, and the cointegration error to be an I(0) process. The longrun covariance matrix  and the one-sided covariance matrix , needed to define the asymptotic
nuisance parameters, are given by equations (5) and (6), respectively,
 = (I A)1 (I A )1

(5)

 = G(I A )1 ,

(6)

where  denotes the innovations covariance matrix of the VAR and G is the unconditional
covariance matrix of u t given by,
vecG = (I A A)1 vec.

(7)

An early result by Stock (1987) shows that the OLS estimator of obtained from (1) is superconsistent, regardless of the presence of temporal and/or contemporaneous correlation between
the regression error, u 1t , and the error that drives the regressor, u 2t . On the other hand, in general,
the asymptotic distribution of the OLS estimator of falls outside the local asymptotic mixture
of normals (LAMN) family and contains nuisance parameters. The reason for the presence of
non-standard asymptotics is that in the presence of contemporaneous and temporal correlation
between the elements of u t , two types of second-order asymptotic effects are present in the
limiting distribution of the OLS estimator (see Phillips and Loretan 1991). The first is the nuisance
parameter, 12 / 22 , that describes the long-run correlation effect, due to non-diagonality of
the long run covariance matrix  = [ i j ], i, j = 1, 2. The second is the nuisance parameter

C Royal Economic Society 2004

589

Autoregressive distributed lag and dynamic OLS cointegration estimators


21 =
k=0 E(u 20 u 1k ) that describes the endogeneity effect. In the present case, where there
are no feedbacks from the cointegration error to the error that drives the regressor (a 21 = 0), both
nuisance parameters have the same source, namely the contemporaneous correlation between u 1t
and u 2t and the temporal correlation between u 2ti , i = 1, 2, . . . and u 1t .
In order to remove the second-order effects parametrically, we must employ a new regression
model whose error term is orthogonal to u 2t and u 2ti , i = 1, 2, . . . This can be done by employing
the conditional expectation of u 1t either on the current and past values of u 2t (set A) or on the
current and past values of u 2t plus the past values of u 1t (set B). As mentioned in the Introduction,
the first and second conditioning information sets result in the AS and ADL models, respectively.
Next, we show how the AS and ADL models are actually derived, starting from the latter.
2.1. The ADL estimator based on the ADL model
The full system (1) and (2) with errors specified by (3) and (4), implies the following conditional
density of yt , for the most general case with a 21 = 0:




D yt | xt , z0t1 , 1 = N 1 xt + c1 yt1 + c2 xt1 + c3 xt2 , v2 ,
(8)
where 1 ( 1 , c 1 , c 2 , c 3 , 2v ) and
1 = +

12
,
22

c1 = a11 a21
c2 = a12

12
,
22

12
(a22 + 1 a21 ) a11 ,
22



12
c3 = a22
a12 ,
22
2 = 11

2
12
.
22

(9)

(10)

(11)

(12)

(13)

This conditional model can be written as the ADL(q,r) regression, with orders (q,r) = (1, 2):
yt = 1 xt + c1 yt1 + c2 xt1 + c3 xt2 + t ,

(14)

The new error term, vt , is now orthogonal to u 2t , u t1 , u t2 , . . . , and its variance is equal to
2 = 11

2
12
.
22

(15)

In the context of the ADL(1, 2) model the cointegration parameter is equal to the long-run
multiplier of yt with respect to xt , that is
=

1 + c2 + c3
,
1 c1

(16)

This is a relationship between the parameter of interest and the parameters of the conditional
model alone, suggesting that it meets the first condition for xt to be weakly exogenous for ,

C Royal Economic Society 2004

590

Ekaterini Panopoulou and Nikitas Pittis

in the sense of Engle et al. (1983).4 This means that we can always estimate (14) by OLS and
then use (16) to obtain an efficient estimate of . However, additional computations are required
to obtain the variance of this estimate (see Banerjee et al. 1993). A more convenient approach,
proposed by Bewley (1979), transforms the model (14) in such a way that a point estimate of
and its variance can be obtained directly. After some algebraic manipulation, model (14) can be
equivalently written as
yt = 0 yt + xt + 0 xt + 1 xt1 + t ,

(22)

where
0 =

c1
c 2 + c3
c3
1
, 0 =
, 1 =
, t =
t .
(1 c1 )
(1 c1 )
(1 c1 )
(1 c1 )

Estimates of the coefficients and their standard errors can be obtained by using the instrumental
variables (IV) estimator, with the original matrix of regressors being the instrumental variables
(see Wickens and Breusch 1988). This means that the ADL estimator of is very easy to apply
since it involves only IV estimation techniques.
2.2. The DOLS estimator based on the AS model
The ADL model, derived above, may be thought of as arising from projecting u 1t on the full
information set B = (u 2t , u t1 , u t2 , . . .), that is
E(u 1t | B) =

12
e2t + a11 u 1t1 + a12 u 2t1 .
22

(23)

As already mentioned, the second-order effects can be dealt with by projecting u 1t on a subset
of this set, namely A = (u 2t , u 2t1 , u 2t2 , . . .), A B. The resulting conditional expectation
involves an infinite sum,
E(u 1t | A) =

i u 2ti ,

(24)

i=0

4 The second condition for weak exogeneity requires the parameters of the conditional model and those of the marginal
model to be variation-free (see Engle et al. 1983). In the present case, the marginal density of xt is given by


D xt | z0t1 , 2 = N (1 xt1 + 2 xt2 + 3 yt1 , 22 ) ,
(17)

where 2 ( 1 , 2 , 3 , 22 ) and
1 = 1 21 + 22 ,

(18)

2 = 22 ,

(19)

3 = 21 .

(20)

The variation-free condition between 1 and 2 is achieved in the case that 21 = 0. This is because, in general, 1 and
2 are not variation free, due to the following cross restriction between the elements of 1 and 2 ,
(1 + c2 + c3 ) 3 = (1 c1 ) (1 2 1 ) .

(21)

On the other hand, if 21 = 0, variation freeness is restored, xt becomes weakly exogenous for and OLS on (14) will
give a (super) consistent and asymptotically mixed normal estimate of .

C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

591

where i are functions of the parameters in (3) and (4).5 This conditional expectation does not
admit a parsimonious representation analogous to (23). On the other hand, it allows for direct
substitution of this expression into (1), thus yielding the AS model
yt = xt +

i xti + t ,

(25)

i=0

where t is, in general, a serially correlated error term. In particular, t follows the AR(1) model
t = 2 t1 + t ,

(26)

where 2 is the MA coefficient in the ARMA(2,1) representation of u 2t . Specifically, the univariate


representation for u 2t with a 21 = 0 is
u 2t (a11 + a22 ) u 2t1 + a11 a22 u 2t2 = 2t + 2 2t1 ,

(27)



2
22 a11 22 + 22 1 + a11
2 + a11 22 = 0.

(28)

where 2 solves

The last three relationships suggest that the degree of serial correlation in the error of the AS
model is controlled by a 11 . This is because in the case of a 11 = 0, the coefficient 2 in (26) is
zero, thus yielding a serially uncorrelated error in the AS model.6 The serial correlation of t
does not raise any serious problems in the estimation of , provided that a consistent estimator of
the long-run variance of t is employed, such as the one proposed by Newey and West (1987).
Alternatively, the application of generalized least squares (GLS) on (25) ensure valid asymptotic
inferences on .7 In practice, however, the second term on the right-hand side of (25) has to
be replaced by an approximation in which the infinite sum is truncated at i = p. The resulting
AS(p) model accommodates a truncation remainder that is likely to increase the bias of the DOLS
estimator of . This bias grows with the parameter a 11 , which mainly controls for the persistence
of the cointegration error. Increasing the truncation point reduces the DOLS bias, but increases
its variance. Moreover, estimating (25) by OLS is not feasible if p is too large compared to the
sample size. Saikkonen (1991) specifies an upper bound for the rate at which p is allowed to
increase with the sample size T, which is given by the condition p 3 /T 0. Nevertheless, this
condition cannot be used to define the optimal value of p for any given sample size.
Finally, it is easy to show that when a 11 = 0, the ADL model reduces to the AS model. In
this case, the ADL(q, r) and AS(p) models, implied by this specific DGP, are the ADL(0, 2) and
AS(1) models, respectively.

3. SIMULATION RESULTS
In this section, we attempt to quantify the cost of employing the AS(p) instead of the ADL(q,r)
model for the estimation of by means of Monte Carlo simulations. The OLS and IV estimators

5 It

is easy to show that 0 =

12
22 , 1

2 12 + a a + a a , . . . , when a
= a11 12
+ a12 , 2 = a11
11 12
12 22
21 = 0.
22
22

also Stock and Watson (1993, p. 798), for a similar discussion on this issue, for the general case with a 21 = 0.
7 Note that in the case of a linear regression which involves an I(1) strictly exogeneous regressor, the OLS is asymptotically
6 See

equivalent to the GLS estimator (see Kramer 1986; Park and Phillips 1988).

C Royal Economic Society 2004

592

Ekaterini Panopoulou and Nikitas Pittis

applied to the AS(p) and ADL(1,2) models (22), respectively, are referred to as the DOLS(p)
and ADL(1,2) estimators. The serial correlation effect on the DOLS(p) estimator is taken into
account by means of the autocorrelation consistent covariance matrix estimator of Newey and
West (1987). The bandwidth parameter is estimated non-parametrically, according to Newey and
West (1994). Alternatively, we assume an AR(1) model for t and employ the feasible generalized
least squares estimator, referred to as the DGLS(p). The truncation parameter, p, takes values in
the interval [1, 20] by steps of 1. As mentioned in the introduction, the comparison is extended
to include some other commonly used estimators, such as the FMLS, the PL(s,l) and the JOH(z)
estimators.8 The mean bias, median bias and average root mean squared error (MSE) are used to
assess the estimators. The associated t-tests are assessed by comparing the 2.5% (t 0.025 ) and the
97.5% (t 0.975 ) points in the empirical distributions of the relevant t-statistics with those from the
standard N(0,1). Moreover, for nominal sizes of 5%, the empirical sizes of the t-tests for testing
the hypothesis = 1 are computed. We generate 2000 series of length 150, starting with u 10 =
u 20 = 0, and then discard the initial 50 observations, thus generating a sample size of 100. Although
many other parameter settings were run, we only report the results for the leading case {a 12 = 0.5,
12 = 0.7, a 21 = a 22 = 0, 11 = 22 = 1, = 1 and 0 < a 11 < 1} referred to as DGP 1, because
this summarizes the main differences between the ADL(1,2) and DOLS(p)/DGLS(p) estimators.
In this case, the regressor xt is a random walk and weakly exogenous for , in the context of
the conditional model (14). The asymptotic nuisance parameters, 12 / 22 and 21 , reduce to
12
a12 + 12
= 21 =
.
22
1 a11
It is easy to show that when a 11 1, then

12
+
= 21

22

if
if

a12 + 12 > 0
a12 + 12 < 0.

(29)

(30)

This means that the magnitude of the nuisance parameters increase with the persistence of the
cointegration error, thus amplifying the truncation effect on the DOLS(p) and DGLS(p) estimators.
The key parameter a 11 takes the values 0.3, 0.6 and 0.9. A near-to-unit root case is also examined by
setting a 11 = 0.95.9 First, we focus solely on comparing ADL(1,2) with DOLS(p) and DGLS(p).
The results, concerning the mean, median bias and MSE of these estimators are reported in
Figures 1AD, 2AD and 3AD, respectively, and are summarized as follows:
(i) The mean (or median) bias for all the estimators, namely ADL(1,2), DOLS(p) and DGLS(p)
increases with the degree of persistence of the cointegration error.
(ii) DOLS(p) and DGLS(p) perform far worse than ADL(1,2) in bias and MSE for small values
of the truncation parameter, p. When p increases, the DOLS(p) and DGLS(p) bias converges

FMLS estimator is based on consistent estimation of the matrices  and , which in turn requires the selection of
a kernel and the determination of the bandwidth. We employ the quadratic spectral kernel and determine the bandwidth
by means of the Andrews (1991) data-dependent procedure. Moreover, the prewhitened version of FMLS (PW-FMLS)
which filters the error vector
ut prior to estimating  and  is also employed (see Christou and Pittis (2002) for a
discussion on the performance of the various versions of the FMLS estimator). Regarding the PL(s,l) estimator, the orders
s and l refer to the lags and leads of  xt , respectively. Finally, the order z in the JOH(z) estimator corresponds to the
lag-order of the vector autoregressive model on which this estimator is based.
9 Given the values of a , a
12
21 and a 22 in this design, a value of a 11 as large as 0.95, still satisfies the eigenvalue stability
condition for the VAR model of the errors.
8 The


C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

593

A. 11 = 0.3
0.025
0.020
0.015
0.010
0.005
0.000
-0.005
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

B. 11 = 0.6
0.300
0.250
0.200
0.150
0.100
0.050
0.000
-0.050
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

C. 11 = 0.9
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

D. 11 = 0.95
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

Note: The lag length (p) of DOLS(p)/DGLS(p) is on the horizontal axis.


Figure 1. Small sample performance of ADL(1,2) versus DOLS(p)/DGLS(p). Mean bias. DGP1 (Leading
Case).

C Royal Economic Society 2004

594

Ekaterini Panopoulou and Nikitas Pittis

A. 11 = 0.3
0.020
0.015
0.010
0.005
0.000
-0.005
1

9 10 11 12 13 14 15 16 17 18 19 20

DOLS (p)

ADL(1,2)

DGLS(p)

B. 11 = 0.6
0.250
0.200
0.150
0.100
0.050
0.000
-0.050
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

C. 11 = 0.9
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

D. 11 = 0.95
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

Note: The lag length (p) of DOLS(p)/DGLS(p) is on the horizontal axis.


Figure 2. Small sample performance of ADL(1,2) versus DOLS(p)/DGLS(p). Median bias. DGP1
(Leading Case).

C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

595

A. 11 = 0.3
0.003

0.002

0.001

0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

B. 11 = 0.6
0.120
0.100
0.080
0.060
0.040
0.020
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

C. 11 = 0.9
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

D. 11 = 0.95
1.400
1.200
1.000
0.800
0.600
0.400
0.200
0.000
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

Note:The lag length (p) of DOLS(p)/DGLS(p) is on the horizontal axis.


Figure 3. Small sample performance of ADL(1,2) versus DOLS(p)/DGLS(p). MSE. DGP1 (Leading Case).


C Royal Economic Society 2004

596

Ekaterini Panopoulou and Nikitas Pittis

to that of ADL(1,2). However, the lag length, necessary to reduce the bias of DOLS(p) and
DGLS(p) towards the bias of ADL(1,2), increases with the persistence of the cointegration
error. For example, when a 11 is equal to 0.3, 0.6 and 0.9, the number of lags necessary to
bring the bias of DOLS(p) down to the level of ADL(1,2) is 4, 7 and 20, respectively. In the
near-to-unit root case, a 11 = 0.95, the performance of DOLS(20) and DGLS(20) in bias is
still much worse than that of ADL(1,2).
(iii) For small values of p, DOLS(p) fares much better than DGLS(p). When p becomes
sufficiently large, DOLS(p) and DGLS(p) become equivalent in bias and MSE.
(iv) When p increases, the rate of decrease of the bias of DOLS(p) and DGLS(p) is much higher
than the rate at which the standard deviation of these estimators increases, for all the values
of a 11 , except for a 11 = 0.3. This explains why the MSE is a decreasing function of p for
all the values of a 11 , except for a 11 = 0.3.
(v) When we increase the sample size to 300, the overall picture regarding the relative
performance of the ADL(1,2) and DOLS(p)/DGLS(p) estimators remains the same.
Next, we compare the ADL(1,2) estimator, which so far has emerged as the best estimator,
with the rest of the estimators under scrutiny. For the DGP under study, the optimal orders s, l and
z for the PL(s,l) and JOH(z) estimators are 1, 0 and 2, respectively.10 The results are reported in
Table 1 and summarized below:
(i) As expected, the performance of the PL(1,0) estimator is comparable to that of ADL(1,2),
since both estimators utilize the same dynamic structure. The JOH(2) estimator also fares
well, especially for the most persistent cases of a 11 = 0.9 and a 11 = 0.95.
(ii) The standard FMLS and, to a lesser extent, the prewhitened FMLS estimators underperform
ADL(1,2), PL(1,0) and JOH(2) for all the values of a 11 . For example, for a 11 = 0.6 the bias
of the FMLS, the PW-FMLS and the ADL(1,2) estimators is equal to 0.066, 0.0202 and
0.0017, respectively.
(iii) Comparing DOLS(p) with the PW-FMLS estimator yields ambiguous results. For a 11 = 0.3
and a 11 = 0.6, DOLS(p) dominates PW-FMLS in terms of bias for all but very small values
of p. For a 11 = 0.95, however, the opposite is true; the PW-FMLS estimator is less biased
than DOLS(p) for all the values of p that are less or equal to 13.
We now turn to the problem of inference by examining the empirical distribution of the
estimators t-statistics as well as the corresponding empirical sizes for testing the hypothesis
= 1. Table 2 reports the 2.5% (t 0.025 ) and 97.5% (t 0.975 ) points of the empirical distribution of
the t-statistics for all the estimators under consideration and for the four values of a 11 . Again
we start the comparisons by focusing on the ADL(1,2), DOLS(p) and DGLS(p) estimators. The
results suggest that the DOLS(p) and DGLS(p) t-statistics are not, in general, well approximated
by a standard N(0,1), even when a sufficiently large value of p is employed. On the other hand,
the ADL(1,2) t-statistic is much better approximated by the standard N(0,1), especially when
the persistence of the cointegration error is not particularly high. Moreover, the value of p that
minimizes the bias of DOLS(p) and DGLS(p) does not always coincide with the value of p
that minimizes the distributional divergence of the corresponding t-statistics from the standard

10 For the prewhitened version (PW) of FMLS, a VAR(1) model is used as the filter for prewhitening residuals. That is,
the VAR-filter coincides with the true model for u t , thus creating the best case environment for the performance of the
PW-FMLS estimator.


C Royal Economic Society 2004

597

Autoregressive distributed lag and dynamic OLS cointegration estimators


Table 1. Small sample performance of alternative estimators. DGP 1 (Leading Case).
Panel A
a 11 = 0.3
Estimator

Mean bias

Median bias

MSE

Size

ADL(1,2)

0.0012

0.0006

0.0012

5.75

PL(1,0)

0.0012

0.0006

0.0012

5.75

JOH(2)
FMLS

0.0012
0.0269

0.0059
0.0179

0.0009
0.0032

0.75
14.75

PW-FMLS

0.0099

0.0066

0.0017

8.15

a 11 = 0.6

Panel B
Estimator

Mean bias

Median bias

MSE

Size

ADL(1,2)

0.0017

0.0016

0.0041

7.10

PL(1,0)
JOH(2)
FMLS
PW-FMLS

0.0018
0.0077
0.0660
0.0202

0.0017
0.0044
0.0460
0.0142

0.0041
0.0062
0.0140
0.0059

7.05
1.55
26.25
11.35

a 11 = 0.9

Panel C
Estimator

Mean bias

Median bias

MSE

Size

ADL(1,2)
PL(1,0)
JOH(2)
FMLS
PW-FMLS

0.0372
0.0397
0.0336
0.3202
0.1409

0.0399
0.0447
0.0244
0.2972
0.1188

0.0882
0.0890
0.1724
0.1981
0.1457

15.10
15.35
14.10
63.20
29.30

a 11 = 0.95

Panel D
Estimator

Mean bias

Median bias

MSE

Size

ADL(1,2)
PL(1,0)
JOH(2)
FMLS
PW-FMLS

0.1037
0.1132
0.0965
0.5170
0.3315

0.1574
0.1670
0.1082
0.5168
0.3199

0.3659
0.3580
0.4100
0.4711
0.7436

27.90
27.80
30.05
74.60
46.15

N(0,1). For example, for a moderately persistent cointegration error, that is for a 11 = 0.6, the bias
of DOLS(p) reaches the level of the ADL(1,2) for p = 7. For this value of p the 2.5% and 97.5%
points of the corresponding t-statistic distribution are 2.9 and 2.9, respectively. The situation
deteriorates for higher values of a 11 . For a 11 = 0.9, the biases of both DOLS(p) and DGLS(p)
are minimized for p = 20, a value for which the t 0.025 and t 0.975 points are equal to 4.7 and 6.2,
respectively, for DOLS(p), and 2.3 and 3.8, respectively, for DGLS(p). More dramatic effects
occur when the cointegration error is nearly non-stationary, that is when a 11 = 0.95. On the other

C Royal Economic Society 2004

598

Ekaterini Panopoulou and Nikitas Pittis

Table 2. Points of the empirical distribution (2.5% and 97.5%) of estimators t-statistics DGP 1. (Leading
Case).
t 0.025
t 0.975
Estimator/ a 11

0.3

0.6

0.9

0.95

0.3

0.6

0.9

0.95

OLS

0.655

0.765

1.074

1.131

4.199

4.975

9.983

15.363

ADL(1,2)
PL (1,0)

2.020
2.020

1.926
1.966

1.676
1.738

1.484
4.186

2.084
2.084

2.196
2.299

3.826
3.491

6.048
5.513

JOH (2)

0.952

1.211

1.908

2.483

1.571

1.658

3.555

5.798

FMLS

1.541

1.629

2.108

2.006

3.162

4.369

15.099

38.781

PW-FMLS

1.855

1.919

2.313

2.792

2.499

2.757

5.398

8.732

DOLS 1

1.936

1.484

1.387

1.459

2.947

4.242

9.717

15.052

DOLS 2
DOLS 3
DOLS 4
DOLS 5
DOLS 6
DOLS 7
DOLS 8
DOLS 9
DOLS 10
DOLS 11
DOLS 12
DOLS 13
DOLS 14
DOLS 15
DOLS 16

2.271
2.361
2.357
2.340
2.454
2.513
2.533
2.513
2.514
2.497
2.527
2.527
2.574
2.583
2.563

1.882
2.249
2.500
2.675
2.784
2.872
2.934
2.895
2.871
2.933
2.961
3.005
3.042
3.032
3.076

1.613
1.738
2.040
2.234
2.449
2.759
2.867
3.012
3.068
3.289
3.465
3.662
3.745
3.959
4.124

1.617
1.764
2.098
2.226
2.420
2.585
2.723
2.804
2.992
3.016
3.117
3.356
3.565
3.624
3.793

2.577
2.436
2.420
2.389
2.429
2.375
2.430
2.401
2.318
2.363
2.384
2.423
2.461
2.390
2.341

3.815
3.520
3.322
3.085
2.999
2.886
2.904
2.927
2.851
2.806
2.799
2.857
2.894
2.906
2.824

9.712
9.575
9.459
9.257
9.015
8.699
8.524
8.133
7.762
7.599
7.389
7.184
6.936
6.699
6.547

14.800
14.815
14.626
14.604
14.441
14.601
14.093
13.885
13.723
13.462
13.037
12.671
12.367
12.207
12.148

DOLS 17
DOLS 18
DOLS 19
DOLS 20

2.507
2.533
2.454
2.469

2.991
3.047
2.985
2.962

4.315
4.535
4.646
4.724

3.980
4.147
4.316
4.422

2.348
2.388
2.467
2.493

2.873
2.855
2.876
2.908

6.396
6.304
6.253
6.236

11.788
11.601
11.401
11.117

DGLS 1
DGLS 2
DGLS 3
DGLS 4
DGLS 5
DGLS 6
DGLS 7
DGLS 8
DGLS 9
DGLS 10
DGLS 11
DGLS 12

1.467
1.899
2.049
2.083
2.073
2.098
2.115
2.166
2.143
2.145
2.161
2.156

0.116
0.984
1.424
1.758
1.930
2.053
2.118
2.184
2.171
2.193
2.237
2.258

5.141
3.659
2.641
1.662
0.885
0.288
0.255
0.567
0.856
1.023
1.241
1.366

5.974
4.602
3.688
2.868
2.316
1.777
1.159
0.715
0.494
0.077
0.206
0.394

2.585
2.230
2.086
2.088
2.073
2.071
2.051
2.031
2.045
1.997
2.049
2.074

5.345
3.382
2.777
2.548
2.442
2.384
2.293
2.256
2.173
2.152
2.196
2.228

10.632
8.989
7.605
6.740
6.131
5.722
5.395
5.155
4.931
4.816
4.626
4.537

12.159
10.715
9.578
8.730
8.208
7.902
7.605
7.415
7.172
7.181
6.966
6.933


C Royal Economic Society 2004

599

Autoregressive distributed lag and dynamic OLS cointegration estimators


Table 2. Continued
t 0.025
Estimator/ a 11

t 0.975

0.3

0.6

0.9

0.95

0.3

0.6

0.9

0.95

DGLS 13

2.140

2.266

1.481

0.611

2.045

2.250

4.419

6.845

DGLS 14
DGLS 15

2.158
2.171

2.307
2.279

1.656
1.850

0.839
1.003

2.082
2.071

2.250
2.219

4.344
4.284

6.812
6.671

DGLS 16

2.166

2.286

1.909

1.201

2.052

2.207

4.124

6.416

DGLS 17

2.128

2.272

2.050

1.269

2.036

2.224

4.038

6.322

DGLS 18
DGLS 19
DGLS 20

2.186
2.131
2.121

2.324
2.251
2.266

2.167
2.245
2.324

1.287
1.481
1.690

2.092
2.070
2.070

2.211
2.176
2.194

3.969
3.910
3.844

6.347
6.290
6.121

hand, for a 11 = 0.9, the t 0.025 and t 0.975 points for ADL(1,2) are 1.7 and 3.8, respectively, thus
ensuring much more reliable inferences on . These distributional characteristics of the t-statistics
are reflected on the empirical sizes of the t-tests for testing the hypothesis = 1. The results,
reported in Figures 4AD, reveal large size distortions for both DOLS(p) and DGLS(p) in the
following two cases. First, when a 11 = 0.6 and the value of p is relatively small. Second, when
the cointegration error is highly persistent, that is when a 11 = 0.9 and even worse when a 11 =
0.95. In the second case, the size distortions are present regardless of the value of p, and yield
totally unreliable inferences. For example, for a 11 = 0.9 the empirical size of DOLS(p) ranges
from 72% for p = 1 to 43% for p = 20. At the same time, the empirical size of ADL(1,2) is at
the reasonable level of 15%. Increasing the sample size to 300 yields qualitatively similar results.
For example, for a 11 = 0.9, the empirical size of DOLS(p) is 67% for p = 1 and reduces to 36%
for p = 20, while the size of the ADL(1,2) is at the level of 7%.
We now examine the issue of statistical inference in the context of the PL(1,0), the JOH(2),
the FMLS and the PW-FMLS estimators. The t 0.025 and t 0.975 values for these estimators are also
reported in Table 2, whereas the corresponding empirical sizes are reported in the last column of
Table 1. The FMLS bias, reported above, is accompanied by size distortions that become more
severe as the value of a 11 increases. For example, for a 11 = 0.9, the empirical size of FMLS
and PW-FMLS is 63% and 29%, respectively, whereas the corresponding size for the ADL(1,2)
is as low as 15%. These distortions are due to the large divergence of the FMLS t-statistic from
the standard normal, occurring when the persistence of cointegration error is high. For example,
for a 11 = 0.95, the value of t 0.975 is 38.78 and 8.73 for FMLS and PW-FMLS, respectively (see
Christou and Pittis (2002) for more discussion on this point). The empirical size of PL(1,0) is
almost identical to that of ADL(1,2) for all the values of a 11 . This, however, does not seem to be
the case for the JOH-based t-test, which appears to be under-sized for low and moderate degrees of
persistence of the cointegration error. Specifically, for a 11 = 0.3 and a 11 = 0.6 the value of t 0.025
is 0.952 and 1.211, respectively, resulting in empirical sizes that are substantially smaller than
the nominal ones.
As far as alternative parametrizations are concerned, we run the following simulations: (i) The
second-order effects arise solely from the contemporaneous correlation between the innovations
of the error, that is a 12 = 0, 12 = 0.7, a 21 = a 22 = 0, 11 = 22 = 1, = 1 and 0 < a 11 < 1. (ii) The
error that drives the regressor Granger causes the cointegration error, but the contemporaneous
correlation between the two errors is zero, that is a 12 = 0.5, 12 = 0, a 21 = a 22 = 0,

C Royal Economic Society 2004

600

Ekaterini Panopoulou and Nikitas Pittis

A. 11 = 0.3
16
14
12
10
8
6
4
2
1

10 11 12 13 14 15 16 17 18 19 20

DOLS (p)

ADL(1,2)

DGLS(p)

B. 11 = 0.6
71
61
51
41
31
21
11
1
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

C. 11 = 0.9
122
102
82
62
42
22
2
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

D. 11 = 0.95
122
102
82
62
42
22
2
1

DOLS (p)

10 11 12 13 14 15 16 17 18 19 20
ADL(1,2)

DGLS(p)

Note: The lag length (p) of DOLS(p)/DGLS(p) is on the horizontal axis.


Figure 4. Small sample performance of ADL(1,2) versus DOLS(p)/DGLS(p). Empirical size. DGP1
(Leading Case).

C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

601

11 = 22 = 1, = 1 and 0 < a 11 < 1. In both of these cases, the results are qualitatively similar
to those of the leading case. DOLS(p) and DGLS(p) are generally beaten by ADL(1,2) in bias and
MSE for all values of p under consideration. When p reaches a sufficiently large value, say p , the
performance of DOLS(p ) and DGLS(p ) approaches that of ADL(1,2). As in the leading case,
p increases with the degree of persistence of the cointegration error. The problems of statistical
inferences on in both cases are very similar to those reported for the leading case. Finally, we
briefly discuss the case, where the key parameter a 11 is set equal to zero, that is a 12 = 0.5, 12 =
0.7, a 21 = a 22 = 0, 11 = 22 = 1, = 1 and a 11 = 0. This is a case where the AS model
utilizes an exact rather than an approximate projection of u 1t on the current and past values of
u 2t , which in turn implies that the DOLS(1) estimator utilizes the correct model, whereas the
ADL(1,2) estimator is based on a slightly overspecified model.11 The simulation results seem
to confirm the theoretical predictions. The bias and standard deviation of DOLS(1) are slightly
smaller than those of ADL(1,2). Of course, the addition of more lags of xt in the AS model
increases the variability (and the MSE) of DOLS(p), but this is something that occurs in the case
of an overparametrized ADL(q,r) model as well.
3.1. Information criteria
The analysis, so far, seems to favour the ADL(q,r) over the DOLS(p)/DGLS(p) estimation method
for conducting inferences on . In fact, this estimator dominates, in some or all the aspects of
statistical inference, not only the DOLS(p)/DGLS(p) estimator but also the rest of the estimators
presently under study. Throughout the analysis, we assumed that the ADL(q,r) estimator utilizes
the correct dynamic model, implied by the DGP (1)(4), that is q = 1, r = 2. In such a case,
the performance of the ADL(1,2) estimator may be thought of as the limiting performance of
the DOLS(p) or DGLS(p) estimators. Does this clearly suggest that in empirical applications,
researchers should always employ ADL(q,r) for estimating ? The answer seems to be in the
affirmative, conditional, however, on the ability of researchers to determine the correct dynamic
model for each particular case, that is to select the correct values for q and r. A more realistic
experiment for measuring the benefits from employing ADL(q,r) over DOLS(p)/DGLS(p) should
incorporate the issue of selecting the lag orders (q,r) and p in the corresponding estimators. To
address this issue, we design the following experiment, in the context of the DGP1. We consider
the family of ADL(q,r) estimators that arise from allowing q and r to take integer values in the
interval [0,4], thus obtaining 14 ADL(q,r) estimators. In this class, and for the specific DGP under
study, ADL(0,0), ADL(0,1), ADL(1,0) and ADL(1,1) are underspecified, ADL(1,2) is correctly
specified, and the rest are overspecified. We also consider 21 DOLS(p) estimators and another
21 DGLS(p) estimators, by allowing p to take integer values in the interval [0,20]. As far as the
PL(s,l) and JOH(z) estimators are concerned, we allow s and z to take integer values in the interval
[1,4].12 Since no leads of the regressor are required for this particular DGP, we set l equal to zero.
To select the orders (q,r), p, s and z, we use the three most commonly used information criteria
for model selection, namely the Akaike (1974), the Schwarz (1978) and the Hannan and Quinn
(1979) criteria, denoted by AIC, SIC and HQ, respectively. In each replication, we select the orders
11 In this case, is serially uncorrelated, which in turn implies that neither non-parametric nor GLS-type corrections
t
are necessary.
12 Obviously, the problem of selecting the correct lag order is not relevant for FMLS or PW-FMLS, due to their nonparametric nature. A comparable issue concerns the selection of the optimal bandwidth by means of optimality criteria,
such as the ones suggested by Andrews (1991) or Newey and West (1994). We do not attempt to deal with this issue in
detail (see Christou and Pittis 2002).

C Royal Economic Society 2004

602

Ekaterini Panopoulou and Nikitas Pittis

of ADL(q,r), DOLS(p)/DGLS(p), PL(s,0) and JOH(z) by each of the three criteria and calculate
the statistics, defined in the previous section. The average values of the statistics concerning the
estimation precision are reported in Table 3A, whereas those on hypothesis testing are reported
in Table 3B. We also report the frequencies at which each criterion selects the orders (q, r)
and p, in Figures 5AD, 6AD and 7AD for the ADL(q,r), DOLS(p) and DGLS(p) estimators,
respectively. For brevity, we do not report the selection frequencies of the orders s and z in PL(s,0)
and JOH(z), respectively, but we briefly discuss them in the text. We consider sample sizes of 100
and 300, but report the results only from the former.
First, we confine our discussion on the comparison between the ADL(q,r) and the DOLS(p)/
DGLS(p) estimators. The main results may be summarized as follows:
(i) Irrespective of the value of a 11 , the SIC and HQ criteria select the correct specification of
the dynamic model, i.e. the ADL(1,2), in 85% of the cases, whereas the respective figure
for the AIC is 60%. Increasing the sample size to 300 increases the frequency at which the
correct ADL model is selected to 65% for AIC and to 95% for SIC and HQ.
(ii) In the context of the DOLS(p)/DGLS(p) estimators, SIC and HQ fail to select a sufficiently
large p, especially for large values of a 11 . In the context of DOLS(p), the best performing
criterion is by far AIC, which tends to point towards large values of p as the persistence
of the cointegration error increases. The performance of AIC, however, is greatly reduced
in the context of the DGLS(p) estimator, where AIC is still the best criterion but only by a
slight margin over SIC and HQ.
(iii) The behaviour of the information criteria has the following consequences: the mean, median
bias and MSE are much lower in the context of ADL(q,r) than DOLS(p)/DGLS(p), especially
as a 11 tends to unity. For example, when a 11 = 0.95, the average bias of the AIC-based
ADL(q,r), is 4 and 13 times lower than the bias of the AIC-based DOLS(p) and DGLS(p),
respectively. The picture is similar as regard hypothesis testing. The distributions of the
ADL(q,r), DOLS(p) and DGLS(p) t-statistics shift to the right as the persistence of the
cointegration error increases. This is due to the fact that the nuisance parameters 12 / 22
and 21 tend to + as a 11 approaches unity. This shift is profound in the case of the partly
corrected DOLS(p) and DGLS(p) estimators, thus yielding empirical sizes of 67% and
99%, respectively, for a 11 = 0.95. For the same degree of persistence, the empirical size of
the ADL(q,r) procedure is around 23%, regardless of the information criterion employed.
(iv) Turning to the relative performance of the DOLS(p) versus DGLS(p) estimators, the
superiority of the DOLS(p) estimator is evident in both estimation and hypothesis testing.
This is due to the fact that all the criteria fail to select a sufficiently large p for the DGLS(p)
estimator. When a 11 = 0.9, the mean biases of the AIC-based DOLS(p) and DGLS(p) are
0.08 and 0.44, respectively, whereas for a 11 = 0.95 the corresponding biases climb to 0.27
and 0.85, respectively.
(v) There is a simple reason why AIC is the best performing criterion in the context of DOLS(p),
whereas it does worse than SIC and HQ in the context of the ADL(q,r) estimator. AIC is an
asymptotically efficient criterion, that is it selects the model that best fits the data without
assuming that the correct model belongs in the set of candidate models. This is obviously the
case for the class of the AS(p) models under consideration, since the correct model assumes
p = . On the other hand, when the class of the ADL(q,r) models is considered, the correct
model, ADL(1,2), belongs to the set of candidate models. In such a case, consistent selection
criteria, such as SIC and HQ, work well in selecting the correct model for reasonable sample
sizes.

C Royal Economic Society 2004

603

Autoregressive distributed lag and dynamic OLS cointegration estimators

Table 3. Small sample performance of alternative estimators model selected by information criteria. VAR(1)
errors (a 21 = 0).
Panel A. Estimation
Mean bias
Criterion/a 11

0.3

0.6

0.9

Median bias
0.95

0.3

0.6

0.9

MSE
0.95

0.3

0.6

0.9

0.95

ADL
AIC

0.001

0.004

0.042

0.062 0.001 0.003 0.046

0.173 0.001 0.004 0.101 0.211

SIC
HQ

0.002
0.001

0.005
0.004

0.048
0.045

0.067 0.001 0.003 0.048


0.062 0.001 0.003 0.046

0.172 0.001 0.004 0.094 0.199


0.174 0.001 0.004 0.098 0.187

AIC
SIC
HQ

0.002
0.006
0.004

0.004
0.015
0.009

0.081
0.113
0.097

0.257 0.002 0.005 0.077 0.261


0.274 0.001 0.005 0.080 0.276
0.269 0.001 0.005 0.082 0.269

AIC

0.003

0.013

0.440

0.270 0.001 0.003 0.065


0.289 0.004 0.011 0.097
0.280 0.003 0.007 0.080
DGLS
0.852 0.001 0.009 0.389

SIC
HQ

0.008
0.006

0.038
0.026

0.915
0.726

0.970
0.816

1.115 0.002 0.008 0.910 1.237


1.083 0.001 0.006 0.672 1.141

AIC
SIC
HQ

0.001
0.001
0.001

0.003
0.003
0.003

0.038
0.037
0.038

0.039
0.041
0.041

0.147 0.001 0.005 0.077 0.193


0.153 0.001 0.005 0.077 0.192
0.151 0.001 0.005 0.077 0.193

AIC
SIC
HQ

0.010
0.010
0.010

0.009
0.009
0.009

0.040
0.040
0.040

1.098 0.006 0.026


1.039 0.004 0.018
PL
0.131 0.001 0.002
0.132 0.001 0.002
0.131 0.001 0.003
JOH
0.125 0.006 0.005
0.125 0.006 0.005
0.125 0.006 0.005

0.028
0.028
0.028

0.109 0.001 0.023 0.058 0.138


0.109 0.001 0.023 0.058 0.138
0.109 0.001 0.023 0.058 0.138

DOLS

0.967 0.002 0.005 0.369 0.882

Panel B. Hypothesis testing


t 0.025
Criterion/a 11

0.3

0.6

t 0.975
0.9

0.95

0.3

AIC
SIC
HQ

2.062 2.029 1.754 1.523 2.316


2.049 1.970 1.745 1.505 2.185
2.062 1.970 1.748 1.514 2.214

AIC
SIC
HQ

2.608 3.130 4.803 4.525 2.618


2.393 2.894 4.613 4.345 2.753
2.445 2.977 4.779 4.443 2.683

AIC
SIC
HQ

2.155 2.303 1.911 0.290 2.229


1.977 1.866 1.803 4.658 2.335
2.036 2.072 0.477 2.406 2.298


C Royal Economic Society 2004

0.6

0.9

ADL
2.505 4.022
2.312 4.017
2.331 4.022
DOLS
3.252 8.388
3.470 8.773
3.371 8.606
DGLS
2.577 9.289
3.062 10.498
2.739 10.123

Size
0.95

0.3

0.6

0.9

0.95

6.150
6.106
6.087

7.05
6.65
6.95

8.00 14.35 23.40


7.30 14.05 23.10
7.45 14.05 23.25

14.059 13.90 22.15 49.85 66.15


14.412 11.85 21.10 53.20 66.70
14.382 12.15 21.05 52.60 67.00
11.510 7.90 11.35 62.70 88.50
12.072 7.85 14.35 96.95 99.75
11.927 7.05 12.20 86.05 98.05

604

Ekaterini Panopoulou and Nikitas Pittis


Table 3. Continued
Panel B. Hypothesis testing
t 0.025

Criterion/a 11

0.3

0.6

t 0.975
0.9

0.95

0.3

0.6

0.9

Size
0.95

0.3

0.6

0.9

0.95

PL
AIC

2.133 2.053 1.922 2.256 2.299 2.538 3.713 5.109 7.60 8.55 14.15 22.65

SIC
HQ

2.143 2.077 1.945 2.393 2.287 2.538 3.698 5.208 7.70 8.55 14.15 22.90
2.130 2.036 1.916 2.337 2.273 2.504 3.713 5.208 7.65 8.40 14.00 22.80

AIC
SIC
HQ

0.956 1.196 1.563 1.657 1.553 1.652 3.276 5.233 1.05 2.10 10.35 21.80
0.956 1.196 1.563 1.657 1.553 1.652 3.276 5.233 1.05 2.10 10.35 21.80
0.956 1.196 1.563 1.657 1.553 1.652 3.276 5.233 1.05 2.10 10.35 21.80

JOH

Now we examine the performance of the PL(s,0), the JOH(z) and the PW-FMLS estimators.
The main results are summarized below:
(i) The frequencies at which the criteria select the correct PL(1,0) model are almost equal to
the corresponding ones for the ADL(1,2) model. As a result, the performance of PL(s,0)
is comparable to that of ADL(q,r) as far as estimation precision and reliability of statistical
inferences are concerned. Similar results are obtained for the JOH(z) estimator. Therefore, the
ADL(q,r), PL(s,0) and JOH(z) estimators may be thought of as forming a class of parametric
estimators, say Class A, with similar characteristics.
(ii) The PW-FMLS estimator, with the bandwidth parameter selected by the Andrews (1991)
data-dependent procedure, and the DOLS(p) estimator, with p selected by any of the three
criteria under study seem to form a second class of estimators, say Class B. Any estimator
of Class A seems to dominate any estimator of Class B in any aspect of statistical inference.
Finally, the standard FMLS and the DGLS(p) estimators seem to form a third class, say Class
C, consisting of the worst-performing estimators.
3.2. Further extensions
So far, regarding statistical inferences on , the Monte Carlo evidence strongly suggests the
use of an estimator from Class A (in particular, ADL(q,r)) over estimators from Class B or,
even more so, over estimators from Class C. Moreover, attention has focused on the case that the
cointegration error and the first difference of the regressor are generated by a VAR(1) process, and
the cointegration error does not Granger cause the error that drives the regressor. The ADL(1,2)
model is the correct model implied by this specific DGP and its order is successfully selected by
the three most commonly used information criteria, especially the consistent ones, i.e. the SIC and
HQ criteria. Next, we investigate the extent to which the relative performance of the estimators
remains unchanged, when alternative specifications of the error dynamics are considered. In
particular, we extend our simulations to include the following cases:
(i) the cointegration error Granger causes the error that drives the regressor, that is a 21 = 0.
(ii) the cointegration error and the error that drives the regressor follow a VAR(2) process.

C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

605

Figure 5. Selection frequencies of information criteria. ADL(q,r) estimator, VAR(1) errors, 21 = 0.



C Royal Economic Society 2004

606

Ekaterini Panopoulou and Nikitas Pittis

Figure 6. Selection frequencies of information criteria. DOLS(p) estimator, VAR(1) errors, 21 = 0.



C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

607

Figure 7. Selection frequencies of information criteria. DGLS(p) estimator, VAR(1) errors, 21 = 0.



C Royal Economic Society 2004

608

Ekaterini Panopoulou and Nikitas Pittis

(iii) the cointegration error and the error that drives the regressor follow a first-order vector
moving average,VMA(1), process.
3.2.1. The cointegration error Granger causes the error that drives the regressor. In this set of
simulations, the error vector, u t , is still a VAR(1) process, but the transition matrix A does not
contain any zero elements, except for a 22 . In this case, as opposed to the ones analysed in the
previous section, there are feedbacks from the cointegration error to the error that drives the
regressor. This, in turn, implies that further augmentation of the ADL(q,r) and AS(p) models by
g leads of xt and t leads of xt , respectively, is required for asymptotic optimality. The resulting
estimators, referred to as ADL(q,r,g) and DOLS(p,t)/DGLS(p,t), aim at removing the secondorder asymptotic bias effects that arise from contemporaneous and temporal correlation between
the elements of u t (see Phillips and Loretan 1991; Saikonnen 1991; Stock and Watson 1993).13
In this respect, we consider the family of ADL(q,r,g) estimators that arise from allowing q, r
and g to take integer values in the interval [0,4], thus obtaining 14 ADL(q,r,g) estimators. We also
consider nine DOLS(p,t) estimators and another nine DGLS(p,t) estimators by allowing p and t to
take integer values in the interval [0,4]. Finally, we consider 14 PL(s,l) estimators by allowing s
and l to take integer values in the intervals [1,4] and [0,4], respectively. To select the orders of these
estimators, we use the three criteria mentioned above. The design of this set of simulations is the
same with that described in the previous section. The DGP under consideration is the following:
{a 12 = 0.5, 12 = 0.7, a 21 = 0.5, a 22 = 0, 11 = 22 = 1, = 1 and 0 < a 11 < 1}. The parameter
a 11 is set equal to 0.3 and 0.7, in order for the eigenvalue stability condition to be satisfied. The
average values of the statistics concerning the estimation precision and hypothesis testing are
tabulated in Tables 4A and B, respectively. We also report the frequencies at which each criterion
selects the orders of the ADL(q,r,g), DOLS(p,t) and DGLS(p,t) estimators in Figures 8A, 8B,
9A, 9B and 10A, 10B, respectively.
First compare the ADL(q,r,g), DOLS(p,t) and DGLS(p,t) estimators. This set of simulations
provides further evidence on the dominance of the ADL class of estimators over the DOLS/DGLS
one, in terms of both estimation precision and reliability of statistical inference. However, the
difference in the performance between ADL(q,r,g) and DOLS(p,t)/DGLS(p,t) is less prominent
in this case than it was in the leading case, DGP1. This is due to the fact that when a 21 = 0, the
long-run correlation parameter 12 / 22 converges to a well-defined limit as a 11 1. On the
other hand, when a 21 = 0, the nuisance parameter 12 / 22 tends to infinity, as a 11 1. As a
consequence of the limiting behaviour of 12 / 22 , the average biases of the ADL(q,r,g) estimators
for a 11 = 0.7 lie between 0.0008 and 0.0015 depending on the selection criterion, whereas the
corresponding biases of the DOLS(p,t) and DGLS(p,t) estimators lie between 0.0009 and 0.0019,
and 0.0014 and 0.0032, respectively. Similarly, statistical inferences on are much more reliable
in the context of the ADL(q,r,g) estimator, as suggested by the ADL(q,r,g) empirical sizes, which
hardly exceed 10%, irrespective of the selection criterion used and the value of a 11 . On the other
hand, the empirical sizes for the DOLS(p,t) and DGLS(p,t) t-tests range from 18.6 to 21.9% and
from 11.7 to 19.5%, respectively.
As far as the rest of the estimators are concerned, the discussion is confined solely to hypothesis
testing, for reasons of space. The PL(s,l) t-statistic is distributed approximately as N (0, 1), thus
resulting in very reliable statistical inferences on . On the other hand, the JOH(z)-based t-test
is under-sized, especially for small values of a 11 , despite the fact that the information criteria select

13 In

a similar vein, the order, l, in the PL(s,l) estimator is assumed to be greater than zero.

C Royal Economic Society 2004

609

Autoregressive distributed lag and dynamic OLS cointegration estimators

Table 4. Small sample performance of alternative estimators model selected by information criteria. VAR(1)
errors (a 21 = 0).
Panel A. Estimation
Mean bias
Criterion/a 11

0.3

Median bias
0.7

0.3

0.7

MSE
0.3

0.7

ADL
AIC

0.0003

0.0008

0.0003

0.0007

0.0001

0.0000

SIC
HQ

0.0001
0.0004

0.0015
0.0010

0.0000
0.0003

0.0012
0.0008

0.0001
0.0001

0.0000
0.0000

AIC
SIC
HQ

0.0000
0.0003
0.0001

0.0009
0.0019
0.0013

0.0008
0.0016
0.0011

0.0001
0.0001
0.0001

0.0000
0.0000
0.0000

AIC

0.0001

0.0014

0.0001
0.0002
0.0000
DGLS
0.0001

0.0011

0.0001

0.0000

SIC
HQ

0.0003
0.0001

0.0032
0.0021

0.0002
0.0000

0.0024
0.0017

0.0001
0.0001

0.0000
0.0000

AIC
SIC
HQ

0.0003
0.0003
0.0003

0.0009
0.0009
0.0009

0.0001
0.0001
0.0001

0.0000
0.0000
0.0000

AIC
SIC
HQ

0.0033
0.0033
0.0033

0.0008
0.0008
0.0008

0.0001
0.0001
0.0001

0.0000
0.0000
0.0000

0.0002

0.0010

0.0001

0.0002

DOLS

FMLS
PW-FMLS

0.0049

0.0027
0.0014

0.0015

PL
0.0002
0.0006
0.0002
0.0006
0.0002
0.0006
JOH
0.0020
0.0003
0.0020
0.0003
0.0020
0.0003
FMLS
0.0031
0.0021
0.0001

0.0010

Panel B. Hypothesis testing


t 0.025
Criterion/a 11

0.3

t 0.975
0.7

AIC
SIC
HQ

2.040
2.016
2.009

1.873
1.706
1.840

AIC
SIC
HQ

2.249
2.177
2.229

2.387
2.165
2.307


C Royal Economic Society 2004

0.3
ADL
2.305
2.154
2.246
DOLS
2.300
2.232
2.278

Size
0.7

0.3

0.7

2.503
2.735
2.493

6.90
6.15
6.45

9.55
10.05
9.15

3.286
3.736
3.467

8.90
8.15
8.65

18.60
21.90
19.75

610

Ekaterini Panopoulou and Nikitas Pittis


Table 4. Continued
Panel B. Hypothesis testing
t 0.025

Criterion/a 11

t 0.975

0.3

0.7

0.3

Size
0.7

0.3

0.7

DGLS
AIC

2.118

1.917

2.167

2.763

6.75

11.65

SIC
HQ

2.093
2.099

1.615
1.799

2.073
2.143

3.572
3.163

6.25
6.60

19.45
14.45

AIC
SIC
HQ

2.006
2.006
2.006

2.274
2.274
2.274

2.058
2.058
2.058

2.282
2.282
2.282

5.65
5.65
5.65

8.35
8.35
8.35

AIC

1.001

1.454

JOH
1.663

1.809

1.05

2.55

SIC
HQ

1.001
1.001

1.454
1.454

1.809
1.809

1.05
1.05

2.55
2.55

FMLS
PW-FMLS

1.918
1.873

7.411
2.998

6.766
2.218

10.9
6.00

49.55
10.60

PL

1.663
1.663
FMLS
2.739
2.188

the correct lag order, z = 2, at a frequency that ranges from 90 to 98%. Interestingly, the PW-FMLS
procedure allows for statistical inferences of reasonable accuracy. In particular, the empirical size
of the associated t-test is 6% and 10.6% for a 11 = 0.3 and a 11 = 0.7, respectively. This means
that the performance of the PW-FMLS estimator is comparable to that of DGLS(p) and, for small
values of a 11 , even to that of ADL(q,r,g). However, the relatively good properties of PW-FMLS
are not shared by the standard FMLS, which remains the worst-performing estimator, producing
empirical sizes of 10.9% and 49.6% for a 11 = 0.3 and a 11 = 0.7, respectively.
3.2.2. VAR(2) errors. In this set of simulations we investigate the extent to which the ADL(q,r)
estimator outperforms the DOLS/DGLS(p) one in the case that the errors are generated by a
bivariate VAR(2) process. First, we obtain the ADL(q,r) model implied by this DGP, and second,
we derive the conditions under which the ADL(q,r) model reduces to the AS(p) model. Specifically,
we assume that the errors are generated by the following process:

 

 

  
u 1t
a11 a12
u 1t1
b11 b12
u 1t2
e
=
+
+ 1t
u 2t
a21 a22
u 2t1
b21 b22
u 2t2
e2t


e1t
e2t

  
0
11
NIID
0
12

12
22


(31)

with a 21 = 0, b 21 = 0, that is there is no Granger-causality running from the cointegration error


to the error that drives the regressor. The VAR(2) structure of u t implies that the conditional
expectation of u 1t on the full information set can be summarized as follows:
12
E(u 1t |u 2t , ut1 , ut2 , . . .) =
e2t + a11 u 1t1 + a12 u 2t1 + b11 u 1t2 + b12 u 2t2 . (32)
22

C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

611

Figure 8. Selection frequencies of information criteria. ADL(q,r,g) estimator, VAR(1) errors, 21 = 0.

This conditional expectation gives rise to the following ADL(2,3) model


yt = xt + d1 yt1 + d2 yt2 + d3 xt1 + d4 xt2 + d5 xt3 + t ,

(33)

where
d1 = a11 a21

12
,
22

(34)

d2 = b11 b21

12
,
22

(35)

d3 = a12 a11 + (a21 a22 1)

12
,
22

d4 = b12 b11 a12 + (b21 + a22 b22 )


d5 = b22

C Royal Economic Society 2004

12
b12 .
22

12
,
22

(36)

(37)

(38)

612

Ekaterini Panopoulou and Nikitas Pittis

Figure 9. Selection frequencies of information criteria. DOLS(p,t) estimator, VAR(1) errors, 21 = 0.

It is easy to show that the ADL(2,3) model reduces to the AS(2) model when a 11 = b 11 = 0.
Similar to the previous experiments, we consider the family of ADL(q,r) estimators that arise
from allowing q and r to take integer values in the interval [0,4], thus obtaining 14 ADL(q,r)
estimators. Regarding the AS(p) model, we allow p to take integer values in the interval [0,20],
thus obtaining 21 DOLS(p)/ DGLS(p) estimators.14 To select the q, r and p orders, we employ
the information criteria employed in our previous simulations. In each replication, we determine
the orders of ADL(q,r) and DOLS(p)/DGLS(p) by each of the three criteria, and then we use the
resulting estimators to calculate the statistics, defined in the previous section. The DGP under
consideration is the following: {a 12 = b 12 = 0.5, 12 = 0.7, a 21 = a 22 = b 21 = b 22 = 0, 11 =
22 = 1, = 1 and 0 a 11 < 1, 0 b 11 < 1}. The parameter a 11 is set equal to 0, 0.3 and
0.6, while b 11 takes the values of 0 and 0.3, in order for the eigenvalue stability condition to be
satisfied.15 To conserve space, we do not report the results from these experiments, but we briefly
discuss them below.

the DGLS(p) estimator, we assume an AR(2) model for t .


the values of a 12, a 21, a 22 and b 12, b 21, b 22 in this design, the eigenvalue stability condition reduces to a 11 +
b 11 1.
14 For

15 Given


C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

613

Figure 10. Selection frequencies of information criteria. DGLS(p,t) estimator, VAR(1) errors, 21 = 0.

All in all, the simulation results continue to provide strong evidence in favour of the ADL(q,r)
models, similar to our leading case, DGP1. The SIC and HQ criteria select the correct order
of the ADL(q,r) estimator in 90% of the cases, whereas the performance of AIC falls to the
level of 60%. On the other hand, the consistent criteria fail to select a sufficiently large p for
the DOLS(p)/DGLS(p) estimators. The only exception seems to be AIC, which in the case of a
highly persistent cointegration error, that is when a 11 + b 11 = 0.9, selects p = 20 in 40% of the
cases. As a result, the ADL(q,r) estimator has substantially lower mean and median biases than
the DOLS(p) or DGLS(p) estimators. For example, when a 11 = 0.6 and b 11 = 0.3, the bias of
the ADL(q,r) estimator ranges from 0.049 when HQ is used to 0.055 when AIC is used. For the
same values of a 11 and b 11 , the bias of DOLS(p) ranges from 0.145 when AIC is used to 0.175
when SIC is used. DGLS(p) is by far the worst estimator in both bias and MSE. For example, for
a 11 + b 11 = 0.9 the bias and MSE of the AIC-based DGLS(p) are 0.588 and 0.517, respectively.
The bias and MSE of DGLS(p) increase dramatically when SIC is used, reaching the values of
1.033 and 1.121, respectively. The differences in the degree of biases and MSEs between the
ADL(q,r) and the DOLS(p)/DGLS(p) estimators are also reflected in the size performance of the
corresponding test procedures. In particular, the DOLS(p)/DGLS(p) t-tests suffer from severe size
distortions, especially in the cases of a highly persistent cointegration error. For example, for a 11
+ b 11 = 0.9, the empirical size of DOLS(p) and DGLS(p), when AIC is used, is 58.8 and 75.5%,

C Royal Economic Society 2004

614

Ekaterini Panopoulou and Nikitas Pittis

respectively. On the other hand, for the same degree of persistence and by means of the same
information criterion, the empirical size of the ADL(q,r)-based t-test is only 15.6%.16
Turning to the rest of the estimators, the PL(s,l) t-statistic fares slightly better than the
ADL(q,r) one, producing an empirical size of approximately 14% in the case of a highly persistent
cointegration error. The empirical distribution of the JOH(z)-based t-statistic is skewed to the
right producing empirical sizes considerably greater than those associated with the ADL(q,r) or
PL(s,l) procedures. Nevertheless, the size distortions of JOH(z) are significantly smaller than
those produced by DOLS(p) or DGLS(p). Finally, the behaviour of the semiparametric estimators
imitates that of the leading case, DGP1. In particular, the PW-FMLS, and especially the FMLS
procedures fail to account for the second-order asymptotic bias effects, thus resulting in t-statistics
whose empirical distributions are located away from zero. The more persistent the cointegration
error is, the more pronounced these effects appear to be. For example, when a 11 + b 11 = 0.9,
the mean value of the FMLS and PW-FMLS t-statistic is 4.6 and 2.3, respectively, producing
empirical sizes of 69.7 and 50.6%, respectively.
3.2.3. VMA(1) errors. In this set of simulations, we use a first-order bivariate moving average,
VMA(1), process to generate the errors, u 1t and u 2t . The moving average assumption implies that
the memory of the cointegration error is designed to be extremely short. Such a case rarely occurs
in macroeconomic applications, where a highly persistent cointegration error is often detected.
Specifically,

 

  
u 1t
a11 a12
e1t1
e
=
+ 1t
(39)
u 2t
a21 a22
e2t1
e2t
and

e1t
e2t

  
0
11
NIID
0
12

12
22


(40)

for t = 1, 2, . . .T . This DGP does not produce a finite-order ADL(q,r) model, as the VMA(1)
process has a VAR () representation. In this set of simulations, we consider the set of ADL(q,r)
and DOLS(p)/DGLS(p) estimators, employed in our previous simulations, where the orders q,
r, p are selected by the AIC, SIC and HQ criteria. The parameter settings for the DGP under
consideration are the following: {a 12 = 0.5, 12 = 0.7, a 21 = a 22 = 0, 11 = 22 = 1, = 1 and
0 < a 11 < 1}. As in the case with the VAR(1) errors, the parameter a 11 takes the values 0.3, 0.6,
0.9 and 0.95. The results (not reported) may be summarized as follows:
(i) Irrespective of the value of a 11 , the SIC and HQ criteria choose the DOLS(1)/DGLS(1)
estimator in more than 90% of the cases. The respective figure for AIC is only 58%.
(ii) The order of the ADL(q,r) estimator, selected by the criteria, is an increasing function of the
parameter a 11 . For example, when a 11 = 0.3, the SIC and HQ criteria select the ADL(1,2) in
more than 50% of the cases, whereas for a 11 = 0.9 or 0.95, they select the ADL(2,4) model
most frequently. On the other hand, when a 11 = 0.9 or 0.95, the efficient AIC criterion
selects almost evenly among the ADL(2,4), ADL(3,4) and ADL(4,4) estimators.

16 The only case, where the performance of the DOLS(p)/DGLS(p) estimators is comparable to that of the ADL(q,r)
estimator is when a 11 = b 11 = 0. This is the case where the ADL(0,3) model reduces to the AS(2) model and the
DOLS(2)/DGLS(2) estimators do not suffer from a truncation bias. The consistent criteria identify the correct order in
the context of both specifications in more than 90% of the cases. As a result, the performance of the DOLS(p)/DGLS(p)
estimators is almost equal to that of the ADL(q,r) estimator.


C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

615

(iii) When a 11 = 0.9 or 0.95, the AIC-based DOLS(p) estimator is consistently the best but only
by a slight margin over the AIC-based ADL(q,r).
(iv) The distribution of the t-statistic of both ADL(q,r) and DOLS(p)/DGLS(p) estimators is
properly centreed around zero, while slightly negatively skewed and leptokurtic. On the
other hand, the ADL(q,r)-based t-tests marginally outperform the DOLS(p)-based ones in
minimizing size distortions. For a nominal size of 5%, the empirical size of the SIC-based
ADL(q,r) estimator is 9.7% when a 11 = 0.9, whereas the respective figure for the SIC-based
DOLS(p) estimator is 10.2%. Moreover, the size of the DGLS estimators is a decreasing
function of the parameter a 11 , ranging from around 5% for a 11 = 0 to 3% for a 11 = 0.95.
(v) The behaviour of the JOH(z) and PL(s,l) estimators is almost identical to that of ADL(q,r)
in terms of bias, MSE and percent rejections of the null hypothesis. Interestingly, the
semiparametric methods fare reasonably well in this case. The PW-FMLS estimator, in
particular, seems to account fully for the second-order endogeneity effects, thus providing
a reasonable alternative to parametric procedures in conducting statistical inferences on .
The overall picture suggests that when the errors are generated by a MA(1) process, the
DOLS(p)/DGLS(p) and PW-FMLS estimators fare no worse than the ADL(q,r), PL(s,l) and
JOH(z) estimators, in terms of estimation precision and reliability of statistical inferences.

4. CONCLUSIONS
The simulation experiments reported in this paper highlight the potential pitfalls of employing the
DOLS(p)/DGLS(p) estimators or the class of FMLS estimators for the estimation of a cointegration
vector in a single-equation framework. These pitfalls are easily addressed by using the ADL(q,r)
or PL(s,l) estimators instead. The results of this paper are summarized as follows:
(i) In general, the performance of the ADL(q,r) (or PL(s,l)) estimators is superior to that of the
DOLS(p)/DGLS(p) estimators. This is due to the fact that the latter estimators, as opposed
to the former, suffer from truncation bias. A large value of p is usually required for the
DOLS(p)/DGLS(p) bias to approach the levels of the ADL(q,r) bias. However, the 2.5%
and 97.5% points of the empirical distribution of the DOLS(p)/ DGLS(p) t-statistics do
not approach the corresponding points of the N (0, 1), even for large values of p. As a
consequence, the sizes of the tests based on the DOLS(p)/DGLS(p) estimators, as opposed
to those based on the ADL(q,r) estimators, are far off their nominal size of 5%.
(ii) The truncation bias of the DOLS(p)/DGLS(p) estimators depends on the asymptotic longrun correlation and endogeneity nuisance parameters, both of which depend on the Granger
causality structure of the errors in the model and the persistence of the cointegration error.
As a result, the difference between the performances of ADL(q,r) and DOLS(p)/DGLS(p)
increases with the persistence of the cointegration error. This effect is milder in the presence
of Granger causality running from the cointegration error to the error that drives the regressor,
because in this case, the nuisance parameters do not explode as the persistence of the
cointegration error increases.
(iii) The benefits from employing the ADL(q,r) estimators, instead of the DOLS(p)/DGLS(p)
estimators, remain substantial when the orders (q,r) and p are selected via the usual orderselection criteria. The use of the consistent SIC and HQ criteria in the context of the ADL(q,r)
model leads to the selection of the correct order in more than 90% of the cases. On the other

C Royal Economic Society 2004

616

Ekaterini Panopoulou and Nikitas Pittis

hand, these criteria are totally unable to move away from low orders in the context of the
DOLS(p)/DGLS(p) estimation method, thus producing a very large truncation bias. The
efficient AIC criterion is by far the best performing one in the context of the DOLS(p)
estimator, since it selects a sufficiently large p in the cases that the truncation bias is likely
to be large.
(iv) The JOH estimator performs quite well, especially in cases where the persistence of
cointegration error is particularly high. JOH(z) together with ADL(q,r) and PL(s,l) form
a class of (more) parametric estimators that outperform either FMLS or DOLS(p)/DGLS(p)
by wide margins.
(v) The simulation results provide strong evidence against the employment of the standard FMLS
estimator. In fact, this estimator is inferior even to DOLS(p)/DGLS(p) for most values of
p and for all the DGPs under study. If the applied researcher insists on using FMLS, then
at least he/she must utilize the prewhitened version of this estimator, in order to achieve
performance comparable to that of DOLS(p).
(vi) The above-mentioned results mainly refer to the cases of autoregressive errors. When the
errors follow a bivariate moving average process, where the persistence of the cointegration
error is low and the truncation bias of the DOLS(p)/DGLS(p) estimators is negligible, the
two methods under study are almost equivalent.

ACKNOWLEDGEMENTS
We acknowledge financial support from the Greek Ministry of Education and the European Union under
Hrakleitos grant. We are grateful to Stephane Gregoir, an anonymous referee and participants in the
XXVIII Simposio de Analisis Economico, Universidad Pablo de Olavide, Sevilla, Spain, December
1113, 2003 and the XXIX Conference on Stochastic Processes and their Applications, IMPA, Rio de
Janeiro, Brasil, August 39, 2003 for helpful suggestions and comments. The usual disclaimer applies.

REFERENCES
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic
Control AC-19, 66773.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation.
Econometrica 59, 81758.
Banerjee, A., J. J. Dolado, J. W. Galbraith and D. F. Hendry (1993). Cointegration, Error Correction and
the Econometric Analysis of Non-Stationary Data. Oxford: Oxford University Press.
Bewley, R. A. (1979). The direct estimation of the equilibrium response in a linear model. Economics Letters
3, 35761.
Christou, C. and N. Pittis (2002). Kernel and bandwidth selection, prewhitening, and the performance of the
fully modified least squares estimation method. Econometric Theory 18, 94861.
Engle, R. F. and C. W. J. Granger (1987). Cointegration and error correction representation, estimation and
testing. Econometrica 55, 25176.
Engle, R. F., D. F. Hendry and J. F. Richard (1983). Exogeneity. Econometrica 51, 277304.
Hannan, E. J. and B. G. Quinn (1979). The determination of the order of an autoregression. Journal of the
Royal Statistical Society B41, 19095.
Hendry, D. F., A. R. Pagan and J. D. Sargan (1984). Dynamic specification. In Z. Griliches and M. D.
Intrilligator (eds.). Handbook of Econometrics. vol. II, ch.18, pp. 10231100. Amsterdam: North Holland.

C Royal Economic Society 2004

Autoregressive distributed lag and dynamic OLS cointegration estimators

617

Johansen, S. (1988). Statistical analysis of cointegrating vectors. Journal of Economic Dynamics and Control
12, 23154.
Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector
autoregressive models. Econometrica 59, 155180.
Kramer, W. (1986). Least-squares regression when the independent variable follows an ARIMA process.
Journal of the American Statistical Association 81, 15054.
Newey, W. K. and K. D. West (1987). A simple positive, semi-definite, heteroskedasticity and autocorrelation
consistent covariance matrix. Econometrica 55, 7038.
Newey, W. K. and K. D. West (1994). Automatic lag selection in covariance matrix estimation. Review of
Economic Studies 61, 63153.
Park, J. Y. and P. C. B. Phillips (1988). Statistical inference in regressions with integrated processes: Part 1.
Econometric Theory 4, 46898.
Pesaran, H. M. and Y. Shin (1999). An autoregressive distributed lag modelling approach to cointegration
analysis. In S. Strom (ed.), Econometrics and Economic Theory in the Twentieth Century: The Ragnar
Frisch Centennial Symposium, Cambridge University Press, Cambridge, UK.
Phillips, P. C. B. (1988). Reflections on econometric methodology. Economic Record 64, 34459.
Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental regressions with I(1) processes.
Review of Economic Studies 57, 99125.
Phillips, P. C. B. and M. Loretan (1991). Estimating long-run economic equilibria. Review of Economic
Studies 58, 40736.
Saikkonen, P. (1991). Asymptotically efficient estimation of the cointegration regressions. Econometric
Theory 7, 127.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 46164.
Sims, C. A., J. H. Stock and M. W. Watson (1990). Inference in linear time series with some unit roots.
Econometrica 58, 11344.
Stock, J. H. (1987). Asymptotic properties of least squares estimators of cointegrating vectors. Econometrica
55, 103556.
Stock, J. H. and M. W. Watson (1993). A simple estimator of cointegrating vectors in higher-order integrated
systems. Econometrica 61, 783820.
Wickens, M. R. and T. S. Breusch (1988). Dynamic specification, the long run and the estimation of
transformed regression models. Economic Journal 98 (Conference 1988), pp. 189205.


C Royal Economic Society 2004

Das könnte Ihnen auch gefallen