Sie sind auf Seite 1von 20

In particular, we'll see how they're used to implement the so-called "Bounds Tests", to see if long-run

relationships are present when we have a group of time-series, some of which may be stationary, while
others are not. A detailed worked example, using EViews, is included.
First,

recall

that

the

basic

form

of

an

ARDL

regression

model

yt = 0 + 1yt-1 + .......+ kyt-p + 0xt + 1xt-1 + 2xt-2 + ......... + qxt-q + t ,

is:
(1)

where t is a random "disturbance" term, which we'll assume is "well-behaved" in the usual sense. In
particular, it will be serially independent.
We're going to modify this model somewhat for our purposes here. Specifically, we'll work with a
mixture of differences and levels of the data. The reasons for this will become apparent as we go
along.
Let's suppose that we have a set of time-series variables, and we want to model the relationship
between them, taking into account any unit roots and/or cointegration associated with the data. First,
note that there are three straightforward situations that we're going to put to one side, because they
can be dealt with in standard ways:
1.
We know that all of the series are I(0), and hence stationary. In this case, we can simply model
the data in their levels, using OLS estimation, for example.
2.
We know that all of the series are integrated of the same order (e.g., I(1)), but they
are not cointegrated. In this case, we can just (appropriately) difference each series, and estimate a
standard regression model using OLS.
3.
We know that all of the series are integrated of the same order, and they are cointegrated. In
this case, we can estimate two types of models: (i) An OLS regression model using the levels of the
data. This will provide the long-run equilibrating relationship between the variables. (ii) An errorcorrection model (ECM), estimated by OLS. This model will represent the short-run dynamics of the
relationship between the variables.
1.
Now, let's return to the more complicated situation mentioned above. Some of the variables in
question may bestationary, some may be I(1) or even fractionally integrated, and there is also
the possibility of cointegration among some of the I(1) variables. In other words, things just aren't as
"clear cut" as in the three situations noted above.
What do we do in such cases if we want to model the data appropriately and extract both long-run and
short-run
relationships?
This
is
where
the
ARDL
model
enters
the
picture.
The ARDL / Bounds Testing methodology of Pesaran and Shin (1999) and Pesaran et al. (2001) has a
number of features that many researchers feel give it some advantages over conventional cointegration
testing. For instance:

It can be used with a mixture of I(0) and I(1) data.

It involves just a single-equation set-up, making it simple to implement and interpret.

Different variables can be assigned different lag-lengths as they enter the model.

We need a road map to help us. Here are the basic steps that we're going to follow (with details to be
added below):
1.
Make sure than none of the variables are I(2), as such data will invalidate the methodology.
2.
Formulate an "unrestricted" error-correction model (ECM). This will be a particular type of
ARDL model.
3.

Determine the appropriate lag structure for the model in step 2.

4.

Make sure that the errors of this model are serially independent.

5.

Make sure that the model is "dynamically stable".

6.
Perform a "Bounds Test" to see if there is evidence of a long-run relationship between the
variables.
7.
If the outcome at step 6 is positive, estimate a long-run "levels model", as well as a separate
"restricted" ECM.
8.
Use the results of the models estimated in step 7 to measure short-run dynamic effects, and
the long-run equilibrating relationship between the variables.
We can see from the form of the generic ARDL model given in equation (1) above, that such models are
characterised by having lags of the dependent variable, as well as lags (and perhaps the current value)
of other variables, as the regressors. Let's suppose that there are three variables that we're interested
in modelling: a dependent variable, y, and two other explanatory variables, x 1 and x2. More generally,
there will be (k + 1) variables - a dependent variable, and k other variables.
Before we start, let's recall what a conventional ECM for cointegrated data looks like. It would be of
the
form:
yt =

0 +

iyt-i +

jx1t-j +

kx2t-k +

zt-1 +

et

(2)

Here, z, the "error-correction term", is the OLS residuals series from the long-run "cointegrating
regression",
yt =

0 +

1x1t +

2x2t +

vt

(3)

The ranges of summation in (2) are from 1 to p, 0 to q 1, and 0 to q2 respectively.


Now, back to our own analysisStep
1:
We can use the ADF and KPSS tests to check that none of the series we're working with are I(2).
Step 2:
Formulate the following model:
yt =

0 +

iyt-i +

jx1t-j +

kx2t-k +

0yt-1 +

1x1t-1 +

2 x2t-1 +

et

(4)

Notice that this is almost like a traditional ECM. The difference is that we've now replaced the errorcorrection term, zt-1 with the terms yt-1, x1t-1, and x2t-1. From (3), we can see that the lagged residuals

series would be zt-1 = (yt-1 - a0 - a1x1t-1 - a2x2t-1), where the a's are the OLS estimates of the 's. So, what
we're doing in equation (4) is including the same lagged levels as we do in a regular ECM, but we're not
restricting their
coefficients.
This is why we might call equation (4) an "unrestricted ECM", or an "unconstrained ECM". Pesaran et al.
(2001)
call
this
a
"conditional
ECM".
Step 3:
The ranges of summation in the various terms in (4) are from 1 to p, 0 to q 1, and 0 to
q2 respectively.We need to select the appropriate values for the maximum lags, p, q 1, and q2. Also,
note that the "zero lags" on x 1 and x2 may not necessarily be needed. Usually, these maximum lags
are determined by using one or more of the "information criteria" - AIC, SC (BIC), HQ, etc. These
criteria are based on a high log-likelihood value, with a "penalty" for including more lags to achieve
this. The form of the penalty varies from one criterion to another. Each criterion starts with -2log(L),
and then penalizes, so the smaller the value of an information criterion the better the result.
I generally use the Schwarz (Bayes) criterion (SC), as it's a consistent model-selector. Some care has to
be taken not to "over-select" the maximum lags, and I usually also pay some attention to the
(apparent)
significance
of
the
coefficients
in
the
model.
Step
4:
A key assumption in the ARDL / Bounds Testing methodology of Pesaran et al. (2001) is that the errors
of equation (4) must be serially independent. As those authors note (p.308), this requirement may also
be influential in our final choice of the maximum lags for the variables in the model.
Once an apparently suitable version of (4) has been estimated, we should use the LM test to test the
null hypothesis that the errors are serially independent, against the alternative hypothesis that the
errors are (either) AR(m) or MA(m), for m = 1, 2, 3,...., etc.
Step
5:
We have a model with an autoregressive structure, so we have to be sure that the model is
"dynamically stable". For full details of what this means, see my recent post, When is an
Autoregressive Model Dynamically Stable? What we need to do is to check that all of the inverse roots
of the characteristic equation associated with our model lie strictly inside the unit circle. That recent
post of mine showed how to trick EViews into giving us the information we want in order to check that
this
condition
is
satisfied.
I
won't
repeat
that
here.
Step
Now

we're

ready

Here's

to

perform

the

equation
yt =

0 +

iyt-i +

jx1t-j +

6:
Testing"!

"Bounds

(4),
kx2t-k +

0yt-1 +

1x1t-1 +

again:
2 x2t-1 +

et

(4)

All that we're going to do is preform an "F-test" of the hypothesis, H 0: 0 = 1 = 2 = 0 ; against the
alternative that H0 is not true. Simple enough - but why are we doing this?

As in conventional cointegration testing, we're testing for the absence of a long-run equilibrium
relationship between the variables. This absence coincides with zero coefficients for y t-1, x1t-1 and x2tequation (4). A rejection of H0 implies that we have a long-run relationship.
1 in
There is a practical difficulty that has to be addressed when we conduct the F-test. The distribution of
the test statistic is totally non-standard (and also depends on a "nuisance parameter", the cointegrating
rank of the system) even in the asymptotic case where we have an infinitely large sample size. (This is
somewhat akin to the situation with the Wald test when we test for Granger non-causality in the
presence of non-stationary data. In that case, the problem is resolved by using the Toda-Yamamoto
(1995) procedure, to ensure that the Wald test statistic is asymptotically chi-square, as
discussed here.)
Exact critical values for the F-test aren't available for an arbitrary mix of I(0) and I(1) variables.
However, Pesaran et al. (2001) supply bounds on the critical values for the asymptotic distribution of
the F-statistic. For various situations (e.g., different numbers of variables, (k + 1)), they give lower and
upper bounds on the critical values. In each case, the lower bound is based on the assumption that all
of the variables are I(0), and the upper bound is based on the assumption that all of the variables are
I(1). In fact, the truth may be somewhere in between these two polar extremes.
If the computed F-statistic falls below the lower bound we would conclude that the variables are I(0),
so no cointegration is possible, by definition. If the F-statistic exceeds the upper bound, we conclude
that we have cointegration. Finally, if the F-statistic falls between the bounds, the test is inconclusive.
Does this remind you of the old Durbin-Watson test for serial independence? It should!
As a cross-check, we should also perform a "Bounds t-test" of H 0 : 0 = 0, against H1 : 0 < 0. If the tstatistic for yt-1 in equation (4) is greater than the "I(1) bound" tabulated by Pesaran et al. (2001;
pp.303-304), this would support the conclusion that there is a long-run relationship between the
variables. If the t-statistic is less than the "I(0) bound", we'd conclude that the data are all stationary.
Step
7:
Assuming that the bounds test leads to the conclusion of cointegration, we can meaningfully estimate
the
long-run
equilibrium
relationship
between
the
variables:
yt =
as

well
yt =

0 +
as

0 +

iyt-i +

1x1t +

2x2t +

vt

the
jx1t-j +

(5)

usual
kx2t-k +

zt-1 +

ECM:
et

(6)

where zt-1 = (yt-1 -a0 - a1x1t-1 - a2x2t-1), and the a's are the OLS estimates of the 's in (5).
Step
8:
We can "extract" long-run effects from the unrestricted ECM. Looking back at equation (4), and noting
that at a long-run equilibrium, yt = 0, x1t = x2t = 0, we see that the long-run coefficients for x 1 and

x2 are

-(1/

0)

and

-(2/

0)

respectively.

An
Example:
Now we're ready to look at a very simple empirical example. I'm going to use the data for U.S. and
European natural gas prices that I made available as a second example in my post, Testing for Granger
Causality. I didn't go through the details of testing for Granger causality with that set of data, but I
mentioned near the end of the post, and the EViews file (which included a "read_me" object with
comments about the results) is there on the code page for this blog (dated 29 April, 2011).
If you look back at that earlier file, you'll find that I used the Toda-Yamamoto (1995) testing procedure
to determine that there is Granger causality running from the U.S. series to the European series, but
not vice
versa.
A new EViews file that uses the same data for our ARDL modelling is available on the code page, under
the date for the current post. The data for the two time-series we'll be using are also available on
the data page for this blog. The data are monthly, from 1995(01) to 2011(03). In terms of the notation
that was introduced earlier, we have (k + 1) = 2 variables, so k = 1 when it comes to the bounds testing.
Here's a plot of the data we'll be using (remember that you can enlarge most of these inserts by
clicking on them):

To complete Step 1, we need to check that neither of our time-series are I(2). Applying the ADF test to
the levels of EUR and US, the p-values are 0.53 and 0.10 respectively. Applying the test to the firstdifferences of the series, the p-values are both 0.00. (The lag-lengths for the ADF regressions were
chosen
using
the
Schwarz
criterion,
SC.)
Clearly,
neither
series
is
I(2).
Applying the KPSS test we reject the null of stationarity, even at the 1% significance level, for both
EUR and US, but cannot reject the null of I(1) against I(2). The p-value of 10% for the ADF test of
I(1) vs. I(0) for the EUR series may leave us wondering if that series is stationary, or not. You'll know
that apparent "conflicts" between the outcomes of tests such as these are very common in practice.

This is a great illustration of how the ARDL / Bounds Testing methodology can help us. In order for
standard cointegration testing (such as that of Engle and Granger, or Johansen) to make any sense, we
must be really sure that all of the series are integrated of the same order. In this instance, you might
not
be
feeling
totally
sure
that
this
is
the
case.
Step 2 is straightforward. Given that the Granger causality testing associated with my earlier post
suggested that there is causality from US to EUR (but not vice versa), EUR is going to be the
dependent
variable
in
my
unrestricted
ECM:
EURt =
That's Step

0 +

2 out

iEURt-i +

jUSt-j +
of

0EURt-1 +

1USt-1 +

et

the

(5)
way!

To implement the information criteria for selecting the lag-lengths in an time-efficient way, I "tricked"
EViews into providing lots of them at once by doing the following. I estimated a 1-equation VAR model
for EURt and I supplied the intercept, EUR t-1, USt-1, and a fixed number of lags of US t as exogenous
regressors. For example, when the fixed number of lags on US t was zero, here's how I specified the
VAR:

After estimating this model, I then chose VIEW, LAG STRUCTURE, LAG LENGTH CRITERIA:

I then repeated this by adding USt-1 to the list of exogenous variables, and got the following results:

I proceeded in this manner with additional lags of US t in the "exogenous" list. I also considered cases
such as:

which resulted in the following information criteria values:

Looking at the SC values in these three tables of results, we see that a maximum lag of 4 is suggested
for EURt. (The AIC values suggest that 8 lags of EUR t may be appropriate, but some experimentation
with this was not fruitful.)
There is virtually no difference between the SC values for the case where the model includes just US t as
a regressor (0.8714), and the case where just US t-1 is included (0.8718). To get some dynamics into the
model, I'm going to go with the latter case.
With Step 3 completed, and with this lag specification in mind, let's now look at the estimated
unrestricted ECM:

Step 4 involves checking that the errors of this model are serially independent. Selecting VIEW,
RESIDUAL DIAGNOSTICS, SERIAL CORRELATION LM TEST, I get the following results:
m

LM

p-value

1
2
3
4

0.079
2.878
5.380
11.753

0.779
0.237
0.146
0.019

O.K., we have a problem with serial correlation! To deal with it, I experimented with one or two
additional lags of the dependent variable as regressors, and ended up with the following specification
for the unrestricted ECM:

The serial independence results now look much more satisfactory:


m

LM

1
2
3
4
5
6
7
8

0.013
3.337
5.183
7.989
8.473
11.023
12.270
12.334

p-value
0.911
0.189
0.159
0.092
0.132
0.088
0.092
0.137

Next, Step 5 involves checking the dynamic stability of this ARDL model. Here are the inverse roots of
the associated characteristic equation:

All seems to be well - these roots are all inside the unit circle.
Before proceeding to the Bounds Testing, let's take a look at the "fit" of our unrestricted ECM. The
"Actual / Fitted / Residuals" plot looks like this:

When we "unscramble" these results, and look at the fit of the model in terms of explaining the level of
EUR itself, rather than EUR, things look pretty good:

We're now ready for Step 6 - the Bounds Test itself. We want to test if the coefficients of both EUR(-1)
and US(-1) are zero in our estimated model (repeated below):

The associated F-test is obtained as follows:

With the result:

The value of our F-statistic is 5.827, and we have (k + 1) = 2 variables (EUR and US) in our model. So,
when we go to the Bounds Test tables of critical values, we have k = 1.
Table CI (iii) on p.300 of Pesaran et al. (2001) is the relevant table for us to use here. We haven't
constrained the intercept of our model, and there is no linear trend term included in the ECM. The
lower and upper bounds for the F-test statistic at the 10%, 5%, and 1% significance levels are [4.04 ,
4.78], [4.94 , 5.73], and [6.84 , 7.84] respectively.
As the value of our F-statistic exceeds the upper bound at the 5% significance level, we can conclude
that there is evidence of a long-run relationship between the two time-series (at this level of
significance or greater).
In addition, the t-statistic on EUR(-1) is -2.926. When we look at Table CII (iii) on p.303 of Pesaran et
al. (2001), we find that the I(0) and I(1) bounds for the t-statistic at the 10%, 5%, and 1% significance
levels are [-2.57 , -2.91], [-2.86 , -3.22], and [-3.43 , -3.82] respectively. At least at the 10%
significance level, this result reinforces our conclusion that there is a long-run relationship between
EUR and US.
So,

here

we

Recalling our preferred unrestricted ECM:

are

at Step

7 and Step

8.

we see that the long-run multiplier between US and EUR is -(0.047134 / (-0.030804)) = 1.53. In the long
run, an increase of 1 unit in US will lead to an increase of 1.53 units in EUR.
If

we

estimate

the
EURt =

levels
0 +

1USt +

by OLS, and construct the residuals series, {zt}, we can fit a regular (restricted) ECM:

model,
vt

Notice that the coefficient of the error-correction term, z t-1, is negative and very significant. This is
what we'd expect if there is cointegration between EUR and US. The magnitude of this coefficient
implies that nearly 3% of any disequilibrium between EUR and US is corrected within one period (one
month).
This final ECM is dynamically stable:

As none of the roots lie on the X (real) axis, it's clear that we have three complex conjugate pairs of
roots. Accordingly, the short-run dynamics associated with the model are quite complicated. This can
be seen if we consider the impulse response function associated with a "shock" of one (sample)
standard deviation:

Finally, the within-sample fit (in terms of the levels of EUR) is exceptionally good:

In fact, the simple correlations between EUR and the "fitted" EUR series from the unrestricted and
regular ECM's are each 0.994, and the correlation between the two fitted series is 0.9999.
So, there we have it - bounds testing with an ARDL model.
[Note: For an important update of this post, relating to EViews 9, see my 2015 post, here.]

References
Pesaran, M. H. and Y. Shin, 1999. An autoregressive distributed lag modelling approach to cointegration
analysis. Chapter 11 in S. Strom (ed.), Econometrics and Economic Theory in the 20th Century: The
Ragnar Frisch Centennial Symposium. Cambridge University Press, Cambridge. (Discussion Paper
version.)
Pesaran, M. H., Shin, Y. and Smith, R. J., 2001. Bounds testing approaches to the
analysis of level relationships. Journal of Applied Econometrics, 16, 289326.
Pesaran, M. H. and R. P. Smith, 1998. Structural analysis of cointegrating VARs. Journal of Economic
Surveys, 12, 471-505.
Toda, H. Y and T. Yamamoto (1995). Statistical inferences in vector autoregressions with possibly
integrated processes. Journal of Econometrics, 66, 225-250.

Das könnte Ihnen auch gefallen