Sie sind auf Seite 1von 21

Wednesday, March 6, 2013

ARDL Models - Part I


I've been promising, for far too long, to provide a post on ARDL models and bounds
testing.
Well,
I've
finally
got
around
to
it!
"ARDL" stands for "Autoregressive-Distributed Lag". Regression models of this type
have been in use for decades, but in more recent times they have been shown to
provide a very valuable vehicle for testing for the presence of long-run
relationships
between
economic
time-series.
I'm going to break my discussion of ARDL models into two parts. Here, I'm going to
describe, very briefly, what we mean by an ARDL model. This will then provide the
background for a second post that will discuss and illustrate how such models can
be used to test for cointegration, and estimate long-run and short-run dynamics,
even when the variables in question may include a mixture of stationary and nonstationary
time-series.
In

its

basic

form,

an

ARDL

regression

model

looks

like

this:

yt = 0 + 1yt-1 + .......+ pyt-p + 0xt + 1xt-1 + 2xt-2 + ......... + qxt-q + t


where

t is

random

"disturbance"

term.

The model is "autoregressive", in the sense that y t is "explained (in part) by lagged
values of itself. It also has a "distributed lag" component, in the form of successive
lags of the "x" explanatory variable. Sometimes, the current value of x t itself is
excluded from the distributed lag part of the model's structure.
Let's describe the model above as being one that is ARDL(p,q), for obvious reasons.
Given the presence of lagged values of the dependent variable as regressors, OLS
estimation of an ARDL model will yieldbiased coefficient estimates. If the
disturbance term, t, is autocorrelated, the OLS will also be
an inconsistentestimator, and in this case Instrumental Variables estimation was
generally
used
in
applications
of
this
model.
In the 1960's and 1970's we used distributed lag (DL(q), or ARDL(0,q)) models a lot.
To avoid the adverse effects of the multicollinearity associated with including
many lags of "x" as regressors, it was common to reduce the number of parameters
by imposing restrictions on the pattern (or "distribution") of values that the
coefficients
could
take.

Perhaps the best known set of restrictions was that associated with the Koyck
(1954) for the estimation of DL () model. These restrictions imposed a polynomial
rate of decay on the coefficients. This enabled the model to be manipulated into
a new one that was autoregressive, but with an error term that followed a moving
average process. Today, we'd call this an ARMAX model. Again, Instrumental
Variables estimation was often used to obtain consistent estimates of the model's
parameters.
Frances and van Oest (2004) provide an interesting perspective of the Koyck
model, and the associated "Koyck transformation", 50 years after its introduction
into
the
literature.
Shirley Almon popularized another set of restrictions (Almon, 1965) for the
coefficients in a DL(q) model. Her approach was based on Weierstrass's
Approximation Theorem, which tells us that any continuous function can be
approximated, arbitrarily closely, by a polynomial of some order. The only question
is "what is the order", and this had to be chosen by the practitioner.
The Almon estimator could actually be re-written as a restricted least squares
estimator. For example, see Schmidt and Waud (1973), and Giles (1975).
Surprisingly, though, this isn't how this estimator was usually presented to students
and
practitioners.
Almon's approach allowed restrictions to be placed on the shape of the "decay
path" of the gamma coefficients, as well as on the values and slopes of this decay
path at the end-points, t=0 and t=q. Almon's estimator is still included in a number
of econometrics packages, including EViews. A Bayesian analysis of the Almon
estimator, with an application to New Zealand imports data, can be found in Giles
(1977), and Shiller (1973) provides a Bayesian analysis of a different type of
distributed
lag
model.
Dhrymes (1971) provides a thorough and very general discussion of DL models.
So, now we know what an ARDL model is, and where the
Distributed Lag" comes from. In the next post on this topic
application of such models in the context of non-stationary
the
emphasis
on
an
illustrative
application

term "AutoregressiveI'll discuss the modern


time-series data, with
with
real
data.

[Note: For an important update to this post, relating to EViews 9, see my 2015
post, here.]

BOUNDS TESTS
Saturday, December 27, 2014

The Demise of a "Great Ratio"


Once upon a time there was a rule of thumb that there were 20 sheep in New
Zealand for every person living there. Yep, I kid you not. The old adage used to be
"3
million
people;
60
million
sheep".
I liked to think of this as another important "Great Ratio". You know - in the spirit
of the famous "Great Ratios" suggested by Klein and Kosubod (1961) in the context
of economic growth, and subsequently analysed and augmented by a variety of
authors. The latter include Simon (1990), Harvey et al. (2003), Attfield and Temple
(2010),
and
others.
After all, it's said that (at least in the post-WWII era) the economies of both
Australia and New Zealand "rode on the sheep's back". If that's the case, then the
New Zealand Sheep Ratio (NZSR) may hold important clues for economic growth in
that
country.
My interest in this matter right now comes from reading an alarming press release
from Statistics New Zealand, a few days ago. The latest release of
the Agricultural Production Statistics for N.Z. revealed that the (provisional)
figure for the number of sheep was (only!) 26.9 million at the end of June 2014 down
4%
from
2013.
I was shocked, to say the least! Worse was to come. The 2014 figure puts the
number
of
sheep
in
N.Z.
at
the
lowest
level
since
1943!
I'm sure you can understand my concern. We'd better take a closer look at this,
and what it all means for the NZSR:
Let's begin with the raw numbers, courtesy of Statistics N.Z.'s free and userfriendly interface, "Infoshare".

I used the long-run historical data for the (human) population, which is why that
series ends in 2011. More recent data are available, of course, but I couldn't easily
match them to the historical series. This is really of no consequence for what I'm
going to do here.
The breaks in the sheep series are due to the fact that no agricultural survey was
conducted
in
certain
years.
I
don't
know
why
not.
I'm going to do some unit root testing, so these gaps have to be dealt with. In line
with my work with Kevin Ryan (as discussed in an earlier post, here), I've filled in
the four gaps in the series with the previous actual value.
With the human population rising, and the sheep population declining, you don't
have to be a rocket scientist to know what's been happening to the NZSR:

Arguably, the NZSR was relatively stable up until about the end of the 1980's, but
after that..........the end of an era.
Let's analyze the data more carefully. All of the data I've used are available on
the data page that accompanies this blog; and the EViews workfile is on the code
page.
The first thing that I've done is to "trick" EViews into giving me access to the
standard tests that we use for structural change in a regression model. I regressed
NZSR on just an intercept (and no other regressors), using OLS. The estimated
intercept is just the sample mean for NZSR, and the residuals for the regresssion
are just the NZSR data, expressed as deviations about this sample mean. Now I can
apply tests for structural change, effectively to the mean-adjusted NZSR series. (I
mentioned this trick in an earlier post.)
Specifically, here are the results of applying the test suggested by Bai and Perron
(1998), using the "min. SIC" rule proposed by Liu et al. (1997)

You can see that we have evidence of structural breaks in 1958, 1990, and 2001.
The second of these dates lines up nicely with my "eye-ball" conclusion from the
previous
graph.

Now let's focus on the data for the period 1935 to 1990. I'm going to test to see if
there is evidence of unit roots in the logarithms of POP and SHEEP. As there is, I'm
then going test if the logarithm of NZSR is stationary. In other words, I'll be
checking we have cointegration in the logarithms of the data.
At each step, I'll allow for the fact that there is evidence of a structural break in
1958. The incorporation of breakpoint unit root tests is one of the many nice new
features in the Beta release of EViews 9, as noted in my previous post, here.
Here is how I've done this for the log(POP) series:

This implements Perron's (1989) modified ADF tests, and here's the result:

Clearly, we can't reject the null of a unit root. In the case of the log(SHEEP) series,
the corresponding ADF statistic is -0.9597 (p 0.50), so we come to the same
conclusion.
The log(NZSR) series is trendless over this period, so this is taken into account
when applying the modified ADF test:

The modified ADF statistic -3.5716 (p < 0.10). The p-values for the modified ADF
tests are based on the asymptotic distributions for the test statistics, and we have
T = 56. None the less, we have reasonable evidence that log(POP) and log(SHEEP)
were cointegrated over the period 1935 to 1990. Their log-ratio was stationary.
There's our "Great Ratio"!
(For the record, exactly the same conclusions are reached if we use the levels,
rather
than
the logarithms,
of
the
data.)
The second graph given earlier makes it pretty clear that the (Great) New Zealand
Sheep Ratio is no longer with us. By way of confirmation, I've tested for a unit root
in log(NZSR) over the full sample, 1935 to 2011. I used each of the of the
breakpoint unit root tests available in EViews 9 that allow for innovation outlier
breaks. There are 16 tests in total when you allow for the varous drfit/trend
options.
By way of an example, here are the results for Dickey-Fuller max.-t test, allowing
for an innovation trend-break:

For all of the tests applied to the full series for log(NZSR), or for NZSR itself, very
large p-values are achieved and so we cannot reject a unit root in the data.
The New Zealand Sheep Ratio appears to be a thing of the past!
2014, David E. Giles

Friday, January 9, 2015

ARDL Modelling in EViews 9


My previous posts relating to ARDL models (here and here) have drawn a lot of
hits. So, it's great to see that EViews 9 (now in Beta release - see the
details here) incorporates an ARDL modelling option, together with the associated
"bounds
testing".
This is a great feature, and I just know that it's going to be a "winner" for EViews.
It

certainly

deserves

post,

so

here

goes!

First, it's important to note that although there was previously an EViews "add-in"
for ARDL models (see here and here), this was quite limited in its capabilities.
What's now available is a full-blown ARDL estimation option, together with bounds
testing and an analysis of the long-run relationship between the variables being
modelled.
Here, I'll take you through another example of ARDL modelling - this one involves
the relationship between the retail price of gasoline, and the price of crude oil.
More specifically, the crude oil price is for Canadian Par at Edmonton; and the
gasoline price is that for the Canadian city of Vancouver. Although crude oil prices
are recorded daily, the gasoline prices are available only weekly. So, the price
data that we'll use are weekly (end-of-week), for the 4 January 2000 to 16 July
2013,
inclusive.
The oil prices are measured in Candian dollars per cubic meter. The gasoline prices
are in Canadian cents per litre, and they exclude taxes. Here's a plot of the raw
data:

The data are available on the data page for this blog. The EViews workfile is on
the code page.
I'm going to work with the logarithms of the data: LOG_CRUDE and LOG_GAS.
There's still a clear structural break in the data for both of these series.
Specifically there's a structural break that occurs over the weeks ended 8 July
2008 to 30 December 2008 inclusive. I've constructed a dummy variable, BREAK,
that takes the value one for these observations, and zero everywhere else.
The break doesn't occur at just a single point in time. Instead, there's a change in
the level and trend of the data that evolves over several periods. We call this an
"innovational outlier", and in testing the two time series for unit roots, I've taken
this into account.
In a recent post I discussed the new "Breakpoint Unit Root Test" options that are
available in EViews 9. They're perfectly suited for our current situation. Here's how
I've implemented the appropriate test of a unit root in the case of the LOG_CRUDE
series:

The result is:

We wouldn't reject the hypothesis of a unit root at the 5% significance level, and
the result is marginal at the 10% level. The corresponding result for the LOG_GAS
series is:

In this case we'd reject the null hypothesis of a unit root at the 5% significance
level, but not at the 1% level. Overall, the results are somewhat inconclusive, and
this is precisely the situation that ARDL modelling and bounds testing is designed
for. Applying the unit root tests to the first-differences of each series leads to a
very clear rejection of the hypothesis that the data are I(2), which is important for
the legitimate application of the bounds test below.
Now, let's go ahead with the specification and estimation of a basic ARDL model
that explains the retail price of gasoline in terms of past values of that price, as
well as the current and past values of the price of crude oil. We can do this in the
same way that we'd estimate any equation in EViews, but we select the
"Estimation Method" to be "ARDL" (see below):

Notice that I've set the maximum number of lags for both the dependent variable
and the principal regressor to be 8. This means that 72 different model
specifications will be considered, allowing for the fact that the current value of
LOG_CRUDE can be considered as a regressor. Also, notice that I've included the
BREAK dummy variable, as well as an intercept and linear trend as (fixed)
regressors. (That is, they won't be lagged.)
Using the OPTIONS tab, let's select the Schwarz criterion (SC) as the basis for
determining the lag orders for the regressors:

The model which minimizes SC will be chosen. This results in a rather


parsimonious model specification, as you can see:

I mentioned in an earlier post on Information Criteria that SC tends to select a


simpler model specification than some other information criteria. So, instead of
SC, I'm going to use Akaike's Information Criterion (AIC) for selecting the lag
structure in the ARDL model. There's a risk of "over-fitting" the model, but I
definitely don't want to under-fit it. Here's what we get:

It's important that the errors of this model are serially independent - if not, the
parameter estimates won't be consistent (because of the lagged values of the
dependent variable that appear as regressors in the model. To that end, we can
use the VIEW tab to choose, RESIDUAL DIAGNOSTICS; CORRELOGRAM - QSTATISTICS, and this gives us the following results:

The p-values are only approximate, but they strongly suggest that there is no
evidence of autocorrelation in the model's residuals. This is good news!
Now, recall that, in total, 72 ARDL model specifications were considered. Although
an ARDL(4,2) was finally selected, we can also see how well some other
specifications performed in terms of minimizing AIC. Selecting the VIEW tab in the
regression output, and then choosing MODEL SELECTION SUMMARY; CRITERIA
GRAPH from the drop-down, we see the "Top Twenty" results:

(You can get the full summary of the AIC, SC, Hannan-Quinn, and adjusted
R2 statistics for all 72 model specifications if you select CRITERIA TABLE, rather
than CRITERIA GRAPH.)
One of the main purposes of estimating an ARDL model is to use it as the basis for
applying the "Bounds Test". This test is discussed in detail in one of my earlier
posts. The null hypothesis is that there is no long-run relationship between the
variables - in this case, LOG_CRUDE and LOG_GAS.
In the estimation results, if we select the VIEW tab, and then from the drop-down
menu choose COEFFICIENT DIAGNOSTICS; BOUNDS TEST, this is what we'll get:

We see that the F-statistic for the Bounds Test is 32.38, and this clearly exceeds
even the 1% critical value for the upper bound. Accordingly, we strongly reject the
hypothesis of "No Long-Run Relationship".
The output at this point also shows the modified ARDL model that was used to
obtain this result. The form that this model takes will be familiar if you've read
my earlier post on bounds testing.

In the estimation results for our chosen ARDL model, if we select the VIEW tab,
and then from the drop-down menu choose COEFFICIENT DIAGNOSTICS;
COINTEGRATION AND LONG RUN FORM, this is what we'll see:

The error-correction coefficient is negative (-0.2028), as required, and is very


significant. Importantly, the long-run coefficients from the cointegrating equation
are reported, with their standard errors, t-statistics, and p-values:

So, what do we conclude from all of this?


First, not surprisingly, there's a long-run equilibrium relationship between the
price of crude oil, and the retail price of gasoline.
Second, there is a relatively quick adjustment in the price of gasoline when the
price of crude oil changes. (Recall that the data are observed weekly.)
Third, a 10% change in the price of crude oil will result in a long-run change of 7%
in the price of retail gasoline.
Whether or not these responses are symmetric with respect to price increases and
price decreases is the subject of someon-going work of mine.

Das könnte Ihnen auch gefallen