DistLagModel 10out05

Revisiting Distributed Lag Models
Through a Bayesian Perspective

Authors:
Romy R. Ravines
PhD student
e-mail: romy@dme.ufrj.br
Alexandra M. Schmidt
Associate Professor
e-mail: alex@im.ufrj.br
Helio S. Migon
Full Professor
e-mail: migon@im.ufrj.br
Corresponding author:
Universidade Federal do Rio de Janeiro.
Instituto de Matematica.
Caixa Postal 68530. CEP 21945-970. Rio de Janeiro, RJ. Brazil.
Tel/Fax:(021)5627374.
Revisiting Distributed Lag Models

Through a Bayesian Perspective
Romy R. Ravines, Alexandra M. Schmidt and Helio S. Migon
Instituto de Matematica
Universidade Federal do Rio de Janeiro, Brazil
Abstract
This paper aims to show to practitioners how flexible and straightforward
the implementation of the Bayesian paradigm can be for distributed lag models within the Bayesian dynamic linear model framework.
Distributed lag models are of importance when it is believed that a covariate at time t, say Xt , causes an impact on the mean value of the response
variable, Yt . Moreover, it is believed that the effect of X on Y persists for
a period and decays to zero as time passes by. There are in the literature
many different models that deal with this kind of situation. This paper aims
to review some of these proposals and show that under some fairly simple
reparametrization they fall into a particular case of a class of Dynamic Linear
Models (DLM), the transfer functions models. Inference is performed following the Bayesian paradigm. Samples from the joint posterior distribution
of the unknown quantities of interest are easily obtained through the use of
Markov chain Monte Carlo (MCMC) methods. The computation is simplified

by the use of the software WinBugs.
As an example, a consumption function is analyzed using the Koyck transformation and the transfer function models. Then a comparison is made with
classical cointegration techniques. We make use of the U.S. quarterly pricedeflated, seasonally adjusted data on personal disposable income and personal
consumption expenditure described in [1].
KEY-WORDS: Dynamic Linear Models; Transfer Functions; Markov chain Monte

Carlo; WinBUGS.
Introduction
In econometrics, it is common practice to make use of transfer functions to force

distributed lag coefficients to model the behavior of a response variable Y under the
impact of a covariate X. This is necessary when the dependent variable reacts to
changes in one or more of the explanatory variables only after a lapse of time. This
delayed reaction suggests the inclusion of lagged explanatory variables (distributing
the impact of the explanatory variable over several periods) into the specification of
the model, resulting in a dynamic model.
In this paper we aim to fit transfer functions models using a fully Bayesian approach by making use of Markov Chain Monte Carlo (MCMC) techniques. Specifically, we compare different distributed lag models and show that they can be represented, after some reparametrization, as a particular case of a class of dynamic
linear models (DLM), the transfer functions forms.
3
Broadly speaking, a model is dynamic every time the variables are indexed by
time and appear with different time lags. For instance, Yt = 0 Xt + 1 Xt1 + t is a
simple dynamic model, called a distributed lag model. Here the dynamic structure
appears on the exogenous variable, Xt . It can also appear in the endogenous variable,
e.g. Yt = Yt1 + t which is the known autoregressive model of order 1, AR(1).
A model with lags in exogenous and dependent variables is called autoregressive
distributed lag model (ARDL). Note that a dynamic structure can also appear on
the error process and, more generally, the parameters can also be modeled through
a dynamic structure.
Distributed lag models have played a prominent role in numerous applications
in the economics and agricultural economics literature. Early examples include the
studies by [2] on the response of agricultural supply to price, the study on capital
appropriations and expenditures by [3] and the response of capital investment to
various aspects of the economic environment [4, 5]. More recently, [6] used them
to establish the dynamic link between sales and advertising whereas [7] estimated
energy demand relationships. Actually, distributed lag models have been used also
in epidemiological and environmental areas. Examples of applications in those areas
are [8] and [9]. Besides, many econometric textbooks include a chapter of distributed
lag models, examples are [10, Ch 6], [11, Ch 17], [12, Ch 17] and [13, Ch 6].
When using distributed-lag models one could require weights which decrease
as time goes by. This function, called the Koyck model due to [4], celebrated its
50th anniversary in 2004 and, as [6] asserts, this seemingly simple model is a little
more complicated than we always tend to think. First, the Koyck transformation
entails a parameter restriction, which should not be overlooked for efficiency reasons.
Second, the t-statistic for the parameter for the covariate effects has a non-standard
distribution.
Moreover, distributed lag models are a particular case of ARDL and [14] showed
strong evidence in favor of a rehabilitation of the traditional ARDL, as it remains
valid when the underlying variables are non-stationary. We believe that it is suitable to review the inference of distributed lag models and compare it with some
cointegration ones.
The paper is organized as follows. Section 2 introduces distributed-lag models
and shows some of the main models available in the literature. Then Section 3
presents the general definition of a DLM and the particular case of transfer function
models. Therein, two classes of transfer function models are discussed, the formfree and the functional form transfer functions. Section 4 discusses the inference
procedure of the dynamic linear and distributed lag models. In particular, attention
is drawn to the development of stochastic simulation methods which have allowed
a great development of Bayesian inference in the last 15 years. Section 5 shows an
example which uses the same data set analyzed by [1] using the Koycks distributed
lag model and the transfer function forms. Lastly, Section 6 concludes the paper
and describes natural extensions which are the subject of our future research.
Distributed-lag Models
The general form of a linear infinite distributed lag model is

Yt =
i Xti + t ,
(1)
i=0
where any change in Xt will affect E[Yt ] in all the later periods. The term i in
(1) is the ith reaction coefficient, and it is usually assumed that limi i = 0 and
P
i=0
i = < . Now assume that changes in Xt do not have a big influence after
a few periods of time, say m, this means that all i after m vanish. In this case, the
model reduces to a finite distributed lag, for which the upper limit of the summation
sign in (1) is m.
One important aspect to be considered is the number of parameters involved in
these distributed lag models. In order to be parsimonious it is usually assumed that
the coefficients of lagged variables are not all independent but functionally related
[13]. There are different specifications for infinite or finite distributed lags. Some of
them are based on economic theory, others are of a more inductive nature. In the
following subsections the most known distributed lag models are described.
2.1
Infinite Distributed Lags
[4] suggested a simplification of the model in (1). He assumes that the i s decrease
exponentially over time, that is:
i = i ,
i, with 0 < < 1.
(2)
It then follows that

Yt = Xt + 1 Xt1 + 2 Xt2 + . . . + t .
6
(3)
An important example of the Koyck distributed lag (or geometric lag) is the
partial adjustment model. For example, let Yt be the capital stock at time t and Yt
be the desired capital stock at time t. According to the partial adjustment model,
the change in capital stock is proportional to the gap between the current desired
level of the capital and the past actual level:
Yt Yt1 = (Yt Yt1 ) + t ,
with 0 < < 1,
where t is a stochastic disturbance term. If it is further assumed that the desired

capital stock is a multiple of an output Xt , i.e. Yt = Xt , then, Yt = (1 )Yt1 +
Xt + t , which is equivalent to a distributed lag model of the form
Yt =
(1 )i Xti + t ,
i=0
where t = (1 )t1 + t . This distributed lag model indicates the dependence of

current capital stock on current and all past levels of output.
Another example of infinite distributed lag model is the one proposed by [15].
Solows assumption about the i is that they are determined by a Pascal distribution,
in other words,
r+i1
i =
(1 )r i ,
i
0 < < 1, i, r > 0.
(4)
Therefore, if = 1, we have
0 = (1 )r
and
i =
r+i1
i1 .
i
Solows distributed lag model has a flexible weighting scheme for the coefficients
i . In particular, if r = 1 we have the Koycks model. [15] suggested that in the
lag context it is useful to locate the mode of the distribution. For this reason he
7
proposed the use of the Pascal distribution, where the mode is always less than the
mean, so that the distribution is skewed to the right. The larger and the smaller
r, the greater the skewness is.
A more general model, the Rational Distributed Lags, was suggested by [5]. He
considers the model
Yt = + (L)xt + Vt
(5)
where (L) is a polynomial infinitely long. Recently, [16] proposed the following
model:
(L)Yt = + (L)xt + Ut
Yt =
(6)
(L)
1
+
xt +
Ut ,
(L) (L)
(L)
whose main feature is to retain the same infinite lag structure as in (5), (L) =
(L)
(L)
but with a more general process for Vt . [16] called (6) the ARAR distributed lags
model (ARDLAR).
2.2
Finite Distributed Lags
[3] proposed an approximation, which is known as the interpolation distribution,

for the parameters of a finite distributed lag model. Almons assumption is that the
coefficients i are well approximated in the lag i by polynomials of degree p < m,
with p + 1 parameters, that is:
i =
p
X
k ik .
(7)
k=0
This polynomial approximation provides a wide variety of shapes for i . Usually

only polynomials of relatively low degree are employed.
8
In a more recent Bayesian context, [17] proposed the Flexible Distributed Lag
models which place less structure on the weights by representing inequality restrictions of the coefficients in their prior distributions.
Dynamic Linear Models
3.1
Introduction
Dynamic Linear Models (DLM) [18] are a broad class of models with time-varying
parameters, useful to model time series data and regression problems with variables
varying over time. A general representation of a DLM is
Y t = F 0t t + t
t N (0, V t )
t = Gt t1 + t
t N (0, W t ),
(8a)
(8b)
where, for all t, Y t is a l 1 vector, F t is a known design n l matrix, Gt is

a known n n matrix, t is the state or system vector of dimension n, t is the
observational error and t is the system or evolution error. Usually, t and t are
internally and mutually independent. Equation (8a) is known as the observation
equation, defining the sampling distribution for Y t conditional on t , and equation
(8b) is the evolution, state or system equation, defining the evolution of the state
vector. In this paper we are mainly interested in modelling univariate time series.
In this case the general model described above reduces to the case where l = 1.
The distributed lagged models can be expressed in the form of (8). The effect
of any regressor at time t on the mean response at future times can be expressed as
a transfer function. In the transfer function analysis, the goal is to show how the
9
movement of an exogenous variable affects the time path of an endogenous variable.

If the effect does not have a structural form over time, we then say there is a formfree transfer function. However, if the effect or coefficients of the regression model,
are related in a known way, we then have a functional form transfer function. Both
models are described in the next subsections.
3.2
Transfer function models
The form-free transfer function is a regression DLM on a fixed and finite number of
lagged values. According to [18, p. 281], if the mean response at time t is defined as
E(Yt | t ) =
m
X
i Xti = 0 Xt + 1 Xt1 + . . . + m Xtm ,
(9)
i=0
the matrix F t in (8) is given by F 0t = (Xt , Xt1 , . . . , Xtm ) and 0t = 0 = (0 , 1 , . . . , m ).

In (9), the effect of the current regressor value Xt = X on the mean response at
future times i, conditional on Xt+1 = . . . = Xt+i = 0, defines the transfer response
function of X,
i X
i = 0, 1, . . . , m;
i > m.
In this case, the regression coefficients i are not related (to each other) and we have,
at least, m unknown quantities. One simple way of adapting the regression structure
to incorporate functional relationships amongst the i is to consider a regression, not
on X directly, but on a constructed effect variable measuring the combined effect of
current and past values of X. Let Xt be the value of an independent, scalar variable
X at time t. As described in [18], a general transfer function model for the effect of
10
X on the response series Y is defined by

Yt = F 0 t + t , t N (0, 2 ),
(10a)
t = G t1 + t Xt + t
(10b)
t = t1 + t ,
(10c)
with terms defined as in Subsection 3.1. In particular, t and t are observation and
evolution noise terms, respectively, and t is an nvector of parameters, evolving
via the addition of a noise term t , normally distributed with zero mean, and
assumed to be independent of t and t .
The state vector t carries the effect of current and past values of X through
Yt in equation (10a); this is formed in (10b) as the sum of a linear function of past
effects, t1 , and the current effect t Xt , plus a noise term. The general model (10)
can be rewritten in the standard DLM form as described in [18]. Notice that the
above parametrization is more flexible than that of equation (1), allowing different
stochastic interpretations for the effect of Xt on Yt .
3.3
Transfer functions and Distributed Lag Models
Following the descriptions above we now show that the distributed lag models described in Section 2 can be represented as a DLM, in the form of equation (10).
For instance, the Koyck distributed lag model, represented by equation (3), can be
rewritten as
Yt = Et + t
(11a)
Et = Et1 + Xt ,
(11b)
11
where Et = Xt + 1 Xt1 + 2 Xt2 + . . . and 0 < < 1. In this model, the

transfer response function of X is simply i X. Here, based on the parametrization
in (10), we have n = 1, t = Et , the effect variable, t = , the current effect
for all t, F = 1, G = and the noise term t is assumed to be zero. Note
that this model can be easily extended by assuming that varies smoothly with
time, that is, t = t1 + t as in (10c), or, more generally, a stochastic variation
can be considered for Et in (11b). For example, Et = t Et1 + t Xt + Et , with
Et normally distributed with zero mean and variance E2 . Notice that the DLM
representation results in a very general structure for the error term; for example,
assuming independence between the observational and system error terms, one could
substitute (10b) into (10a) and an ARMA process would result.
The Solows distributed lag model can also be expressed in the form of a DLM.
In this case we have (1 L)r Et = (1 )r Xt . An evolution equation can be
assigned for and r. Also, the Almons model can be rewritten in the free form
transfer function (9) since it is a regression on a fixed and finite number of lagged
variables. By using (7) to define the coefficients of the lagged variables, the transfer
response function of Xti in the mean of Yt will be
k=0
k ik Xti
i = 0, 1, . . . , m;
i > m.
An econometric example is presented in [19] where a model is developed to

predict the value of Brazilian exports as a function of a stochastic trend component
and the real exchange rate. The dynamic behavior of the model was introduced
through a general adaptive expectation hypothesis, so the resulting model was a
12
first order transfer response plus trend. The first order transfer response was the
best predictor amongst several models fitted to those data.
Inference Procedure
Following the Bayesian paradigm the specification of a model is complete after specifying the prior distribution of all parameters of interest. Following Bayes theorem,
the posterior distribution is proportional to the product of the prior by the likelihood.
In the DLM setting, inference can be made sequentially as described by [18,
Ch 4]. Note, however, that for the transfer function models described in Section 3
the evolution matrix G in (8b) may depend on an unknown quantity as shown in
equation (11), thus, in order to make inference on as well, other algorithms are
needed.
Until the late 80s, numerical integration techniques were employed to evaluate
the normalizing constants of the posterior distributions and the estimates for the
parameters. In the case of distributed lag models, the posterior distributions of the
parameters are difficult to obtain analytically. For instance, [1] used the Koycks
transformation which implies to estimate the following equation
Yt = Yt1 + k(1 )Xt + t t1 ,
(12)
where , k and 2 are the parameters of interest. They used non-informative prior
distributions for and k and computed the joint posterior distribution. Due to
the complexity of the marginal densities, they needed to employ bivariate numerical
13
integration techniques to obtain samples from the target distribution. However,

there are three quantities which are unknown, therefore they needed to integrate
out one of them to fall into the bivariate case.
Since the early 90s, Markov Chain Monte Carlo (MCMC) methods, like Gibbs
Sampling [20], have been extensively used in Bayesian inference. The software
BUGS (Bayesian Analysis using Gibbs Sampling) is a well-known tool that helps
practitioners in this task.
This package was developed by David Spiegelhalter
and colleagues at the MRC Biostatistics Unit and is available free of charge from
http://www.mrc-bsu.cam.ac.uk/bugs.
In this paper we show that the DLM, in particular the distributed lag models,
is amenable to a Bayesian analysis via BUGS. Its main strength lies in the ease
with which any changes in the model, such as different autoregressive structures
or polynomial state transitions or the choice of different prior distribution for the
parameters, are accomplished.
BUGS performs reasonably well the estimation of the posterior distribution of
DLMs. However, an important issue to take into account is that the convergence of
the chains can be very slow due to the high correlation among the elements of .
If this is the case, a practical solution is to run a big number of iterations and to
store samples after every k iterations. However, in terms of exploring new models,
BUGS is helpful as it is quite general, but more specific code may be needed for more
detailed and efficient applications [21].
14
Application: A Consumption Function
5.1
Data and Models
Similarly to [1], we used the dataset from [22], pp. 499-500, corresponding to the
U.S. quarterly price-deflated, seasonally adjusted data on personal disposable income
(Xt ) and personal consumption expenditure (Yt ), 1947.I-1960.IV. Figure 1 shows
both time series. Following their analysis we use the consumption function:
Yt = kXt + t ,
(13)
where, for the tth period, t = 1, 2, . . . , T , Yt is measured real consumption, Xt is

normal real income, k is a parameter whose value is unknown, and t is an error
term or transitory consumption. Since Xt is not observable, [1] assumed that the
normal income satisfies the adaptive expectation hypothesis:
Xt Xt1
= (1 )(Xt Xt1
),
(14)
where the parameter is such that 0 < < 1. On combining (13) and (14), we
obtain
Yt = k(1 )(Xt + Xt1 + 2 Xt2 + . . . + n Xtn + . . .) + t ,
(15)
or, equivalently
Yt = Et + t ,
t2
(16)
Et = Et1 + Xt ,
where = k(1 ) and Et = Xt + Xt1 + 2 Xt2 + . . . .
From here, we develop the analysis of (15) assuming that Xt is a predetermined
variable, as defined in [23, p.345]: x is a vector of predetermined variables which are
15
dependent on previous elements of the error vector but not on their contemporaneous
values. This assumption allows us to work with a single equation; however, in
Subsection 5.6 we use an instrumental variable method to eliminate the correlation
between regressor and errors.
Our aim is to use the Bayesian framework to make inference for the model in
(15) with the following assumptions for the serial correlation properties of the error
term t :
Error I: t ind
N (0, I2 );
2
Error II: t = t1 + t , with t ind
N (0, II
).
Error I asserts that the t s are normally and independently distributed, each with
zero mean and common variance, I2 . Error II allows for the possibility of autocorrelation among the t s. In this case we assume that the error term t satisfies a first
2
order autoregressive process with unknown parameter and variance II
.
Under the Bayesian framework, the models above are completely specified after
assigning the prior distributions of the parameters. For and k we assign a beta
prior distribution with parameters (1,1), i.e. Beta(1, 1) and k Beta(1, 1).
Therefore, we assume that and k are uniformly and independently distributed
over zero to one, a priori. As we do not have any prior information about the
variance of the error term, we adopt an inverse-gamma prior probability density
function, i.e. 2 IG(0.0001, 0.0001). Other inverse-gamma(, ) prior distributions
were tested having no influence in the resultant posterior. For Model II, we add the
prior for which is Normal(0, 100), that is, we use a relative flat prior without
the restriction that must lie in the stationary region, [1, 1]. In other words, we
16
allow the data to determine the region of its greatest posterior probability density.
Also we assume that for time t = 1, E1 = y1 , as Et is the expected value of yt in
(16). Actually this is just an approximation. More generally, we could have assigned
a prior distribution for E0 , say p(E0 ), the effect before any observation is made, and
used the recursive equation to obtain the prior distribution for E1 , p(E1 |E0 , , ).
5.2
Results
In order to obtain summaries of the posterior distributions of interest, we used the

software WinBUGS 1.4 [24]. Appendix A presents the code we used in WinBUGS for
the Koycks model. For models with errors structures I and II, we generated two
parallel chains starting at different values. We ran the chains for 10,000 iterations,
discarded the first 5,000 and stored every 10th iteration. We used the Gelman-Rubin
statistic, as modified by [25], to check for convergence of the chains. We compared
our results with those of [1], who used the Koycks transformation (KT) (12) and
employed numerical integration methods to approximate the posterior distributions
of the parameters for each model.
The posterior mean, mode and standard deviation of k and of the Model with
Error I are shown in the second and third entries of Table 1. From this, we note
that the two approaches used in this paper produce similar results to those obtained
by [1]. Figure 2 shows the posterior densities of k and obtained with the Koycks
transformation (panels (a) and (b)) and with the transfer function representation
(panels (c) and (d)), respectively. It is clear that both representations provide similar
results. Note that we identified well the two modes of the posterior distributions of
17
each parameter.
Table 2 provides information about the posterior summaries for some of the
parameters involved in Model (15) with Error II: k, and . From this we conclude
that the two representations, Koycks transformation and transfer function, provide
similar values for the posterior mode of k but the posterior distributions of the
transfer function are more concentrated around the mean. For the results are
quite different. We note that [1] marginalized the posterior distributions over ,
however we make inference about its value.
Samples of the posterior densities of k, and obtained under the two parameterizations of Model (15) with Error II (KT, TF) are shown in Figure 3. It is clear
that both representations provide similar conclusions.
5.3
Model selection
Since the estimates for are sensitive to what is assumed about the error term, it
is necessary to select the best model among those fitted in the previous section. In
this paper we used two different criteria. The first one is the Deviance Information
+ 2pD , where D
+ pD = D()
is the
Criterion (DIC) [26]. It is given by DIC = D
posterior expectation of the deviance, which is given by D = 2 log f (y|) , where

is obtained by
f (y|) is the likelihood conditional on the parameters values. D()
= 2 log f (y|)
.
substituting the posterior mean of in the deviance, so that, D()
WinBUGS version 1.4 computes the DIC automatically. The
D().
And pD = D
model with the smallest DIC is considered to be the best.
The second criterion used for model comparison is the one proposed by [27].
18
The Expected Predictive Deviance (EPD), is obtained by minimizing the posterior

loss of a given model. When a squared error loss is considered, the EPD can be
computed explicitly. In this case, EP D =
Pn
i=1
i2 +
+1
Pn
i=1 (i
yi )2 , where i
and i2 are the mean and the variance of the predictive distribution, respectively.
(i = E[Yi,rep |yi,obs ] and i2 = V ar[Yi,rep |yi,obs ]), where Yi,rep represents replicates of
the ith observed data, yi,obs . Particularly we fixed = 1 (see [27] for more details).
The model which minimizes this criterion is selected.
Table 3 shows the values of DIC and EPD, considering a quadratic loss, obtained
for each model. We observe that the computed criteria do not differ much under
the two different representations. Notice that we get the smallest DIC and EPD for
Model with Error II, suggesting that it is more appropriate to assume that the error
terms, t in (13), are autocorrelated. More attention could be drawn to this model,
by investigating, for example, its sensibility to the choice of the prior distribution
or to test the hypothesis that = .
5.4
Comparison with a classical approach
In order to compare the results of previous sections with a classical approach, we

followed a standard strategy commonly used to deal with non-stationary time-series.
Specifically, we followed [14] and [7] who showed that the ARDL approach is perfectly valid even when (some of) the variables are non-stationary integrated of order
one, I(1), with the only requirement that the I(1) variables are cointegrated with a
unique cointegrating relationship.
Using the Koycks transformation, (15) can be seen as an autoregressive dis-
19
tributed lag (ARDL(1,0)) model, then k and k(1 ) can be interpreted as the
long-run and short-run effects or elasticities, respectively. In the Bayesian framework, we can make inference about functions of the parameters in the models in a
direct way, by simply obtaining a sample from the posterior density of the shortrun effect from the MCMC output. The first line of Table 4 shows the long-run
elasticities estimated using the transfer function model selected in Section 5.3 (with
errors II). It also shows the long-run elasticities estimated using the transfer function model (with errors II), when a constant term is included in the model (second
line), the static OLS procedure1 and the Johansen cointegration method2 . We observe that the point estimation of the long-run parameter obtained with all models
are very similar. Figure 5(a) shows the histogram of the sample from the posterior
distribution of k obtained and the Normal distribution obtained for k with the OLS
procedure. Figure 5(b) shows the empirical and asymptotic distributions obtained
for k(1 ). Although the point estimation for k under both procedures are similar,
from Figure 5(a) we conclude that the asymptotic distribution and the posterior are
quite different. Note that in the asymptotic one P (k(H)

0.905) 0.50 while in
the posterior distribution, P (k 0.905|H) = 0.025 , where H represents the whole
dataset. Remember that the former is based on asymptotic properties while the
latter is based solely on the observed sample.
1
suggested by [28].
suggested by [29].
20
5.5
Goodness of fit and Out-of-sample forecast
In order to examine the goodness of fit of model (16) with Error II and a constant
term for the observational equation, we re-estimated the parameters with the first
49 observations and left the last 6 to assess the forecast performance of this model.
Figure 6(a) shows the 95% credible interval estimated for the sample between
1947.I and 1959.II. It shows a good adjustment as just two observations are out
of the interval. Figure 6(b) shows the 95% credible interval estimated for the six
observations left out of the sample. It can be observed that all of the real values are
included in the interval.
5.6
Treating the Endogeneity
Until here, we treated income as a predetermined variable, just to illustrate our

method and compare it with [1]; however, it is well-known the interdependence
present in a macroeconometric model for consumption and income. In this section
we adopt the approach of [30, Example 7.38]. To form an instrument for X we fitted
an AR(1) model to it. Then we estimated the model (16) with error II.
Table 5 shows the results obtained, including the estimator for the parameter of
the AR(1) model, . Basically we compare the posterior statistics for k and k(1 )
with those of the second line of Table 4. We observe, as expected, that with the
instrumental variable the means are smaller and the standard deviations are larger.
21
Discussion and Future work
Although the first distributed lag models were proposed during the 50s, they are
very useful and are still used. Indeed, this class of models involves many issues that
deserve attention: the form of the relationship among coefficients, structure of the
error term, non-stationarity of the series, endogenous variables, panel data, etc.
In this paper we discussed the inference of distributed lag models under the
Bayesian framework. Specifically, we showed that these models can be represented
by a broader class of models: the dynamic linear models. This allows one to use all
the facilities offered by the Bayesian methodology, like the use of subjective information, and the description of the uncertainty, through the posterior distribution of
the parameters of interest.
Inference about distributed lag models can be performed using MCMC algorithms. In this paper we showed that this task is now simplified by the use of
WinBUGS, a freely available software package. We also showed that model comparison can be easily performed using DIC, which is implemented in WinBUGS, and EPD.
Moreover, the chains obtained through WinBUGS can be exported into the software
CODA [31], which is also freely available, and convergence diagnostics and statistical analysis (summary or graphics) of the samples from the posterior densities can
be assessed.
In fact, other particular algorithms can be used but the gain in efficiency is
largely outweighed by the ease of implementation in WinBUGS. Its use allows one to
explore different modelling possibilities, like the effect of different prior distributions,
the use of transformed variables or the inclusion of stochastic terms in the system
22
equation, as it was done here.

More complex models can be analyzed under the Bayesian framework. For instance, [32] illustrated a non-normal and non-linear case using Linear Bayes. [33]
propose a MCMC scheme for this setting.
An important theoretical issue, which is currently under research, is how to
choose the form of the transfer function, and check how this affects the predictions
and the uncertainty associated with them. One possibility is the use of a second
order transfer function. We plan to make some comparisons about this through the
use of some artificial data set.
References
[1] Zellner A, Geisel M. Analysis of distributed lag models with application to the
consumption function. Econometrica, 1970; 38:865888.
[2] Nerlove M. Distributed lags and demand analysis. Agricultural Handbook, 1958;
141.
[3] Almon S. The distributed lag between capital appropriations and expenditures.
Econometrica, 1965; 33:178196.
[4] Koyck LM. Distributed Lags models and Investment Analysis. North-Holland,
Amsterdam, 1954.
[5] Jorgenson D. Rational distributed lag functions. Econometrica, 1966; 34:135
139.
23
[6] Hans P, van Oest R. On the econometrics of the Koyck model. Technical report,
Economic Institute, Erasmus University Rotterdam, 2004.
[7] Bentzen J, Engsted T. A revival of the autoregressive distributed lag model in
estimating energy demand relationships. Energy, 2001; 26:4555.
[8] Huang Y, Dominici F, Bell M. Bayesian hierarchical distributed lag models
for summer ozone exposure and cardio-respiratory mortality. Technical report, Johns Hopkins University, Dept. of Biostatistics, 2004. Available from
http://www.bepress.com/jhubiostat/paper56
[9] Welty L, Zeger S. Are the Acute Effects of PM10 on Mortality in NMMAPS the
Result of Inadequate Control for Weather and Season? A Sensitivity Analysis
using Flexible Distributed Lag Models. American Journal of Epidemiology,
2005; 162:8088.
[10] Berndt E. The Practice of Econometrics: Classic and Contemporary. Addison
Wesley: Reading, 1991.
[11] Greene W. Econometric Analysis. (3rd edn). Prentice Hall: Madrid, 1999.
[12] Gujarati D. Basic Econometrics. (3rd edn). McGraw-Hill: New York, 2000.
[13] Zellner A. An Introduction to Bayesian Inference in Econometrics. John Wiley
& Sons: New York, 1971.
[14] Pesaran MH, Shin Y. An autoregressive distributed lag modeling approach
to cointegration analysis. In Econometrics and Economic Theory in the 20th
century, Strom S (ed). Cambridge University Press, 1999.
24
[15] Solow R. On a family of lag distributions. Econometrica, 1960; 28(2):393406.

[16] Carter
R,
Zellner
A.
The
ARAR
Error
ate Time Series and Distributed Lag

namics
&
Econometrics,
2004;
Model
for
Univari-
Studies in Nonlinear Dy-
8(1):393406.
Available
from
http://www.bepress.com/snde/vol8/iss1/art2.
[17] Chotikapanich D, Griffiths W. Flexible distributed lags. Technical report,
Department of Econometrics, University of New England, 1999. Available from
http://www.une.edu.au/economics/publications/econometrics/EMETwp105.PDF.
[18] West M, Harrison J. Bayesian Forecasting and Dynamic Models. (2nd edn).
Springer-Verlag: New York, 1997.
[19] Migon H. The prediction of Brazilian exports using Bayesian forecasting. Investigacion Operativa, 2000; 9(1,2 and 3):95106.
[20] Gelfand A, Smith A. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 1990; 85(410):398409.
[21] Meyer R, Yu J. BUGS for a Bayesian analysis of stochastic volatility models.
Econometrics Journal, 2000; 3:198215.
[22] Griliches Z, Maddala G, Lucas R,, Wallace N. Notes on estimated aggregate
quarterly consumption functions. Econometrica, 1962; 30(3):491500.
[23] Lancaster T. An introduction to modern Bayesian Econometrics. Blackwell,
Oxford, 2004.
25
[24] Spiegelhalter D, Thomas A,, Best NG. WinBUGS. Version 1.4, 2003. Avalaible
from http://www.mrc-bsu.cam.ac.uk/bugs.
[25] Brooks SP, Gelman A. Alternative methods for monitoring convergence of
iterative simulations. Journal of Computational and Graphical Statistics, 1998;
7:434455.
[26] Spiegelhalter DJ, Best NG, Carlin BP,, Van der Linde A. Bayesian measures
of model complexity and fit. Journal of the Royal Statistical Society
B, 2001;
64:583639.
[27] Gelfand AE, Ghosh SK. Model choice: a minimum posterior predictive loss
approach. Biometrika, 1998; 85:111.
[28] Engle R, Granger C. Cointegration and error correction: representation, estimation and testing. Econometrica, 1987; 55:25176.
[29] Johansen S. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica, 1991; 59(6):15511580.
[30] Congdon P. Bayesian Statistical Modelling. Wiley & sons, England, 2001.
[31] Best NG, Cowles MK,, Vines SK. CODA: Convergence diagnosis and output
analysis software for Gibbs sampling output, Version 0.4., 1997. Avalaible
from http://www.mrc-bsu.cam.ac.uk/bugs/classic/coda04/.
[32] Migon H, Harrison J. An application of nonlinear Bayesian forecasting to television advertising. In J.M. Bernardo, M.H. DeGroot, D.V. Lindley,, A.F.M.
26
Smith, editors, Bayesian Statistics. Vol. 2., pages 681696. Elsevier Science
Publishers B.V., North-Holland, 1985.
[33] Alves M. Func
oes de transferencia em modelos dinamicos generalizados. PhD
thesis, Instituto de Matematica, UFRJ, Brazil, 2005. (in portuguese).
27
Example of a code used in WinBUGS
This is the code for the Koycks model (12) with homoskedastic and non-autocorrelated
errors (t ind
N (0, I2 )).
model #(Ct=consumption, Yt=income){

e[1]
<- 0
for(t in 2:T){
mean.Y[t] <- lambda*Y[t-1] + k*(1-lambda)*X[t] - lambda*e[t-1]
Y[t]
dnorm(mean.Y[t], tau.y)
Y.hat[t]
dnorm(mean.Y[t], tau.y)
e[t]
<- Y[t] - mean.Y[t]
}
# Prior
lambda
~ dbeta(1,1)
~ dbeta(1,1)
tau.y
~ dgamma(0.0001,0.0001)
var.y
<- 1/tau.y
phi
<- k*(1-lambda)
28
Table 1: Posterior summaries associated with the parameters in Model

(15) with Error I
Parameter mean
s.d.
2.5%
50%
97.5%
b
R
Published in [1]a
k
0.948
0.020
0.940 - 1.000
0.508
0.254
0.380 - 0.900
Using the Koycks transformation in WinBUGS

k
0.942
0.016
0.930
0.936
0.995
1.037
0.419
0.219
0.074
0.385
0.895
1.019
Using the Transfer function parametrization in WinBUGS

k
0.942
0.017
0.930
0.936
0.996
1.351
0.413
0.222
0.063
0.372
0.895
1.051
is the
the values in the 50% column correspond to the modes. b R
= 1).
potential scale reduction factor (at convergence, R
a
29
Table 2: Posterior summaries associated with the parameters in

Model (15) with Error II
Parameter mean
s.d.
2.5%
50%
97.5%
b
R
Published in [1]a
k
0.878
0.201
0.940
0.597
0.184
0.610
Using the Koycks transformation in WinBUGS

k
0.958
0.014
0.938
0.954
0.992
1.002
0.754
0.087
0.562
0.764
0.893
1.001
0.790
0.141
0.454
0.811
0.991
1.002
Using the Transfer function parametrization in WinBUGS

k
0.959
0.018
0.933
0.956
0.998
1.001
0.737
0.125
0.462
0.762
0.899
1.001
0.751
0.106
0.548
0.750
0.955
1.001
is
the values in the 50% column correspond to the modes. b R
the potential scale reduction factor (at convergence, R = 1).

a
30
Table 3: Model comparison criteria: Deviance Information Criteria (DIC) and

Expected Predictive Deviance (EPD)
Using the
Using the Transfer
Koycks transformation
function parametrization
Model (15)
DIC
312.49
311.40
with Error I
EPD
1455.57
1453.29
Model (15)
DIC
280.37
279.20
with Error II EPD
760.16
767.91
Table 4: Long-run and Short-run income effects
Model
Long-run
Short-run
Effect
s.d.
Effect
s.d.
Transfer Function
0.959
0.018
0.251
0.116
Transfer Function
0.910
0.042
0.262
0.108
0.878
0.014
0.266
0.114
0.884
0.022
OLS
Johansen (VECM)
Model (16), b Model (16) including a constant term.

considering a linear trend in data and 0 lags in VAR, this
model was selected with the Schwartz criteria
c
31
Table 5: Posterior summaries associated with the parameters

in Model (16) with Error II and an AR(1) model for
Income
mean
s.d.
2.5%
50%
97.5%
0.902
0.102 0.606
0.922
0.995
1.001
0.768
0.145 0.407
0.808
0.952
1.003
k(1 )
0.211
0.135 0.029
0.172
0.548
1.003
0.753
0.108 0.549
0.748
0.973
1.002
9.819
2.011 6.587
9.602
14.33
1.001
0.998
0.013 0.972
0.998
1.024
1.003
is the potential scale reduction factor (at convergence,

R
R = 1).
32
(a) Income
(b) Consumption
Figure 1: U.S. quarterly price-deflated, seasonally adjusted data on personal disposable

income (Xt ) and personal consumption expenditure (Yt ), 1947.I-1960.IV.
(a) k- KT
(b) - KT
(c) k - TF
(d) - TF
Figure 2: Posterior samples of the parameters k and under Model (15) with Error I.
The dashed line corresponds to the posterior mean. (KT=using the Koycks
transformation, TF=using the transfer function parametrization).
33
(a) k - KT
(b) - KT
(c) - KT
(d) k - TF
(e) - TF
(f) - TF
Figure 3: Posterior samples of the parameters k, and in Model (15) with Error II.
The dashed line corresponds to the posterior mean. (KT=using the Koycks
transformation, TF=using the transfer function parametrization).
(a) k
(b)
(c)
Figure 4: Posterior samples of the parameters k, and in Model (15) with Error II plus
a constant term. The dashed line corresponds to the posterior mean.
34
(a) Long-run
(b) Short-run
Figure 5: Long-run and short-run parameters estimated from model (16) with Errors II
and a constant term in the observational equation. In both panels, the histogram
corresponds to the posterior sample, the vertical line corresponds to the posterior
mean, and the dashed line to the asymptotic normal distribution estimated via
OLS.
(a) Estimation in-the-sample

(1947.I-1959.II)
(b) Forecast out-of-sample

(1959.III-1960.IV)
Figure 6: Forecast for Consumption estimated from model (16) with Errors II and a constant term. In both panels, the solid line corresponds to the mean, the dashed
lines correspond to the first and third quartiles of the posterior distribution.
The crosses correspond to the observed values.
35

DistLagModel 10out05

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DistLagModel 10out05

Hochgeladen von

Copyright:

Verfügbare Formate

Revisiting Distributed Lag Models

Through a Bayesian Perspective

Revisiting Distributed Lag Models

Markov chain Monte Carlo (MCMC) methods. The computation is simplified

KEY-WORDS: Dynamic Linear Models; Transfer Functions; Markov chain Monte

In econometrics, it is common practice to make use of transfer functions to force

The general form of a linear infinite distributed lag model is

Infinite Distributed Lags

i, with 0 < < 1.

It then follows that

with 0 < < 1,

where t is a stochastic disturbance term. If it is further assumed that the desired

where t = (1 )t1 + t . This distributed lag model indicates the dependence of

0 < < 1, i, r > 0.

Finite Distributed Lags

[3] proposed an approximation, which is known as the interpolation distribution,

This polynomial approximation provides a wide variety of shapes for i . Usually

Dynamic Linear Models

where, for all t, Y t is a l 1 vector, F t is a known design n l matrix, Gt is

movement of an exogenous variable affects the time path of an endogenous variable.

Transfer function models

i Xti = 0 Xt + 1 Xt1 + . . . + m Xtm ,

the matrix F t in (8) is given by F 0t = (Xt , Xt1 , . . . , Xtm ) and 0t = 0 = (0 , 1 , . . . , m ).

X on the response series Y is defined by

Transfer functions and Distributed Lag Models

where Et = Xt + 1 Xt1 + 2 Xt2 + . . . and 0 < < 1. In this model, the

An econometric example is presented in [19] where a model is developed to

integration techniques to obtain samples from the target distribution. However,

This package was developed by David Spiegelhalter

Application: A Consumption Function

Data and Models

where, for the tth period, t = 1, 2, . . . , T , Yt is measured real consumption, Xt is

In order to obtain summaries of the posterior distributions of interest, we used the

posterior expectation of the deviance, which is given by D = 2 log f (y|) , where

The Expected Predictive Deviance (EPD), is obtained by minimizing the posterior

Comparison with a classical approach

In order to compare the results of previous sections with a classical approach, we

quite different. Note that in the asymptotic one P (k(H)

Goodness of fit and Out-of-sample forecast

Treating the Endogeneity

Until here, we treated income as a predetermined variable, just to illustrate our

Discussion and Future work

equation, as it was done here.

[15] Solow R. On a family of lag distributions. Econometrica, 1960; 28(2):393406.

ate Time Series and Distributed Lag

Studies in Nonlinear Dy-

Example of a code used in WinBUGS

model #(Ct=consumption, Yt=income){

<- Y[t] - mean.Y[t]

Table 1: Posterior summaries associated with the parameters in Model

Using the Koycks transformation in WinBUGS

Using the Transfer function parametrization in WinBUGS

Table 2: Posterior summaries associated with the parameters in

Using the Koycks transformation in WinBUGS

Using the Transfer function parametrization in WinBUGS

the potential scale reduction factor (at convergence, R = 1).

Table 3: Model comparison criteria: Deviance Information Criteria (DIC) and

Using the Transfer

with Error II EPD

Table 4: Long-run and Short-run income effects

Model (16), b Model (16) including a constant term.

Table 5: Posterior summaries associated with the parameters