Beruflich Dokumente
Kultur Dokumente
447±456
A procedure for designing a multilayer perceptron for predicting time series with interven-
tions is proposed. It is based on the generation, according to a rule emerging from an ARIMA
model with interventions previously ®tted, of a set of nonlinear forecasting models with inter-
ventions. Each model is approximated through a three-layered perceptron, selecting the one
minimizing the Bayesian Information Criterion. The training of the multilayer perceptron
is performed by three alternative learning rules, incorporating multiple repetitions, and
the hidden layer size is computed by means of a grid search. A comparative analysis
using time series from the Active Population Survey in Andalusia, Spain, shows a better
performance of these neural network models over ARIMA models with interventions.
Key words: Backpropagation; backpropagation with momentum; Levenberg-Marquardt;
mean squared error; mean absolute deviation; MATLAB; SAS; active population survey.
1. Introduction
q Statistics Sweden
448 Journal of Of®cial Statistics
The classical seasonal multiplicative ARIMA
p; d; q
P; D; Qs , where we use the stan-
dard Box and Jenkins (1976) notation, including the nonseasonal ARIMA
p; d; q
model when P D Q 0, is de®ned by the following components in (1):
the lag operator B, where Bx t x t 1, and the seasonal lag operator B s , B s x t x t s;
s
the difference operator = 1 B, and the seasonal difference operator =s 1 B , used
d and D times, respectively, to achieve stationarity, and the constant v0
f
B and F
B s are the autoregressive polynomial terms, with p and Ps degrees, being
f
B 1 f1 B ... fp B p ; F
B s 1 F1 B s ... FP B Ps
f
B and F
B s are the moving average polynomial terms, with q and Qs degrees,
being
We can also consider an additive seasonal ARIMA model, where the product f
BF
B s
is replaced by a polynomial term 1 f1 B . . . fp B p F1 B s . . . FP B Ps ,
and similarly, the product v
BQ
B s is replaced by 1 v1 B . . . vq B q
Q1 B s . . . QQ B Qs .
In addition to the classical ARIMA
pd; q
P; D; Qs model, we see in (1) m terms, one
for each intervention. The m auxiliary variables d 1t ; d t2 ; . . . ; d mt are associated with inter-
ventions occurring in time periods r1 ; r 2 ; . . . ; rm , respectively, and their de®nition depends
on the nature of the intervention.
For a permanent intervention;
t t 1 if t $ ri
di yi
2
0 if t < r i
There exist diverse forms for U i
B and S i
B, so it is possible to accommodate in the
model (1) several types of responses to the interventions (see Mills 1990 for example),
but it is not an easy task to arrive at the best ARIMA model with interventions. Given
that in our comparative study we only had two possible interventions, we decided to
follow a straight procedure to identify an ARIMA model with interventions, as is
Cubiles-de-la-Vega et al.: A Neural Network Model for Predicting Time Series with Interventions 449
described by Liu (1992). Next we summarize this procedure for a possible intervention in
time period r:
1) Try a sudden and transitory intervention:
U0
1 By t
1 S1 B
If S 1 is larger than or equal to 1, or near 1, we can exclude a transitory effect.
2) Try a gradual and permanent intervention:
U0
yt
1 S1 B
If S 1 is not signi®cantly different from 0, we discard a gradual effect. If S 1 1, there
exists a linear trend.
3) Finally, try a sudden and permanent effect:
U0 y t
If this model is not valid, we discard an intervention in time period r.
Thus we can de®ne interventions of Types 1, 2, and 3 corresponding to each of the three
intervention models de®ned in the three steps of the last algorithm.
vi) When p q P Q 0, we construct models similar to those of ii) and iv), but
based on d (instead of p) and D (instead of P).
For example, from the model ARIMA
0 1 1
0 1 14 , we considered next lag sets: f1g,
f1 2g, f1 2 3g, f1 2 3 4g, f1 2 3 4 5g (i), and also, f1 4 5g (iii), and f1 4g (v).
The unknown function F of (4) must be approximated by an appropriate model. Given
the nonlinear nature of this task, ANNs provide a convenient framework to attack the job,
standing out the multilayer perceptron.
4.1. De®nition
A multilayer perceptron is a feedforward arti®cial neural network with three or more
layers. The layers between the ®rst and the last one are the hidden layers. The ®rst
layer, or input layer, is formed by k nodes, corresponding to an input vector
x 1 ; x 2 ; . . . ; x k . The last layer, or output layer, is formed by q nodes, so the network output
is a vector y
y 1 ; y 2 ; . . . ; y q . The multilayer perceptron was devised as a mathematical
model for approximating a function f: A Í R k ! R q . The approximation is based on the
network training (or learning) starting from n training patterns
x
l ; y
l , being
y
l f
x
l , l 1; 2; . . . ; n.
Let H be the hidden layer size, fv ih ; i 0; 1; 2; . . . ; k; h 1; 2; . . . ; H g the synaptic
coef®cients for the connections between the input and the hidden nodes, and let
fw h j ; h 0; 1; 2; . . . ; H; j 1; 2; . . . ; qg be the synaptic coef®cients for the connections
between the hidden and the output layers. The output of each hidden layer, s h ,
h 1; 2; . . . ; H, is computed by applying an activation (or transfer) function, g, to its
P
net input, m h n 0 h ki 1 n i h x i , so s h g
m h . Similarly, each output node produces
a value y j , j 1; 2; . . . ; q, obtained by means of an activation function f , y j f
t j , being
t j the net input of the output node j:
! !!
XH XH X
k
y j f
t j f w 0 j wh j sh f w 0 j wh j g n 0 h ni h xi
h1 h1 i1
5. A Comparative Study
was split into a training set and a test set: the test set is formed by the 8 cases corresponding
to the years 1996 and 1997, while the 76 cases corresponding to the period 1997±1995
form the training set, used for ®tting the forecasting models.
Some de®nitions of the variables measured by the Active Population Survey were
changed in the second quarter of 1987. Though the series of the Survey were linked by
the Spanish National Statistical Institute, we could expect the presence of an intervention
in that time period. Furthermore, the time series plots revealed a clear intervention in the
®rst quarter of 1984 for several series. So we considered the analysis of the possible
presence of interventions in both time periods. Thus, for each of the 33 series, when it
came to the training set, a multiplicative or additive and possibly seasonal ARIMA
model was ®tted. Box-Jenkins' methodology for identifying one or more ARIMA models
(Box and Jenkins 1976) was used, and we incorporated, in the case of each of these
models, the procedure outlined in Section 2 in order to identify interventions. So for
each series we had one or more ARIMA models with interventions. A model was consid-
ered valid when i) all the coef®cients were ®ve percent signi®cative, and ii) the residuals
were not found to be correlated by means of Ljung-Box test. When several models were
considered valid, the minimum Bayesian Information Criterion was used to select the best
model.
This procedure was developed using the ETS module of SAS v7.5.2, SAS Institute Inc.
(1993), available in Asterix node of Andalusian Scienti®c Computing Centre (CICA), and
employing maximum-likelihood estimation. From it, we found one or two interventions in
25 of the 33 series.
For each series, and from the identi®ed ARIMA model with interventions, we
constructed several lag sets as in Section 3. The nonlinear models so constructed were
approximated by means of a multilayer perceptron, constructed as is described in
Section 4. The forecasting model with the minimum Bayesian Information Criterion
was selected.
5.2. Results
Following the procedure of Section 5.1, for each of the 25 series where we identi®ed at
least one intervention, we analyzed and compared the results obtained with ARIMA
models with interventions and the multilayer perceptron.
Tables 1, 2 and 3 contain, for each of these 25 series, the ®tted ARIMA model with
interventions, all of them with log transformation, where () denotes that the ARIMA
model is additive. The second column contains the lags that, with the corresponding auxili-
ary variables, de®ne the input layer of the selected multilayer perceptron. The characters 1,
2 and 3 in the intervention columns denote the type of intervention, as is described in
Section 2, using a 0 for no intervention in the associated time period. Numbers surrounded
by parentheses are the only nonzero coef®cients in the corresponding term.
We observe that the unemployed series need more complex multilayer perceptrons,
whereas for the active series simpler multilayer perceptrons are suf®cient, thus revealing
a dependence on a larger number of lagged values for the unemployed series.
Table 4 contains the numbers of series where each method has the minimum test fore-
casting error among the four methods, as measured by the mean squared error (MSE) and
Cubiles-de-la-Vega et al.: A Neural Network Model for Predicting Time Series with Interventions 453
Table 1. Lag sets and ARIMA models for the Active Population Series
Series Lags ARIMA model (log) Interventions
P d q P D Q 1984 1987
A1 1, 4 0 1 1 0 1 1 3 3
A2 1 0 1 0 0 0 0 3 0
A4 1, 2, 3 0 1 3 0 0 0 0 3
A5 1, 4 0 1 1 0 0 1 0 3
A6 1, 4 0 1 1 0 1 1 3 2
A10() 1, 4, 5 1 1 0 1 1 0 3 1
A11 1, 4 0 1 1 0 1 1 0 3
Table 2. Lag sets and ARIMA models for the Employment Series
Series Lags ARIMA model (log) Interventions
p d q P D Q 1984 1987
E1 1, 4 0 1 1 0 0 1 3 0
E2() 1 0 1 (2) 0 0 1 3 0
E3 1, 4 0 1 1 0 1 1 0 2
E5 1 0 1 (2 5) 0 0 0 3 0
E6 () 1, 4 0 1 1 0 0 1 3 0
E7 1, 4 0 1 1 0 1 1 0 1
E8 1, 4 0 1 1 0 1 1 1 2
E10 1, 2, 3, 4, 5 0 1 (2) 0 1 1 2 0
E11 1, 2, 3, 4 0 1 0 0 1 1 2 0
Table 3. Lag sets and ARIMA models for the Unemployment Series
Series Lags ARIMA model (log) Interventions
P d Q P D Q 1984 1987
U1 1, 2, 3, 4, 5 0 1 (3 5) 0 0 0 3 0
U2 1, 2, 3 0 1 3 0 0 0 3 0
U4 1, 4 0 1 1 0 0 1 0 3
U5 1, 2, 3, 4, 5 0 1 (5) 0 0 1 3 0
U6() 1, 2, 3, 4, 5 0 1 1 0 0 1 3 0
U7 1, 2, 3, 4, 5 0 1 1 0 1 1 2 3
U8 1, 4 0 1 1 0 1 1 2 0
U9 1 0 1 0 0 0 0 0 3
U10 1, 2, 3, 4, 5, 6, 7 (3) 1 0 1 0 0 3 3
6. Concluding Remarks
i) In a majority of the considered series, the presented procedure for designing fore-
casting ANN models, with treatment of interventions, provided better results than
those obtained with ARIMA models with treatment of interventions.
ii) The multilayer perceptron, trained with MATLAB default parameters for the
backpropagation and backpropagation with momentum learning rules, but intro-
ducing multiple repetitions and a grid search for the hidden layer size, offers a
good performance. However, Levenberg-Marquardt trained models exhibit an
over®tting behaviour.
iii) The best compromise between the test and training errors, measured with
MSE and MAD criteria, is achieved by the multilayer perceptron trained with
backpropagation.
iv) Arti®cial neural networks trained by several learning rules, combined with the
preview study of the series with Box-Jenkins methodology, and selecting the
model through the Bayesian Information Criterion, provide a valuable framework
for obtaining univariate predictions of the labour force time series, incorporating
the treatment of interventions in the forecasting model.
7. References
Bishop, C.M. (1995). Neural Networks for Pattern Recognition. Oxford: University
Press.
Box, G.E.P. and Jenkins, G.M. (1976). Time Series Analysis. Forecasting and Control. San
Francisco: Holden-Day.
Box, G.E.P. and Tiao, G.C. (1975). Intervention Analysis with Application to Economic
456 Journal of Of®cial Statistics