Beruflich Dokumente
Kultur Dokumente
y(t) = ksk(t)
k=1
s(t) = (s1(t), s2(t), s3(t))T (7)
and
c = ( 1, 2, 3)T
Parameters i, ij, i and i are called the input
weight, the recurrent weight, the bias and the output
weight, respectively.
In contrast to the NARX model, RNN does not
have feedback connections from the output to the
input. The feedback connection exists only amongst
the neurons in the hidden layer.
According to the structural difference, the NARX
model and RNN are basically studied independently.
Only a few papers have presented results concerning
their similarity [14,15]. Olurotimi [15] recently
Fig. 2. Recurrent neural network model. presented a model equivalence result for a RNN
and a feedforward neural network. He showed that
every RNN can be transformed into a NARX model.
Thus, he derived an algorithm for RNN training
3
y(t) = cT
s(t) (3) x(t 1) = g(x(t), u(t 1)) (8)
where s(t) is the output of the hidden units at time y(t 1) = cx(t 1) (9)
t and c is the ouptut weight vector. A simple where x(t), y(t) are the system input and output.
example which consists of three hidden units is Due to the universal approximation property of a
illustrated in Fig. 2. This model is expressed as feedforward neural network [12,1619] the nonlinear
follows: function g can thus be approximated by a feedfor-
3 ward neural network. Hence, the above system can
s1(t) = tanh 1ksk(t 1) 1u(t) 1 (4) be rewritten as follows:
k=1 n
s2(t) = tanh 3
2ksk(t 1) 2u(t) 2 (5)
x(t 1) =
i=1
ditanh(aix(t) biu(t 1) ei) (10)
y(t 1) = cx(t 1)
k=1
(11)
s3(t) = tanh 3
k=1
3ksk(t 1) 3u(t) 3 (6) where {ai, bi, di, ei}ni=1 and c are the system para-
meters.
A Note on the Equivalence of NARX and RNN 35
iksk(t) iu(t 1) i
(13)
For getting back the NARX model, we can perform
the inverse transformation defined as follows:
W1 = W1 (31)
n
i = i (18) 1 if x 1
f(x) = x if 1 x 1 (36)
Let [ ] be the matrix ( ik)nn; the vector form is 1 if x 1
given by
[ ] = T, = , = , = (19) this result can be extended to higher order NARX
model. Without loss of generality, we consider a
This establishes a way to transform a NARX into second order NARX:
a recurrent neural network. The inverse transform-
ation of a RNN into a NARX model can be y(t 1) = W1f(W20
y(t) W21
y(t 1) (37)
accomplished via the following equations: W3u(t 1) W4)
=
=
(21)
(22)
z(t) =
s(t)
s(t 1)
where s(t 1) = f(W20
y(t) W21
y(t 1)
= (23)
W3u(t 1) W4). Since s = f(s) if s is bounded
y(t) W3
u(t 1) W4) (24) where
36 J.P.F. Sum et al.
[11,21], the complexity is still in the same order
W20W1 W21W1 if N n.
W2 = (41)
Inn Onn Next, if the same dynamic system is identified
W3 =
W3
OnM
(42)
by a RNN, the extended Kalman filter approach
[13,22,23] is one fast method which can simul-
taneously estimate the state vector s(t) and identify
W4 =
W4
On1
(43)
the parametric vector (t) [24]:
s(tt 1) = g(s(t 1t 1), u(t), (t 1)) (48)
Note that W1 RN2n, W2 R2n2n, W3 R2nM P(tt 1) = F(t 1)P(t 1t 1)FT(t 1) (49)
and W4 R2n1.
Property e(t) = (y
y(s(tt 1), (t 1)))
(t)
controversy if the dimension of y is larger than the The initial P1(00) is set to be a zero matrix and
number of hidden units, that is n N. (0) is a small random vector. We have
Suppose a NARX is being trained by using the
Forgeting Recursive Least Squares1 (FRLS) method. g(s(tt), u(t 1), (t)) = tanh(W2(t) s(tt) (55)
Let be the augmented vector including all the W3(t) u(t 1) W4(t))
parameters {W1, W2, W3, W4}, xt = (y T
(t 1), u The computationl burden is again on P(tt) which
T
(t)) and (t) = (y(t)/, the training can be
T
requires (dim 3) multiplication.
accomplished via the following recursive equations: Since is the augmented vector including all the
S(t) = T(t)P(t 1)(t) (1 )INN (44) parameters {W1, W2, W3, W4} and is the augmented
vector including all the parameters {W1, W2, W3, W4};
L(t) = P(t 1)(t)S S (t)
1 1
(45) the dimension of will be the total number of
P(t 1) elements in W1, W2, W3 and W4. That is
P(t) = (Idimdim L(t)(xt)) (46)
1 dim = n(2N M 1)
(t) = (t 1) L(t)(y(t) y(xt, (t 1)) (47) Similarly, the dimension of is given by
with the initial conditions (0) = 0 and dim = n(n M N 1)
P(0) = 1Idimdim and 0 1, and is a small
positive number. y(xt,(t1)) is the ouptut of the By comparing their computational complexities on
NARX model at the t th step. The computational P(t) and P(tt), respectively, it is observed that training
burden is on Eqs (45) and (46), which requires the RNN may not be more time consuming than
(dim3) multiplication. Even though some de- training a NARX model. So, we suggest the following
indirect method for training NARX and RNN.
1
We pick FRLS for discussion simply because it is a fast training (a) If N n and NARX has to be trained, we
method for feedforward neural networks [2,20,21,34]. can first initialise a random NARX model and
A Note on the Equivalence of NARX and RNN 37
3.2 On Pruning
Fig. 3. Summary of the training and pruning ideas implied from
the model equivalence.
One should also realise that this equivalence result
sheds light on the design of a more effective RNN
pruning algorithm. As indicated in some papers 3.3. On Stability Analysis of the NARX Model
[3,10,25], pruning a RNN is basically more difficult
than pruning a feedforward neural network. One rea- Stability is one concern that researchers would like
son is due to the evaluation of the second order to consider a dynamic system has been identified. In
RNN, some resutls on this issue have recently been
derivative of the error function. Therefore, it will be
derived [9,12,18]. In accordance with the equivalence
interesting to see whether we can reformulate the
of NARX and RNN, we can readily use those the-
RNN pruning in such a way that is similar to a
orems to analyse the system stability.
feedforward network pruning.
The idea is simple. After the RNN has been trained,
Theorem 1. A NARX model defined as in Eq. (24)
it is transformed into an equivalent NARX model.
is stable if the magnitude of all the eigenvalues of
Then we can apply optimal brain damage [26,27] or
W2W1 are smaller than one.
other techniques [25,28,29] to prune the NARX
model. Empirically, pruning a feedforward neural net- Proof. Using the equivalence relation, a NARX model
work is usually easier than pruning a recurrent neural
n
network [20,22]. Once the pruning procedure is fin-
ished, we transform it back to a RNN model. Of y(t 1) =
y(t) W3
W1 tanh(W2 u(t 1) W4)
i=1
course, this kind of indirect pruning procedure for
RNN does not ensure that the number of weights can be transformed into
will be reduced. s(t 1) = tanh(W2
s(t) W3
u(t 1) W4)
Note that not all pruning techniques for feedforward
networks can be applied. Two examples are the Stat- y(t 1) = W1
s(t 1)
istical Stepwise Method (SSM) [30] and RLS-based When no input is fed into the system, the difference
pruning [20], as their pruning methods require infor- s(t 1) and
between s(t) is given by
mation which can only be obtained during training.
To do so, we will have to transform the RNN into s
s(t) = tanh(W2W1
(t 1) s(t) W4)
an equivalent NARX model at the very beginning. tanh(W2W1
s(t 1) W4)
Once the RNN is initialised, it is transformed into an
equivalent NARX model. Once training of this equiv- W2W1
s(t)
s(t 1)
alent NARX model is finished, we can apply methods Therefore, if all the eigenvalues of W2W1 are smaller
such as statistical stepwise method, RLS-based prun- than one, limt s(t) = 0, which implies
s(t 1)
ing and the nonconvergent method [31] to prune the that s(t) will converge to a constant vector
s0. Hence
NARX model. After pruning is finished, the pruned limty(t) = W1s0, and the proof is complete.
NARX model is transformed back into a RNN. For
clarity, we summarise all these training and pruning One consequence of Theorem 1 is applicable to
ideas graphically in Fig. 3. the design of a feedback controller for a dynamic
38 J.P.F. Sum et al.
system. Assuming that an unknown system is already we have also shown that if the neuron tranfer function
identified by a NARX model with y0 being an equilib- is a piecewise linear function, every NARX model
rium and the system is unstable. Due to disturbance, (irrespective to its order) can also be transformed
the output of y0 y
y0 shifts to
. To make the into a RNN, and vice versa. In accordance with this
output of the system go back to y0, one can design equivalent relationship, we are able to:
an output feedback controller, as shown in Fig. 4(b),
speed up the training of a NARX or a RNN by
u(t 1) = W0
y(t) an indirect method,
with W0 satisfying the condition that all the eigen- simplify the pruning procedure of RNN,
values of (W2 W3W0)W1 are smaller than one. analyse the stability behaviour of a NARX,
Usually, researchers proposed to use two neural net- design an output feedback controller for the
works, one for identification and the other for control unknown dynamic system.
[11,21,32,33]. To make the controller work, two-phase
training is needed. In the first phase, a neural network
identifier has to be trained to identify the dynamical
behaviour of the system. Then, in the second phase, References
the weight values of the identifier is fixed and the
1. Chen S, Billings SA, Grant PM. Non-linear system
controller network is trained. This will be time con- identification using neural networks. Int J Control 1990;
suming and difficult to implement as an online con- 51(6): 11911214
trol method. 2. Chen S, Cowan C, Billings SA, Grant PM. Parallel
recursive prediction error algorithm for training layered
neural networks. Int J Control 1990; 51(6): 12151228
3. Lin T, Lee Giles C, Horne BG, Kung SY. A delay
4. Conclusion damage model selection algorithm for NARX neural
networks. IEEE Trans Signal Processing, Special Issue
In this paper, we have presented several results on on Neural Networks for Signal Processing 1997;
the equivalence of the NARX model and the RNN. 45(11): 2719
First, we have shown that if the neuron transfer 4. Narendra KS, Parthasarathy K. Neural networks and
dynamical systems. Int J Approximate Reasoning 1992;
function is tanh, every first order NARX model can 6: 109131
be transformed into a RNN, and vice versa. Second, 5. Siegelmann HT, Horne BG, Lee Giles C. Computational
capabilities of recurrent NARX neural networks. IEEE
Trans Systems, Man and Cybernetics Part B: Cyber-
netics 1997; 27(2): 208
6. Rumelhart DE et al. Learning internal representations
by error propagation. In Parallel Distributed Processing
Volume 1: Foundations, DE Rumelhart et al. (ed), MIT
Press, 1986, pp 318362
7. Jin L et al. Absolute stability conditions for discrete-
time recurrent neural network. IEEE Trans Neural Net-
works 1994; 5(6): 954964
8. Jin L et al. Approximation of discrete-time stage-space
trajectories using dynamic recurrent neural networks.
IEEE Trans Automatic Control 1995; 40(7): 12661270
9. Jin L, Gupta MM. Globally asymptotical stability of
discrete-time analog neural network. IEEE Trans Neural
Networks 1996; 7(4): 10241031
10. With Pedersen M, Hansen LK. Recurrent networks:
Second order properties and pruning. Advances in Neu-
ral Information Processing Systems 7 (G. Tesauro et al.
(ed), MIT Press, 1995, pp 673680
11. Puskorius GV, Feldkamp LA. Neurocontrol of nonlinear
dynamical systems with Kalman filter trained recurrent
networks. IEEE Trans Neural Networks 1994; 5(2):
279297
12. Sontag ED. Recurrent neural networks: Some systems-
theoretic aspects. In: Dealing with Complexity: a Neural
Network Approach (M Karny, K Warwick, V Kurkova,
eds.), Springer-Verlag, London, 1998, pp 111
Fig. 4. (a) Training of a NARX model to identify an unknown 13. Williams RJ. Training recurrent networks using the
dynamic system; (b) once the system is identified, an output extended Kalman filter. Proc IJCNN92, Baltimore, Vol
feedback controller, u(t + 1) = W0
y(t), can be designed. IV, 1992 pp 241246.
A Note on the Equivalence of NARX and RNN 39
14. Connor JT, Martin D, Atlas LE. Recurrent neural net- method for recurrent networks. Neural Computation
works and robust time series prediction. IEEE Trans 1998; 10(6): 14811505
Neural Networks 1994; 5(2): 240254 25. With Pedersen et al. Pruning with generalization based
15. Olurotimi O. Recurrent neural network training with weight saliencies: OBD and OBS. Advances in Neu-
feedforward complexity. IEEE Trans Neural Networks ral Information Processing Systems 8 (DS Touretzky
1994; 5(2): 185197 et al. (eds), MIT Press, 1996
16. Cybenko G. Approximation by superpositions of a sig- 26. LeCun Y et al. Optimal brain damage. Advances in
moidal function. Mathematics of Control, Signals and Neural Information Processing Systems 2 (DS Tour-
Syst 1989; 2: 303314 etzky, ed.) 1990, pp 396404
17. Funahashi K. On the approximate realization of continu- 27. Reed R. Pruning algorithms A survey. IEEE Trans
ous mappings by neural networks. Neural Networks Neural Networks 1993; 4(5): 740747
1989; 2(3): 183192 28. Hassibi B, Stork DG. Second order derivatives for
18. Sontag ED. Neural network for control. In Essays on network pruning: Optimal brain surgeon. In: Hanson
Control: Perspectives in the Theory and its Applications et al. (eds) Advances in Neural Information Processing
(HL Trentelman, JC Willems, eds), Birkhauser, Boston, Systems 1993, pp 164171
1993, pp 339380 29. Moody J. Prediction risk and architecture selection for
19. Sum J, Chan LW. On the approximation property of neural networks. In From Statistics to Neural Net-
recurrent neural network. Proc World Multiconference works: Theory and Pattern Recognition Application,
on Systemics, Cybernetics and Informatics, Caracas, V. Cherkassky et al. (eds.), Springer-Verlag, 1994
Venezuela, July 711 1997 30. Cottrell M et al. Neural modeling for time series: A
20. Leung CS et al. On-line training and pruning for RLS statistical stepwise method for weight elimination.
algorithms. Elec Lett 1996; 7: 21522153 IEEE Trans Neural Networks 1995; 6(6): 13551362.
21. Puskorius GV, Feldkamp LA. Decoupled extended Kal- 31. Finnoff W, Hergert F, Zimmermann HG. Improving
man filter training of feedforward layered networks. model selection by nonconvergent methods. Neural
Proc IJCNN91, Vol I, 1991, pp 771777 Networks 1993; 6: 771783
22. Sum JPF et al. Extended Kalman filter in recurrent 32. Ku C, Lee K. Diagonal recurrent neural networks for
neural network training and pruning. Trechnical report dynamic systems control. IEEE Trans Neural Net-
CS-TR-96-05, Department of Computer Science and works 1995; 6(1): 144156
Engineering, CUHK, June 1996 33. Narendra KS, Parthasarathy K. Identification and con-
23. Suykens J, De Moor B, Vandewalle J. Nonlinear system trol of dynamical systems using neural networks. IEEE
identification using neural state space models, applicable Trans Neural Networks 1990; 1(1): 427
to robust control design. Int J Control 1995; 82(1): 34. Shah S et al. Optimal filtering algorithms for fast
129152 learning in feedforward neural networks. Neural Net-
24. Sum J et al. Extended Kalman filter-based pruning works. 1992; 5: 779787