Application of Adaptive Least Square Support Vector Machines

Proceedings of the 6th World Congress on Intelligent Control
and Automation, June 21 - 23, 2006, Dalian, China
Application of Adaptive Least Square Support Vector

Machines in Nonlinear System Identification
Xiaodong Wang, Weifeng Liang, Xiushan Cai, Ganyun Lv, Changjiang Zhang and Haoran Zhang
College of Information Science and Engineering
Zhejiang Normal University
Jinhua 321004, Zhejiang, China
{wxd 200420255 xiushan ganyun_lv zcj74922 & hylt}@zjnu.cn
Abstract - Training problem of least squares support vector solving a set of linear equations. Consequently, the solution of
machine (LS-SVM) is solved by finding a solution to a set of LS-SVM is always unique and globally optimal. In our
linear equations. This makes online adaptive implementation of previous work [6], LS-SVM has been successfully used for
the algorithm feasible. In this paper, an adaptive algorithm for nonlinear system identification. But existed LS-SVM
the purpose of nonlinear system identification is proposed. Using
this training algorithm, a variant of support vector machine has
algorithm is trained offline in batch way. Offline training
been developed called adaptive LS-SVM. The adaptive LS-SVM algorithm is not fit for the practical applications such as online
is especially useful on online system identification. Several system identification and control problems, where the data
pertinent numerical simulations have shown the validity of the come is sequentially. To overcome the problem mentioned
proposed method. above, this paper proposes an adaptive LS-SVM method for
nonlinear system identification.
Index Terms - Nonlinear systems, Identification, Support
vector machines II. PROBLEM FORMULATION
The nonlinear dynamic systems with an input u and an
I. INTRODUCTION
output y can be described in discrete time by the NARX
It is well known that in the past three decades linear (nonlinear autoregressive with exogenous input) input–output
models have been widely used in system identification, for model
two major reasons. Firstly, the effects that different and y (k 1) f (x(k )) , (1)
combined input signals have on the output are easily
determined. Secondly, linear systems are homogeneous. where f () is some nonlinear function, y (k 1) denotes the
However, most control systems encountered in practice are output predicted at the future time instant k 1 and x(k ) is
nonlinear. In many cases, linear models are not suitable to the regressor vector, consisting of a finite number of past
represent these systems and nonlinear models have to be inputs and outputs:
considered. Since there are nonlinear effects in practical ª y (k ) º
systems, e.g. harmonic generation, intermodulation, « »
desensitization, gain/expansion and chaos, neither of the « »
above principles for linear models are valid for nonlinear « y (k n y 1)»
systems. Therefore, nonlinear system identification is much x( k ) « ». (2)
« u (k ) »
more difficult than linear system identification [1]. « »
In nonlinear system identification field, researchers are
« »
very enthusiastic about the potential of neural networks ¬«u (k nu 1) ¼»
especially regarding the multilayer perceptron (MLP) [2-4]. The dynamic order of the system is represented by the
However, their performance is not always satisfactory. Some number of lags nu and n y .
inherent drawbacks, e.g. the multiple local minima problem,
the choice of the number of hidden units and the danger of The task of system identification is essentially to find
over fitting, etc., would make it difficult to put the MLP into suitable mappings, which can approximate the mappings
some practice. implied in a nonlinear dynamic system. The function f () can
Recently, the least squares support vector machine (LS- be approximated by some general function approximators
SVM) [5] has emerged as an alternative approach to MLP for such as neural networks, neuro-fuzzy systems, splines,
pattern recognition. This is due to the fact that the LS-SVM is interpolated look-up tables, etc. The aim of system
established based on the structural risk minimization principle identification is only to obtain an accurate predictor for y . In
rather than minimize the empirical error commonly this work, we use the adaptive LS-SVM for nonlinear system
implemented in the MLP. The LS-SVM achieves higher identification.
generalization performance than the MLP in solving the
machine learning problem. Moreover, unlike MLP’ training III. LS-SVM REGRESSION
that requires nonlinear optimization with the danger of getting
stuck into local minima, training LS-SVM is equivalent to
1-4244-0332-4/06/$20.00 ©2006 IEEE

1897
In this section, we briefly discuss LS-SVM regression. solutions, the LS-SVM uses equality constraints. The (6) is
For further details on LS-SVM we refer to Ref. [5]. solved by finding a solution to a set of linear equations. This
Consider a given training set of N data points ^xi , yi `i 1 makes online adaptive implementation of the algorithm
N
feasible.
with input data xi R n and output yi R . In feature space
IV. ADAPTIVE LS-SVM FOR SYSTEM IDENTIFICATION
LS-SVM models take the form:
y ( x) wT M ( x) b , (3) Here, we formulate an adaptive solution for the LS-SVM
regression based on (6). We use a windows size of length L .
where the nonlinear mapping M () maps the input data into a
The training data are described by { X (k ),Y (k )} , the inputs
higher dimensional feature space. Note that the dimensional of
w is not specified (it can be infinite dimensional). In LS- X (k ) [ xk , xk 1 , , xk L 1 ] , and targets
T
SVM for function estimation the following optimization Y (k ) [ y k , y k 1 ,, y k L 1 ] . At k
the moment,
problem is formulated Qij (k ) K ( xi k 1 , x j k 1 ) , i, j 1,, L , D (k ) [D k , D k 1 ,
1 C N
min wT w ¦ ei2 , (4) ,D k L1 ]T , b(k ) bk , y (k ) y k , and then the (7) can be
2 2k1
subject to the equality constrains rewritten as
k L 1
y i w M ( x i ) b ei , i 1, , N . (5) y (k ) ¦D i (k ) K ( x, xi ) b(k ) . (9)
Here the quadratic programming problem has equality i k
constrains. The problem is again convex and can be solved Let U (k ) Q(k ) C 1 I , where I is a one matrix, and then
using Lagrangian multipliers, D i . Using the Karush-Kuhn- we get the matrix equation described by
Tucker (KKT) conditions we get the linear equations ª 0 E T º ª b( k ) º ª 0 º
ª0 E T º ª b º ª0º « »« » « ». (10)
« 1 » « » « », (6) ¬ E U (k )¼ ¬D (k )¼ ¬ y (k )¼
¬ E Q C I ¼ ¬D ¼ ¬ y ¼ Assuming P ( k ) U (k ) 1 , we can then use the (10) to
T T T
where y [ y1 ,, y N ] , E [1,,1] , D [D1 ,,D N ] and compute P (k ) to get that
Qij I ( xi ) I ( x j ) K ( xi , x j ) , i, j 1,, N .
E T P(k ) y (k )
The formulation and solution is presented in Ref. [5] and b( k ) , (11)
E T P(k ) E
because the constraints are all equality constraints the solution
is given by solving a set of N 1 linear equations. Note that EE T P(k ) y (k )
D (k ) P(k )( y (k ) ) P(k )( y (k ) Eb(k ) , (12)
C is a regularization factor. If C is large, then the solution E T P(k ) E
approaches the standard minimum mean squared error where
estimate. If C is made small, then the regularization term,
(1 / 2) wT w becomes more important and outlier points are de-
P(k ) U (k ) 1 >Q(k ) C I @1 1
1
emphasized. ª h( k ) H ( k ) T º
« » (13)
This finally results into the following LS-SVM model for ¬ H (k ) D(k ) ¼
function estimation
N ª0 0 º T
y ( x) ¦ D i K ( xi , x j ) b , (7) «0 D(k ) 1 » sh (k ) sh (k ) ch (k )
i 1
¬ ¼
where D i , b are the solutions to the linear system, K ( xi , x j ) h( k ) K ( xk , xk ) C 1 ,
represents the high dimensional feature space that is H (k ) [ K ( xk 1 , xk ) C 1 ,, K ( xk l 1 , xk )]T ,
nonlinearly mapped from the input space x . The LS-SVM
ª K ( xk 1 , xk 1 ) C 1 K ( xk L1 , xk 1 ) º
approximates the function using the (7). « »
The choice of the kernel function K ( xi , x j ) has several D(k ) « »,
« K ( xk 1 , xk L1 ) 1 »
K ( xk L1 , xk L1 ) C ¼
possibilities. In this work, the radial basis function (RBF) is ¬
used as the kernel function of the LS-SVM because RBF
sh (k ) [1, H (k )T D(k ) 1 ]T ,
kernels tend to give good performance under general
smoothness assumptions. Consequently, they are especially ch (k ) 1 (h(k ) H (k )T D(k ) 1 H (k )) .
useful if no additional knowledge of the data is available. For At the k 1 moment, the new data pair ( xk L , y k L )
RBF kernels one has
2 entered into the training data and the old data pair ( xk , y k ) is
K ( xi , x j ) exp§¨ xi x j 2V 2 ·¸ , (8) thrown away from the training data. The kernel function
© ¹
where V is a positive real constant. changes into Qij (k 1) K ( xi k , x j k ) , i, j 1,, L , then
Note that the LS-SVM is still found by working in the
dual space, but unlike standard support vector machine
P ( k 1) U ( k 1) 1 >Q(k 1) C I @ 1 1
. (14)
1898
The adaptive algorithm of the LS-SVM for finding
threshold value b(k ) and D (k ) can be summarized by 1.2
(a)
1) Initialization: k 1 . 0.9
2) The new data pair ( xk L , y k L ) coming in and the old
0.6
data pair ( xk , y k ) is thrown away in order to obtain the
y(k)
training data ( X (k ), y (k )) . 0.3
3) Computing the kernel function Q(k ) and P(k ) . 0.0

4) Computing the b(k ) , D (k ) and predicting y (k ) .
-0.3
5) k m k 1 go to 2). 0 20 40 60 80 100
k
V. SIMULATION STUDIES
0.020
Extensive simulation studies were carried out with several 0.015 (b)
examples of nonlinear dynamic systems in order to verify the 0.010
proposed adaptive LS-SVM in nonlinear system identification. 0.005
error(k)
The root mean squared error metric (RMSE) metric is
0.000
used to evaluate the performance of adaptive LS-SVM
-0.005
identification for the examples. The RMSE of the validation
-0.010
set is calculated as follows:
-0.015
1 N
RMSE ¦ k 1
( y (k ) ~
y (k )) 2 , (15) -0.020
N 0 20 40 60 80 100
k
where N denotes the total number of data points in the
validation set. y represents the output of the original
Fig. 1 The identification results using the proposed adaptive
nonlinear system and ~ y represents the estimation output of LS-SVM: (a) the system (solid) and the identified (dashed)
the adaptive LS-SVM model. The identification error is output using adaptive LS-SVM, and (b) the identification
error (k ) y (k ) ~
error (Example 1)
y (k ) .
Example 1.
It is well known that the familiar model of the nonlinear 3
(a)
system given by the difference equation known as the logistic
map exhibits chaotic behavior in a certain region of the
parameter space. The logistic map is given by the difference
y(k)
equation 0
y (k 1) Oy (k )(1 y (k )) . (16)
For values of O >3.5699, the time series of y (k ) shows
chaotic behavior. Let us consider the logistic map for O =3.9
in this work. -3
0 20 40 60 80 100
Fig. 1 depicts the identification results using the proposed k
adaptive LS-SVM to Examples 1
Example 2. 0.04 (b)
In this example, we consider a system described by the
second-order difference equation 0.02
error(k)
y (k ) y (k 1)( y (k ) 2.5)
y (k 1) u (k ) , (17) 0.00
1 y 2 (k 2) y 2 (k 1)
where the system input is defined as u (k ) sin(2Sk / 25) . Fig. -0.02
2 depicts the identification results using the proposed adaptive -0.04

LS-SVM to Examples 2
0 20 40 60 80 100
Example 3. k
The plant of the nonlinear system is given by the
following difference equation: Fig. 2 The identification results using the proposed adaptive LS-
0.2 y (k 1) 0.6 y (k 2) SVM: (a) the system (solid) and the identified (dashed) output using
y (k ) sin(u (k 1)) , (18)
1 y (k 1) 2 adaptive LS-SVM, and (b) the identification error (Example 2).
1899
6 estimation error is shown for the Examples 1-3. Their
(a) corresponding RMSEs are 0.0031, 0.0044, and 0.0121
respectively.
3
VI. CONCLUSIONS
y(k)
0
The LS-SVM algorithm is trained offline in batch way.
But offline training algorithm is not fit for the practical
-3 applications such as online system identification, where the
data come is sequentially. In this paper, an online nonlinear
-6 system identification scheme based on adaptive LS-SVM has
0 500 1000 1500 2000
k been presented. Several different models are used as
evaluation of the identifying power of the adaptive LS-SVM.
0.4 The results indicate that this approach is effective. The
(b)
adaptive LS-SVM may be used for online signal processing
0.2 applications due to its less computational requirement and
satisfactory performance.
error(k)
0.0
ACKNOWLEDGMENT
-0.2 The Project Supported by Zhejiang Provincial Natural
Science Foundation of China (Y105281).
-0.4
0 500 1000 1500 2000 REFERENCES
k [1] G.P. Liu, V. Kadirkamanathan, and S.A. Billings, “On-line identification
of nonlinear systems using Volterra polynomial basis function neural
networks,” Neural Networks, vol. 11, no. 9, pp. 1645-1657, Dec. 1998.
Fig. 3 The identification results using the proposed adaptive LS-SVM: (a) [2] S. Lu and T. Basar, “Robust nonlinear system identification using neural
the system (solid) and the identified (dashed) output using adaptive LS- network models,” IEEE Trans. Neural Networks, vol. 9, pp. 407-429,
SVM, and (b) the identification error (Example 3). May 1998.
[3] K.S. Narendra and K. Parthasarathy, “Identification and control of
dynamical systems using neural networks,” IEEE Trans. Neural
The reference input is selected as Networks, vol. 1, pp. 4-27, Mar. 1990.
[4] R. Griñó, G. Cembrano and C. Torras, “Nonlinear system identification
sin( 2Sk / 250), k d 1000, using additive dynamic neural networks-two on-line approaches,” IEEE
u (k ) ®
¯0.2 sin( 2Sk / 200 S / 3) 0.5 sin( 4Sk / 300), k ! 1000, Trans. Circuits and Systems I: Fundamental Theory and Applications,
vol. 47, pp. 150-165, Feb. 2000.
(19) [5] J.A.K. Suykens and J. Vandewalle, “Least squares support vector
The simulation result is shown in Fig. 3. machine classifiers,” Neural Processing Letters, vol. 9, pp. 293-300, Jun.
As can be seen from Figs. 1(a), 2(a) and 3(a), which 1999.
[6] M.Y. Ye and X.D. Wang, “Chaotic time series prediction using least
demonstrate that the identified and system outputs of the squares support vector machines,” Chinese Physics, vol. 13 pp. 454-458,
system are largely indistinguishable, almost perfect Apr. 2004.
identification are achieved. In Figs. 1(b), 2(b) and 3(b), the
1900

Application of Adaptive Least Square Support Vector Machines

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Application of Adaptive Least Square Support Vector Machines

Hochgeladen von

Copyright:

Verfügbare Formate

Proceedings of the 6th World Congress on Intelligent Control

and Automation, June 21 - 23, 2006, Dalian, China

Application of Adaptive Least Square Support Vector

1-4244-0332-4/06/$20.00 ©2006 IEEE

3) Computing the kernel function Q(k ) and P(k ) . 0.0

2 depicts the identification results using the proposed adaptive -0.04

Das könnte Ihnen auch gefallen