Sie sind auf Seite 1von 11

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO.

6, DECEMBER 1992

51 1

Process Control by On-Line Trained Neural Controllers


Julio Tanomaru, Student Member, IEEE, and Sigeru Omatu, Member, IEEE

Abstract-Although neural controllers based on multilayer neural networks have been demonstrating high potential in the nonconventional branch of adaptive process control called neurocontrol, practical applications are severely limited by the long training time that they require. This paper addresses the question of how to perform on-line training of multilayer neural controllers in an efficient way in order to reduce the training time. At first, based on multilayer neural networks, structures for a plant emulator and a controller are described. Only a little qualitative knowledge about the process to be controlled is required. The controller must learn the inverse dynamics of the plant from randomly chosen initial weights. Basic control configurations are briefly presented and new on-line training methods, based on performing multiple updating operations during each sampling period, are proposed and described in algorithmic form. One method, the direct inverse control error approach, is effective for small adjustments of the neural controller when it is already reasonably trained; another, the predicted output error approach, directly minimizes the control error and greatly improves convergence of the controller. Simulation and experimental results using a simple plant show the effectiveness of the proposed neuromorphic control structures and training methods.

I. INTRODUCTION RTIFICIAL neural networks (ANN'S) are matheatical systems designed to deliberately employ principles on which biological nervous systems are believed to be based. By embodying such principles, ANN modelers expect to be able to emulate the information processing capabilities of biological neural systems to some extent [ l ] ,[ 2 ] . The recent efforts in applying ANN'S to control of dynamical processes resulted in the fledgling, [4], but very promising, field of neurocontrol[3], which can be thought of as a nonconventional (connectionist) branch of adaptive control theory. The neurocontrol appeal for control engineers can be primarily explained by three reasons: 1 ) biological nervous systems are living examples of intelligent adaptive controllers, 2) ANN'S are essentially adaptive systems able to learn how to perform complex tasks, and 3) neurocontrol techniques are believed to be able to overcome many of the difficulties that

A m

Manuscript received April 17, 1992; revised July 31 and August 22, 1992. The authors are with the Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770 Japan. IEEE Log Number 9204064. 0278-0046/92$03.00

conventional adaptive control techniques [5]suffer when dealing with nonlinear plants or plants with unknown structure. Generally, the training of ANN'S involved in a control system can be performed on- or off-line, depending on whether they execute useful work or not while learning is taking place. Although off-line training is usually straightforward, conditions for assuring good generalization of the ANN'S through the control space are difficult to attain, making on-line training always necessary in control applications. In fact, the training should ideally occur exclusively on-line, with ANN'S learning at high speed from any initial set of weights. Neural or neuromorphic controllers (NC's), i.e., controllers based on an ANN structure, have been proposed to learn the inverse dynamics of the control plant from the observation of the plant input-output relationship through time [6], [7].Although such NC's have proven to be effective in controlling complex systems, the usually very long time required for training makes controllers obtained via conventional techniques preferable in many practical applications. The slow convergence of NC's implies poor control performance and robustness, especially during the first stages of training. In order to make neurocontrol a viable alternative to industrial control of processes, there is a pressing necessity of efficient on-line training algorithms for NC's. This paper first presents general neuromorphic control structures based on multilayer ANN'S. Only a little qualitative knowledge about the plant is necessary. Basic control configurations are briefly presented, but, although they perform well after enough learning, long training time is necessary until good control performance is achieved. In conventional adaptive systems in which one updating operation takes place for each sampling period, increasing the sampling rate seems to be a natural step toward improving performance. However, in many practical cases the sampling rate cannot exceed some limit, either by physical or practical (technical) constraints. Considering such a limitation, new on-line training methods, based on efficient use of plant input-output data and distinction between sampling and learning frequencies, are proposed in order to reduce NC training time and described in algorithmic form. Simulation and experimental results attest the effectiveness of the proposed neuromorphic control structures and training methods.
0 1992 IEEE

512

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6, PECEMBER 1992

11. NEUROMORPHIC CONTROL STRUCTURES


A. Problem Statement

C. Neural Plant Emulator (PE) Given the estimates p and q, a MNN with m = p + q y(k + 1) = f [ y ( k ) , y ( k - 1 ) > . . . , Y ( k- p + 11, + 1 inputs and single output can be used for emulating ~ ( k ) ( k - 11, u, u ( k - Q)l (1) f(.) in (1). Denoting the mapping performed by the plant where y denotes the output, U is the input, k is the emulator by cp,(.), and its output by y,, we have discrete-time index, P and Q are nonnegative integers, Yl = (PE(XE) (3) and f ( . ) is some function. In many practical cases, the plant input is limited in amplitude, i.e., there exist U, and where x E is an m-dimensional vector. For x , ( k ) = uM such that, for any k , [ y ( k ) ; - . ,y ( k - p + 11, u ( k ) ; . - ,u(k - q)IT, the emulator is trained in order to minimize a norm of the emulation U , I( k ) I U M . u (2) error y ( k + 1) - y , . The PE is illustrated in Fig. l(a), In this paper, the task is to learn how to control the where zT1 stands for the time-delay operator. plant described in (1) in order to follow a specified reference r ( k ) , minimizing some norm of the error e ( k ) = r ( k ) D.Neural Controller (NC) - y(k) through time. It is assumed that the only available Assume that the plant in (1) is invertible, i.e., there qualitative a priori knowledge about the plant is that p exists a function g(.) such that and q, which are, respectively, estimates for P and Q in (11, are known. In other words, it is assumed that a rough u ( k ) = g [ y ( k + 1),Y(k),...,Y(k - p + 11, estimate of the order of the plant to be controlled is ~ ( -kl ) , u ( k - 2 ) * * * , ~-(ek ] . (4) ) given. Consider again a MNN with m-dimensional input vector xc, single output u l , and an input-output relationship B. Multilayer Neural Networks for Control briefly represented by Although several ANN architectures have been applied to process control, most of the incipient neurocontrol U1 = d x , ) (5) literature concentrates on multilayer neural networks (MNNs). MNNs are particularly attractive to control where cp,(.) denotes the input-output mapping of the MNN. If the output of cp,(.) approximates the output of engineers due to the following basic reasons: g(.) for corresponding inputs, the MNN can be thought of 1) MNNs are essentially feedforward structures in as a controller in the feedforward control path. At instant which the information flows forward, from the inputs k , the input to the plant can be obtained from ( 5 ) by to the outputs, through hidden layers. This charac- setting teristic is very convenient for control engineers, used to work with systems represented by blocks with x c ( k ) = [ r ( k + 1 ) , y ( k ) , . - - , y ( k- p + 11, inputs and outputs clearly defined and separated, ~ ( -kl ) ; * . , ~ ( k- q)IT (6) and is not present in recurrent networks (e.g., Hopfields model [SI), in which bidirectional nodes ap- where the reference r ( k + 1) was used instead of the unknown y ( k + 1). After enough training of the NC, if pear and there are no true inputs and outputs. 2) MNNs with as few as one hidden layer using arbi- the output error e ( k ) is kept small, it is possible to have trary sigmoidal activation functions are &le to per- x , ( k ) = [ r ( k + l ) , r ( k ) ; . . , r ( k - p + l ) , form any nonlinear mapping between two finite-diu ( k - l ) , . . - , u ( k - q)IT (7) mensional spaces to any desired degree of accuracy, provided there are enough number of hidden units emphasizing the feedforward nature of the NC. The basic (neurons) [9], [lo]. In other words, MNNs are very NC configuration is illustrated in Fig. l(b), where the two versatile mappings of arbitrary precision. In control, alternatives to the input vector, as given by (6) and (71, are usually many of the blocks involved in a control briefly indicated. system can be viewed as mappings and, therefore, can be emulated by MNNs with inputs and outputs 111.. TRAINING CONFIGURATIONS properly defined. The controllers training signal in Fig. l(b) provides the 3) The basic algorithm for learning in MNNs, the information necessary for the NC to learn the inverse backpropagation (BPI algorithm [ll], [121, belongs to dynamics of the plant in such a way that an error function the broad class of gradient methods largely applied J defined on the plant output error e ( k ) = r ( k ) - y ( k ) is in optimal control, and is, therefore, familiar to minimized. In order to enable learning based on a gradicontrol engineers. ent method, it is necessary to compute the derivative of l), 2), and 3 ) indicate that MNNs can be thought of as the error function J with respect to the output of the NC,
. * e ,

Consider the discrete-time single-input-single-output (SISO) process

blocks with mapping learning ability. Based on such mapping ability, two general neural control structures can be proposed: a plant emulator and a controller [131, [141.

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS

513

Fig. 1. General neuromorphic control structures. (a) Plant emulator (PE). (b) Neural controller (NC).

i.e., 6 = - d J / d u , [13], [14]. The knowledge of 6 suffices for updating the weights of the NC via backpropagation (BP). Three controllers training schemes are here briefly described [131-[161. A . Direct Inuerse Control This configuration, depicted in Fig. 2(a), can be employed for on- or off-line training. However, since this control scheme relies on the NCs generalization ability, it should not be used alone as the only training scheme. On the other hand, training data for supervised learning can be obtained in a straightforward way. At time k + 1, for xL(k) = [ y ( k + l),..., y ( k + 1 - p ) , U(k - l),..., U(k q)], the NC can be updated in order to minimize an error function J defined as a function of the difference u ( k ) U@), where u , ( k ) = q c [ x 3 k ) ] and, therefore, the term 6 = - d J / d u , can be easily calculated, enabling BP. For example, defining
(8) the expression for 6 (with subindex k ) becomes merely
=

Fig. 2. Control configurations. (a) Direct inverse control. (b) Direct adaptive control. (c) Indirect adaptive control.

where e ( k ) = r ( k ) - y ( k ) and the binary factor [k was introduced to account for the constraints on the input ~ ( k )Defining i(k) = e ( k ) [ d y ( k ) / d u ( k ) l ,for the system . described in (1) and (21, at instant k the factor t k is expressed as

i(k)> 0

tk

if{ i ( k ) < 0
otherwise.

and or and

u[k

1)

=U,,,,

zl(k - 1)

=U,

J( k )

O S [ U ( k ) - ul( k ) ] *

( 1-21 The role of t k is to avoid mistraining the NC for refepxces that cannot be tracked. The inclusion of tk is equivalent to, considering the existence of a limiter between the output of the NC and the input of the plant. When the output error results from the physical limitations exprtssed in (21, the limiter saturates and a zero-de= rivative ( t k 0) is included in (10, inhibiting learning. On the other hand, when the reference can be physically followed, learning is performed in the usual way. C. Indirect Adaptive Control Although in many practical cases the derivative in (11) can be easily estimated or replaced 1 or - 1, that is, the signum of the derivative (since the algebraic signal of 6 is enough to specify the direction of the gradient of J ) , this

6,

U(k) - U,(k).

(9)

B. Direct Adaptiue Control This configuration is shown in Fig. 2(b). Learning is essentially on-line performed and the error function J is defined on the plant output error, as error functions in control systems usually are. The problem is that the exact calculation of 6 (sensitivity of J with respect to the output of the NC) requires knowledge of the Jacobian of the plant. For example, a straightforward error function and corresponding 6 term can be given by

1 ,

514

IEEE TRANSACTIONSON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6, DECEMBER 1992

is not the general case. In the indirect adaptive control scheme shown in Fig. 2(c), a PE is used to compute the sensitivity of the error function J with respect to the controllers output [7], [13]-[16]. Since PE is a MNN, the desired sensitivity can be easily calculated by using BP. Furthermore, the configuration in Fig. 2(c) is particularly useful when the inverse of the plant is ill defined, i.e., the function f(.) in (1) does not admit a true inverse. For updating the controller, the NC and PE are viewed as the variable and fixed parts of an m-input-single-output MNN, respectively. At the outset, the PE should be off-line trained with a data set sufficiently rich to allow plant identification, and then both the NC and the PE are on-line trained. In a sense, the PE performs system identification and, therefore, for rapidly changing systems, it is reasonable to update the PE more often than the NC due to robustness concerns. IV. EFFICIENT ON-LINETRAINING MNNs and the BP algorithm were originally developed for pattern classification problems, where the training patterns are static, the training procedure and the error function are straightforward, and real-time learning is not required. In control, training patterns for the A s change with time, several training algorithms and error functions can be defined, and real-time learning is necessary. Slow convergence is the most severe drawback of MNNs and seriously restricts neurocontrol practical applications. Several approaches have been proposed toward convergence speedup in neurocontrol [131. Among them, some common ideas are:

frequency cannot or should not exceed some limit. For instance, in common industrial chemical plants, one is usually interested in processes involving large time constants. There is not much sense in using high sampling rates, since information redundancy would occur. In fact, very high sampling frequencies can modify the control system completely and increase its complexity. It may be necessary to consider fast subprocesses and transients that could be ignored otherwise. Another case in which the sampling frequency cannot be made arbitrarily high occurs in distributed control systems in which information is sent to and received from a control unit at intervals which are out of control of the unit. Although the sampling period T, sets the basic pace of the control system, in systems with iterative learning the frequency at which learning occurs can be thought of as a different time basis. For most practical cases, T, is much larger than TL, the time spent in one learning iteration (updating of all network weights), and the ratio T,/T, tends to higher values as faster implementations of MNNs become available in hardware or software. Therefore, if appropriate plant input-output data is available and the only concern is time, many learning iterations can be performed during a sampling period and the normal and simplest approach of a single updating per period implies a waste of processing time. The problems are how to select appropriate training data and how to use such data and available time to perform meaningful training of the neuromorphic structures, that is, training that is likely to improve control performance. Novel training methods in which several learning iterations take place during each single sampling period are proposed as follows.

1) Developing of efficient BP algorithms. A. Emulators Training 2) Embodiment of plant structural knowledge in the Consider that at instant k + 1 the current plant output structure of the MNNs [161. y ( k + l), the p + t - 1 previous values of y, and the 3) Hybrid systems in which A s are associated with q + t previous values of U are available in memory. Then control structures derived from nonneural techt the t pairs ( x , ( k - i>, y(k 1 - i)), for i = 0, l,..., - 1 niques. can be used as patterns for training the PE at time k + 1. 4) Prelearning and efficient initialization procedures. For y,(k 1 - i) = cp,[x,(k - ill, one possibility is to Based on the distinction between sampling frequency minimize the following error function and the frequency at which learning iterations are pert-1 formed (learning frequency), new on-line training algoJ,( k ) = 0.5 (13) A,[y( k - i ) - y,( k - i)] rithms to reduce the NCs training time are introduced i=O here. In discrete-time control systems, the sampling pe2 A,-, 2 0) is a noninriod T, is usually chosen via rules of thumb in order to where {Al] (1 2 A, 2 A, 2 have 2rr/T, much larger than the largest frequency in- creasing positive sequence whose role is to implement volved in the continuous-time system. It is normally true some forgetting, emphasizing the most recent patterns. Example I : Assume that y(10) was just read (and, that increasing the sampling frequency improves the systems performance, but the noticeable improvement therefore, y(l1) is not available), p = 3, q = 2, and t = 3. rapidly reaches a plateau. In usual adaptive control sys- Also assume that y(9), y(S);-., y(5) and u(9), u(S);-*,u(5) tems, the adaptive elements are normally updated once in are available in memory. Then the PE input vectors can each sampling period in such a way that the sampling be arranged as follows: frequency and updating (or adaptation or learning) frequency may be used indistinctly. Ignoring processing time constraints, it seems that the actual training time can be reduced by increasing the sampling frequency. However, in many practical applications the sampling

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS

515

These input vectors and values y(8), y(9), and ~ ( 1 0 ) constitute the training patterns for training at time k + 1 = 10. This procedure is illustrated in Fig. 3(a), where the notation PE,,, indicates the PE's state during the kth sampling interval, after the ith learning iteration, i = 0, l;.., t - 1. Equivalently, c&'(.) denotes the input-output mapping performed by PEk,'. Clearly, &'(.) =
9 ; '
l?O(.).

B. ControllerS Training: The Direct Inverse Control Error Approach

The approach described above has an equivalent to the NC. Consider the direct inverse control configuration (Fig. 2(a)) and assume that at instant k + 1 the current output y ( k + 11, the p + t - 1 previous values of y and q + t previous values of u are all stored in memory. Then the t pairs (x:(k - i), u(k - i)), i = O;.., t - 1, for xL(k) = [ y ( k + l);.., y ( k - p + 11, ~ ( - l),..., u(k - q)lT k can be used as patterns for training the NC at time k + 1. Writing u,(k - i) = qPc[x',(k- i)], for the error function
t-1

J , ( k ) = 0.5
i=O

h , [ u ( k - i ) - u , ( k - i)]'

(14)
Fig. 3. Simple multiple training schemes. (a) PE (example 1). (b) NC, by combining the direct inverse control error and indirect adaptive control approaches (example 2).

the corresponding &term for the ith pattern becomes simply

ak,' = A , [ u ( k

i ) - u,(k - i)].

(15) These vectors and the input values ~ ( 6 1~ ( 7 1and u(8) , , constitute three training patterns (input vector and desired output) available for training the NC at time k + l = 9. However, since this kind of training does not minimize the control error directly, in practice it is necessary to combine this approach with one of the methods described in Section 111-B and 111-C. Fig. 3(b) shows the situation in which multiple learning based on the direct inverse control error and simple learning based on the indirect adaptive control configuration are combined, resulting in four learning iterations per sampling period. , The vector x,.(8) is given by [r(9), y(8), y(7), ~ ( 7 12.461, and u(5)IT, the notation NCkr' denotes the N C s state during the kth sampling interval, after the ith learning iteration (the corresponding mapping is given by cp,"2'<.>>, and the PE is considered perfectly trained for the sake of simplicity. C, ControllerS Training: The Predicted Output Error Approach
A more complex approach for multiple training of the NC can be derived from the indirect adaptive control configuration (Fig. 2(c)). Assume that t reference values r(k + 1 - i ) , i = 0, l,--.,t - 1, are also available at instant k + 1, in addition to t + p values of y , including y ( k + l), and t + q previous values of U. From ( 6 ) and (7), this is equivalent to having t input vectors x,(k - i ) in memory. According to ( ) at instant k - i, the control 5, input u(k - i ) was generated by u ( k - i ) = (pck'i.0 [ x , ( k - i l l . (16)

Notice that the error function in (14) is not directly based on the plant output error. Hence, training of the neural controller does not improve control performance directly, unless learning has been carried out in such a way that good generalization through the control space can be expected. In fact, controller training based exclusively on the direct inverse control error approach generally leads to bad results, and in practice it has been observed that the output of NC tends to stick at some constant value, resulting in zero training error, but obviously poor control performance. This drawback, common to training methods based on the minimization of the inverse control error, can be overcome by combining such training methods with others that directly minimize the plant output error, as illustrated in the following example. Example 2: Assume that y(9) was just read, p = 2, and q = 3. Also assume that the values y(8),y(7),-..,y(5) and 481, u(7);.., 4 3 ) are available in memory. Denoting the (indicatcurrent mapping performed by the NC by (p,". '0, ing k + 1 = 9, no learning yet), the plant input u(9) can p : be calculated from the relation u(9) = c O[ x,(9)], where ~ y(8), xc(9) = [r(10), ( 9 1 , u(8>,u(7),u(6)lT. For learning based on the direct inverse control error approach, the following vectors are then available: xX6) x:(7) x:(8)
=

[ ~ ( 7 > , ~ ( 6 > , ~ ( u ()4~ , ~ 3 )5 r> ~ 5 ) 4( l [ Y ( 8 ) ,y ( 7 Y~w ) ,

= =

m75 1 , ~ 4

4 1 1 ~

t~(9),~(8)~~(7),u(7),u(6)~ 1 1 ~ . ~(5

516

IEEE TRANSACTIONSON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6, DECEMBER 1992

However, the NC has been updated several times since the time the vector x,(k - i) was stored until the present instant k + 1. Hence, at time k + 1, the stored input vector x,(k - i) would yield the virtual plant input
u * ( k - i)
=

xcl k- , I

p ~ + l ~ o [ x ,-k ( i)].

(17)

ylk-it11

This means that, although the most recent data x , ( k ) , with corresponding y ( k 1) and r ( k + l), can be directly used for training by one of the adaptive control configurations, the same does not happen with the past values x,(k - i), for i = l;..,t - 1. For those cases, .however, the corresponding plant responses can be predicted from the emulator, as shown in Fig. 4(a), by

y*(k

+ 1 - i)

cp;+'[x;(k

-i)]

(18)

where the second superscript index of pE(Jwas omitted for simplicity, and the vector x;(k - i) is given by

xE(k

- i) =

[ y ( k - i ) , . - - , y ( k- p
U* (

+ 1 - i),
-

x&7)

i) ;-.,

U* ( k

i ) I T . (19)

(b) Fig. 4. Predicted output error approach. (a) Training scheme using errors between the reference and the output of the PE. (b) Multiple training of the NC by the predicted output error approach (example 3).

The NC training can be understood by considering the NC and the PE as a single MNN: at time k + 1, for each input vector x,(k - i), i = l;..,t - 1, there corresponds a predicted error r ( k + 1 - i) - y * ( k + 1 - i), and training is performed in order to minimize some norm of the predicted output errors. A possible error function would be
t-1

V. TRAINING ALGORITHMS In the following, the on-line training methods proposed in the previous section are written as algorithms in a sort of pseudocode, making real implementation somewhat straightforward. The BP algorithm is assumed to be known and is briefly represented by a routine call of the type

J,(k)

0.5
i=O

X,[r(k- i ) - y * ( k

i)]'.

(20)

In the training configuration shown in Fig. 4(a), the NC is trained based on the error between the reference and the output of the PE, and not the error between the reference and the plant output, as in the direct adaptive control configuration in Fig. 2(b). The previous values of the plant output y ( k ) are still necessary for the input vectors to the NC and PE. Example 3: Assume that y ( 9 ) was just read, p = 2, and q = 3. Also assume that ~ ( 8 1 ,( 7 ) , - . . y ( 5 ) and ~(81, y , u(7);.+,u(3) are available in memory, as well as r(9), r(8), and r(7). Therefore, the NC input vectors used to compute the three most recent control input values are available as follows:
~ ~ ( = [)r ( 7 ) ,y ( 6 ) , ~ ( 5 4 ,5 ) , 4 4 ) ,u(3)IT 6 ) xJ7)
x,(8)
= =

where BP stands for the routine's name, cp denotes the MNN to be updated, x is the input vector, t the corresponding target (desired) output, and o is the actual output.
A. Multileaming Emulator

Assume that at instant k the vectors x ~= ,[ y~ k - 1 ( i ) ; * . , y ( k - p - i), u(k - 1 - i), . * * u(k - 1 - i - q)lT, for i = O,-..,t - 1, are stored in memory. Clearly, the most recent data corresponds to i = 0, whereas higher values for i correspond to older data. Starting from the instant k , the algorithm for multiple training of the emulator can be summarized by: Step 1) READ y ( k ) Step 2) { emulator's training } i+t-1 REPEAT
Y1,i (PE(xE,i)
Aiyl,i)

[ r ( 8 ) ~ ( 7 1 , ( 6 1 ,( 6 ) , 4 5 ) u(4)IT ~ u [ r ( 9 ) , ~ ( 8 ) , ~ (477)),, u ( 6 ) , u ( 5 ) l T .

At time k + 1 = 9, a possible training procedure is shown in Fig. 4(b), where three learning iterations take place during a sampling period. The predicted plant output values are computed following the scheme in Fig. 4(a), enabling the utilization of x,(6) and x,(7) at time k + 1 = 9, whereas the most recent vector x,(8) is used in the conventional way.

W C P ,i , Aiy(k - i), xE, ,

i+i-1 UNTIL (i < 0) Step 3) { control input generation 1 x, + [ r ( k + 11, y ( k ) , * . *y, ( k

+ 1 - p ) , u(k -

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS

517

I);.., u ( k - q)IT or [ r ( k + l),r(k);.., r(k l);.., u(k - q)IT


+ -

+ 1 -p),u(k

u ( k ) cp,(x,> Step 4) APPLY u ( k ) to the plant and WAIT a T, Step 5 ) ( data shifting 1 i+t-1 REPEAT
xE,J
XE,l-l

Denoting by (pc + ( p E ) the MNN formed by the association between the NC and the PE, starting from instant k the algorithm for multiple learning of the controller becomes Step 1 ) READ y ( k ) Step 2) ( controller's training via the predicted error approach 1 i+t-1 REPEAT j+-0 REPEAT
(PC(xC,J+,)

i+i-1 UNTIL (i = 0); Step 6 ) (most recent data vector ) XE,O + [ y ( k ) , ( k - l),...,y ( k y p ) , u(k),-..,u(k - q)IT Step 7) k +-- k + 1 Step 8) GOTO Step 1

+1-

B. Multileaming Controller: The Direct Inverse Control Error Approach


= Assume that at instant k the vectors X L , ~ [ y ( k i);.., y ( k - p - i), u ( k - 2 - i);.., u ( k - 1 - q - illT, for i = l;.., t - 1, are available in memory. Starting from the instant k , the algorithm for the multilearning controller becomes: Step 1 ) READ y ( k ) Step 2) ( most recent data vector ) + [ y ( k ) , * * * , y - p ) , u ( k - 2),***, k - q (k u( - 1)IT Step 3) controller's training )

j+j+l UNTIL ( j > q ) (virtual input vector for the emulator) x E t [ y ( k - 1 - i);.., y ( k - p i),u;, uT;..,u~IT ( virtual output ) 4 X E ) Y* BP(cp, + ( p E , xc,,,AJr(k- i), A J * ) i+i-1 UNTIL (i = 0) Step 3) ( conventional training using most recent data ) BP(cp, + qE, A,r(k), A,y(k)) Step 4) ( data shifting 1 i+t+q+l REPEAT
+

'C,J

XC,J-l

i+t-1 REPEAT
'1,J
qC(x:,l)

BP(cp,,x:,~,A,& - 1 - i ) , A,ul,,) i t i - 1 UNTIL (i < 0 ) Step 4) ( control input generation ) x, = [ r ( k I ) , y(k);.., y ( k + 1 - p ) , u ( k I);.., u ( k - q)lT or [ r ( k I ) , r(k);-., r ( k + 1 - p ) , u ( k l),...,u ( k - q)lT u ( k ) +- cp,(x,) Step 5 ) APPLY u ( k ) to the plant and WAIT a T, Step 6 ) ( data shifting ) i c t - 1 REPEAT

+ +

i+i-1 UNTIL (i = 0) Step 5 ) ( control input generation ) = [ r ( k + 11, y(k);-., y ( k + 1 - p ) , u(k I),..., u(k - q)IT or [ r ( k + l ) ,r(k);.., r ( k + 1 - p ) , u(kl);.., u(k - q>lT u ( k ) 'pc(xc,J Step 6) APPLY u ( k ) to the plant and WAIT a T, Step 7) k + k 1 Step 8 ) GOTO Step 1
+

VI. EVALUATION A. Simulation Study In order to evaluate the performance of the proposed control system and training methods, results from simulation of a simple plant model are presented. Computer simulation is especially appealing when dealing with online training of neural-network-based control systems, since very time-consuming experiments are often necessary. Consider the continuous-time temperature control system
+

x'c,l

Xh,J-l

i+i-1 UNTIL (i = 0); Step 7) k + k + 1 Step 8) GOTO Step 1 C. Multileaming Controller: The Predicted Output Error Approach Assume that the data corresponding the t + q + 1 vec( tors x ~ =, [ r~ k - i ) , y ( k - 1 - i);.*,y(k - p - i), u ( k - 2 - i);.., u ( k - q - 1 - illT, or alternatively, = [ r ( k - i), r(k - 1 - i);.., r ( k - p - i), u ( k - 2 - i),..., u ( k - q - 1 - i>lT, available in memory at instant k. are

dY(t) dt

f(t) C

y -Y(t) o RC

(21)

where t denotes time, y ( t ) is the output temperature in "C, f ( t ) is the heat flowing inward the system, Y,, is the room temperature (constant, for simplicity), C denotes the system thermal capacity, and R is the thermal resistance

518

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6 , DECEMBER 1992

between the system borders and surroundings. Assuming that R and C are essentially constant, obtaining the pulse transfer function for the system in (21) by the step response criterion results in the discrete-time system Y ( k + 1)
=

4 T , ) Y ( k ) + b(T,)u(k)

(22)
--,
0

where k is the discrete-time index, u ( k ) and y ( k ) denote the system input and output, respectively, and Ts is the sampling period. Denoting by a and P some constant values depending on R and C , the remaining parameters can be expressed by
a ( T y )= C a T . Y

.
20

, 40

, 60

.
80

.
100

.
120

.
140

.
160

.
180

and

b(T,)

-(1 a

Fig. 5. Performance of the simulated system after learning.

(23) For the simulation results presented below, the system described in (21)-(23) was modified to include a saturating nonlinearity so that the output temperature cannot exceed some limitation. The simulated control plant is described by from 0 to 90 min, is called a trial. After a trial, the weights are conserved and a new trial starts. Fig. 5 shows the reference, and the plant input and output after good convergence was achieved for T, = 30 s. The performance of NC's carrying out a single learning iteration per sampling period is compared with the multiple-learning-iteration case in Figs. 6-8. The graphs show the total squared error per trial as a function of the number of trials. In Fig. 6 the NC is initialized in such a way to result in small error from the beginning. In the upper graph, training is performed once a period using the indirect adaptive control configuration. In the lower graph, the NC is updated 10 additional times per period by using the direct inverse control error approach. In both cases the sampling period is T, = 30 s. The proposed method improved performance as expected, since the NC is assumed to be relatively well trained from the outset and generalization is, thus, reliable. This fact suggests that this training method can be used for small adjustments of the NC near a good operating point, i.e., for fine tuning. The same is not true when the NC is in a naive state. In fact, experience indicates that when the weights of the NC are such that the plant output error is large, training exclusively based on the inverse control approach often leads to situations in which the output of the NC sticks at some value, resulting in poor performance. The performance of a randomly initialized NC carrying out one learning iteration per period via the indirect adaptive control configuration is shown in Figs. 7(a) and 8(a) for T, = 30 s. Since each simulation trial is equivalent to 90 min of real time, 50 trials would require 75 h or more than three days, and that is one of the reasons why most of the results presented here involve simulation rather than real experiments. In the lower graphs in Fig. 7 the sampling period was changed to 15 s (Fig. 7(b)) and 10 s (Fig. 7(c)), and an expected performance improvement was observed. On the other hand, in the remaining graphs in Fig. 8, the sampling period was fixed in 30 s, and 5 or 10 additional learning iterations per period were included, based on the proposed predicted output error approach (Figs. 8(b) and 8(c), respectively). The inclusion of only a few learning

+ [ I - a(Ts)lYo (24) where a(TS) and b(TS) are given by (23). The parameters for simulation are a = 1.00151E-4, P = 8.67973E-3, y = 40.0, and yo= 25.0 ("C), which were obtained from a real water bath plant. The plant input u ( k ) was limited between 0 and 5 , and it is also assumed that the sampling period is limited by T,
2

10 s.

(25)

With the chosen parameters, the simulated system is equivalent to a SISO temperature control system of a water bath that exhibits linear behavior up to about 70C and then becomes nonlinear and saturates at about 80C. Comparing (24) with (l), it is clear that P = 1 and Q = 0. Simple three-layer MNN's with 10 to 20 hidden sigmoidal neurons were chosen for the PE and NC, and convergence was obtained for several pairs(p,q), for p ranging from 1 to 3, and q from 0 to 2. The graphs presented in the following correspond to p = 1 and q = 0 (perfect matching). The networks were updated according the general rule

Awn

= wn+l

where the superscript indices denote learning iterations, w is a generic weight, J the error function to be minimized, 7 > 0 the learning rate, and a 2 0 the momentum term. Before the outset, the PE is roughly off-line trained, whereas the weights of the NC are randomly initialized. From the initial condition y ( 0 ) = Yo, the target is to t 30 follow a control reference set to 35.0"C for 0 II 60 min, 55.0"C for 30 min < t I min, and 75.0"C for 60 min < t 5 90 min. Each simulation cycle, for t ranging

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAJNED NEURAL CONTROLLERS

519

20

_---lz
D
$ loI

15 -\
L7 -

(a)

-- -.-__ -.-\
\

__
.

- (4
\

J
0 U

- ------.
.
I .

__

o i

obtained for p = P = 1 and q = Q = 0. What happens when one does not have accurate estimates of the order of the plant is a question that arises naturally. In the simulations carried out, training has succeeded for several combinations of different p and q values, but much caution is recommended when generalizing results. Small changes in the initial conditions, references, structures of the neural nets, neuron activation functions, and so forth too often produce dramatic consequences in the performance results. Various combinations for p and q were tested under different conditions, and some results are illustrated in Fig. 9. The perfect matching case is shown in Fig. 9(a), whereas Figs. 9(b) and 9(c) correspond to mismatching cases for p = 3, q = 0, and p = 2, q = 2, respectively. Although in some cases, as in Fig. s ( ~ )conver, gence could not be obtained, good results were achieved for several pairs ( p , q ) near the true values of P and Q.
B. Experimental Results

10

20

30

40

50

Trials

Fig. 7. Neural controller randomly initialized performing one learning iteration per sampling period. (a) T, = 30 s. (b) T, = 15 s. (c) T, = 10 s.

12000 10000

e
L

8000 6000
4000

lz
U

:
cn
0

2000 0

Trials

Fig. 8. Neural controller randomly initialized for T, = 30 s. (a) Only conventional on-line training. (b) Conventional training plus 5 learning iterations/period based on the predicted output error approach. (c) Conventional training plus 10 learning iterations/period.

iterations per period resulted in sharp convergence speedup, greatly reducing the total error. The time spent for updating the neuromorphic structures is roughly proportional to the number of learning iterations, and basically depends on the structure and number of weights of the MNN being considered. In the simulation results presented here, 11 learning iterations of a three layer NC with 80 weights took about 11 X TL = 390 ms in a personal computer, far less than the sampling period in many practical control applications. As mentioned above, the results in Figs. 5-8 were

Experiments using a real water bath plant on which the simulation model was based were carried out. The plant consists of a laboratory 7-L water bath as depicted in Fig. 10(a). A personal computer reads the temperature of the water bath through a link consisting of a diode-based temperature sensor module (SM) and an 8-b A / D converter. The plant input produced by the computer is limited between 0 and 5, and controls the duty cycle for a 1.3-kW heater via a pulse-width-modulation (PWM) scheme. For plant order estimates given by p = 1 and q = 0, the PE and NC were designed as four-layered MNNs with m = 2 inputs normalized between -1 and +1, 6 sigmoidal neurons in each of the hidden layers, and linear output units. Before starting the real control operation, a train of pulses was applied to the plant, and the corresponding input-output pairs were recorded. The PE was then roughly trained with 10 sets of data chosen in order to span a considerable region of the control space. The NC was randomly initialized with small weights in order to avoid saturation of the sigmoidal neurons. The sampling period T, was fixed in 30 s, and the control reference for each trial was set as the same as in the simulation study. During each sampling period, the PE was updated 15 times, whereas the NC performed 11 learning iterations (10 iterations based on one of the proposed multiple learning approaches, and 1 iteration corresponding to the indirect adaptive control scheme). During the first 10 trials, only the predicted output error approach was used for the NCs multiple learning. For the subsequent trials, we used a combination of 5 iterations based on the direct inverse control error approach, followed by 5 iterations based on the predicted output error approach. The learning parameters were chosen as 77 = 0.1 and a = 0.2 at the outset, and were reduced heuristically as the total squared error per trial decreased, in order to improve convergence. The resulting performance after 34 trials is shown in Fig. 10(b). Good results were also obtained for several different initial conditions and refer-

520

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 39, NO. 6, DECEMBER 1992

16000

-,

ences, indicating good generalization of the NC and PE over the control space. In Fig. lO(c), the initial temperature was set to Yo = 20.0"C, and the control reference to 40.0"C for 0 5 k s 60, [40.0 + ( k - 60)0.5]"C for 60 < k s 120, and 70.0"C for 120 < k I 180. Artificial disturbances corresponding to measurement errors in the plant output were added for k = 50 (+5.0C) and k = 150 ( - 5.OoC), and fast adaptation was observed.
VII. CONCLUSION

10

20

30

40

50

Multilayer neural controllers are able to implement effective nonlinear adaptive control. However, the usually Fig. 9. Effects of mismatching between the estimates and the optimal long training time they require discourages industrial order of the plant model. (a) Perfect matching for p = P = 1, q = Q = 0. (b) Mismatching with convergence for p = 3, q = 0. (c) Mismatching practical applications. In this paper, in order to reduce without convergence for p = q = 2. significantly the training time, new on-line training methods for multilayer NC's were proposed and described in algorithmic form. The basic idea is to perform several training iterations during a single sampling period, in such Computer a way to improve control performance effectively. The direct inverse control error approach relies on generalization ability, and thus, performs well when the NC has been already reasonably trained. Such an approach is, thus, useful for fine tuning. On the other hand, the predicted output error approach directly minimizes the control error and results in great convergence speedup even when the NC is randomly initialized. The proposed heater training methods provide an effective way to reduce the number of sampling periods until good convergence is achieved. 80, Hybrid training methods combining the proposed approaches can be promptly devised, as well as extensions to multiple-input-multiple-output systems. Significant performance improvement can be achieved with only a few additional training iterations per period. In a reported simulation example, the mere inclusion of 10 learning iterations per sampling period resulted in more than five20-1 . , . , . , . , . , . , . , , , . fold convergence speedup. The faster the MNN imple0 20 40 60 80 100 120 140 160 180 mentation, the larger the number of possible training u(k) iterations in a sampling interval; the larger the number of . , . , e . , . p. , . , . , 0 20 40 60 80 100 120 140 160 180 learning iterations and the number of training data pairs, k the better the local generalization and the expected performance of the neurocontrol system. During the design of the control system, it was assumed that good estimates for the order of the plant were available. Simulation results seem to indicate that the proposed NC and PE are relatively robust to small mismatching between the estimates p and q and optimal values. Although theoretical results are still to be developed, the possibility of having neural structures robust to a - ! . , . , . , . , . , . , . , . , . order mismatching is encouraging, since traditional adap0 20 40 60 80 100 120 140 160 180 tive control techniques are severely affected by model 5 mismatching. When devising the NC, the plant was as__. sumed to be essentially invertible. Such a strong hypothe0 20 40 60 80 100 120 140 160 180 sis is greatly relaxed by the use of the neuromorphic PE, k which allows simple computation of the sensitivity of the (C) error function with respect to each weight of the NC. Fig. 10. Experimental results. (a) Diagram of the water bath control system. (b) System performance after enough learning. (b) System sub- Many of the open problems are common to virtually all jected to spikelike measurement errors. practical applications of MNN's, and include the nonexisTrials

TANOMARU AND OMATU: PROCESS CONTROL BY ON-LINE TRAINED NEURAL CONTROLLERS

521

tence of systematic methods for quasi-optimal specification of the internal structure on the MNNs and learning rates, and suitable initialization procedures.

dynamical systems using neural networks, IEEE Trans. Neural no pp 4-27 1990 Networks

REFERENCES
R. P. Lippmann, An introduction to computing with neural nets, IEEEASSPMag., vol. 4, no. 2, pp. 4-22, 1987. D. E. Rumelhart, J. L. McClelland, and the PDP Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vols. 1 and 2. Cambridge, MA: MIT Press, 1986. P. J. Werbos, Backpropagation and neurocontrol: A review and prospectus, in Proc. IJCNN89, Washington, DC, 1989, pp. I209-1-216. A. G. Barto, Connectionist learning for control, in W. T. Miller 111, R. S. Sutton, and P. J. Werbos, eds., Neural Networks for Contrql. Cambridge, MA: MIT Press, 1990, pp. 5-58. K. J. Astrom and B. Wittenmark, Adaptwe Control. Reading, M A Addison-Wesley, 1989. D. Psaltis, A. Sideris, and A. A. Yamamura, A multilayered neural network controller, IEEE Control Systems Mag., vol. 8, no. 2, pp. 17-21, 1988. M. I. Jordan, Generic constraints on underspecified target trajectories, in Proc. IJCNN89, Washington, DC, pp. 1-217-1-225, 1989. J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. National Academy Sci., vol. 79, pp. 2554-2558, 1982. K. Funahashi, On the approximate realization of continuous mappings by neural networks, NeuralNetworks, vol. 2, pp. 183-192, 1989. K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are qniversal approximates, Neural Networks, vol. 2, pp. 359-366, 1989. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in D. E. Rumelhart and J. L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. Cambridge, MA: The MIT Press, 1986, pp. 318-362. R. Hecht-Nielsen, Theory of the backpropagation neural network, in Proc. IJCNN89, Washington DC, 1989, pp. 1-593-1-605. J. Tanomaru and S. Omatu, Towards effective neuromorphic controllers, in Proc. IECONPI, Kobe, Japan, 1991, pp. 1395-1400. , Efficient on-line training of multilayer neural controllers, to appear in J . SICE, 1992. K. Narendra, Adaptive control using neural networks, in W. T. Miller 111, R. S. Sutton, and P. J. Werbos, Eds., Neural Networks for Control. Cambridge, MA: MIT Press, 1990, pp. 115-142. K. Narendra and K. Parthasarathy, Identification and control of

Julio Tanomaru (S90) was born in 1964. He received the B.E. degree in electronics engineering in 1986 from the Instituto Tecnoldgico de Aeroniutica, Brazil, and the M.E. degree in Information Science from the Unibersity of Tokushima, Japan, in 1992. From 1987 to 1989 he worked on the design of hardware and software for digital communication systems. From 1989 to 1992 he was sponsored by a scholarship from the Japanese govemment. He is currently a doctoral candidate and a Research Associate in the Department of Information Science and Intelligent Systems at the University of Tokushima. His research interests include neural networks and applications to adaptive control and optimization problems, cognitive science, genetic algorithms, and parallel computing.

Sigeru Omatu (M76) was born in Ehime, Japan, on December 16, 1946. He received the B.E. degree in electrical engineering from the University of Ehime, Japan, in 1969, and M.E. and Ph.D. degrees in electronics engineering from the University of Osaka Prefecture, Japan, in 1971 and 1974, respectively. From 1974 to 1975 he was a Research Associate, from 1975 to 1980 a Lecturer, from 1980 to 1988 an Associate Professor, and since 1988 a Professor at the Universitv of Tokushima, Jauan. From November 1980 to February 1981 and frdm June to September 1986, he was a Visiting Associate on Chemical Engineering at the California Institute of Technology, Pasadena. From November 1984 to October 1985 he was a Visiting Researcher at the International Institute for Applied Systems Analysis, Austria. His current interests center on neural networks and distributed parameter system theory. Dr. Omatu received the Excellent Young Researcher Award from the Society of Instrument and Control Engineers of Japan in 1972 and a Best Paper Award from the Institute of Electrical Engineers of Japan in 1991. He is an Associate Editor of the Intemational Joumal of Modeling and Simulation (U.S.) and of the IMA Joumal of Mathematical Control and Information (U.K.). He is a coauthor qf Qistributed Parameter Systems: Theory and Applications (Oxford University Press).

Das könnte Ihnen auch gefallen