Beruflich Dokumente
Kultur Dokumente
A
i
2
O
i
4
where the A nodes produce their outputs proportionally to their contribution in
predicting the stress, while the O nodes inhibit the output of E if necessary.
As it is shown in Figure 1, except the Thalamic signal going directly to the
Amygdala, the Amygdala, and the Orbitofrontal cortex receive the same input signals.
But the main difference between them is the learning rules.
The connection weights V
i
are adjusted proportionally to the difference between the
reinforcement signal and the activation of the Anodes. The a term is a constant used to
adjust the learning speed:
DV
i
a max 0; S
i
stress 2
A
j
_ _ _ _ _ _
5
As mentioned before, the task of the Amygdala is learning the associations between
the sensory and the emotional input to generate an output. But the equation (5) is
mainly different from similar associative learning systems, because this weight
adaptation rule is monotonic, i.e. the weights V cannot be decreased. At rst, it may
seem as a drawback of learning rule, but this adaptation rule has biological reasons.
According to what occurs in Amygdala, once an emotional reaction is learned, this
should become permanent. The Orbitofrontal cortex inhibits inappropriate reactions of
Amygdala.
The Orbitofrontal cortex learning rule is very similar to the Amygdala rule:
DW
i
bS
i
E 2stress 6
The reinforcement signal for the O-nodes is dened as difference between model
output E and the stress signal. In other words, the O-nodes compare expected and
received reinforcement signal, and inhibit output of the model if there is a mismatch.
Learning based
emotional
controller
339
The main difference between adaptation rule of Orbitofrontal cortex and Amygdala,
is that the Orbitofrontal connection weight can be increased and decreased as needed
to track the required inhibiting of Amygdala. Parameter b is another learning rate
constant.
As discussed, BELBIC learns from its emotional signal and produce its output
based on sensory inputs and connection weights. In Shahmirzadi and Langari (2005),
the stability of BELBIC is demonstrated by using cell-to-cell mapping method.
3. Model free controller design
In order to accelerate the learning process and avoid making the system unstable, we
proposed a newapproach. In Figure 2, a owdiagramof this approach for training model
free controllers for systems with unstable equilibrium has been represented. This
approach consists of two parts, imitative learning and performance enhancement. In the
imitative learning phase, a simple stabilizing controller used as the main control system
and BELBIClearns to imitate the behavior of this controller. Inother words, this controller
acts as a mentor and BELBIC tries to produce the same control force as the mentor
according to the observed states. In the second phase, after BELBIC imitatively learned
to stabilize the system from initial controller, the controller is replaced with BELBIC.
At this phase, control objectives will change and BELBIC tries to improve performance
of the control system instead of imitating the initial controller. Owing to the capability
of learning, BELBIC will learn to enhance the systems performance quickly.
3.1 First benchmark: inverted pendulum system
As mentioned before, the rst test bed for evaluating the controller performance is an
inverted pendulum which is a well-known SIMO system. Controlling the inverted
pendulum is a challenging and interesting task. The control task is tracking the
reference signal and stabilizing the pendulum. The system which is used to evaluate
the controller performance is a nonlinear model of a laboratorial inverted pendulum
system provided by Feedback Ltd This pendulumsystem consists of a cart, a rope, and
a load. The load is regarded as a material particle with a mass of m. The rope is
considered as an inexible rod with length of l, which its mass is negligible in
comparison with the load mass. The cart with mass of M moves on a straight rail.
A schematic of pendulum system is shown in Figure 3. The state equations of utilized
system are (Feedback Instrument Ltd, 2002):
M m x 2l sinu
00
F 2T
c
M m l cosu
00
V 2 M m g
J
u F 2T
c
l cosu Vl sinu 2D
p
7
where T
c
is the friction of moving cart, D
p
is the friction moment of angular movement
of pendulum load. The reaction force of rail is also denoted by V. In addition, the inertia
moment of cart and pendulum is denoted by J.
3.2 Second benchmark: Chuas circuit
The second test bed for evaluating emotional controller performance is a chaotic
system (Chuas circuit). Chuas circuit was the rst electronic dynamical system
IJICC
3,2
340
Figure 2.
Flowchart for model free
controller design
Observe control signal of the
mentor + sensory input
Are conditions for
imitative learning part
satisfied?
Use a model-based controller
as mentor
Switch controller from mentor
to BELBIC
Generate stress signal for
imitative learning
No
Yes
Observe sensory input and
generate stress signal
according to the performance
measures
Imitative
learning
phase
Performance
enhancement
phase
Update parameters of belbic
according to emotional stress
Update parameters of BELBIC
according to emotional stress
Learning based
emotional
controller
341
capable of generating chaotic phenomena in the laboratory. The Chuas circuit has
been shown in Figure 4, and the state equation of the system is ( Jiang et al., 2002):
_
X AX Bu gX Dw
Y CX
X
v
1
v
2
i
_
_
_
_
A
2ac a 0
1 21 1
0 2b 0
_
_
_
_
B
b
1
b
2
b
3
_
_
_
_
gX
2av
3
1
0
0
_
_
_
_
8
This system has three equilibrium points, and the equilibrium point of X
_
2c
p
0 2
2c
p
T
is unstable.
3.3 Proposed controllers for the inverted pendulum system
Owing to nonlinearity of systems state equations and nonlinear properties of driving
motor and friction, designing a model-based controller is a hard task. We used BELBIC
Figure 3.
Schematic of pendulum
system
F Trolley
Load
mg
(M)
x, dx/dt, d
2
x/dt
2
d
2
/dt
2
, d/dt
Figure 4.
Chuas circuit
R
1
C
2
C
1
R
2
L
Chuas diode
i
V
2
V
1
IJICC
3,2
342
as a model free controller to control the inverted pendulum. The main challenge in
using BELBIC as a model free controller in unstable systems or stable systems with
unstable equilibrium point such as our test bed is the learning phase at the beginning.
As the BELBIC has no information of systems dynamics, performance of the
controlled system may seem to be awful at the beginning of learning process and the
pendulum falls down. BELBIC has fast learning ability; and theoretically in the short
time it should learn the proper control action according to its sensory inputs and
emotional stress. But in this task the pendulum angle is very sensitive to the control
signal, so any wrong changes in control signal make the system unstable or oscillating.
First, we used BELBIC as the only controller of the system. It is possible for BELBIC to
learn the proper control strategy, but in our simulation, we nd that this process will
too long or probably impossible in real applications.
According to the idea of hierarchical controller structure, to design a controller to
satisfy various objectives, at rst, it is assumed that the objectives can be decoupled
and then a separate controller is designed to satisfy each objective. After that, outputs
of these controllers must be fused together. Figure 5 shows the proposed BELBIC
structure. As there are two major objectives, position tracking and pendulum angle
regulating, two BELBICs are employed. The cart position error and its rst derivation
are dened as sensory signals for one of BELBICs and the pendulum angel and its rst
derivation are for the other. In most of the previously reported structures of BELBIC,
they have only one neuron, because the sensory input signal was one dimensional.
In our structure, as each BELBIC has two sensory inputs, they must have more than
one neuron and in this task two neurons seems to be adequate.
The emotional stress signal which will be described, couples the two separate
controllers. Also by employing this kind of stress signals there will be no need to use
complex fusion block to combine the output of controllers, and just a summation
operator is adequate (Shahmirzadi et al., 2003). Meanwhile, the computational cost of
output fusion is reduced to the cost of fusing some main and auxiliary objectives in
Figure 5.
Diagram of the proposed
BELBIC controller
Stress
generator 1
Stress
generator 3
(FIS)
Stress
generator 4
(FIS)
FIS
FIS
Pendulum
system
BELBIC
BELBIC
Model-based
controller
(mentor)
FIS
Stress
generator 2
+
Learning based
emotional
controller
343
stress generator block. Also to change the control objectives, and switching from
imitative learning to normal learning, there is no need to change the controller
structure and only changing the emotional stress signal is enough.
Stress generation. As stated before, BELBIC can show various behaviors by
applying different stress signals on it. Therefore, in order to satisfy different control
objectives, proper stress signal must be dened based on each objective. The ability of
achieving more different objectives can be obtained by dening different stress signals.
Control objectives change at each learning phase:
(1) Imitative learning phase. Producing similar control force and stabilizing the
system.
(2) Performance enhancement. Reducing the position and angle errors of the
pendulum and hold it at its equilibrium point.
As a result, it is necessary to dene appropriate stress signals to satisfy each objective
base on its importance.
Stress generation for imitative learning phase. In imitative learning, the objective is
that BELBIC produces a similar control signal to the initial controller. Thus, reducing
the difference between these two control signals is the main goal in this part and there
is no more control objectives. Consequently, the emotional stress signal is consists of
two signals, error of control signal and rst derivation of this error:
Stress w
1
e
u j j w
2
_ e
u j j w
3
e
u
_ e
u
9
The reason to add _ e
u
in above sum is that in sometimes e
u
may become zero,
while these control signals may have completely different behaviors. Also, by adding
the last term, the designer can expect that transition of errors to zero become faster.
Based on the degree of importance of each of these measures ( e
u j j; _ e
u j j; e
u
_ e
u
), a designer
can tune the weights elaborately. Moreover, some consideration must be taken to
the account. For example, if w
3
is relatively large, BELBIC output should oscillates
undesirablyor insome situations the Stress becomes negative whichweakens the learning
phase. These weights can be tuned using learning algorithms such as reinforcement
learning (Sutton and Barto, 1998) whether there is another source of feedback which
evaluates the closeness of behavior of the imitator and mentor controllers.
Stress generation for performance enhancement phase. After BELBIC imitatively
learns the control action from initial controller, it becomes the main controller of the
system and the initial controller is replaced by it. At this time, the control objectives
have been change and reducing position tracking error and angle error become new
control objectives. Therefore, the stress signal must be modied.
To satisfy more than two objectives, more complicated combination of stress
signals which are associated with each objective is necessary. To generate the proper
stress signal for all objectives, at any time, the more important objective must be
attended more than the others. To generate this coded attention mechanism, we used
two approaches. The rst is the nonlinear combination of stresses and the second is
using linguistic rules.
Nonlinear combination of signals for stress generation. This stress signal is similar
to which is used in our previous work for control of an overhead crane (Arami et al.,
2008), but this work is somewhat different. In order to enhance the behavior of system
the major objectives are dened. Moreover, some extra objectives should be attended to
IJICC
3,2
344
improve the performance. Tracking error of the cart position and error of pendulum
angle are the main concerns which needed to be decreased as much as possible.
To achieve these objectives (reduction the position and angle errors), weighted square
of the rst and absolute value of second one are summed up to generate the rst part of
stress signal.
One of the extra objectives is to avoid collision with edges of the rail which leads to
breaking the operation. To impose this behavior to the cart, closeness to the edges of
rail must be punished via stress signal. Therefore, to generate the second part of stress
a dead-zone function and a squared function are employed which generate extreme
stress if the cart gets close to the edges. The dead-zone inactivates this stress part when
the cart is far from the edges.
Another important index which must be considered in every control tasks is energy
of control force and its variations. The amplitude of control force and its derivation are
squared and then their weighted sum is employed to generate a new part of stress
signal. This part of stress then multiplied with a monotonically decreasing function of
sum of two previously mentioned parts of stress. This multiplication leads to such a
behavior that when the stresses of previous parts are small, BELBIC tries to decrease
the control forces. Also when these stresses are signicant, the limiting of control force
is relaxed to increase the possibility of fast responses. The stress generator diagram is
shown in Figure 6.
The rst input is the error of pendulum angle the second one is the error of cart
position and the third is the position of cart. After passing the third input through a
dead-zone block this signal represents the closeness to the edges of the rail, and stress
of contact with end of rails is increased when it is happened. The forth input is the
control signal which is passed through a square function and the energy of it is
considered as a stress which is multiplied with monotonically decreasing function of
stress from the rst three inputs. It means that when the sum of rst part of stresses
(inputs 1, 2, and 3) is high, the forth one is suppressed by them. This nonlinear fusion of
Figure 6.
Internal stress generator
mechanism
1
2
3
4
|u|
|u|
2
|u|
2
|u|
2
<-
<-
+
+
+
+
+
+
In 1
In 2
In 3
In 4
Abs1
Math
function 5
Dead zone 2
Dead zone 1
Gain 2
Gain 4
Gain 5
sqrt
1
Out 2
Math
function 6
Math
function 7
Math
function 8
Fon 1
Product 1
2/(1 + u (1))
2
Note: Nonlinear combination of objectives
Learning based
emotional
controller
345
signals to generate emotional stress can be considered as a coded attention to most
important part of stresses with respect to its operational conditions and environmental
effects.
Fuzzy stress generation. As mentioned before, there are some control objectives
which must be attended according to their importance and degree of satisfaction. The
linguistic rules can implement this attention mechanism, especially when there is a
linguistic knowledge about the behavior of system and the ascendency of objectives.
Thus, rst the major and extra objectives in the control task are dened.
The objectives are:
(1) Major objectives. Tracking the desired position of the cart and xing the
pendulum vertically.
(2) Extra objective. Avoid reaching edges of the rail, minimizing energy of control
force and its variations.
The concerns are similar to the previous part, i.e. satisfying major objectives are more
important than extra objectives. Meanwhile, holding pendulum at its equilibrium is
more important than reducing tracking error in the major objectives. Moreover, when
BELBIC learned to track the desired position while holding pendulum at its
equilibrium point, it must try to reduce the control force and its variations. The
variations and especially the frequency of variations of control force must be limited,
for actuator considerations.
To generate the stress signal and code the attention mechanism, linguistic rules are
used and then they are imported to Sugeno FIS (Takagi and Sugeno, 1983). Using this
method to generate stress signal makes BELBIC capable to attend to important parts of
stresses at any time and situation.
As it is mentioned before, four effective variables are employed to generate the
stress signal. The FIS is designed with the following parameters:
(1) Inputs: errors of the cart position, error of pendulum angle, control force, rst
derivation of the control force.
(2) Number of rules: 16.
(3) FIS type: Sugeno FIS.
(4) Output: emotional stress for BELBICs.
Figure 7 shows the resulted fuzzy surfaces for stress generation. It can be seen that
holding the pendulum at its equilibrium is slightly more important than reducing
tracking error (Figure 7(a)), or reduction of control force variations is less important
than decreasing tracking error (Figure 7(c)).
3.4 Proposed controllers for Chuas circuit
Like the inverted pendulum system, the Chuas circuit has an unstable equilibrium
point. The goal is to regulate the variable around the equilibrium point. The BELBIC
structure used in this task is similar to which is used for controlling the pendulum
system (Figure 5). As there are two major objectives, regulating capacitors voltage (v
1
)
and self inductions current (i) two BELBICs are employed. The voltage error and its
rst derivation are dened as sensory signals for one of BELBICs and the current and
its rst derivation are for the other. Like previous part, each BELBIC has two neurons.
IJICC
3,2
346
For this control task, control objectives at each learning phase are:
(1) Imitative learningphase. Producingsimilar control force andstabilizingthe system.
(2) Performance enhancement. Regulating voltages and current and hold it at its
equilibrium point.
The stress generation for imitative learning phase is the same as the method used for
controlling inverted pendulum system which was dened by equation (9).
For the second phase of learning, the stress signal is generated by using fuzzy rules.
First, the major and extra objectives in the control task are dened as follow:
(1) Major objectives. Regulating capacitor voltage and current and x them at the
equilibrium point.
(2) Extra objective. Minimizing energy of control force and its variations.
Figure 7.
Fuzzy surfaces for stress
generation
0
0.05
0.1
0
0.005
0.01
0.02
0.04
0.06
0.02
0.04
0.06
0.08
0.1
Position-error
Angle-error
S
t
r
e
s
s
0.02
0.04
0.06
0.08
S
t
r
e
s
s
0.02
0.01
0.03
0.04
0.05
0.06
S
t
r
e
s
s
S
t
r
e
s
s
(a)
0
0.05
0.1
1
0
1
Position-error
Control-force
(b)
0
0.05
0.01
1
0
1
Angle-error
Control-force
(d)
0.02
0.04
0.06
0.08
S
t
r
e
s
s
0
1
1
1
0.5
0.5 0
1
Control-force
(f)
0.02
0.04
0.06
0.08
S
t
r
e
s
s
0
0.05
0.01
1
0
1
Angle-error
Derivation -of-
control-force
Derivation -of-
control-force
(e)
0
0.05
0.1
1
0
1
Position-error
Derivation-of-
control-force
(c)
Learning based
emotional
controller
347
The concerns are similar to the previous part, i.e. satisfying major objectives are more
important than extra objectives. Moreover, when BELBIC learned to hold these values
at the equilibrium point, it must try to reduce the control force and its variations. To
generate the stress signal and code the attention mechanism, linguistic rules are used
and then they are imported to Sugeno FIS. The fuzzy rules are the same as the one used
to control inverted pendulum system. The only difference is the inputs of the FIS,
which are: errors of capacitor voltage (v
1
), error of the current (i), control force, and rst
derivation of the control force.
3.5 Switching between stress signals at each learning phase and controllers
As we observed in experimental results, hard switching between controllers and
changing stress signals makes the BELBICs become unstable. Thus, instead of hard
switching, a soft switching scheme must be employed, and the BELBIC control system
must gradually replace the initial controller and at the same time its emotional stress
must be gradually changed. To do this, we employed a FIS to make soft switching
which it is a common solution for soft switching. The human linguistic rules can be
imported to Sugeno FIS (Takagi and Sugeno, 1983) easily. We used 11 fuzzy rules for
soft switching. Figure 8 shows the fuzzy surface for this task. The inputs of this fuzzy
system are the two-mentioned stresses (for imitative learning and improving
performance), the error in imitative learning phase (difference between initial controller
output and BELBIC output). The mentioned fuzzy switch is used for switching
between both controllers and stresses.
4. Results
In this part, result of using emotional controller and model-based controller for the
two-mentioned test beds are presented.
Figure 8.
Fuzzy surfaces for stress
and controller switching
20
30
40
50
0
0.05
0.1
0.02
0.04
0.06
0.08
Time Imitative-stress
S
t
r
e
s
s
0
0.05
0.1
0.02
0.04
0.06
S
t
r
e
s
s
0.01
0.02
0.03
S
t
r
e
s
s
20
30
40
50
Time
Performance-stress
0
0.05
0.1
20
30
40
50
Time
Error
IJICC
3,2
348
4.1 First benchmark, inverted pendulum
To validate the result of proposed controller, the results are compared with the original
supplied controller, which consists of two proportional integral derivative (PID)
controllers and a nonlinear compensator (Feedback Instrument Ltd, 2002). The initial
controller for imitative learning phase (the mentor) is mentioned original controller.
As it is seen from Figure 9, without employing imitative learning, BELBIC not learned
the proper control signal in more than 150 second of training. Also, it can be seen that
the pendulum falls down many times. In addition, it must be noticed that the
comparison of the proposed model free controller with complicated model based ones
which designed based on exact mathematical model of the system is not a fair
comparison. Therefore, the proposed controllers are only compared with the originally
supplied controller which plays the role of mentor in imitative phase of learning to
show the effect of enhancement phase of learning. Also, in order to assess the effect of
different stress generation mechanisms which are different coded attention
mechanisms to the objectives, the nonlinear and fuzzy stress generations are compared.
Figure 10 shows the originally supplied PID controller responses. Figure 11 shows
the BELBIC controller with nonlinear stress function and Figure 12 shows BELBIC
with fuzzy stress. As it is seen BELBIC can imitate the behavior of original controller in
about ten second from starting time completely. After that, based on fuzzy switch
structure, from 30 to 50 second, the both controller are controlling the system and after
it BELBIC controls the system individually. It is clear that after imitative learning,
BELBIC performance in reducing tracking and angle error is better, regardless of the
method for stress generation.
In order to evaluate the ability of controller to reject disturbances, a random voltage
produced by a Gaussian distribution with zero mean and 0.1 of variance is applied to
the motor in some instances. The time of applying this voltage is random variable
which obtained from a uniform distribution. The mentioned disturbance is applied
eight times from the 55th seconds until the end of simulation. The results of original
Figure 9.
Employing BELBIC
without imitative learning
0 50 100 150
10
8
6
4
2
0
2
4
6
P
e
n
d
u
l
u
m
a
n
g
l
e
Learning based
emotional
controller
349
Figure 10.
Results of originally
supplied controller
20 40 60 80 100 120 140
20 40 60 80 100 120 140
20 40 60 80 100 120 140
0.5
0
0.5
0.05
0
0.05
0.5
0
0.5
C
a
r
t
p
o
s
i
t
i
o
nDesired position Actual position
P
e
n
d
u
l
u
m
a
n
g
l
e
C
o
n
t
r
o
l
f
o
r
c
e
Note: Double PID
Figure 11.
Results of BELBIC with
nonlinear stress
generation unit
20 40 60 80 100 120 140
0.5
0
0.5
20 40 60 80 100 120 140
0.05
0
0.05
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t
p
o
s
i
t
i
o
nDesired position Actual position
P
e
n
d
u
l
u
m
a
n
g
l
e
C
o
n
t
r
o
l
f
o
r
c
e
IJICC
3,2
350
PID controller and BELBIC with the two-mentioned stress generation functions are
shown in Figures 13-15.
As it can be seen, BELBIC clearly shows superior performance in tracking and
disturbance rejection which is the results of its learning capability.
To have a meaningful comparison these controllers, four performance measures are
dened as follow and calculated for all the mentioned control systems, originally
supplied controller and BELBIC with two method of stress generation. As the
disturbance applied in randomly selected times, the experiments carried out 20 times
and the statistical moments of the following parameters (mean and standard deviation)
are calculated:
(1) Integral absolute error (IAE) (for cart position and pendulum angle).
(2) Integral of absolute values of control force (IACF).
(3) Integral of absolute values of derivation of control force (IADCF) (shows the
uctuations of the control force).
These performance measures are calculated for the mentioned controllers, in normal
operation and without applying disturbance and the results are demonstrated in
Table I.
From Table I, it is seen that BELBIC shows the fast learning ability for tracking.
Also, the control force signal which is penalized by stress signal is lower than control
force in other controllers and has less oscillation. Moreover, employing FISs for stress
generation leads to better result than the nonlinear stress generation function in
BELBIC.
Figure 12.
Results of BELBIC with
fuzzy stress generation
unit
20 40 60 80 100 120 140
0.5
0
0.5
20 40 60 80 100 120 140
0.05
0
0.05
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t
p
o
s
i
t
i
o
nDesired position Actual position
P
e
n
d
u
l
u
m
a
n
g
l
e
C
o
n
t
r
o
l
f
o
r
c
e
Learning based
emotional
controller
351
In the presence of disturbance, the above-mentioned measures are calculated for the
controllers and the results are presented in Table II. As it is seen, in presence of
disturbance BELBIC (regardless of the method for stress generation) again shows far
better performance than the model-based controller, although its performance
Figure 13.
Results of originally
supplied controller (double
PID) in presence of
disturbance
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t
p
o
s
i
t
i
o
n
Desired position
Actual position
20 40 60 80 100 120 140
0.05
0
0.05
P
e
n
d
u
l
u
m
a
n
g
l
e
Figure 14.
Results of BELBIC with
nonlinear stress
generation function in
presence of disturbance
20 40 60 80 100 120 140
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t
p
o
s
i
t
i
o
n
Desired position Actual position
0.05
0
0.05
P
e
n
d
u
l
u
m
a
n
g
l
e
IJICC
3,2
352
decreases slightly in comparison with normal operation. But it shows better
disturbance rejection and robustness. It can be seen that the fuzzy stress resulted in
more robust results and better performance.
4.2 Second benchmark, Chuas circuit
In this part, BELBIC is used as a controller to stabilize a chaotic system (Chuas circuit)
at its unstable equilibrium point. Chuas circuit controlled by the state PI feedback
controller is given by Jiang et al. (2002):
_
X AX g X l B KX K
_
t
0
CX 2CX
s
dt
_ _
10
where X
s
is the equilibrium point.
Figure 15.
Results of BELBIC with
fuzzy stress generation
unit in presence of
disturbance
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t
p
o
s
i
t
i
o
n
Desired position Actual position
20 40 60 80 100 120 140
0.05
0
0.05
P
e
n
d
u
l
u
m
a
n
g
l
e
Controller structure (no-disturbance) IAE (position) IAE (angle) IACF IADCF
BELBIC fuzzy stress 3.301 0.367 9.432 9.262
BELBIC nonlinear stress 3.480 0.398 11.916 12.231
Double PID 3.925 0.457 12.219 14.353
Table I.
Performance measures
for various controllers
without disturbance
IAE
(position)
IAE
(angle) IACF MADCF
Controller structure
(in presence of disturbance) E STD E STD E STD E STD
BELBIC fuzzy stress 5.699 0.915 0.476 0.174 73.247 3.152 151.26 6.245
BELBIC nonlinear stress 6.577 1.139 0.676 0.183 76.834 4.678 157.53 5.569
Double PID 11.725 2.141 1.692 0.371 82.705 3.247 160.58 7.831
Table II.
Performance measures
for various controllers in
presence of disturbance
Learning based
emotional
controller
353
The system with the following parameters is used in the simulations:
a 10 b 16 c 20:143 B
0
0
1
_
_
_
_
D
d
0
0
_
_
_
_
C
0 1 0
_
K
21:7714 20:2296 24:640
_
k 26:9785
11
The same strategy is applied here for training BELBICand there are two learning phases,
imitative learning from the mentioned PI controller and performance enhancement.
An external disturbance is applied to the equation which is a step change at randomly
selected time with. Magnitude of the disturbance (d) is produced by a Gaussian
distribution with zero mean and 0.1 of variance. In Figure 16, voltages and current are
shown at presence of disturbance. As the disturbance applied in randomly selected times,
the experiments carried out 20 times and the statistical moments of the performance
measures (mean and standard deviation) are calculated and presented in Table III. The
results showthat the proposed model free controller can holdthe systemat its equilibrium
point with less energy and reject the disturbance more quickly, with lower control force.
Figure 16.
Regulation of Chuas
circuit in presence of
disturbance
0 10 20 30 40 50 60 70 80 90 100
0.35
0.4
0.45
0.5
0.55
0.6
Time
v
1
0 10 20 30 40 50 60 70 80 90 100
0
0.02
0.04
0.06
0.08
0.1
0.12
Time
v
2
0 10 20 30 40 50 60 70 80 90 100
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
Time
i
0 10 20 30 40 50 60 70 80 90 100
2.5
2
1.5
1
0.5
0
Time
(a) (b)
(c) (d)
C
o
n
t
r
o
l
f
o
r
c
e
State PI regulator
BELBIC with fuzzy stress
generation
State PI regulator
BELBIC with fuzzy stress
generation
State PI regulator
BELBIC with fuzzy stress
generation
State PI regulator
BELBIC with fuzzy stress
generation
IJICC
3,2
354
5. Conclusions
In this paper, a new approach in stress generation for emotional controllers was
presented. Meanwhile, a novel approach for employing model free controllers with
learning ability for controlling systems with unstable equilibriums was introduced.
This approach was based on imitative learning, in which the emotional controller rst
imitated froma simple stabilizing controller. Although BELBIC has rapid and powerful
learning capability, it could not be simply used to control unstable systems or systems
with unstable equilibriums. The experimental results showed that by employing
imitative learning, BELBIC could rapidly learn to produce appropriate control signals
for controlling a system with unstable equilibrium point. After it learned imitatively
from a simple classically designed controller, due to its learning ability, it could reduce
the tracking and angel errors more effectively. Moreover, it showed more robustness
facing disturbances. Another advantage of the proposed controller with fuzzy
combination of objectives as the stress generator parts is that; by considering extra
situated objectives, it produces smoother control force with lower energy.
The stress of BELBIC was generated by fuzzy rules, which it made BELBIC more
capable to attend each objective properly. The results showed that this kind of stress
generation led to superior performance in terms of tracking and angle errors than
alternative method for stress generation. Another interesting result was that BELBIC
with any stress signal had better performance in presence of disturbance than the
originally supplied controller and the PI feedback controller in case of Chuas circuit,
which were model-based controllers that well-tuned especially for the cases. This was
the effect of learning capability of BELBIC, which could produce more appropriate
control force at various working conditions, and the fuzzy combination of different
objectives which result in stress signals that delicately guide the BELBICs to learn.
Owing to fast changes in some of BELBIC parameters, it is clear that BELBIC does
not learn the whole control policy for system, and it learns the control policy
temporally which decreases computational costs. The learned control policy is seemed
to depend on the operational condition of the cart velocity and position and pendulum
angel, angular velocity, and the satisfying level of each objective.
Using more complex fusion operator to combine the objectives for generating the
stress can be the next step of this work. These fusion mechanisms can model the
attention to the objectives based on the states and the satisfaction degree of each of
the objectives. Furthermore, based on dened expected level of satisfaction of each
objective the model of attending to each of the objectives (combination of objectives) can
be learned using neural networks or other parametric structures. Learning how to
combine the objectives can be a big step toward automating decision making in
unknown environments.
IAE
(v
1
)
IAE
(v
2
)
IAE
(i) IACF MADCF
Controller structure
(in presence of disturbance) E STD E STD E STD E STD E STD
BELBIC fuzzy stress 1.315 0.224 0.461 0.103 3.174 0.637 26.472 2.789 76.311 4.784
State PI feedback 1.482 0.249 0.372 0.0821 3.795 0.982 35.853 3.471 89.179 5.135
Table III.
Performance measures
for various controllers in
presence of disturbance
Learning based
emotional
controller
355
Note
1. The digital pendulum control system, crane system, manufactured by Feedback Instruments
Limited, England.
References
Alexander, C. (1979), The Timeless Way of Building, Oxford University Press, Oxford.
Arami, A., Javan Roshtkhari, M. and Lucas, C. (2008), A fast model free intelligent controller
based on fused emotions: a practical case implementation, Proceeding of the 16th
Mediterranean Conference on Control and Automation, Ajaccio, France, pp. 596-602.
Balkenius, C. and Moren, J. (1998), A computational model of emotional conditioning in the
brain, Proceedings of the Workshop on Grounding Emotions in Adaptive Systems, Zurich,
Switzerland.
Balkenius, C. and Moren, J. (2000), A computational model of emotional learning in the
Amygdala: from animals to animals, Proceedings of 6th International Conference on the
Simulation of Adaptive Behavior, MIT Press, Cambridge, MA, pp. 383-91.
Balkenius, C. and Moren, J. (2001), Emotional learning: a computational model of the
Amygdala, Cybernetics and Systems, Vol. 32, pp. 611-36.
Behenke, S. and Bennewitz, M. (2005), Learning to play soccer using imitative reinforcement,
Proceedings of International Conference on Robotics and Automation (ICRA), Workshop
on Social Aspects of Robot Programming through Demonstration, Barcelona, Spain,
pp. 18-22.
Burl, J.B. (1999), Linear Optimal Control: H
2
and H
innity
Methods, Addison-Wesley, Boston, MA.
Chellaa, A., Dindoa, H. and Infantinob, I. (2006), A cognitive framework for imitation learning,
Robotics and Autonomous Systems, Vol. 54, pp. 403-8.
Chellaa, A., Dindoa, H. and Infantinob, I. (2007), Imitation learning and anchoring through
conceptual spaces, Applied Articial Intelligence, Vol. 21, pp. 343-59.
Farina, M., Deb, K. and Amato, P. (2004), Dynamic multiobjective optimization problems: test
cases, approximations, and applications, IEEE Trans. on Evolutionary Computation,
Vol. 8, pp. 425-42.
Feedback Instrument Ltd (2002), Digital Pendulum Control Experiments Manual,
33-935/936-1V60, Feedback Instrument Ltd, Crowborough.
Gholipour, A., Lucas, C. and Shahmirzadi, D. (2004), Purposeful prediction of space weather
phenomena by simulated emotional learning, IASTED International Journal of Modelling
and Simulation, Vol. 24, pp. 65-72.
Hsu, C.S. (1987), Cell to Cell Mapping: A Method of Global Analysis for Nonlinear Systems,
Springer, New York, NY.
Hsu, C.S. and Guttalu, R.S. (1980), An unraveling algorithm for global analysis of dynamical
systems: an application of cell-to-cell mapping, ASME Journal of Applied Mechanic,
Vol. 47, pp. 940-8.
Jafarzadeh, S., Jahed Motlagh, M.R., Barkhordari, M. and Mirheidari, R. (2008), A new Lyapunov
based algorithm for tuning BELBIC controllers for a group of linear systems, Proceedings
of the 16th Mediterranean Conference on Control and Automation, Ajaccio, France,
pp. 593-5.
Jamali, M.R., Arami, A., Dehyadegari, M. and Lucas, C. (2009), Emotion on FPGA: model driven
approach, Expert Systems with Applications, Vol. 36, pp. 7369-78.
IJICC
3,2
356
Jamali, M.R., Pedram, A., Milasi, M.R. and Lucas, C. (2006), Design and implementation of
BELBIC pattern, Proceedings of 14th Iranian Conference on Electrical Engineering,
Tehran, Iran, pp. 436-41.
Jamali, M.R., Arami, A., Hosseini, B., Moshiri, B. and Lucas, C. (2008), Real time emotional
control for anti-swing and positioning control of SIMO overhead travelling crane, Int.
Journal of Innovative Computing, Information and Control, Vol. 4, pp. 2333-44.
Jiang, G.P., Chen, G. and Tang, W.K. (2002), Stabilizing unstable equilibrium points of a class of
chaotic systems using a state PI regulator, IEEE Trans. on Circuits and Systems I:
Fundamental Theory and Application, Vol. 49, pp. 1820-6.
Jin, Y. and Sendhoff, B. (2008), Pareto-based multiobjective machine learning: an overview and
case studies, IEEE Trans. on Systems Man, and Cybernetics Part C: Applications and
Reviews, Vol. 38, pp. 397-415.
Kuniyoshi, M.I. and Inoue, I. (1994), Learning by watching: extracting reusable task knowledge
from visual observation of human performance, IEEE Trans. on Robotics and
Automation, Vol. 10, pp. 799-822.
Latzke, T., Behenke, S. and Bennewitz, M. (2006), Imitative reinforcement learning for soccer
playing robots, Proceedings of the 10th RoboCup International Symposium, Bremen,
Germany, pp. 47-58.
Lopes, M. and Santos, V.J. (2005), Visual learning by imitation with motor representations,
IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, pp. 438-49.
Lucas, C., Shahmirzadi, D. and Sheikholeslami, N. (2004), Introducing BELBIC: brain emotional
learning based intelligent controller, International Journal of Intelligent Automation and
Soft Computing, Vol. 10, pp. 11-21.
Milasi, R.M., Jamali, M.R. and Lucas, C. (2007), Intelligent washing machine: a bioinspired and
multiobjective approach, International Journal of Control, Automation, and Systems,
Vol. 5, pp. 436-43.
Milasi, R.M., Lucas, C. and Araabi, B.N. (2004), Speed control of an interior permanent magnet
synchronous motor using BELBIC (brain emotional learning based intelligent controller),
Proceedings of 5th International Symposium on Intelligent Automation and Control, World
Automation Congress, Sevilla, Spain, Vol. 16, pp. 280-6.
Milasi, R.M., Lucas, C. and Araabi, B.N. (2006a), Intelligent modeling and control of washing
machine using locally linear neuro-fuzzy (LLNF) modeling and modied brain emotional
learning based intelligent controller (BELBIC), Asian Journal of Control, Vol. 8,
pp. 393-400.
Milasi, R.M., Lucas, C., Araabi, B.N., Radwan, T.S. and Rahman, M.A. (2006b), Implementation
of emotional controller for interior permanent magnet synchronous motor drive,
Proceedings of IEEE/IAS 41st Annual Meeting: Industry Applications, Tampa, FL, USA,
Vol. 4, pp. 1767-74.
Mobahi, H., Nili Ahmadabadi, M. and Nadjar Araabi, B. (2007), A biologically inspired method
for conceptual imitation using reinforcement learning, Journal of Applied Articial
Intelligence, Vol. 21, pp. 155-83.
Montesano, L., Lopes, M., Bernardino, A. and Santos-Victor, J. (2008), Learning object
affordances: from sensory-motor coordination to imitation, IEEE Transactions on
Robotics, Vol. 24, pp. 15-26.
Moren, J. (2002), Emotion and learning: a computational model of the amygdale, PhD thesis,
Lund University, Lund.
Ogata, K. (1997), Modern Control Engineering, 3rd ed., Pearson Education, Harlow.
Learning based
emotional
controller
357
Shahmirzadi, D. (2005), Computational modeling of the brain limbic system and its application
in control engineering, MSc thesis, Texas A&M University, College Station, TX.
Shahmirzadi, D. and Langari, R. (2005), Stability of Amygdala learning system using cell-to-cell
mapping algorithm, Journal of Intelligent System and Control, Vol. 4, pp. 97-119.
Shahmirzadi, D., Lucas, C. and Langari, R. (2003), Intelligent signal fusion algorithm using
BEL brain emotional learning, Proceedings of 7th Joint Conference on Information
Sciences, JCIS03, 1st Symposium on Brain-Like Computer Architecture, Cary, NC, USA.
Sharba, M.A., Lucas, C., Mohammadinejad, A. and Yaghobi, M. (2006), Designing a football
team of robots from beginning to end, International Journal of Information Technology,
Vol. 3 No. 2, pp. 101-8.
Sheikholeslami, N., Shahmirzadi, D., Semsar, E., Lucas, C. and Yazdanpanah, M.J. (2006),
Applying brain emotional learning algorithm for multivariable control of HVAC
systems, International Journal of Intelligent and Fuzzy Systems, Vol. 17, pp. 35-46.
Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: An Introduction, MIT Press,
Cambridge, MA.
Takagi, T. and Sugeno, M. (1983), Derivation of fuzzy control rules from human operators
control actions, Proceedings of IFAC Symposium on Fuzzy Information, Knowledge
Representation and Decision Analysis, Marseille, France, pp. 55-60.
Tong, S. and Li, Y. (2007), Direct adaptive fuzzy backstepping control for a class of nonlinear
systems, International Journal of Innovative Computing, Information and Control, Vol. 3,
pp. 877-96.
Further reading
Merabian, A.R. and Lucas, C. (2007), Intelligent adaptive control of non-linear systems based on
emotional learning approach, International Journal on Articial Intelligence Tools, Vol. 16,
pp. 69-85.
About the authors
Mehrsan Javan Roshtkhari was born in 1984 in Mashhad, Iran. He received his
BSc in Electrical Engineering from the University of Tehran (2006). He is
currently a MSc student in Control Engineering in Electrical and Computer
Engineering Department, University of Tehran. He is also a Student Member of
Control and Intelligent Processing Center of Excellence. His research interest
includes, pattern recognition, signal processing, emotional learning methods,
and model free control. Mehrsan Javan Roshtkhari is the corresponding author
and can be contacted at: m.javan@ece.ut.ac.ir
Arash Arami was born in 1983 in Tehran. He received his BSc in Electrical
Engineering from the University of Tabriz (2006). He is currently a MSc student
in Control Engineering in Electrical and Computer Engineering Department,
University of Tehran. He is also a Student Member of Control and Intelligent
Processing Center of Excellence. His research interest includes: attention
control, reinforcement learning and emotional learning, model free control, fuzzy
clustering, swarm intelligence, and signal processing.
IJICC
3,2
358
Caro Lucas received the MS degree from the University of Tehran, Iran, in 1973,
and the PhD degree from the University of California, Berkeley, in 1976. He is a
Professor at the Department of Electrical and Computer Engineering, University
of Tehran, Iran, as well as a Researcher at the School of Cognitive Science, Institute
for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran. He has
served as the Director of Research Faculty of Intelligent Systems (RFIS), IPM
(1993-1997), Chairman of the ECE Department at the University of Tehran
(1986-1988), Managing Editor of the Memories of the Engineering Faculty, University of Tehran
(1979-1991), Associate Editor of Journal of Intelligent and Fuzzy Systems (1992-1999), and Chairman
of the IEEE, Iran Section (1990-1992). His research interests include biological computing,
computational intelligence, uncertain systems, intelligent control, neural networks, multi-agent
systems, data mining, business intelligence, nancial modeling, and knowledge management.
Professor Lucas has served as the Chairman of several International Conferences. He was the
Founder of the RFIS, Center of Excellence on Control and Intelligent Processing, and has assisted in
founding several new research organizations and engineering disciplines in Iran.
Learning based
emotional
controller
359
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints