Imitative Learning Based Emotional Controller For Unknown Systems With Unstable Equilibrium

Imitative learning based
emotional controller for unknown

systems with unstable
equilibrium
Mehrsan Javan Roshtkhari, Arash Arami and Caro Lucas
Control and Intelligent Processing Center of Excellence,
Faculty of Engineering, School of ECE,
University of Tehran, Tehran, Iran
Abstract
Purpose Intelligent control for unidentied systems with unstable equilibriums is not always a
proper control strategy, which results in inferior performance in many cases. Because of the existing
trial and error manner of the procedure in former duration of learning, this exploration for nding the
appropriate control signals can lead to instability. However, the recent proposed emotional controllers
are capable of learning swiftly; the use of these controllers is not an efcient solution for the mentioned
instability problems. Therefore, a solution is needed to evade the instability in preliminary phase of
learning. The purpose of this paper is to propose a novel approach for controlling unstable systems
or systems with unstable equilibrium by model free controllers.
Design/methodology/approach An existing controller (model-based controller) with limited
performance is used as a mentor for the emotional learning controller in the rst step. This learning phase
prepares the controller to control the plant as well as mentor, while it prevents any instability. When the
emotional controller can imitate the behavior of model based one properly, the employed controller is
gently switched from model based one to an emotional controller using a fuzzy inference system (FIS).
Also, the emotional stress is softlyswitchedfromthe mentor-imitator output difference to the combination
of the objectives. Inthis paper, the emotional stresses are generated once byusinga nonlinear combination
of objectives and once byemploying different stresses to a FISwhich attentionally modulated the stresses,
and makes a subset of these objectives salient regarding the contemporary situation.
Findings The proposed model free controller is employed to control an inverted pendulum system
and an oscillator with unstable equilibrium. It is noticeable that the proposed controller is a model free
one, and does not use any knowledge about the plant. The experimental results on two benchmarks
show the superiority of proposed imitative and emotional controller with fuzzy stress generation
mechanism in comparison with model based originally supplied controllers and emotional controller
with nonlinear stress generation unit in control of pendulum system in all operating conditions.
Practical implications There are two test beds for evaluating the proposed model free controller
performance which are discussed in this paper: a laboratorial inverted pendulum system, which is a
well-known system with unstable equilibrium, and Chuas circuit, which is an oscillator with two
stable and one unstable equilibrium point. The results show that the proposed controller with the
mentioned strategy can control the systems with satisfactory performance.
Originality/value In this paper, a novel approach for controlling unstable systems or systems with
unstable equilibriumbymodel free controllers is proposed. This approachis basedonimitative learningin
preliminary phase of learning and soft switching to an interactive emotional learning. Moreover, FISs are
usedto model the linguistic knowledge of the ascendancy and situated importance of the objectives. These
FISs are used to attentionally modulate the stress signals for the emotional controller. The results of
proposed strategy on two benchmarks reveal the efcacy of this strategy of model free control.
Keywords Learning, Fuzzy logic, Systems and control theory, Controllers
Paper type Research paper
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1756-378X.htm
IJICC
3,2
334
Received 16 November 2008
Revised 11 August 2009
Accepted 23 August 2009
International Journal of Intelligent
Computing and Cybernetics
Vol. 3 No. 2, 2010
pp. 334-359
qEmerald Group Publishing Limited
1756-378X
DOI 10.1108/17563781011049232
1. Introduction
There are several ways of knowledge sharing which are used in multi agent systems.
One of the most prominent mechanisms of knowledge sharing is imitation. Particularly
for intelligent agents, imitative learning is an approach to transfer knowledge from an
expert agent to another agent without any brain-to-brain transfers (Chellaa et al., 2006).
The imitation speeds up the learning and reform the performance of the learner.
By imitative learning, the agent tries to gain the same result which mentor has been
achieved. It is different from mimicking, because the imitator does not act exactly like
mentor, it only performs an action which tends to the same results in environment.
In many tasks in which there is an expert agent and a tyro agent come to learn the
proper actions for doing the task, the imitative learning is employed. Imitative learning
is widely used in robotics and human machine interface (Kuniyoshi and Inoue, 1994;
Chellaa et al., 2007; Montesano et al., 2008; Lopes and Santos, 2005). For example, an
imitation framework based on concept learning is presented by Mobahi et al. (2007).
Moreover, the application of imitative learning for soccer playing robots are discussed
in Behenke and Bennewitz (2005) and Latzke et al. (2006).
Nowadays, development of new biological inspired algorithms is an area of interest.
One of the most important aspects of any intelligent systemis its capability to learn and
adopt. Although the learning process can be done in various ways, the main aim is
adaptation of parameters to improve the performance of the system, and overcome the
difculties which are encountered with changes in the environment (Shahmirzadi,
2005). As the emotional behavior of humans and other animals is an important part
of their intelligence, modeling this process leads to have an intelligent system with
fast learning ability (Balkenius and Moren, 2001). Although evolution mechanism
codes emotional reactions in animals, the mammalian can learn them very fast. In
biological system, emotional reactions are utilized for fast decision making in complex
environments or emergency situations. The main part of mammalians brain which is
responsible for emotional processes is called the limbic system. Several attempts have
been made to model the limbic system (Balkenius and Moren, 1998; Moren, 2002). The
computational models of Amygdala and Orbitofrontal cortex which are the main parts
of limbic system in the brain were rst introduced in Balkenius and Moren (2000).
Consequently, based on works of Balkenius and Moren (2000), brain emotional learning
based intelligent controller (BELBIC) which is an intelligent controller introduced in
Lucas et al. (2004). The fast learning ability of BELBIC makes it a powerful model free
controller for many tasks. Describing BELBIC in pattern format was rst introduced in
Jamali et al. (2006). Pattern describes a problem, which occurs over and over again in our
environment, and then describes the core of the solution to that problem (Alexander,
1979). Reusability, extendibility and implementation concerns in different platforms
were described in this pattern (Lucas et al., 2004). BELBIC is applied on several
applications such as control of intelligent washing machines (Milasi et al., 2006a, 2007),
speed control of an interior permanent magnet synchronous motor (IPMSM) (Milasi
et al., 2004; Sheikholeslami et al., 2006), a modied version of BELBIC is employed for
controlling heating, ventilating and air conditioning systems. Moreover, the BELBIC is
used in time series prediction (Gholipour et al., 2004) and sensor-data fusion
(Shahmirzadi et al., 2003). The real-time implementation of the BELBIC for IPMSM
drives was rst introduced in Milasi et al. (2006b). This implementation is done with
attending to the industrial implementing concerns of a controller. The controller was
Learning based
emotional
controller
335
successfully implemented real-time by using a digital signal processor board for a
laboratory 1-hp IPMSM and the results show fast response, simple implementation,
robustness with respects to uncertainties such as manufacturing imperfections and
good disturbance rejection. Another real-time implementation of BELBIC in position
tracking and swing damping of laboratorial overhead crane in computer control via
MATLAB external mode is described in Jamali et al. (2008). In addition, BELBIC has
been used in many robots (Sharba et al., 2006) and shows the high performance and
the capability of simple real time implementation. Moreover, a exible hardware
implementation of BELBIC has done on eld programmable gate array (FPGA) board
( Jamali et al., 2009). Benets of this embedded BELBIC on FPGA include reusability,
scalability, interpretability, exibility, robustness, and computational stability. Also, the
sampling frequency of mentioned controller on FPGA is 1 kHz which is appropriate for
many control engineering applications. In addition, nonlinear combinations of objectives
are used to design emotional stresses for BELBIC to include more objectives, and used
to control an overhead crane under uncertainties and disturbances (Arami et al., 2008).
The stability of brain emotional learning (BEL) system which is used in control as
BELBIC was discussed in Shahmirzadi and Langari (2005). They analyzed the stability
of the BEL system by cell-to-cell mapping method which was initially developed as an
efcient numerical technique for global analysis of nonlinear systems (Hsu, 1987; Hsu
and Guttalu, 1980). To ensure this interpretation of the stability of the system, a general
idea for choosing control parameters is described in Shahmirzadi and Langari (2005).
A recent study on stability analysis of BELBIC has done in Jafarzadeh et al. (2008)
which guarantee the stability dealing with rst and second order linear systems based
on Lyapunov theory. This analysis provides some constraints on the learning rate,
structure of input signals for BELBIC and the state equations as well. Although more
complex Lyapunov function is needed to prove stability of BELBIC in control of higher
order linear or nonlinear systems, these functions could imply more constraints on state
equation of systems. These constraints postulate that the stability based on Lyapunov
theory cannot be achieved in all systems.
In most of the applications, BELBIC is employed as a model free controller. Model
free control is an approach to control systems with complex dynamics and various
uncertainties to get rid of the expense of system identication. To eliminate the
identication of the system or to reduce the cost of identication process, nowadays,
designing controllers for systems which are not completely identied becomes an area
of interests. In other words, the ability of designing controller for a systemby assuming
that a simple description of system dynamics is available is the essence of model free
control. Fuzzy control can be a solution, but it cannot deal properly with unrecognized
dynamics of system (Tong and Li, 2007). For improving the performance of the system,
it is necessary to add compensators with learning ability to adapt themselves with
system variations during control tasks. Another approach is designing an intelligent
controller with capability of learning control signal. The learning approaches should be
based on reinforcement learning methods in which a controller optimizes a tness
function. It must be noticed that in most of control application the designers have to
satisfy more than one objective to achieve the desire behavior of the system. In order to
deal with these multi-objective problems different approaches are used such as nding
the optimal Pareto points ( Jin and Sendhoff, 2008; Farina et al., 2004), or using a xed
linear combination of objectives as the cost or tness function (Burl, 1999; Ogata, 1997).
IJICC
3,2
336
In general, case for improving the tness value with respect to different objectives
existing in the designers mind, and the varying degrees of importance of each of them
according to the states and the degrees of satisfaction, a nonlinear fusion of the
objectives is necessary to generate appropriate reward and punishments for learning
process (Arami et al., 2008). When the number of objectives increases (for example, more
than four objectives), some objectives are neglected practically in the linear combination
of objectives with xed weights, due to their weights.
The main drawback of model free controllers with learning ability without any
prior knowledge of the systems dynamics such as reinforcement learning based
controllers and BELBIC is that in early stages of learning process, they may cause the
low performance, due to producing wrong control signal. This preliminary phase of
learning can result in instability in some cases. After this rst period of learning, if no
instability occurs, the controller can learn the proper control signals to improve
performance gradually. Although BELBIC shows fast learning ability, it has the same
problem, but in a shorter period of time. If the system is inherently unstable, applying
these controllers may cause the system become unstable, and the process must be
stopped in order to prevent damages. Thus, BELBIC cannot be applied on such systems
independently. To solve this problem, another approach is introduced.
The aim of this paper is to implement an appropriate model free controller to control
the systems with unstable equilibrium such as an inverted pendulum[1] and introduce
new mechanism for stress generation of BELBIC to modulate the attention control.
Because of the pendulum angle sensitivity to the control signal any wrong changes in
control signal makes the system oscillates and the pendulum falls down. Therefore, if
we use BELBIC as a totally model free controller, which learn fromscratch, the learning
will be useless, and the pendulum will fall down many times. For solving the problem
and accelerating the learning phase, a new approach is used. First, BELBIC imitatively
learns froma classical simple controller which is designed based on model of system-.
The classical controller can be a simple one that only stabilizes the system, regardless
of good performance and robustness. Then the output of BELBIC is gradually applied
to the system and it replaces the initial controller. The important part in switching
between controllers is changing the emotional signal of BELBIC due to change in
objective. When BELBIC learns to imitate behavior of the initial controller, the objective
is reducing the error between controllers output, and when BELBIC replaces the initial
controller the objective is reduction of tracking and angle error. In order to generate the
emotional stresses in the second phase of learning, it is necessary to attend to more
important objectives at any time. To generate this coded attention mechanism, two
methods are applied. First, a nonlinear combination of objectives is used to produce
proper stresses for BELBIC. Besides, a set of linguistic rules are used to generate the
useful stresses which is implemented in a fuzzy inference system (FIS).
This paper organized as follows: Section 2 briey introduces BELBIC, Section 3
includes different crucial aspects of the proposed controller, and describes the control
strategy and structure of the controllers, Section 4 discusses the simulation results, and
nally Section 5 concludes the paper.
2. BELBIC
The BELBIC structure is a simple computational model of most important parts in
limbic system of brain, Amygdala and Orbitofrontal cortex. Figure 1 shows the
Learning based
emotional
controller
337
schematic diagram of BELBIC structure and each part of it will be described briey
(Balkenius and Moren, 2000).
As shown in Figure 1, the system consists of four main parts. As it is seen, sensory
input signals rst entered in Thalamus. Thalamus is a simple model of real Thalamus
in the brain in which some simple pre-processing on sensory input signals is done.
After pre-processing in Thalamus, the signal will be sent to Amygdala and Sensory
cortex. Sensory cortex is responsible for subdivision and discrimination of the coarse
output from Thalamus and then sent it to Amygdala and Orbitofrontal cortex.
Amygdala is a small structure in the medial temporal lobe of brain which is thought to
be responsible for the emotional evaluation of stimuli. This evaluation is in turn used
as a basis of emotional states, emotional reactions and is used to signal attention and
laying down long-term memories. And the last part, Orbitofrontal cortex, is supposed
Figure 1.
Structure of BELBIC
Orbitofrontal cortex
Sensory input (s)
Primary reward
(rew)
Amygdala
S
e
n
s
o
r
y

c
o
r
t
e
x
T
h
a
l
a
m
u
s
W
1
V
3
V
2
V
1
V
th
A
th
A
A
A
E
E'
Inhibitory
Plastic
Learning
Excitatory
W
2
W
3
O
O
O
Source: Balkenius and Moren (2000)
IJICC
3,2
338
to inhibit inappropriate responses from the Amygdala, based on the context given by
the hippocampus (Balkenius and Moren, 2000). In this section, functionality of these
parts and the learning algorithm is based on what is stated in Balkenius and Moren
(2000).
As the Thalamus must provide a fast response to stimuli, in this model the
maximumsignal, over all sensory inputs, S, is sent directly to the Amygdala as another
input (equation (1)). Unlike other inputs to the Amygdala, the Thalamic input is not
projected into the Orbitofrontal cortex, so it cannot be inhibited by itself:
S
th
maxS
i
1
In the Amygdala, each A node has a plastic connection weight V. The sensory input is
multiplied by the weight and forms output of the node:
A
i
S
i
V
i
2
In the Orbitofrontal cortex, each Ois similar to A nodes, and the output is calculated by
applying connection weight W the input signal:
O
i
S
i
W
i
3
The model output can be computed as follow:
E
A
i
2
O
i
4
where the A nodes produce their outputs proportionally to their contribution in
predicting the stress, while the O nodes inhibit the output of E if necessary.
As it is shown in Figure 1, except the Thalamic signal going directly to the
Amygdala, the Amygdala, and the Orbitofrontal cortex receive the same input signals.
But the main difference between them is the learning rules.
The connection weights V
i
are adjusted proportionally to the difference between the
reinforcement signal and the activation of the Anodes. The a term is a constant used to
adjust the learning speed:
DV
i
a max 0; S
i
stress 2
A
j
_ _ _ _ _ _
5
As mentioned before, the task of the Amygdala is learning the associations between
the sensory and the emotional input to generate an output. But the equation (5) is
mainly different from similar associative learning systems, because this weight
adaptation rule is monotonic, i.e. the weights V cannot be decreased. At rst, it may
seem as a drawback of learning rule, but this adaptation rule has biological reasons.
According to what occurs in Amygdala, once an emotional reaction is learned, this
should become permanent. The Orbitofrontal cortex inhibits inappropriate reactions of
Amygdala.
The Orbitofrontal cortex learning rule is very similar to the Amygdala rule:
DW
i
bS
i
E 2stress 6
The reinforcement signal for the O-nodes is dened as difference between model
output E and the stress signal. In other words, the O-nodes compare expected and
received reinforcement signal, and inhibit output of the model if there is a mismatch.
Learning based
emotional
controller
339
The main difference between adaptation rule of Orbitofrontal cortex and Amygdala,
is that the Orbitofrontal connection weight can be increased and decreased as needed
to track the required inhibiting of Amygdala. Parameter b is another learning rate
constant.
As discussed, BELBIC learns from its emotional signal and produce its output
based on sensory inputs and connection weights. In Shahmirzadi and Langari (2005),
the stability of BELBIC is demonstrated by using cell-to-cell mapping method.
3. Model free controller design
In order to accelerate the learning process and avoid making the system unstable, we
proposed a newapproach. In Figure 2, a owdiagramof this approach for training model
free controllers for systems with unstable equilibrium has been represented. This
approach consists of two parts, imitative learning and performance enhancement. In the
imitative learning phase, a simple stabilizing controller used as the main control system
and BELBIClearns to imitate the behavior of this controller. Inother words, this controller
acts as a mentor and BELBIC tries to produce the same control force as the mentor
according to the observed states. In the second phase, after BELBIC imitatively learned
to stabilize the system from initial controller, the controller is replaced with BELBIC.
At this phase, control objectives will change and BELBIC tries to improve performance
of the control system instead of imitating the initial controller. Owing to the capability
of learning, BELBIC will learn to enhance the systems performance quickly.
3.1 First benchmark: inverted pendulum system
As mentioned before, the rst test bed for evaluating the controller performance is an
inverted pendulum which is a well-known SIMO system. Controlling the inverted
pendulum is a challenging and interesting task. The control task is tracking the
reference signal and stabilizing the pendulum. The system which is used to evaluate
the controller performance is a nonlinear model of a laboratorial inverted pendulum
system provided by Feedback Ltd This pendulumsystem consists of a cart, a rope, and
a load. The load is regarded as a material particle with a mass of m. The rope is
considered as an inexible rod with length of l, which its mass is negligible in
comparison with the load mass. The cart with mass of M moves on a straight rail.
A schematic of pendulum system is shown in Figure 3. The state equations of utilized
system are (Feedback Instrument Ltd, 2002):
M m x 2l sinu
00
F 2T
c
M m l cosu
00
V 2 M m g
J

u F 2T
c
l cosu Vl sinu 2D
p
7
where T
c
is the friction of moving cart, D
p
is the friction moment of angular movement
of pendulum load. The reaction force of rail is also denoted by V. In addition, the inertia
moment of cart and pendulum is denoted by J.
3.2 Second benchmark: Chuas circuit
The second test bed for evaluating emotional controller performance is a chaotic
system (Chuas circuit). Chuas circuit was the rst electronic dynamical system
IJICC
3,2
340
Figure 2.
Flowchart for model free
controller design
Observe control signal of the
mentor + sensory input
Are conditions for
imitative learning part
satisfied?
Use a model-based controller
as mentor
Switch controller from mentor
to BELBIC
Generate stress signal for
imitative learning
No
Yes
Observe sensory input and
generate stress signal
according to the performance
measures
Imitative
learning
phase
Performance
enhancement
phase
Update parameters of belbic
according to emotional stress
Update parameters of BELBIC
according to emotional stress
Learning based
emotional
controller
341
capable of generating chaotic phenomena in the laboratory. The Chuas circuit has
been shown in Figure 4, and the state equation of the system is ( Jiang et al., 2002):
_
X AX Bu gX Dw
Y CX
X
v
1
v
2
i
_
_
_
_
A
2ac a 0
1 21 1
0 2b 0
_
_
_
_
B
b
1
b
2
b
3
_
_
_
_
gX
2av
3
1
0
0
_
_
_
_
8
This system has three equilibrium points, and the equilibrium point of X
_
2c
p
0 2

2c
p
T
is unstable.
3.3 Proposed controllers for the inverted pendulum system
Owing to nonlinearity of systems state equations and nonlinear properties of driving
motor and friction, designing a model-based controller is a hard task. We used BELBIC
Figure 3.
Schematic of pendulum
system
F Trolley
Load
mg
(M)
x, dx/dt, d
2
x/dt
2
d
2
/dt
2
, d/dt
Figure 4.
Chuas circuit
R
1
C
2
C
1
R
2
L
Chuas diode
i
V
2
V
1
IJICC
3,2
342
as a model free controller to control the inverted pendulum. The main challenge in
using BELBIC as a model free controller in unstable systems or stable systems with
unstable equilibrium point such as our test bed is the learning phase at the beginning.
As the BELBIC has no information of systems dynamics, performance of the
controlled system may seem to be awful at the beginning of learning process and the
pendulum falls down. BELBIC has fast learning ability; and theoretically in the short
time it should learn the proper control action according to its sensory inputs and
emotional stress. But in this task the pendulum angle is very sensitive to the control
signal, so any wrong changes in control signal make the system unstable or oscillating.
First, we used BELBIC as the only controller of the system. It is possible for BELBIC to
learn the proper control strategy, but in our simulation, we nd that this process will
too long or probably impossible in real applications.
According to the idea of hierarchical controller structure, to design a controller to
satisfy various objectives, at rst, it is assumed that the objectives can be decoupled
and then a separate controller is designed to satisfy each objective. After that, outputs
of these controllers must be fused together. Figure 5 shows the proposed BELBIC
structure. As there are two major objectives, position tracking and pendulum angle
regulating, two BELBICs are employed. The cart position error and its rst derivation
are dened as sensory signals for one of BELBICs and the pendulum angel and its rst
derivation are for the other. In most of the previously reported structures of BELBIC,
they have only one neuron, because the sensory input signal was one dimensional.
In our structure, as each BELBIC has two sensory inputs, they must have more than
one neuron and in this task two neurons seems to be adequate.
The emotional stress signal which will be described, couples the two separate
controllers. Also by employing this kind of stress signals there will be no need to use
complex fusion block to combine the output of controllers, and just a summation
operator is adequate (Shahmirzadi et al., 2003). Meanwhile, the computational cost of
output fusion is reduced to the cost of fusing some main and auxiliary objectives in
Figure 5.
Diagram of the proposed
BELBIC controller
Stress
generator 1
Stress
generator 3
(FIS)
Stress
generator 4
(FIS)
FIS
FIS
Pendulum
system
BELBIC
BELBIC
Model-based
controller
(mentor)
FIS
Stress
generator 2
+
Learning based
emotional
controller
343
stress generator block. Also to change the control objectives, and switching from
imitative learning to normal learning, there is no need to change the controller
structure and only changing the emotional stress signal is enough.
Stress generation. As stated before, BELBIC can show various behaviors by
applying different stress signals on it. Therefore, in order to satisfy different control
objectives, proper stress signal must be dened based on each objective. The ability of
achieving more different objectives can be obtained by dening different stress signals.
Control objectives change at each learning phase:
(1) Imitative learning phase. Producing similar control force and stabilizing the
system.
(2) Performance enhancement. Reducing the position and angle errors of the
pendulum and hold it at its equilibrium point.
As a result, it is necessary to dene appropriate stress signals to satisfy each objective
base on its importance.
Stress generation for imitative learning phase. In imitative learning, the objective is
that BELBIC produces a similar control signal to the initial controller. Thus, reducing
the difference between these two control signals is the main goal in this part and there
is no more control objectives. Consequently, the emotional stress signal is consists of
two signals, error of control signal and rst derivation of this error:
Stress w
1
e
u j j w
2
_ e
u j j w
3
e
u
_ e
u
9
The reason to add _ e
u
in above sum is that in sometimes e
u
may become zero,
while these control signals may have completely different behaviors. Also, by adding
the last term, the designer can expect that transition of errors to zero become faster.
Based on the degree of importance of each of these measures ( e
u j j; _ e
u j j; e
u
_ e
u
), a designer
can tune the weights elaborately. Moreover, some consideration must be taken to
the account. For example, if w
3
is relatively large, BELBIC output should oscillates
undesirablyor insome situations the Stress becomes negative whichweakens the learning
phase. These weights can be tuned using learning algorithms such as reinforcement
learning (Sutton and Barto, 1998) whether there is another source of feedback which
evaluates the closeness of behavior of the imitator and mentor controllers.
Stress generation for performance enhancement phase. After BELBIC imitatively
learns the control action from initial controller, it becomes the main controller of the
system and the initial controller is replaced by it. At this time, the control objectives
have been change and reducing position tracking error and angle error become new
control objectives. Therefore, the stress signal must be modied.
To satisfy more than two objectives, more complicated combination of stress
signals which are associated with each objective is necessary. To generate the proper
stress signal for all objectives, at any time, the more important objective must be
attended more than the others. To generate this coded attention mechanism, we used
two approaches. The rst is the nonlinear combination of stresses and the second is
using linguistic rules.
Nonlinear combination of signals for stress generation. This stress signal is similar
to which is used in our previous work for control of an overhead crane (Arami et al.,
2008), but this work is somewhat different. In order to enhance the behavior of system
the major objectives are dened. Moreover, some extra objectives should be attended to
IJICC
3,2
344
improve the performance. Tracking error of the cart position and error of pendulum
angle are the main concerns which needed to be decreased as much as possible.
To achieve these objectives (reduction the position and angle errors), weighted square
of the rst and absolute value of second one are summed up to generate the rst part of
stress signal.
One of the extra objectives is to avoid collision with edges of the rail which leads to
breaking the operation. To impose this behavior to the cart, closeness to the edges of
rail must be punished via stress signal. Therefore, to generate the second part of stress
a dead-zone function and a squared function are employed which generate extreme
stress if the cart gets close to the edges. The dead-zone inactivates this stress part when
the cart is far from the edges.
Another important index which must be considered in every control tasks is energy
of control force and its variations. The amplitude of control force and its derivation are
squared and then their weighted sum is employed to generate a new part of stress
signal. This part of stress then multiplied with a monotonically decreasing function of
sum of two previously mentioned parts of stress. This multiplication leads to such a
behavior that when the stresses of previous parts are small, BELBIC tries to decrease
the control forces. Also when these stresses are signicant, the limiting of control force
is relaxed to increase the possibility of fast responses. The stress generator diagram is
shown in Figure 6.
The rst input is the error of pendulum angle the second one is the error of cart
position and the third is the position of cart. After passing the third input through a
dead-zone block this signal represents the closeness to the edges of the rail, and stress
of contact with end of rails is increased when it is happened. The forth input is the
control signal which is passed through a square function and the energy of it is
considered as a stress which is multiplied with monotonically decreasing function of
stress from the rst three inputs. It means that when the sum of rst part of stresses
(inputs 1, 2, and 3) is high, the forth one is suppressed by them. This nonlinear fusion of
Figure 6.
Internal stress generator
mechanism
1
2
3
4
|u|
|u|
2
|u|
2
|u|
2
<-
<-
+
+
+
+
+
+
In 1
In 2
In 3
In 4
Abs1
Math
function 5
Dead zone 2
Dead zone 1
Gain 2
Gain 4
Gain 5
sqrt
1
Out 2
Math
function 6
Math
function 7
Math
function 8
Fon 1
Product 1
2/(1 + u (1))
2
Note: Nonlinear combination of objectives
Learning based
emotional
controller
345
signals to generate emotional stress can be considered as a coded attention to most
important part of stresses with respect to its operational conditions and environmental
effects.
Fuzzy stress generation. As mentioned before, there are some control objectives
which must be attended according to their importance and degree of satisfaction. The
linguistic rules can implement this attention mechanism, especially when there is a
linguistic knowledge about the behavior of system and the ascendency of objectives.
Thus, rst the major and extra objectives in the control task are dened.
The objectives are:
(1) Major objectives. Tracking the desired position of the cart and xing the
pendulum vertically.
(2) Extra objective. Avoid reaching edges of the rail, minimizing energy of control
force and its variations.
The concerns are similar to the previous part, i.e. satisfying major objectives are more
important than extra objectives. Meanwhile, holding pendulum at its equilibrium is
more important than reducing tracking error in the major objectives. Moreover, when
BELBIC learned to track the desired position while holding pendulum at its
equilibrium point, it must try to reduce the control force and its variations. The
variations and especially the frequency of variations of control force must be limited,
for actuator considerations.
To generate the stress signal and code the attention mechanism, linguistic rules are
used and then they are imported to Sugeno FIS (Takagi and Sugeno, 1983). Using this
method to generate stress signal makes BELBIC capable to attend to important parts of
stresses at any time and situation.
As it is mentioned before, four effective variables are employed to generate the
stress signal. The FIS is designed with the following parameters:
(1) Inputs: errors of the cart position, error of pendulum angle, control force, rst
derivation of the control force.
(2) Number of rules: 16.
(3) FIS type: Sugeno FIS.
(4) Output: emotional stress for BELBICs.
Figure 7 shows the resulted fuzzy surfaces for stress generation. It can be seen that
holding the pendulum at its equilibrium is slightly more important than reducing
tracking error (Figure 7(a)), or reduction of control force variations is less important
than decreasing tracking error (Figure 7(c)).
3.4 Proposed controllers for Chuas circuit
Like the inverted pendulum system, the Chuas circuit has an unstable equilibrium
point. The goal is to regulate the variable around the equilibrium point. The BELBIC
structure used in this task is similar to which is used for controlling the pendulum
system (Figure 5). As there are two major objectives, regulating capacitors voltage (v
1
)
and self inductions current (i) two BELBICs are employed. The voltage error and its
rst derivation are dened as sensory signals for one of BELBICs and the current and
its rst derivation are for the other. Like previous part, each BELBIC has two neurons.
IJICC
3,2
346
For this control task, control objectives at each learning phase are:
(1) Imitative learningphase. Producingsimilar control force andstabilizingthe system.
(2) Performance enhancement. Regulating voltages and current and hold it at its
equilibrium point.
The stress generation for imitative learning phase is the same as the method used for
controlling inverted pendulum system which was dened by equation (9).
For the second phase of learning, the stress signal is generated by using fuzzy rules.
First, the major and extra objectives in the control task are dened as follow:
(1) Major objectives. Regulating capacitor voltage and current and x them at the
equilibrium point.
(2) Extra objective. Minimizing energy of control force and its variations.
Figure 7.
Fuzzy surfaces for stress
generation
0
0.05
0.1
0
0.005
0.01
0.02
0.04
0.06
0.02
0.04
0.06
0.08
0.1
Position-error
Angle-error
S
t
r
e
s
s
0.02
0.04
0.06
0.08
S
t
r
e
s
s
0.02
0.01
0.03
0.04
0.05
0.06
S
t
r
e
s
s
S
t
r
e
s
s
(a)
0
0.05
0.1
1
0
1
Position-error
Control-force
(b)
0
0.05
0.01
1
0
1
Angle-error
Control-force
(d)
0.02
0.04
0.06
0.08
S
t
r
e
s
s
0
1
1
1
0.5
0.5 0
1
Control-force
(f)
0.02
0.04
0.06
0.08
S
t
r
e
s
s
0
0.05
0.01
1
0
1
Angle-error
Derivation -of-
control-force
Derivation -of-
control-force
(e)
0
0.05
0.1
1
0
1
Position-error
Derivation-of-
control-force
(c)
Learning based
emotional
controller
347
The concerns are similar to the previous part, i.e. satisfying major objectives are more
important than extra objectives. Moreover, when BELBIC learned to hold these values
at the equilibrium point, it must try to reduce the control force and its variations. To
generate the stress signal and code the attention mechanism, linguistic rules are used
and then they are imported to Sugeno FIS. The fuzzy rules are the same as the one used
to control inverted pendulum system. The only difference is the inputs of the FIS,
which are: errors of capacitor voltage (v
1
), error of the current (i), control force, and rst
derivation of the control force.
3.5 Switching between stress signals at each learning phase and controllers
As we observed in experimental results, hard switching between controllers and
changing stress signals makes the BELBICs become unstable. Thus, instead of hard
switching, a soft switching scheme must be employed, and the BELBIC control system
must gradually replace the initial controller and at the same time its emotional stress
must be gradually changed. To do this, we employed a FIS to make soft switching
which it is a common solution for soft switching. The human linguistic rules can be
imported to Sugeno FIS (Takagi and Sugeno, 1983) easily. We used 11 fuzzy rules for
soft switching. Figure 8 shows the fuzzy surface for this task. The inputs of this fuzzy
system are the two-mentioned stresses (for imitative learning and improving
performance), the error in imitative learning phase (difference between initial controller
output and BELBIC output). The mentioned fuzzy switch is used for switching
between both controllers and stresses.
4. Results
In this part, result of using emotional controller and model-based controller for the
two-mentioned test beds are presented.
Figure 8.
Fuzzy surfaces for stress
and controller switching
20
30
40
50
0
0.05
0.1
0.02
0.04
0.06
0.08
Time Imitative-stress
S
t
r
e
s
s
0
0.05
0.1
0.02
0.04
0.06
S
t
r
e
s
s
0.01
0.02
0.03
S
t
r
e
s
s
20
30
40
50
Time
Performance-stress
0
0.05
0.1
20
30
40
50
Time
Error
IJICC
3,2
348
4.1 First benchmark, inverted pendulum
To validate the result of proposed controller, the results are compared with the original
supplied controller, which consists of two proportional integral derivative (PID)
controllers and a nonlinear compensator (Feedback Instrument Ltd, 2002). The initial
controller for imitative learning phase (the mentor) is mentioned original controller.
As it is seen from Figure 9, without employing imitative learning, BELBIC not learned
the proper control signal in more than 150 second of training. Also, it can be seen that
the pendulum falls down many times. In addition, it must be noticed that the
comparison of the proposed model free controller with complicated model based ones
which designed based on exact mathematical model of the system is not a fair
comparison. Therefore, the proposed controllers are only compared with the originally
supplied controller which plays the role of mentor in imitative phase of learning to
show the effect of enhancement phase of learning. Also, in order to assess the effect of
different stress generation mechanisms which are different coded attention
mechanisms to the objectives, the nonlinear and fuzzy stress generations are compared.
Figure 10 shows the originally supplied PID controller responses. Figure 11 shows
the BELBIC controller with nonlinear stress function and Figure 12 shows BELBIC
with fuzzy stress. As it is seen BELBIC can imitate the behavior of original controller in
about ten second from starting time completely. After that, based on fuzzy switch
structure, from 30 to 50 second, the both controller are controlling the system and after
it BELBIC controls the system individually. It is clear that after imitative learning,
BELBIC performance in reducing tracking and angle error is better, regardless of the
method for stress generation.
In order to evaluate the ability of controller to reject disturbances, a random voltage
produced by a Gaussian distribution with zero mean and 0.1 of variance is applied to
the motor in some instances. The time of applying this voltage is random variable
which obtained from a uniform distribution. The mentioned disturbance is applied
eight times from the 55th seconds until the end of simulation. The results of original
Figure 9.
Employing BELBIC
without imitative learning
0 50 100 150
10
8
6
4
2
0
2
4
6
P
e
n
d
u
l
u
m

a
n
g
l
e
Learning based
emotional
controller
349
Figure 10.
Results of originally
supplied controller
20 40 60 80 100 120 140
20 40 60 80 100 120 140
20 40 60 80 100 120 140
0.5
0
0.5
0.05
0
0.05
0.5
0
0.5
C
a
r
t

p
o
s
i
t
i
o
nDesired position Actual position
P
e
n
d
u
l
u
m

a
n
g
l
e
C
o
n
t
r
o
l

f
o
r
c
e
Note: Double PID
Figure 11.
Results of BELBIC with
nonlinear stress
generation unit
20 40 60 80 100 120 140
0.5
0
0.5
20 40 60 80 100 120 140
0.05
0
0.05
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t

p
o
s
i
t
i
o
P
e
n
d
u
l
u
m

a
n
g
l
e
C
o
n
t
r
o
l

f
o
r
c
e
IJICC
3,2
350
PID controller and BELBIC with the two-mentioned stress generation functions are
shown in Figures 13-15.
As it can be seen, BELBIC clearly shows superior performance in tracking and
disturbance rejection which is the results of its learning capability.
To have a meaningful comparison these controllers, four performance measures are
dened as follow and calculated for all the mentioned control systems, originally
supplied controller and BELBIC with two method of stress generation. As the
disturbance applied in randomly selected times, the experiments carried out 20 times
and the statistical moments of the following parameters (mean and standard deviation)
are calculated:
(1) Integral absolute error (IAE) (for cart position and pendulum angle).
(2) Integral of absolute values of control force (IACF).
(3) Integral of absolute values of derivation of control force (IADCF) (shows the
uctuations of the control force).
These performance measures are calculated for the mentioned controllers, in normal
operation and without applying disturbance and the results are demonstrated in
Table I.
From Table I, it is seen that BELBIC shows the fast learning ability for tracking.
Also, the control force signal which is penalized by stress signal is lower than control
force in other controllers and has less oscillation. Moreover, employing FISs for stress
generation leads to better result than the nonlinear stress generation function in
BELBIC.
Figure 12.
fuzzy stress generation
unit
20 40 60 80 100 120 140
0.5
0
0.5
20 40 60 80 100 120 140
0.05
0
0.05
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t

p
o
s
i
t
i
o
P
e
n
d
u
l
u
m

a
n
g
l
e
C
o
n
t
r
o
l

f
o
r
c
e
Learning based
emotional
controller
351
In the presence of disturbance, the above-mentioned measures are calculated for the
controllers and the results are presented in Table II. As it is seen, in presence of
disturbance BELBIC (regardless of the method for stress generation) again shows far
better performance than the model-based controller, although its performance
Figure 13.
Results of originally
supplied controller (double
PID) in presence of
disturbance
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t

p
o
s
i
t
i
o
n
Desired position
Actual position
20 40 60 80 100 120 140
0.05
0
0.05
P
e
n
d
u
l
u
m

a
n
g
l
e
Figure 14.
nonlinear stress
generation function in
presence of disturbance
20 40 60 80 100 120 140
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t

p
o
s
i
t
i
o
n
Desired position Actual position
0.05
0
0.05
P
e
n
d
u
l
u
m

a
n
g
l
e
IJICC
3,2
352
decreases slightly in comparison with normal operation. But it shows better
disturbance rejection and robustness. It can be seen that the fuzzy stress resulted in
more robust results and better performance.
4.2 Second benchmark, Chuas circuit
In this part, BELBIC is used as a controller to stabilize a chaotic system (Chuas circuit)
at its unstable equilibrium point. Chuas circuit controlled by the state PI feedback
controller is given by Jiang et al. (2002):
_
X AX g X l B KX K
_
t
0
CX 2CX
s
dt
_ _
10
where X
s
is the equilibrium point.
Figure 15.
fuzzy stress generation
unit in presence of
disturbance
20 40 60 80 100 120 140
0.5
0
0.5
C
a
r
t

p
o
s
i
t
i
o
n
Desired position Actual position
20 40 60 80 100 120 140
0.05
0
0.05
P
e
n
d
u
l
u
m

a
n
g
l
e
Controller structure (no-disturbance) IAE (position) IAE (angle) IACF IADCF
BELBIC fuzzy stress 3.301 0.367 9.432 9.262
BELBIC nonlinear stress 3.480 0.398 11.916 12.231
Double PID 3.925 0.457 12.219 14.353
Table I.
Performance measures
for various controllers
without disturbance
IAE
(position)
IAE
(angle) IACF MADCF
Controller structure
(in presence of disturbance) E STD E STD E STD E STD
BELBIC fuzzy stress 5.699 0.915 0.476 0.174 73.247 3.152 151.26 6.245
BELBIC nonlinear stress 6.577 1.139 0.676 0.183 76.834 4.678 157.53 5.569
Double PID 11.725 2.141 1.692 0.371 82.705 3.247 160.58 7.831
Table II.
for various controllers in
Learning based
emotional
controller
353
The system with the following parameters is used in the simulations:
a 10 b 16 c 20:143 B
0
0
1
_
_
_
_
D
d
0
0
_
_
_
_
C
0 1 0
_
K
21:7714 20:2296 24:640
_
k 26:9785
11
The same strategy is applied here for training BELBICand there are two learning phases,
imitative learning from the mentioned PI controller and performance enhancement.
An external disturbance is applied to the equation which is a step change at randomly
selected time with. Magnitude of the disturbance (d) is produced by a Gaussian
distribution with zero mean and 0.1 of variance. In Figure 16, voltages and current are
shown at presence of disturbance. As the disturbance applied in randomly selected times,
the experiments carried out 20 times and the statistical moments of the performance
measures (mean and standard deviation) are calculated and presented in Table III. The
results showthat the proposed model free controller can holdthe systemat its equilibrium
point with less energy and reject the disturbance more quickly, with lower control force.
Figure 16.
Regulation of Chuas
circuit in presence of
disturbance
0 10 20 30 40 50 60 70 80 90 100
0.35
0.4
0.45
0.5
0.55
0.6
Time
v
1
0 10 20 30 40 50 60 70 80 90 100
0
0.02
0.04
0.06
0.08
0.1
0.12
Time
v
2
0 10 20 30 40 50 60 70 80 90 100
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
Time
i
0 10 20 30 40 50 60 70 80 90 100
2.5
2
1.5
1
0.5
0
Time
(a) (b)
(c) (d)
C
o
n
t
r
o
l

f
o
r
c
e
State PI regulator
BELBIC with fuzzy stress
generation
State PI regulator
generation
State PI regulator
generation
State PI regulator
generation
IJICC
3,2
354
5. Conclusions
In this paper, a new approach in stress generation for emotional controllers was
presented. Meanwhile, a novel approach for employing model free controllers with
learning ability for controlling systems with unstable equilibriums was introduced.
This approach was based on imitative learning, in which the emotional controller rst
imitated froma simple stabilizing controller. Although BELBIC has rapid and powerful
learning capability, it could not be simply used to control unstable systems or systems
with unstable equilibriums. The experimental results showed that by employing
imitative learning, BELBIC could rapidly learn to produce appropriate control signals
for controlling a system with unstable equilibrium point. After it learned imitatively
from a simple classically designed controller, due to its learning ability, it could reduce
the tracking and angel errors more effectively. Moreover, it showed more robustness
facing disturbances. Another advantage of the proposed controller with fuzzy
combination of objectives as the stress generator parts is that; by considering extra
situated objectives, it produces smoother control force with lower energy.
The stress of BELBIC was generated by fuzzy rules, which it made BELBIC more
capable to attend each objective properly. The results showed that this kind of stress
generation led to superior performance in terms of tracking and angle errors than
alternative method for stress generation. Another interesting result was that BELBIC
with any stress signal had better performance in presence of disturbance than the
originally supplied controller and the PI feedback controller in case of Chuas circuit,
which were model-based controllers that well-tuned especially for the cases. This was
the effect of learning capability of BELBIC, which could produce more appropriate
control force at various working conditions, and the fuzzy combination of different
objectives which result in stress signals that delicately guide the BELBICs to learn.
Owing to fast changes in some of BELBIC parameters, it is clear that BELBIC does
not learn the whole control policy for system, and it learns the control policy
temporally which decreases computational costs. The learned control policy is seemed
to depend on the operational condition of the cart velocity and position and pendulum
angel, angular velocity, and the satisfying level of each objective.
Using more complex fusion operator to combine the objectives for generating the
stress can be the next step of this work. These fusion mechanisms can model the
attention to the objectives based on the states and the satisfaction degree of each of
the objectives. Furthermore, based on dened expected level of satisfaction of each
objective the model of attending to each of the objectives (combination of objectives) can
be learned using neural networks or other parametric structures. Learning how to
combine the objectives can be a big step toward automating decision making in
unknown environments.
IAE
(v
1
)
IAE
(v
2
)
IAE
(i) IACF MADCF
Controller structure
(in presence of disturbance) E STD E STD E STD E STD E STD
BELBIC fuzzy stress 1.315 0.224 0.461 0.103 3.174 0.637 26.472 2.789 76.311 4.784
State PI feedback 1.482 0.249 0.372 0.0821 3.795 0.982 35.853 3.471 89.179 5.135
Table III.
for various controllers in
Learning based
emotional
controller
355
Note
1. The digital pendulum control system, crane system, manufactured by Feedback Instruments
Limited, England.
References
Alexander, C. (1979), The Timeless Way of Building, Oxford University Press, Oxford.
Arami, A., Javan Roshtkhari, M. and Lucas, C. (2008), A fast model free intelligent controller
based on fused emotions: a practical case implementation, Proceeding of the 16th
Mediterranean Conference on Control and Automation, Ajaccio, France, pp. 596-602.
Balkenius, C. and Moren, J. (1998), A computational model of emotional conditioning in the
brain, Proceedings of the Workshop on Grounding Emotions in Adaptive Systems, Zurich,
Switzerland.
Balkenius, C. and Moren, J. (2000), A computational model of emotional learning in the
Amygdala: from animals to animals, Proceedings of 6th International Conference on the
Simulation of Adaptive Behavior, MIT Press, Cambridge, MA, pp. 383-91.
Balkenius, C. and Moren, J. (2001), Emotional learning: a computational model of the
Amygdala, Cybernetics and Systems, Vol. 32, pp. 611-36.
Behenke, S. and Bennewitz, M. (2005), Learning to play soccer using imitative reinforcement,
Proceedings of International Conference on Robotics and Automation (ICRA), Workshop
on Social Aspects of Robot Programming through Demonstration, Barcelona, Spain,
pp. 18-22.
Burl, J.B. (1999), Linear Optimal Control: H
2
and H
innity
Methods, Addison-Wesley, Boston, MA.
Chellaa, A., Dindoa, H. and Infantinob, I. (2006), A cognitive framework for imitation learning,
Robotics and Autonomous Systems, Vol. 54, pp. 403-8.
Chellaa, A., Dindoa, H. and Infantinob, I. (2007), Imitation learning and anchoring through
conceptual spaces, Applied Articial Intelligence, Vol. 21, pp. 343-59.
Farina, M., Deb, K. and Amato, P. (2004), Dynamic multiobjective optimization problems: test
cases, approximations, and applications, IEEE Trans. on Evolutionary Computation,
Vol. 8, pp. 425-42.
Feedback Instrument Ltd (2002), Digital Pendulum Control Experiments Manual,
33-935/936-1V60, Feedback Instrument Ltd, Crowborough.
Gholipour, A., Lucas, C. and Shahmirzadi, D. (2004), Purposeful prediction of space weather
phenomena by simulated emotional learning, IASTED International Journal of Modelling
and Simulation, Vol. 24, pp. 65-72.
Hsu, C.S. (1987), Cell to Cell Mapping: A Method of Global Analysis for Nonlinear Systems,
Springer, New York, NY.
Hsu, C.S. and Guttalu, R.S. (1980), An unraveling algorithm for global analysis of dynamical
systems: an application of cell-to-cell mapping, ASME Journal of Applied Mechanic,
Vol. 47, pp. 940-8.
Jafarzadeh, S., Jahed Motlagh, M.R., Barkhordari, M. and Mirheidari, R. (2008), A new Lyapunov
based algorithm for tuning BELBIC controllers for a group of linear systems, Proceedings
of the 16th Mediterranean Conference on Control and Automation, Ajaccio, France,
pp. 593-5.
Jamali, M.R., Arami, A., Dehyadegari, M. and Lucas, C. (2009), Emotion on FPGA: model driven
approach, Expert Systems with Applications, Vol. 36, pp. 7369-78.
IJICC
3,2
356
Jamali, M.R., Pedram, A., Milasi, M.R. and Lucas, C. (2006), Design and implementation of
BELBIC pattern, Proceedings of 14th Iranian Conference on Electrical Engineering,
Tehran, Iran, pp. 436-41.
Jamali, M.R., Arami, A., Hosseini, B., Moshiri, B. and Lucas, C. (2008), Real time emotional
control for anti-swing and positioning control of SIMO overhead travelling crane, Int.
Journal of Innovative Computing, Information and Control, Vol. 4, pp. 2333-44.
Jiang, G.P., Chen, G. and Tang, W.K. (2002), Stabilizing unstable equilibrium points of a class of
chaotic systems using a state PI regulator, IEEE Trans. on Circuits and Systems I:
Fundamental Theory and Application, Vol. 49, pp. 1820-6.
Jin, Y. and Sendhoff, B. (2008), Pareto-based multiobjective machine learning: an overview and
case studies, IEEE Trans. on Systems Man, and Cybernetics Part C: Applications and
Reviews, Vol. 38, pp. 397-415.
Kuniyoshi, M.I. and Inoue, I. (1994), Learning by watching: extracting reusable task knowledge
from visual observation of human performance, IEEE Trans. on Robotics and
Automation, Vol. 10, pp. 799-822.
Latzke, T., Behenke, S. and Bennewitz, M. (2006), Imitative reinforcement learning for soccer
playing robots, Proceedings of the 10th RoboCup International Symposium, Bremen,
Germany, pp. 47-58.
Lopes, M. and Santos, V.J. (2005), Visual learning by imitation with motor representations,
IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, pp. 438-49.
Lucas, C., Shahmirzadi, D. and Sheikholeslami, N. (2004), Introducing BELBIC: brain emotional
learning based intelligent controller, International Journal of Intelligent Automation and
Soft Computing, Vol. 10, pp. 11-21.
Milasi, R.M., Jamali, M.R. and Lucas, C. (2007), Intelligent washing machine: a bioinspired and
multiobjective approach, International Journal of Control, Automation, and Systems,
Vol. 5, pp. 436-43.
Milasi, R.M., Lucas, C. and Araabi, B.N. (2004), Speed control of an interior permanent magnet
synchronous motor using BELBIC (brain emotional learning based intelligent controller),
Proceedings of 5th International Symposium on Intelligent Automation and Control, World
Automation Congress, Sevilla, Spain, Vol. 16, pp. 280-6.
Milasi, R.M., Lucas, C. and Araabi, B.N. (2006a), Intelligent modeling and control of washing
machine using locally linear neuro-fuzzy (LLNF) modeling and modied brain emotional
learning based intelligent controller (BELBIC), Asian Journal of Control, Vol. 8,
pp. 393-400.
Milasi, R.M., Lucas, C., Araabi, B.N., Radwan, T.S. and Rahman, M.A. (2006b), Implementation
of emotional controller for interior permanent magnet synchronous motor drive,
Proceedings of IEEE/IAS 41st Annual Meeting: Industry Applications, Tampa, FL, USA,
Vol. 4, pp. 1767-74.
Mobahi, H., Nili Ahmadabadi, M. and Nadjar Araabi, B. (2007), A biologically inspired method
for conceptual imitation using reinforcement learning, Journal of Applied Articial
Intelligence, Vol. 21, pp. 155-83.
Montesano, L., Lopes, M., Bernardino, A. and Santos-Victor, J. (2008), Learning object
affordances: from sensory-motor coordination to imitation, IEEE Transactions on
Robotics, Vol. 24, pp. 15-26.
Moren, J. (2002), Emotion and learning: a computational model of the amygdale, PhD thesis,
Lund University, Lund.
Ogata, K. (1997), Modern Control Engineering, 3rd ed., Pearson Education, Harlow.
Learning based
emotional
controller
357
Shahmirzadi, D. (2005), Computational modeling of the brain limbic system and its application
in control engineering, MSc thesis, Texas A&M University, College Station, TX.
Shahmirzadi, D. and Langari, R. (2005), Stability of Amygdala learning system using cell-to-cell
mapping algorithm, Journal of Intelligent System and Control, Vol. 4, pp. 97-119.
Shahmirzadi, D., Lucas, C. and Langari, R. (2003), Intelligent signal fusion algorithm using
BEL brain emotional learning, Proceedings of 7th Joint Conference on Information
Sciences, JCIS03, 1st Symposium on Brain-Like Computer Architecture, Cary, NC, USA.
Sharba, M.A., Lucas, C., Mohammadinejad, A. and Yaghobi, M. (2006), Designing a football
team of robots from beginning to end, International Journal of Information Technology,
Vol. 3 No. 2, pp. 101-8.
Sheikholeslami, N., Shahmirzadi, D., Semsar, E., Lucas, C. and Yazdanpanah, M.J. (2006),
Applying brain emotional learning algorithm for multivariable control of HVAC
systems, International Journal of Intelligent and Fuzzy Systems, Vol. 17, pp. 35-46.
Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: An Introduction, MIT Press,
Cambridge, MA.
Takagi, T. and Sugeno, M. (1983), Derivation of fuzzy control rules from human operators
control actions, Proceedings of IFAC Symposium on Fuzzy Information, Knowledge
Representation and Decision Analysis, Marseille, France, pp. 55-60.
Tong, S. and Li, Y. (2007), Direct adaptive fuzzy backstepping control for a class of nonlinear
systems, International Journal of Innovative Computing, Information and Control, Vol. 3,
pp. 877-96.
Further reading
Merabian, A.R. and Lucas, C. (2007), Intelligent adaptive control of non-linear systems based on
emotional learning approach, International Journal on Articial Intelligence Tools, Vol. 16,
pp. 69-85.
About the authors
Mehrsan Javan Roshtkhari was born in 1984 in Mashhad, Iran. He received his
BSc in Electrical Engineering from the University of Tehran (2006). He is
currently a MSc student in Control Engineering in Electrical and Computer
Engineering Department, University of Tehran. He is also a Student Member of
Control and Intelligent Processing Center of Excellence. His research interest
includes, pattern recognition, signal processing, emotional learning methods,
and model free control. Mehrsan Javan Roshtkhari is the corresponding author
and can be contacted at: m.javan@ece.ut.ac.ir
Arash Arami was born in 1983 in Tehran. He received his BSc in Electrical
Engineering from the University of Tabriz (2006). He is currently a MSc student
in Control Engineering in Electrical and Computer Engineering Department,
University of Tehran. He is also a Student Member of Control and Intelligent
Processing Center of Excellence. His research interest includes: attention
control, reinforcement learning and emotional learning, model free control, fuzzy
clustering, swarm intelligence, and signal processing.
IJICC
3,2
358
Caro Lucas received the MS degree from the University of Tehran, Iran, in 1973,
and the PhD degree from the University of California, Berkeley, in 1976. He is a
Professor at the Department of Electrical and Computer Engineering, University
of Tehran, Iran, as well as a Researcher at the School of Cognitive Science, Institute
for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran. He has
served as the Director of Research Faculty of Intelligent Systems (RFIS), IPM
(1993-1997), Chairman of the ECE Department at the University of Tehran
(1986-1988), Managing Editor of the Memories of the Engineering Faculty, University of Tehran
(1979-1991), Associate Editor of Journal of Intelligent and Fuzzy Systems (1992-1999), and Chairman
of the IEEE, Iran Section (1990-1992). His research interests include biological computing,
computational intelligence, uncertain systems, intelligent control, neural networks, multi-agent
systems, data mining, business intelligence, nancial modeling, and knowledge management.
Professor Lucas has served as the Chairman of several International Conferences. He was the
Founder of the RFIS, Center of Excellence on Control and Intelligent Processing, and has assisted in
founding several new research organizations and engineering disciplines in Iran.
Learning based
emotional
controller
359
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints

Imitative Learning Based Emotional Controller For Unknown Systems With Unstable Equilibrium

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Imitative Learning Based Emotional Controller For Unknown Systems With Unstable Equilibrium

Hochgeladen von

Copyright:

Verfügbare Formate

Imitative learning based

emotional controller for unknown

Das könnte Ihnen auch gefallen