Beruflich Dokumente
Kultur Dokumente
WeB10.2
I. I NTRODUCTION
NT Colony Optimization (ACO) is inspired by ants
and their behavior of finding shortest paths from their
nest to sources of food. Without any leader that could
guide the ants to optimal trajectories, the ants manage to
find these optimal trajectories over time in a distributed
fashion. In an ACO algorithm, the metaphorical ants are
agents programmed to find an optimal combination of elements of a given set that maximizes some utility function.
The key ingredient in ACO and its biological counterpart
are the pheromones. With real ants, these are chemicals
deposited by the ants and their concentration encodes a
map of trajectories, where stronger concentrations represent
better trajectories. ACO represents the class of metaheuristic
optimization methods that use the concepts of distributed
optimization and pheromone maps in solving Combinatorial
Optimization Problems [1].
This paper introduces an ACO-based algorithm for the
automated design of optimal control policies for continuousstate dynamic systems. The algorithm combines the concepts
of multi-agent optimization and fuzzy approximation of the
state space in a novel approach.
This paper is structured as follows. Section II describes
the relation of the subjects covered in this paper to the
state of the art. In Section III, the ACO heuristic is briefly
reviewed. Section IV presents some preliminaries on the
control problem and the fuzzy partitioning of the state space.
In Section V the Fuzzy ACO algorithm is introduced and
described in detail. Section VI demonstrates the functioning
of the Fuzzy ACO algorithm on a simple control problem
and Section VII concludes this paper.
1003
ij
ij
lNi
il il
, j Ni ,
(1)
1004
(3)
K1
X
eT (k + 1)Qe(k + 1) + uT (k)Ru(k),
(4)
k=0
states and an input to the system may drive its state to one
bin, or another. In [14], a variation to the AS algorithm
is introduced that is capable of solving an optimal control
problem in this manner. However, the number of bins needed
to accurately capture the dynamics of the original system
may become very large even for simple systems with only
two state variables. Moreover, the time complexity of the
ACO algorithm grows exponentially with the number of bins,
making the algorithm infeasible for realistic systems.
A much better alternative is not to quantize the state space
at all, but to approximate it by a smooth parameterized
function approximator. In that case, there is still a finite
number of parameters, but this number is typically much
smaller compared to using crisp quantization. The universal
function approximator that is used in this paper is the fuzzy
approximator.
C. Fuzzy Approximation
With fuzzy approximation, the domain of each state variable is partitioned using membership functions. We define the
membership functions for the state variables to be triangularshaped, such that the membership degrees for any value of
the state on the domain always sum up to one. Only the
centers of the membership functions have to be stored. Let Ai
denote the membership functions for x1 , with ai their centers
for i = 1, . . . , NA , with NA the number of membership
functions for x1 . Similarly for x2 , denote the membership
functions by Bi , with bi their centers for i = 1, . . . , NB , with
NB the number of membership functions for x2 . Similarly,
the membership functions can be defined for the other state
variables in x, but for the sake of notation, the discussion
in this paper limits the number to two, without loss of
generality. Note that in the example in Section VI, the order
of the system is four.
The membership degree of Ai and Bi are respectively
denoted by Ai (x1 (k)) and Bi (x2 (k)) for a specific value
of the state at time k. The degree of fulfillment is computed
by multiplying the two membership degrees:
ij (x(k)) = Ai (x1 (k)) Bj (x2 (k)).
Let the vector of all degrees of fulfillment for a certain
state at time k be denoted by:
(x(k)) =[11 (x(k)) 12 (x(k)) . . . 1NB (x(k))
21 (x(k)) 22 (x(k)) . . . 2NB (x(k))
...
NA NB (x(k))]T ,
(5)
1005
number of bins, called the quantized states. All combinations of these quantized states for the different state
variables corresponded to the nodes in the graph and the
arcs corresponded to transitions from one quantized state to
another. Because of the quantization, the resulted system was
transformed into a stochastic decision problem. However,
the pheromones were associated to these arcs as usual. In
the fuzzy case, the state space is partitioned by membership
functions, as described in Section IV-C and the combination
of the indices to these membership functions for the different
state variables correspond to the nodes in the construction
graph. With the fuzzy interpolation, the system remains a
deterministic decision problem, but the transition from node
to node now does not directly correspond to a state transition.
The pheromones are associated to the arcs as usual, but the
updating needs to take into account the degree of fulfillment
of the associated membership functions. This updating will
be described in Sections V-E and V-F.
In [14], the vertex to vertex transitions of an ant are not
deterministic. In Fuzzy ACO, an ant is not assigned to a
certain vertex at a certain time, but to all vertices according
to some degree of fulfillment at the same time and a transition
from vertex to vertex is not trivial either. Because of this, a
solution component cij does not contain pairs of vertex-next
vertex, but of state-action. For this reason, a pheromone ij
is now denoted as iu with i the index of the vertex (i.e. the
corresponding element of ) and u the action. For the sake
of notation, no distinction will be made between the actual
input u(k) and the index of the input (the action) u.
Similar to the definition of the vector of all degrees of
fulfillment in (5), the vector of all pheromones for a certain
action u at time k is denoted as:
T
u (k) = 1u (k) 2u (k) . . . NA NB u (k) .
B. Outline of a Trial
In the following sections, the fuzzy action selection and
the local and global pheromone update are explained in
more detail. Two more elements in this algorithm need
special attention, namely the initialization of the ants and
the determination whether an ant has reached the goal.
When using a Global Best pheromone update rule in an
optimal control problem, all ants have to be initialized to
the same state, as starting from states that require less time
and less effort to reach the goal would always result in a
better Global Best solution. Ultimately, initializing an ant
exactly in the goal state would be the best possible solution
and no other solution, starting from more interesting states
would get the opportunity to update the pheromones in the
global pheromone update phase. In order to find a control
policy from any initial state to the goal state, the Global
Best update rule cannot be used. Simply using all solutions
of all ants in the updating, like in the original AS algorithm,
the resulting algorithm does allow for random initialization
of the ants over the state space and is therefore used in the
Fuzzy ACO algorithm.
Regarding the terminal condition for the ants, with the
fuzzy implementation, none of the vertices can be pointed
u (k)
,
(6)
pc {u| c (k)}(k) = T
c (k) P
lU l (k)
(7)
1006
ij (1 )ij + 0
= (1 )(ij 0 ) + 0 .
After n updates, the pheromone level is reduced to:
(n)
ij (1 )n (ij 0 ) + 0 ,
(8)
0 0
0
1
0
0
0 b(c1 , c2 ) 0
0
x + 1 0 u,
x =
0 0
0
0
1
0 1
0
0
0 b(c1 , c2 )
where the damping b(c1 , c2 ) in the experiments is modeled
by an affine sum of two Gaussian functions, with means
(0, 2.3) and (4.7, 1) and standard deviations (2.5, 1.5) and
(1.5, 2) respectively. The damping profile can be seen in
Fig. 1(b), where darker shading means more damping.
B. Fuzzy ACO Setup and Parameters
The cores of the membership functions of the positions
c1 , c2 are chosen to be {5, 3.5, 2, 0.5, 0, 0.5, 2, 3.5, 5}
and those for the velocities v1 , v2 are {2, 0.5, 0, 0.5, 2}.
The action set contains of 25 actions, namely the crossproduct of the sets {1, 0.5, 0, 0.5, 1} for both dimensions.
The local and global pheromone decay factors are respectively = 0.01 and = 0.1. Furthermore, = 3 and the
number of ants is 2000. The sampling time is Ts = 0.2
and the ants are randomly initialized over the complete state
space at the start of each trial. An ant terminates its trial
when its position and velocity in both dimensions are within
a bound of 0.25 and 0.05 from the goal respectively.
C. Simulation Results
The convergence of the Fuzzy ACO algorithm is depicted
in Fig. 1(a). It shows that the relative variation of the policy
is already very low after about 20 trials. A slice of resulted
policy for zero velocity is depicted together with the damping
profile in Fig. 1(b). The policy shows the mapping of the
positions in both dimensions to the input on a fine grid.
Fig. 1(c) presents the trajectories of the vehicle for various
initial positions and zero initial velocity. It shows that the
vehicles manage to drive quickly to the goal, while avoiding
the regions of stronger damping to a certain extent. However,
the trajectories are only close to optimal. Especially for the
1007
0.8
0.7
0.6
0.5
c1 [m]
c1 [m]
Policy variation
1
0.9
0.4
0.3
0.2
0.1
0
10
20
30
40
50
Trial
60
70
80
90
5
5
0
c2 [m]
5
5
0
c2 [m]
Results.
[8] Y. Hsiao, C. Chuang, and C. Chien, Computer network loadbalancing and routing by ant colony optimization, in Proceedings
of the IEEE International Conference on Networks (ICON 2004),
Singapore, November 2004, pp. 313318.
[9] K. Socha and C. Blum, An ant colony optimization algorithm for
continuous optimization: application to feed-forward neural network
training, Neural Computing & Applications, vol. 16, no. 3, pp. 235
247, May 2007.
[10] K. Socha and M. Dorigo, Ant colony optimization for continuous
domains, European Journal of Operational Research, vol. 185, no. 3,
pp. 11551173, 2008.
[11] G. Bilchev and I. C. Parmee, The ant colony metaphor for searching
continuous design spaces, in Selected Papers from AISB Workshop
on Evolutionary Computing, ser. Lecture Notes in Computer Science,
T. Fogarty, Ed., vol. 993. London, UK: Springer-Verlag, April 1995,
pp. 2539.
[12] S. Tsutsui, M. Pelikan, and A. Ghosh, Performance of aggregation
pheromone system on unimodal and multimodal problems, in Proceedings of the 2005 Congress on Evolutionary Computation (CEC
2005), September 2005, pp. 880887.
[13] P. Korosec, J. Silc, K. Oblak, and F. Kosel, The differential antstigmergy algorithm: an experimental evaluation and a real-world
application, in Proceedings of the 2007 Congress on Evolutionary
Computation (CEC 2007), September 2007, pp. 157164.
[14] J. M. van Ast, R. Babuska, and B. De Schutter, Ant colony optimization for optimal control, in Proceedings of the 2008 Congress
on Evolutionary Computation (CEC 2008), Hong Kong, China, June
2008, pp. 20402046.
[15] J. Casillas, O. Cordon, and F. Herrera, Learning fuzzy rule-based
systems using ant colony optimization algorithms, in Proceedings of
the ANTS2000. From Ant Colonies to Artificial Ants: Second Interantional Workshop on Ant Algorithms. Brussels (Belgium), September
2000, pp. 1321.
[16] B. Zhao and S. Li, Design of a fuzzy logic controller by ant
colony algorithm with application to an inverted pendulum system, in
Proceedings of the IEEE International Conference on Systems, Man
and Cybernetics, 2006, pp. 37903794.
[17] W. Zhu, J. Chen, and B. Zhu, Optimal design of fuzzy controller
based on ant colony algorithms, in Proceedings of the IEEE International Conference on Mechatronics and Automation, 2006, pp. 1603
1607.
[18] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
[19] L. Busoniu, D. Ernst, B. De Schutter, and R. Babuska, Continuousstate reinforcement learning with fuzzy approximation, IEEE Transactions on Systems, Man and Cybernetics, Part C, vol. 38, pp. 156
172, 2008.
1008