Rituparna Datta
Kalyanmoy Deb Editors
Evolutionary
Constrained
Optimization
Editors
Evolutionary Constrained
Optimization
123
Editors
Rituparna Datta
Department of Electrical Engineering
Korea Advanced Institute of Science
and Technology
Daejeon
Republic of Korea
Kalyanmoy Deb
Electrical and Computer Engineering
Michigan State University
East Lansing, MI
USA
ISSN 2363-6149
ISSN 2363-6157 (electronic)
Infosys Science Foundation Series
ISSN 2363-4995
ISSN 2363-5002 (electronic)
Applied Sciences and Engineering
ISBN 978-81-322-2183-8
ISBN 978-81-322-2184-5 (eBook)
DOI 10.1007/978-81-322-2184-5
Library of Congress Control Number: 2014957133
Springer New Delhi Heidelberg New York Dordrecht London
Springer India 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained
herein or for any errors or omissions that may have been made.
Printed on acid-free paper
Springer (India) Pvt. Ltd. is part of Springer Science+Business Media (www.springer.com)
Preface
viii
Preface
optimization. The book can also serve as a textbook for advanced courses and as a
guide to the future direction of research in the area. Many constraint handling
techniques that exist in bits and pieces are assembled together in the present
monograph. Hybrid optimization, which is gaining a lot of popularity today due to its
capability of bridging the gap between evolutionary and classical optimization is
broadly covered here. These areas will be helpful for researchers, novices and experts
alike.
The book consists of ten chapters covering diverse topics of constrained
optimization using EAs.
Helio J.C. Barbosa, Afonso C.C. Lemonge, and Heder S. Bernardino review the
adaptive penalty techniques in the rst chapter that mainly deals with constraints using
EAs. The penalty function approach is one of the most popular constraint handling
methodologies due to its simple working principle and its ease of integration with
any unconstrained technique. The study also indicates the need for implementation of
different adaptive penalty methods in a single search engine. It will facilitate better
information for the decision maker to choose a particular technique.
The theoretical understanding of constrained optimization is one of the key
features to select the best constraint handling mechanism for any problem.
To tackle this issue, Shayan Poursoltan and Frank Neumann have studied the
influence of tness landscape in Chap. 2. The study introduces different methods to
quantify the ruggedness of a given constrained optimization problem.
Rommel G. Regis proposes a constraint handling method to solve computationally expensive constrained black-box optimization using surrogate-assisted
evolutionary programming (EP) in Chap. 3. The proposed algorithm creates
surrogates model for the black-box objective function and inequality constraint
functions in every generation of the EP. Furthermore, at the end of each generation
a trust-region-like approach is used to rene the best solution. Hard and soft
constraints are common in constrained optimization problems.
In Chap. 4, Richard Allmendinger and Joshua Knowles point out a new type of
constraint known as ephemeral resource constraints (ERCs). The authors have
explained the presence of ERCS in real-world optimization problems.
A combination of multi-membered evolution strategy and an incremental
approximation strategy-assisted constraint handling method is proposed by Sanghoun Oh and Yaochu Jin in Chap. 5 to deal with highly constrained, tiny and
separated feasible regions in the search space. The proposed approach generates an
approximate model for each constraint function with increasing order of accuracy.
It starts with a linear model and consecutively reaches to the complexity similar to
the original constraint function.
Chapter 6, by Tetsuyuki Takahama and Setsuko Sakai, describes a method
combining the e-constrained method and the estimated comparison. In this method,
rough approximation is utilized to approximate both the objective function as well
as constraint violation. The methodology is integrated with differential evolution
(DE) for its simple working principle and robustness.
Preface
ix
Jeremy Porter and Dirk V. Arnold carry out a detailed analysis of the behavior of a
multi-recombinative evolution strategy that highlights both cumulative step size
adaptation and a simple constraint handling technique in Chap. 7. In order to obtain
the optimal solution at the cones apex, a linear optimization problem is considered for
analysis with a feasible region dened by a right circular cone, which is symmetric
about the gradient direction.
A niching technique is explored in conjunction with multimodal optimization by
Mohammad Reza Bonyadi and Zbigniew Michalewicz in Chap. 8 to locate feasible
regions, instead of searching for different local optima. Since in continuous
constrained optimization, feasible search space is more likely to appear with many
disjoint regions, the global optimal solution might be located within any one
of them. A particle swarm optimization is used as search engine.
In Chap. 9, Rammohan Mallipeddi, Swagatam Das, and Ponnuthurai Nagaratnam
Suganthan present an ensemble of constraint handling techniques (ECHT). Due to
the nonexistence of a universal constraint handling method, an ensemble method can
be a suitable alternative. ECHT is collated with an improved (DE) algorithm and the
proposed technique is known as EPSDE.
Rituparna Datta and Kalyanmoy Deb propose an adaptive penalty function
method using genetic algorithms (GA) in the concluding chapter (Chap. 10) of this
book. The proposed method amalgamates a bi-objective evolutionary approach
with the penalty function methodology in order to overcome individual weakness.
The bi-objective approach is responsible for the approximation of appropriate
penalty parameter and the starting solution for the unconstrained penalized function
by a classical method, which is responsible for exact convergence.
We would like to thank the team at Springer. In particular we acknowledge the
contributions of our Editor, Swati Meherishi, and the editorial assistants, Kamya
Khatter and Aparajita Singh, who helped bring this manuscript to fruition.
Rituparna Datta would like to thank his wife Anima and daughter Riddhi for their
love and affection.
Daejeon, Korea, September 2014
East Lansing, MI, USA
Rituparna Datta
Kalyanmoy Deb
Acknowledgments to Reviewers
With deep gratitude we convey our heartfelt greetings and congratulations to the
following colleagues and key researchers who spared no pains for reviewing this
book to make it a signal success.
Richard Allmendinger, University College London, UK
Dirk Arnold, Dalhousie University, Canada
Helio J.C. Barbosa, Universidade Federal de Juiz de Fora, Brazil
Heder S. Bernardino, Laboratorio Nacional de Computacao Cientica, Brazil
Hans-Georg Beyer, FH Vorarlberg, University of Applied Sciences, Austria
Fernanda Costa, University of Minho, Portugal
Dilip Datta, Tezpur University, India
Oliver Kramer, University of Oldenburg, Germany
Afonso Celso de Castro Lemonge, Federal University of Juiz de Fora, Brazil
Xiaodong Li, RMIT University, Australia
Rammohan Mallipeddi, Kyungpook National University, South Korea
Tomasz Oliwa, Toyota Technological Institute at Chicago, USA
Khaled Rasheed, University of Georgia, USA
Rommel G. Regis, Saint Josephs University, USA
xi
Contents
29
51
135
157
181
95
xiii
xiv
Contents
205
231
249
315
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
317
xv
xvi
Chapter 1
1.1 Introduction
Constrained optimization problems are common in the sciences, engineering, and
economics. Due to the growing complexity of the problems tackled, nature-inspired
metaheuristics in general, and evolutionary algorithms in particular, are becoming
H.J.C. Barbosa (B)
National Laboratory for Scientific ComputingLNCC, Petropolis, Rio de Janeiro, RJ, Brazil
e-mail: hcbm@lncc.br
A.C.C. Lemonge
Department of Applied and Computational Mechanics, Federal University of Juiz de Fora,
Juiz de Fora, MG, Brazil
e-mail: afonso.lemonge@ufjf.edu.br
H.S. Bernardino H.J.C. Barbosa
Department of Computer Science, Federal University of Juiz de Fora, Juiz de Fora, MG, Brazil
e-mail: heder@ice.ufjf.br
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_1
increasingly popular. That is due to the fact that, in contrast to classical mathematical
programming techniques, they can be readily applied to situations where the objective
function(s) and/or constraints are not known as explicit functions of the decision
variables. This happens when potentially expensive computer models (generated by
means of the finite element method (Hughes 1987), for example) must be run in order
to compute the objective function and/or check the constraints every time a candidate
solution needs to be evaluated. For instance, in the design of truss structures, one
possible definition of the problem is to find the cross-section areas of the bars that
minimize the structures weight subject to limitations in the nodal displacements and
in the stress of each bar (Krempser et al. 2012). Notice that although the structures
weight can be easily calculated from the design variables, the values of the nodal
displacements and of the stress in each bar are determined by solving the equilibrium
equations defined by the finite element model.
As move operators (recombination and mutation) are usually blind to the
constraints (i.e., when operating upon feasible individual(s) they do not necessarily
generate feasible offspring) most metaheuristics must be equipped with a constraint
handling technique. In simpler situations, repair techniques (Salcedo-Sanz 2009),
special move operators (Schoenauer and Michalewicz 1996), or special decoders
(Koziel and Michalewicz 1998) can be designed to ensure that all candidate solutions are feasible.
We do not attempt to survey the current literature on constraint handling in this
chapter, and the reader is referred to survey papers of, e.g., Michalewicz (1995),
Michalewicz and Schoenauer (1996), Coello (2002), and Mezura-Montes and Coello
(2011) as well as to the other chapters in this book. Instead we consider the oldest, and
perhaps most general class of constraint handling methods: the penalty techniques,
where infeasible candidate solutions have their fitness value reduced and are allowed
to coexist and evolve with the feasible ones.
Although conceptually simple, penalty techniques usually require user-defined
problem-dependent parameters, which often significantly impact the performance of
a metaheuristic.
The main focus of this chapter is on adaptive penalty techniques, which automatically set the values of all parameters involved using feedback from the search
process without user intervention. This chapter presents a survey of the most relevant adaptive penalty techniques from the literature as well as a critical assessment of
their assumptions, rationale for the design choices made, and reported performance
on test-problems.
The chapter is structured as follows. Section 1.2 summarizes the penalty method,
Sect. 1.3 introduces the main taxonomy for strategy parameter control, and Sect. 1.4
reviews some representative proposals for adapting penalty parameters. Section 1.5
presents a discussion of the main findings and the chapter ends with some conclusions,
including suggestions for further work in order to increase the understanding of such
adaptive techniques.
(1.1)
|hj (x)|,
max{0, gj (x)}
(1.2)
However, the equality constraints hj (x) = 0 are often replaced by the inequalities
|hj (x)| 0, for some small positive , and one would have
vj (x) =
(1.3)
For computational efficiency the violations vj (x) are used to compute a substitute for
d(x, F ) in the design of penalty functions that grow with the vector of violations
v(x) Rm , where m = p + q is the number of constraints to be penalized. At this
point it is easy to see that interior penalty techniques, in contrast to exterior ones,
require feasible solutions (which are often hard to find) thus explaining the high
popularity of the later.
The most popular penalty function is perhaps (Luenberger and Ye 2008)
P(x) =
m
(vj (x))2
(1.4)
j=1
(a)
(b)
Fig. 1.1 Illustration of situations in which x1 is closer to the optimum (x ) than x2 even when:
a f (x1 ) = f (x2 ) and v(x1 ) > v(x2 ); or b f (x1 ) > f (x2 ) and v(x1 ) = v(x2 )
1.3 A Taxonomy
In order to organize the large amount of penalty techniques available in the literature
Coello (2002) proposed the following taxonomy: (a) static penalty, (b) dynamic
penalty, (c) annealing penalty, (d) adaptive penalty, (e) co-evolutionary penalty, and
(f) death penalty. We think however that the general definitions proposed by Eiben and
Smith (2003) with respect to the way strategy parameters are set within metaheuristics
in general and evolutionary algorithms in particular can be naturally adopted here.
Beyond the simplest static case where strategy parameters are defined by the user
and remain fixed during the run, dynamic schemes have been also used where an
exogenous schedule is proposed in order to define the strategy parameters at any
given point in the search process. It is easy to see that if setting fixed parameters is
not trivial, defining the way they should vary during the run seems to be even harder.
It is also felt that such strategy parameters should not be defined before the run but
rather vary according to what is actually happening in the search process. This gives
rise to the so-called adaptive techniques, where feedback from the search process is
used to define the current strategy parameters.
From the reasoning above, the death penalty can be included as a particular case
of static penalty, and the annealing penalty can be seen as a dynamic penalty scheme.
Co-evolutionary penalty techniques are considered in Sect. 1.5.2.
It should be noted here that the design of the adaptive mechanisms mentioned
above often involve meta-parameters or, at least, implicit design choices. The rationale here is that such meta-parameters should be easier to set appropriately; preferably fixed by the designer, with no posterior user intervention required. However,
the parameter setting in some adaptive techniques can be as hard as in the case of the
static ones (Coello 2002), contradicting the main objective of the adaptive penalty
methods.
Finally, an even more ambitious proposal can be found in the literature: the selfadaptive schemes. In this case, strategy parameters are coded together with the
candidate solution, and conditions are created so that the evolutionary algorithm
not only evolves increasingly better solutions but also better adapted strategy parameters. With this increasing sophistication in the design of the algorithms one not
only seeks to improve performance but also to relieve the user from the task of
strategy parameter setting and control.
However, as will be shown in the next section, another possibility, which has not
been contemplated in the taxonomy considered above, can be found in the literature
for the task of automatically setting strategy parameters. The idea is to maintain an
additional population with the task of co-evolving such strategy parameters (here
penalty coefficients) along with the standard population evolving the solutions to the
constrained optimization problem at hand.
(t)
if bi F
if bi F
otherwise
for all t g + 1 i t
for all t g + 1 i t
The method proposed by Coit et al. (1996), uses the fitness function F(x) written as
F(x) = f (x) + (Ffeas Fall )
m
dj (x, F ) Kj
j=1
NFTj
where f (x) is the unpenalized objective function for the solution x, Fall corresponds to
the best solution already found, Ffeas corresponds to the best feasible solution already
found, and dj (x, F ) returns the distance between x and the feasible region (dependent
of the problem). Kj and NFTj , the near-feasible threshold of the jth constraint, are
user-defined parameters.
Rasheed (1998) proposed an adaptive penalty approach for handling constraints
within a GA. The strategy required the user to set a relatively small penalty parameter
and then it would increase or decrease it on demand as the optimization progresses.
The method was tested in a realistic continuous-variable conceptual design of a
supersonic transport aircraft, and the design of supersonic missile inlets, as well as
in benchmark engineering problems. The fitness of each individual was based on the
sum of an adequate measure of merit computed by a simulator (such as the take-off
mass of an aircraft). If the fitness value is between V and 10V , where V is a power of
V
. The proposed algorithm featured
10, the penalty coefficient starts with the value 100
two points: (i) the individual that has the least sum of constraint violations and
(ii) the individual that has the best fitness value. The penalty coefficient is considered
adequate if both individuals are the same and otherwise the penalty coefficient is
increased to make the two solutions have equal fitness values. The author concluded
that the idea of starting with a relatively small initial penalty coefficient and increasing
it or decreasing it on demand proved to be very good in the computational experiments
conducted.
Hamida and Schoenauer (2000) proposed an adaptive scheme named as Adaptive
Segregational Constraint Handling Evolutionary Algorithm (ASCHEA) employing:
(i) a function of the proportion of feasible individuals in the population; (ii) a seduction/selection strategy to mate feasible and infeasible individuals applying a specific
feasibility-oriented selection operator, and (iii) a selection scheme to give advantage
for a given number of feasible individuals. The ASCHEA algorithm was improved
(Hamida and Schoenauer 2002) by considering a niching technique with adaptive
radius to handle multimodal functions and also (i) a segregational selection that distinguishes between feasible and infeasible individuals, (ii) a constraint-driven recombination, where in some cases feasible individuals can only mate with infeasible ones,
and (iii) a population-based adaptive penalty method that uses global information
on the population to adjust the penalty coefficients. Hamida and Schoenauer (2002)
proposed the following penalty function:
P(x) =
m
j=1
j vj (x)
(1.5)
where j is adapted as
j (t + 1) = j (t)/fact
j (t + 1) = j (t) fact
(1.6)
where fact > 1 and target are to be defined by the user (although the authors suggest
target = 0.5), and t (j) is the proportion of individuals which do not violate the
jth constraint. The idea is to have feasible and infeasible individuals on both sides
of the corresponding boundary. The adapted parameter j , with initial value j (0),
are computed using the first population, trying to balance objective function and
constraint violations:
j (0) = 1,
if
vj (xi ) = 0
i
(1.7)
j (0) =
ni |f (xi )| 100, otherwise
|v (x )|
i
The early proposals reviewed here were not able in general to adequately deal with
the problem, suggesting that more information from the search process, at the price
of added complexity, was required.
(1.8)
where favg represents the average fitness value of all feasible individuals in the current
generation and (t) depends on favg . Thus, the fitness function is defined as
F(x) = f (x) (t)E(x)
(1.9)
where
E(x) =
m
vi (x)
(1.10)
i=1
(1.11)
10
The function (t) is defined according to the user defined parameter . If 1 then
(t) =
(1.12)
where C is a user defined parameter which is the maximum scaled fitness value
assigned to the best feasible member. The scaled fitness values are used only in the
selection procedure and will not be described here.
Otherwise (if < 1), then (t) is defined by an iterative process which is initialized with (t) = 1 and is repeated until the value of (t) becomes unchanging. The
steps of the procedure are
(i) to calculate by means of Eq. (1.11)
(ii) to evaluate the candidate solutions according to Eq. (1.9)
(iii) to obtain xmin and x , where Fmin = F (xmin ) is the minimum value of F and
x is the candidate solution that leads to
(t) =
F (x ) (t) favg
E (x )
(1.13)
( 1) E (xmin ) F (x ) + E (x ) F (xmin ) + favg F (xmin )
favg [E (x ) + ( 1) E (xmin )]
(1.14)
Beaser et al. (2011) updates the adaptive penalty function theory proposed by
Nanakorn and Meesomklin (2001), expanding its validity beyond maximization
problems to minimization as well. The expanded technique, using a hybrid genetic
algorithm, was applied to a problem in chemistry.
The first modification was introduced in the Eq. (1.8):
/F
F(x) (t)favg for x
(1.15)
(1.16)
(1.17)
11
(1.18)
Besides, two properties are established: (1) the fitness assignment maps the twodimensional vector into the real number space: in this way, it is possible to
compare the solutions in the Pareto optimal set, selecting which one is preferable
and (2) the penalty coefficient C varies with the feasibility proportion of the current
population and, if there are no feasible solutions in the population, this parameter
will receive a relatively large value in order to guide the population in the direction
of the feasible space.
The common need for user-defined parameters together with the difficulty of
finding adequate parameter values for each new application pointed the way to the
challenge of designing penalty techniques which do not require such parameters.
if x is feasible,
f (x), m
(1.19)
F(x) = f (x) +
kj vj (x), otherwise
j=1
kj = |f (x)|
m
(1.20)
12
where f (x) is the average of the objective function values in the current population
and vl (x) is the violation of the lth constraint averaged over the current population.
The idea is that the values of the penalty coefficients should be distributed in a way
that those constraints that are more difficult to be satisfied should have a relatively
higher penalty coefficient.
With the proposed definition one can prove the following property: an individual
whose jth violation equals the average of the jth violation in the current population
for all j, has a penalty equal to the absolute value of the average fitness function of
the population.
The performance of the APM was examined using test problems from the evolutionary computation literature as well as structural engineering constrained optimization problems but the algorithm presented difficulties in solving some benchmark
problems, for example, the functions G2 , G6 , G7 and G10 proposed by Michalewicz
and Schoenauer (1996). That was improved in the conference paper (Barbosa and
Lemonge 2002), where f (x) in the definition of the objective function of the infeasible
individuals in Eq. (1.19) was changed to
f (x) =
f (x),
f (x)
(1.21)
and f (x) is the average of the objective function values in the current population.
The new version was tested (Lemonge and Barbosa 2004) in benchmark engineering
optimization problems and in the GSuite (Michalewicz and Schoenauer 1996) with
a more robust performance.
The procedure proposed by Barbosa and Lemonge (2002), originally conceived
for a generational GA, was extended to the case of a steady-state GA (Barbosa and
Lemonge 2003a), where, in each generation, usually only one or two new individuals are introduced in the population. Substantial modifications were necessary to
obtain good results in a standard test-problem suite (Barbosa and Lemonge 2003a).
The fitness function for an infeasible individual is now computed according to the
equation:
m
kj vj (x)
(1.22)
(1.23)
F(x) = H +
j=1
where H is defined as
f (xworst )
H=
f (xbestFeasible )
kj = H
m
(1.24)
13
Also, every time a better feasible element is found (or the number of new elements
inserted into the population reaches a certain level), H is redefined and all fitness
values are recomputed. The updating of each penalty coefficient is performed in
such a way that no reduction in its value is allowed. The fitness function value is
then computed using Eqs. (1.22)(1.24). It is clear from the definition of H in (1.23)
that if no feasible element is present in the population, one is actually minimizing
a measure of the distance of the individuals to the feasible set, since the actual
value of the objective function is not taken into account. However, when a feasible
element is found then it immediately enters the population as, after updating all
fitness values using (1.19), (1.23), and (1.24), it becomes the element with the best
fitness value.
Later, APM variants were introduced with respect to the definition of the penalty
parameter kj (Barbosa and Lemonge 2008). The APM, as originally proposed,
computes the constraint violations in the initial population, and updates all penalty
coefficients, for each constraint, after a given number of offspring is inserted in
the population. A second variant, called sporadic APM with constraint violation
accumulation, accumulates the constraint violations during a given number of insertions of new offspring in the population, updates the penalty coefficients, and keeps
the penalty coefficients for the next generations. The APM with monotonic penalty
coefficients is the third variant, where the penalty coefficients are calculated as in
the original method, but no penalty coefficient is allowed to have its value reduced
along the evolutionary process. Finally, the penalty coefficients are defined by using
a weighted average between the previous value of a coefficient and the new value
predicted by the method. This variant is called the APM with damping. Besides that,
these variants of the APM were extended to the steady-state GA and presented in
Lemonge et al. (2012).
Rocha and Fernandes (2009) proposed alternative expressions for the APM
penalty coefficients
pop
pop
vj (x i )
i
f (x )
m i=1
kj =
pop
k=1 i=1 vk (x i )
i=1
and also
pop
pop
l
i)
v
(x
j
kj =
f (x i ) exp
m i=1
1
pop
i
k=1
i=1 vk (x )
i=1
u(x) =
j=1
(1.25)
14
where m is the total number of inequality and equality constraints, and vjmax
is the maximum value of the jth violation in the current population. The xworst of the
infeasible solutions is selected by comparing all infeasible individuals against the
best individual xbest . Two potential population distributions exist in relation to this:
(i) if one or more of the infeasible solutions have an objective function value that
is lower than the f (xbest ), the f (xworst ) of the infeasible solutions is taken as the
infeasible solution having the highest infeasibility value and an objective function
value that is lower than the f (xbest ) solution. If more than one individual exists
with the same highest degree of infeasibility, then f (xworst ) is taken as the solution
with maximum infeasibility value and the lower of the objective function values,
and (ii) when all of the infeasible solutions have an objective function value that is
greater than f (xbest ). Thus, f (xworst ) is identified as being the solution with the highest degree of infeasibility value. Having more than one individual in the population
with the same highest infeasibility value, then f (xworst ) is taken as the solution with
the maximum infeasibility value and the higher of the objective function values. The
highest objective function value in the current population to penalize the infeasible
individuals is defined as fmax . The method is applied in two stages where the first
stage considers the case where one or more infeasible solutions have a lower and
potentially
better objective
problem) than the xbest solution
function (minimization
and x| f (x) < f (xmax ) (u(x) > 0.0) . A linear relationship between the degree
of infeasibility of the xbest and xworst is considered as
u (x) =
u(x) u(xworst )
u(xbest ) u(xworst )
(1.26)
Thus, the fitness function F1st (x), in the first stage, is written as
F1st (x) = f (x) + u (x)(f (xmax ) f (xworst ))
(1.27)
The second stage increases the objective function such that the penalized objective
function of the worst infeasible individual F2nd (x) is equal to the worst objective
individual (Eqs. (1.28) and (1.29)).
exp(2.0u(x)) 1
F2nd (x) = F1st (x) + |F1st (x)|
exp(2.0) 1
(1.28)
and
f (xmax )f (xbest )
f (xbest )
= 0,
f (xmax )f (xworst ) ,
f (xworst )
if f (xworst ) f (xbest )
if f (xworst ) = f (xmax ).
if f (xworst ) > f (xbest )
(1.29)
The scaling factor , is introduced to ensure that the penalized value of worst
infeasible solution is equivalent to the highest objective function value in the current
population. = 0 (second case in Eq. (1.29)) is used when the worst infeasible
15
individual has an objective function value equal to the highest in the population. In this
case, no penalty is applied since the infeasible solutions would naturally have a low
fitness and should not be penalized further. The use of absolute values of the fitness
function in Eq. (1.29) is considered since minimization of objective functions may
have negative values. The use of absolute values of the fitness function in Eq. (1.29)
is considered since minimization of objective functions may have negative values.
A self-organizing adaptive penalty strategy (SOAPS) is presented in Lin and Wu
(2004) featuring the following aspects: (1) The values of penalty parameters are
automatically determined according to the population distribution; (2) The penalty
parameter for each constraint is independently determined; (3) The objective and
constraint functions are automatically normalized; (4) No parameters need to be
defined by the user; (5) Solutions are maintained evenly distributed in both feasible
and infeasible parts of each constraint. The pseudo objective function defined by
the proposed algorithm is given as
F(x) = f (x) + P(x)
(1.30)
rjt vj (x)
100
p + 2q
m
P(x) =
(1.31)
j=1
where t is the generation, rjt is the penalty parameter for the jth constraint at generation t, and p and q are the number of inequality and equality constraints, respectively.
The penalty parameter rjt for the jth constraint at the tth generation is set as
t (j) 0.5
,
1
5
rjt
rjt1
t1
(1.32)
where t (j) is the percentage of feasible solutions with respect to the jth constraint
at the tth generation. This parameter will be adapted during the evolutionary process
and its initial value is set as
rj0
1
QRobj
=
1
QRcon
j
(1.33)
1
1
and QRcon
where QRobj
j are the interquartile ranges of the objective function and
the jth constraint function values, respectively, in the initial population.
Although the proposed algorithm performed satisfactorily on constrained optimization problems with inequality constraints, it had difficulties in solving problems
with equality constraints. The authors presented in the same paper (Wu and Lin
2004) a modification (with added complexity) of the first version of the algorithm.
They detected that the initial penalty parameter for a constraint may become undesirably large due to the poor initial population distribution. A sensitivity analysis of
16
the parameter rj0 was done by the authors and they concluded that enlarged penalties
undesirably occur because solutions with these unexpected large constraint violations
are not evenly sampled in the initial population. The value for F(x) in the second
generation of SOAPS is written as
F(x) =
f (x),
f (x) (1 rGEN ) + FBASE rGEN + P(x)
if x F
otherwise
(1.34)
where FBASE means the minimum value of all feasible solutions or, in the absence
of them, the infeasible solutions with the smallest amount of constraint violation.
The value of rGEN is given by the number of function evaluations performed so far
divided by the total number of function evaluations. The expression for P(x) is
P(x) =
rj vj (x)
(1.35)
rj0
med1obj,feasj med1obj,infeasj
med1
if med1obj,feasj med1obj,infeasj
0.5
otherwise
conj
med1obj,infeasj med1obj,feasj
med1conj
(1.36)
where med1obj,feasj is the median of the objective function value of the feasible solutions, and med1obj,infeasj is the median of all infeasible solutions with respect to the
jth constraint, in the initial population. The value med1conj represents the median of
all constraint violations of the jth constraint in the initial population. The value of
medobj,feas , used in Eq. (1.36), is written as
medobj,feas = med,feas = med,infeas
= medobj,infeas + r medcon
(1.37)
where med,feas is the median of the pseudo-objective function values of feasible designs, and med,infeas is the median of the pseudo-objective function values
of infeasible designs. The latter, med,infeas consists of medobj,infeas the median
of objective function values of all infeasible designs and medcon , the median of
constraint violations of all infeasible designs. The second generation of SOAPS was
tested in two numerical illustrative problems and one engineering problem.
Tessema and Yen (2006) proposed an adaptive penalty function for solving constrained optimization problems using a GA. A new fitness value, called distance
value, in the normalized fitness-constraint violation space, and two penalty values
are applied to infeasible individuals so that the algorithm would be able to identify the best infeasible individuals in the current population. The performance of the
algorithm was tested on the G1 to G13 test-problems and the algorithm was considered
able to find competitive results when compared with others from the literature.
17
In (Tessema and Yen 2009) an algorithm that aims to exploit infeasible individuals
with low objective value and low constraint violation was proposed. The fraction
of feasible individuals in the population is used to guide the search process either
toward finding more feasible individuals or searching for the optimum solution. The
objective function of all individuals in the current population will be evaluated first,
and the smallest and the largest values will be identified as fmin and fmax , respectively.
The fitness function of each individual is normalized as
f (x) fmin
f (x) =
fmax fmin
(1.38)
f (x),
u(x),
if
there is no feasible ind.
F(x) =
f (xi ) =
f (xi ),
max {f (xbest ) + (1 )f (xworst ), f (xi )}
if xi K1
if xi K2
(1.39)
18
0,
if xi K1
G(x
)
min
G(x)
i
i) =
(1.40)
G(x
xK2
xK2
If only one infeasible solution appears in the population, the normalized constraint
of such individual will always be equal to 0. To avoid it, the normalized
violation G
of such individual is set to a value uniformly chosen between
constraint violation G
0 and 1. The fitness function is defined by adding the normalized objective function
values and constraint violations and defined as
i)
F(xi ) = f (xi ) + G(x
(1.41)
3. Feasible situation: in this case, the comparisons of individuals are based only on
the objective function f (x).
Costa et al. (2013) proposed an adaptive constraint handling technique where the
fitness function of an infeasible individual is defined as
F(x) = fmax +
m
vj (x)
(1.42)
j=1
and vj (x) is defined as in Eq. (1.3). An adaptive tolerance was introduced in order
to handle equality constraints. An initial tolerance 0 is defined and it is adaptively
updated along the evolutionary process, with a periodicity of generations, according
to the expression:
k+1 = k + (1 )Cbest 2
(1.43)
where is a smoothing factor, Cbest is the vector of equality constraints for the best
point in the population, and 2 is the Euclidean norm.
A parameterless adaptive penalty technique used within a GA has been proposed
in Vincenti et al. (2010), Montemurro et al. (2013) where the basic idea is that some
good infeasible individuals (in the sense of having good objective function values)
can be useful to attract the exploration toward the boundary of the feasible domain,
as the optimum usually has some active constraints. The penalty coefficients ci and qj
(for equality and inequality constraints, respectively) are computed at each generation
t as
F
F
NF
NF
f
f
best fbest
best fbest
i = 1, . . . , q and qj (t) = NF
j = 1, . . . , p
ci (t) =
(gi )NF
hj best
best
(1.44)
19
where the superscripts F and NF stand for feasible and non-feasible, respectively.
F and f NF are the values of the objective function for the best individuals within
fbest
best
the feasible and the infeasible sides of the domain, respectively, while (gi )NF
best and
NF
hj best represent the violation of inequality and equality constraints, respectively,
for the best infeasible solution.
Individuals that are infeasible with respect to the kth constraint are grouped and
ranked with respect to their objective function values: the objective function of the
NF
while the individuals that are feasible with
best individual of such a group is fbest
respect to the kth constraint are grouped and ranked with respect to their objective
F .
function values: the objective function of the best individual of this group is fbest
When no feasible individuals are available in the population with respect to the
kth constraint, the population is then sorted into two groups: individuals having
smaller values of the kth constraint violation (10 % of the population) are grouped
as virtually feasible while the rest are grouped as infeasible and ranked in terms of
their objective function values: the objective function of the best individual of such
NF
.
a group is fbest
It is worth noting that the definition in Eq. (1.44) forces the value of the objective
function of the best infeasible individual to be equal to that of the best feasible
individual. In the next section, further (perhaps less popular) ways of implementing
penalty techniques are briefly described.
20
to the best of our knowledge, no strict self-adaptive technique has been applied so
far to constrained optimization problems in Rn .
(1.45)
where w1 and w2 are two (integer) penalty coefficients, and sum_viol(x) and
num_viol(x) are, respectively, the sum of the violations and the number of constraints
which are violated by the candidate solution x. The second of these populations, P2 ,
encodes the set of weight combinations (w1 and w2 ) that will be used to compute
the fitness value of the candidate solutions in P1 whereas P2 contains the penalty
coefficients that will be used in the fitness function evaluation. Benchmark problems
from the literature, especially mechanical engineering optimization, are used in the
numerical tests but only inequality constraints were considered in the experiments.
The co-evolutionary idea was also analyzed in He and Wang (2007) and He et al.
(2008). In these works, the penalty factors are adapted by a co-evolutionary particle
swarm optimization approach (CPSO). Two kinds of swarms are used in He and
Wang (2007) and He et al. (2008): one population of multiple swarms is used to
solve the search problem and other one is responsible to adapt the penalty factors.
Each particle j in the second population represents the penalty coefficients for a set
of particles in the first one. The two populations evolve by a given G1 and G2 number
of generations. The adopted fitness function is the one proposed by Richardson et al.
(1989), where not only the amount of violation contributes to the quality of a given
candidate solution but also the number of of violated constraints. According to He
and Wang (2007) and He et al. (2008),
Fj (x) = f (x) + sum_viol(x) wj,1 + num_viol(x) wj,2 ,
where f (x) is the objective function value, and wj,1 and wj,2 are the penalty coefficients from the particle j in the second swarm population. The penalty factors wj,1
and wj,2 are evolved according to the following fitness:
sum_feas
num_feas
num_feas, if there is at least one feasible solution in the subset
pop
i
pop
G(j) =
i=1 sum_viol(x )
max(Gvalid ) +
pop
i=1 num_viol(x i ), otherwise,
num_viol(x i )
i=1
where sum_feas denotes the sum of objective function values of feasible solutions,
num_feas is the number of feasible individuals, and max(Gvalid ) denotes the maximum
21
G over all valid particles; the valid particles are those ones which operate over a subset
of particles where there is at least one feasible solution.
(1.46)
where G(x) is the amount of constraint violation from inequality and equality constraints, and r is the penalty coefficient.
f and G are taken as fuzzy variables with the corresponding linguistic values such
as very large, large,
small, very small, etc. The ranges for f and G are defined by
Df = fmin , fmax and DG = [Gmin , Gmax ]. Those ranges must then be partitioned
which is a problem dependent, non-trivial taskand linguistic values are associated
with each part. The sets A and B are introduced as fuzzy sets for f and G, respectively,
and r k , k = 1, . . . , l is defined as a fuzzy singleton for r which is inferred from
appropriate membership functions and finally used in (1.46).
In their numerical experiments, three partitions were used for both f and G with
triangle membership functions, and five points were used for the output. The rule
base contained 9 rules in the form
If f is Ai and G is Bj then r = r k .
Lin (2013) proposed perhaps the first constraint-handling approach which applies
the information granulation of rough set theory to address the indiscernibility relation
among penalty coefficients in constrained optimization. Adaptive penalty coefficients
for each constraint wtk , k = 1, . . . , m were defined in a way that a high penalty is
assigned to the coefficient of the most difficult constraint. In addition, the coefficients
are also depended on the current generation number t. Using the standard definition
for the violation of the jth constraint (vj (x)), the fitness function reads as
F(x) = f (x) +
m
j=1
22
where wtk = (Ct)(k,t) and C is a severity factor. The exponent (k, t), initialized
as (k, 0) = 2 for all k, is defined as
(k, t 1) k , if k = 1
(k, t) =
(k, t 1)
if k = 0
according to the discernible mask and the representative attribute value k of the
superior class Xgood (see the paper for details). If the kth constraint is discernible
(i.e., k = 1), the exponent (k, t) is adjusted by the representative attribute value
(k ); otherwise, the exponent retains the same value as in the previous generation.
1.6 Discussion
1.6.1 User-Defined Parameters
Some of the proposals considered do not require from the user the definition of penalty
parameters, and can as such be considered parameterless. This is very useful for
the practitioner. However, it should be noted that essentially all proposals do embody
some fixed values that are hidden from the user and, as a result, cannot be changed.
Furthermore, all proposals involve design decisions which were madewith variable level of justificationand incorporated into the definition of the technique. It
seems natural to assume that some of those could possibly be changeda research
opportunityleading to improved results.
23
Another major issue that makes it impossible to rigorously assess the relative
performance of the adaptive penalty techniques (APTs) reviewed is that the final
results depend not only on the penalty technique considered but also on the search
engine (SE) adopted. The competing results often derive from incomparable arrangements such as APT-1 embedded in SE-1 (a genetic algorithm, for instance) versus
APT-2 applied to SE-2 (an evolution strategy, for instance). The results using stochastic ranking (SR) within an evolution strategy (ES) (Runarsson and Yao 2000) were
shown to outperform APM embedded in a binary-coded genetic algorithm (GA)
(Lemonge and Barbosa 2004) when applied to a standard set of benchmark constrained optimization problems in Rn . This seems to be dueat least in partto the
fact that the ES adopted performs better in this continuous domain than a standard GA.
A proper empirical assessment of the constraint handling techniques considered
(SR versus APM) should be performed by considering settings such as (SR+GA versus APM+GA) and (SR+ES versus APM+ES). An attempt to clarify this particular
question is presented by Barbosa et al. (2010b). It is clear that there is a need for
more studies of this type in order to better assess the relative merits of the proposals
reviewed here.
The standard way of assessing the relative performance of a set A of na
algorithms ai , i {1, . . . , na }, is to define a set P of np representative problems pj ,
j {1, . . . , np }, and then test all algorithms against all problems, measuring the
performance tp,a of algorithm a A when applied to problem p P.
In order to evaluate tp,a one can alternatively (i) define a meaningful goal
(say, level of objective function value) and then measure the amount of resources
(say, number of function evaluations) required by the algorithm to achieve that goal,
or (ii) fix a given amount of resources to be allocated to each algorithm and then
measure the goal attainment.
Considering that tp,a is the CPU time spent by algorithm a to reach the stated goal
in problem p a performance ratio can be defined as
rp,a =
tp,a
.
min{tp,a : a A}
(1.47)
Although each tp,a or rp,a is worth considering by itself, one would like to be able
to assess the performance of the algorithms in A on a large set of problems P in a
user-friendly graphical form. This has been achieved by Dolan and Mor (2002) who
introduced the so-called performance profiles, an analytical tool for the visualization
and interpretation of the results of benchmark experiments. For more details and an
application in the constrained optimization case, see Barbosa et al. (2010a).
One has also to consider that it is not an easy task to define a set P which is
representative of the domain of interest, as one would like P (i) to span the target
problem-space and, at the same time, (ii) to be as small as possible, in order to
alleviate the computational burden of the experiments. Furthermore, it would also
be interesting to assess the relative performance of the test-problems themselves
with respect to the solvers. Are all test-problems relevant to the final result? Are
some test-problems too easy (or too difficult) so that they do not have the ability to
24
discriminate the solvers? Efforts in this direction, exploring the performance profile
concept, were attempted in Barbosa et al. (2013).
1.6.4 Extensions
It seems natural to expect that most of, if not all, the proposals reviewed here can
be easily extended to the practically important case of constrained multi-objective
optimization. Although papers presenting such extension have not been reviewed
here, it seems that there is room, and indeed a need, to explore this case.
The same can perhaps be said of the relevant case of mixed (discrete and
continuous) decision variables, as well as the more complex problem of constrained
multi-level optimization.
1.7 Conclusion
This chapter presented a review of the main adaptive penalty techniques available
for handling constraints within nature inspired metaheuristics in general and evolutionary techniques in particular. The main types of evidence taken from the search
process in order to inform the decision-making process of continuously adapting the
relevant parameters of the penalty technique have been identified.
As the different adaptive techniques have not been implemented on a single
given search engine, the existing comparative studies, which are usually based on
the final performance on a set of benchmark problems, are not very informative of the
relative performance of each penalty technique, as the results are also affected by the
different search engines adopted in each proposal. The need for better comparative
studies investigating the relative performance of the different adaptive techniques
when applied within a single search engine in larger and more representative sets of
benchmark problems are also identified.
25
Acknowledgments The authors thank the reviewers for their comments, which helped improve
the quality of the final version, and acknowledge the support from CNPq (grants 308317/2009-2,
310778/2013-1, 300192/2012-6 and 306815/2011-7) and FAPEMIG (grant TEC 528/11).
References
Barbosa HJC, Lemonge ACC (2002) An adaptive penalty scheme in genetic algorithms for constrained optimization problems. In: Langdon WB, Cant-Paz E, Mathias KE, Roy R, Davis D,
Poli R, Balakrishnan K, Honavar V, Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC,
Miller JF, Burke EK (eds) Proceedings of the genetic and evolutionary computation conference
(GECCO). Morgan Kaufmann, San Francisco
Barbosa HJC, Lemonge ACC (2003a) An adaptive penalty scheme for steady-state genetic algorithms. In: Cant-Paz E, Foster JA, Deb K, Davis LD, Roy R, OReilly U-M, Beyer H-G, Standish
R, Kendall G, Wilson S, Harman M, Wegener J, Dasgupta D, Potter MA, Schultz AC, Dowsland
KA, Jonoska N, Miller J (eds) Genetic and evolutionary computation (GECCO). Lecture Notes
in Computer Science. Springer, Berlin, pp 718729
Barbosa HJC, Lemonge ACC (2003b) A new adaptive penalty scheme for genetic algorithms. Inf
Sci 156:215251
Barbosa HJC, Lemonge ACC (2008) An adaptive penalty method for genetic algorithms in constrained optimization problems. Front Evol Robot 34:934
Barbosa HJC, Bernardino HS, Barreto AMS (2010a) Using performance profiles to analyze the
results of the 2006 CEC constrained optimization competition. In: 2010 IEEE congress on evolutionary computation (CEC), pp 18
Barbosa HJC, Lemonge ACC, Fonseca LG, Bernardino HS (2010b) Comparing two constraint
handling techniques in a binary-coded genetic algorithm for optimization problems. In: Deb K,
Bhattacharya A, Chakraborti N, Chakroborty P, Das S, Dutta J, Gupta SK, Jain A, Aggarwal V,
Branke J, Louis SJ, Tan KC (eds) Simulated evolution and learning. Lecture Notes in Computer
Science. Springer, Berlin, pp 125134
Barbosa HJC, Bernardino HS, Barreto AMS (2013) Using performance profiles for the analysis
and design of benchmark experiments. In: Di Gaspero L, Schaerf A, Stutzle T (eds) Advances in
metaheuristics. Operations Research/computer Science Interfaces Series, vol 53. Springer, New
York, pp 2136
Bean J, Alouane A (1992) A Dual Genetic Algorithm For Bounded Integer Programs. Technical Report Tr 92-53, Department of Industrial and Operations Engineering, The University of
Michigan
Beaser E, Schwartz JK, Bell CB, Solomon EI (2011) Hybrid genetic algorithm with an adaptive
penalty function for fitting multimodal experimental data: application to exchange-coupled nonKramers binuclear iron active sites. J Chem Inf Model 51(9):21642173
Coello CAC (2000) Use of a self-adaptive penalty approach for engineering optimization problems.
Comput Ind 41(2):113127
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191
(1112):12451287
Coit DW, Smith AE, Tate DM (1996) Adaptive penalty methods for genetic optimization of constrained combinatorial problems. INFORMS J Comput 8(2):173182
Costa L, Santo IE, Oliveira P (2013) An adaptive constraint handling technique for evolutionary
algorithms. Optimization 62(2):241253
Courant R (1943) Variational methods for the solution of problems of equilibrium and vibrations.
Bull Am Math Soc 49:123
26
Dolan E, Mor JJ (2002) Benchmarking optimization software with performance profiles. Math
Program 91(2):201213
Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, New York
Eiben AE, Jansen B, Michalewicz Z, Paechter B (2000) Solving CSPs using self-adaptive constraint
weights: how to prevent EAs from cheating. In: Whitley, LD (ed) Proceedings of the genetic and
evolutionary computation conference (GECCO). Morgan Kaufmann, San Francisco, pp 128134
Farmani R, Wright J (2003) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Gan M, Peng H, Peng X, Chen X, Inoussa G (2010) An adaptive decision maker for constrained
evolutionary optimization. Appl Math Comput 215(12):41724184
Gen M, Cheng R (1996) Optimal design of system reliability using interval programming and
genetic algorithms. Comput Ind Eng, (In: Proceedings of the 19th international conference on
computers and industrial engineering), vol 31(12), pp 237240
Hamida H, Schoenauer M (2000) Adaptive techniques for evolutionary topological optimum design.
In: Parmee I (ed) Proceedings of the international conference on adaptive computing in design
and manufacture (ACDM). Springer, Devon, pp 123136
Hamida S, Schoenauer M (2002) ASCHEA: new results using adaptive segregational constraint
handling. In: Proceedings of the IEEE service center congress on evolutionary computation
(CEC), vol 1. Piscataway, New Jersey, pp 884889
Harrell LJ, Ranjithan SR (1999) Evaluation of alternative penalty function implementations in a
watershed management design problem. In: Proceedings of the genetic and evolutionary computation conference (GECCO), vol 2. Morgan Kaufmann, pp 15511558
He Q, Wang L (2007) An effective co-evolutionary particle swarm optimization for constrained
engineering design problems. Eng Appl Artif Intell 20(1):8999
He Q, Wang L, zhuo Huang F (2008) Nonlinear constrained optimization by enhanced coevolutionary PSO. In: IEEE congress on evolutionary computation, CEC 2008. (IEEE World
Congress on Computational Intelligence), pp 8389
Hughes T (1987) The finite element method: linear static and dynamic finite element analysis.
Prentice Hall Inc, New Jersey
Koziel S, Michalewicz Z (1998) A decoder-based evolutionary algorithm for constrained parameter
optimization problems. In: Eiben A, Bck T, Schoenauer M, Schwefel H-P (eds) Parallel problem
solving from nature (PPSN). LNCS, vol 1498. Springer, Berlin, pp 231240
Krempser E, Bernardino H, Barbosa H, Lemonge A (2012) Differential evolution assisted by surrogate models for structural optimization problems. In: Proceedings of the international conference
on computational structures technology (CST). Civil-Comp Press, p 49
Lemonge ACC, Barbosa HJC (2004) An adaptive penalty scheme for genetic algorithms in structural
optimization. Int J Numer Methods Eng 59(5):703736
Lemonge ACC, Barbosa HJC, Bernardino HS (2012) A family of adaptive penalty schemes for
steady-state genetic algorithms. In: 2012 IEEE congress on evolutionary computation (CEC).
IEEE, pp 18
Liang J, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan P, Coello CC, Deb K (2006) Problem
definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter
optimization. Technical report, Nanyang Technological University, Singapore
Lin C-H (2013) A rough penalty genetic algorithm for constrained optimization. Inf Sci 241:
119137
Lin C-Y, Wu W-H (2004) Self-organizing adaptive penalty strategy in constrained genetic search.
Struct Multidiscip Optim 26(6):417428
Luenberger DG, Ye Y (2008) Linear and nonlinear programming. Springer, New York
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194
27
Chapter 2
Continuous optimization
Fitness land-
2.1 Introduction
Constrained optimization problems (COP)s, especially nonlinear ones, are important
and widespread in many real-world applications such as chemical engineering, VLSI
chip design, and structural design (Floudas and Pardalos 1990). Various algorithmic
approaches have been introduced to tackle constrained optimization problems. The
major component of these optimization algorithms is devoted to the handling of the
involved constraints.
29
30
31
search points that are visited by a random walk on the landscape (Weinberger 1990).
Furthermore, there have been many studies that extend the basic autocorrelation
approach to provide additional insights into fitness landscapes (Box et al. 2013;
Hordijk 1996). One of the drawbacks of using autocorrelation by these statistical
analysis techniques is that the calculated value is a vague notion that does not clearly
reflect the landscape ruggedness. Thus, Vassilev proposed a new technique based on
the assumption that each landscape is an ensemble of different objects (the nodes
seen by a random walk on the fitness landscape), which can be grouped by their form,
size, and distribution (Vassilev et al. 2000). Vasillevs approach was applicable to
discrete problems. For real parameter landscapes, Malan and Engelbrecht (2009) used
Vassilevs information theoretic analysis to measure the fitness landscape ruggedness
in the continuous domain. So far, these landscape analysis techniques have been
conducted only for unconstrained or discrete problems. Measuring the landscape
ruggedness for constrained continuous problems imposes additional challenges and
we will propose how to tackle them in this chapter.
We propose an approach to measure the fitness landscape ruggedness of constrained continuous optimization problems. The quantification of ruggedness
combined with other analytical problem characteristics can help to build an algorithm
selection model based on the relation of different algorithms and problem properties.
This chapter includes a methodology for quantifying fitness landscape ruggedness
of constrained continuous problems. In order to do this, we extend Malans approach
to quantify the fitness landscape ruggedness of constrained continuous problems.
The information obtained by using simple random walks on constrained problems
landscape is not useful enough since it is mostly related to infeasible areas that are
unlikely to be seen by the solver. To cope with constraints in nearly infeasible problems, our approach replaces Malans random walk with a biased one. The obtained
samples are used to quantify the ruggedness of landscapes using the approach of
Vassilev et al. (2000). We evaluate our approach on well-known benchmarks taken
from the recent CEC competitions (Mallipeddi and Suganthan 2010) and discuss
the benefits and drawbacks of our new approach.
The remainder of this chapter is organized as follows: In Sect. 2.2, we introduce
constrained continuous optimization and discuss approaches that have been used to
analyze the ruggedness of unconstrained fitness landscapes. We present our approach
for quantifying ruggedness of constrained continuous fitness landscapes in Sect. 2.3
and the results of our experimental investigations in Sect. 2.4. Finally, we end our
research with some concluding remarks.
2.2 Preliminaries
In this section, we introduce basic notations and summarize the previous works on
measuring the ruggedness of fitness landscapes.
32
f (x), x = (x1 , . . . , xn ) Rn
(2.1)
such that x S F.
The feasible region F S of the search space S is defined as
li x i u i , 1 i n
(2.2)
where li and u i are lower and upper bounds on the variable xi , 1 i n. Additional
constraints are given by the functions
gi (x) 0, 1 i q,
h i (x) = 0, q + 1 i p
In order to work with iterative optimization algorithms for these problems, it is
common to relax the equality constraints
h i (x) = 0, q + 1 i p
to
|h i (x)| , q + 1 i p
(2.3)
where is a very small positive value that determines how much the original
constraints can be violated. In our experimental study, we work with = 0.0001
which is the same setting as used in Mallipeddi and Suganthan (2010).
33
1,
si = ft (i, ) = 0,
1,
if f i f i1 <
if | f i f i1 |
if f i f i1 >
(2.4)
where the parameter is the real positive number that represents the accuracy of
the calculation of the string S(). According to the function, if = 0 then the
function will be sensitive to the differences in adjacent points. It can be observed
that increasing the value of reduces the sensitivity of the function. Therefore, if the
value of equals the difference of the highest and lowest points in the walk, then the
fitness sequence will only consist of zeros.
To measure the ruggedness, the entropy of the string S() is calculated as follows:
H (S()) =
P[ pq]log 6 P[ pq]
(2.5)
p=q
34
landscape ruggedness with respect to the flat areas where neutrality is present. P[ pq]
refers to the frequency of the blocks where p and q have different values ( p = q):
P[ pq] =
n [ pq]
n
(2.6)
(2.7)
in which, is the smallest value that generates all sub-blocks as zeros and consequently the landscape becomes flat. Also, k is considered 18 to calculate smaller
values for s. Note that the parameter can be calculated as the difference in the
highest and lowest fitness that has been found in the random walk.
An entropic measure H (S()) requires a sequence of search points S(). In order
to generate a set of time series, a simple random walk on a landscape path can be
used (see Algorithm 1).
The above method was used for measuring the ruggedness of discrete problems.
The major issue of using this approach for continuous problems is that (unlike the
discrete problems) it is not possible to generate or access all possible neighbors of the
Table 2.1 Various sub-blocks in Si considered as rugged objects
10
11
Sub-block
01
10
Object type
Rugged
Rugged
Rugged
Rugged
11
01
Rugged
Rugged
Object figure
Table 2.2 Various
sub-blocks in Si considered
as flat objects
Sub-block
00
1 1
11
Object type
Flat
Flat
Flat
Object figure
35
3.
4.
5.
6.
7.
8.
9.
10.
If steps(counter) > boundaries
11.
steps(counter) = steps(counter-1)-(Range of the problem domain);
12.
Endif
13. Endfor
14. Until (counter < MaxStepNumber)
visited individual. Thus, Malan and Engelbrecht (2009) modified the approach to use
it for unconstrained continuous problems. The proposed approach adopts a random
increasing walk which increases the step size over time. Furthermore, the step size
is decreased if the algorithm produces a solution that is not within the boundaries
given by the constraints. The algorithm for the random increasing walk proposed in
Malan and Engelbrecht (2009) is given in Algorithm 2. Here, we assume that the
variable range is the same for all dimensions, which implies that the maximum step
size is the same for all dimensions. The algorithm can be easily adjusted to problems
with different variable ranges by using a maximum step size for each variable.
36
and challenging part. Often in these problems, the infeasibility rate is high and it
might be even very hard to find one feasible solution. This implies that random walk
methods are usually not very helpful as they would produce infeasible solutions most
of the time. Most constraint handling methods direct the search process to feasible
regions of the search space and therefore often allow to optimize in the feasible
region of the search space, which might be a very small proportion of the size of the
overall space.
37
i j (t + 1) = i j (t)e
where =
1
2n
N (0,1)+ N j (0,1)
random variable and N j (0, 1) denotes that there is a new value for each component
of .
By calculating the next generation strategy parameters (as above), each parent
produces new individuals as
38
When integrating constraint violations into the objective function, the main problem is to choose an appropriate penalty coefficient that determines how strongly the
constraint violation influences the objective value. There are also penalty methods
that use the constraint violation and objective functions separately. In this case, they
optimize the constraint violation and objective function in lexicographic order so
that the main goal is to obtain a feasible solution.
As discussed earlier, to deal with nearly infeasible problems, there is a need to use
a walk with the ability to distinguish between feasible and infeasible individuals. We
choose the stochastic ranking method proposed by Runarsson and Yao (2000) as our
constraint handling mechanism to sample and collect individuals for the time series
S(). It has been observed that there should be a balance between accepting infeasible
individuals and preserving feasible ones. Hence, neither over- nor under-penalizing
infeasible solution is a proper choice as constraint handling method (Gen and Cheng
2000). It is worth noting that all penalty methods try to adjust the balance between the
objective and the penalty function. The proposed stochastic ranking method adjusts
this balance in a direct way. By using this method, the walk is directed toward feasible
areas of the search space.
The stochastic ranking method is used to rank offspring in the evolutionary
strategy discussed earlier (see Algorithm 4). Ranking is achieved by comparing
adjacent individuals in at least sweeps. Ranking is terminated once no change
occurs during a whole sweep. To determine the balance of offspring selection, the
probability of P f is introduced in Runarsson and Yao (2000). In other words, P f
is the probability of comparing two adjacent individuals based on their objective
function. It is obvious that if two comparing individuals are feasible, then P f is 1.
39
Initialize probability of P f
I j = {1, . . . , }
For i starts at 1, i< N , increment i
For j starts at 1, h< -1, increment j
Generate a random number (U ) in the range of (0,1)
If ((I j ) = (I j+1 ) = 0) or (U < P f )
If f (I j ) > f (I j+1 )
swap(I j , I j+1 )
End if
else
If (I j ) > (I j+1 )
swap(I j , I j+1 )
End if
End if
End for
Break if no changes occurred within a complete sweep
End for
Algorithm 4: Stochastic ranking for dealing with infeasible areas. N is the number of sweeps needed for the whole population, is the number of individuals
that are ranked by at least sweeps and is a real-valued function that imposes
penalty
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
to use a biased walk that moves through good regions of the fitness landscape. It is
necessary to have feasible solution within the walk steps in order to obtain an effective ruggedness measure. Therefore, our approach uses a biased walk by constraint
handling methods, which makes it possible to have feasible individuals in the path.
In the algorithm, the individuals that are found by the simple evolutionary strategy
are ranked by the stochastic ranking method. Later, the highest rank individual is
selected for the step walk. The pseudo-code of our methodology to quantify the
ruggedness of constrained continuous fitness landscapes is given in Algorithm 5.
40
(2.8)
where P f is the probability that individual x wins when x and y are compared
according to their objective function value, and P is the probability that x wins
when they are compared according to the penalty function.
As discussed in Sect. 2.3.1, the walking algorithm should consider both feasible
and infeasible areas. Thus, P f determines whether the comparison is based on the
objective or the penalty function. Of course, the impact of this parameter setting
depends on the fitness landscape under investigation. By adjusting the parameter
P f , we can control the number of feasible or infeasible individuals in the walk and,
consequently, the calculated ruggedness measure is more likely based on the feasible
or infeasible regions.
n
xi2
5.12 xi 5.12
i=1
subject to g(x) 0
where g(x) imposes the constraints of the two-dimensional sphere function. We
construct three different problems that differ from each other by using each of the
41
following constraints:
n
g1 (x) = 10(i=1
| cos3 (xi 40)|) 4,
n
g2 (x) = 10(i=1 | cos3 (xi 40)|) 8,
n
| cos3 (xi 40)|) 12
g3 (x) = 10( i=1
In this experiment different optimization problems (Sphereg1 , Sphereg2 , Sphereg3 )
have low, medium, and high feasibility rate. In this experiment, we consider twodimensional sphere function to analyze the results more accurately. Figures 2.1
and 2.2 show the feasible areas in these three functions (n = 2).
We apply and compare the random increasing walk (see Algorithm 2) with our
methodology on these problems with different feasibility rates. In this experiment,
we use (1,7)-ES algorithm and P f = 0.4 that means the ES has a tendency to focus
on feasible solutions. We performed 20 independent runs consisting of 1,000 steps
each and for each problem the percentage of feasible solutions is represented in
Table 2.3.
Due to the stochastic nature of evolutionary optimization, the above test is repeated
20 times and the two-tail t-test significance is performed. In all tests, the significant
level is assigned as 0.05. The p-values for each function are represented in Table 2.4.
The results show that the difference in means are significant and less than 0.05.
Clearly, our methodology is less influenced by increasing the infeasibility rate of
the problem. Also, comparing both walks shows that using our biased walk is more
likely to obtain feasible individuals (steps) in the walk, (see Table 2.3). The standard
42
Fig. 2.2 Two-dimensional space of the constrained sphere functions with infeasible areas marked
white: a sphereg1 , b sphereg2 , c sphereg3 having low, medium, and high infeasibility rate
Table 2.3 Percentage of feasible individuals in the walks
Sphereg1
Sphereg2
Random increasing walk
Biased walk
71.3
75.8
55.8
68.1
Sphereg3
28.7
48.7
Table 2.4 p-values for significance of a difference between two means for running random increasing and biased walk over three functions
Sphereg1
Sphereg2
Sphereg3
p-value
0.0043
7.0834E 06
9.4817E 06
deviations of feasible individuals in both walks are shown in Fig. 2.3. It is clear that
the standard deviation of feasible individuals is higher for random walks.
Thus, the obtained ruggedness measure is related to the feasible parts, which is
more likely to be seen by the solver.
43
Fig. 2.3 Standard deviation for average percentage of feasible individuals in walks using random
increasing and biased walks
44
Fig. 2.4 Percentage of feasible individuals in walks for nearly infeasible CEC benchmark problems
Table 2.5 Ruggedness results for functions in CEC 2010 benchmarks (10D)
Function (10D) 2
4
8
16
32
64
128
256
C01
C02
C03
C06
C07
C09
C10
C17
C18
0
0
0
0
0
0
0
0
0
0.001
0.001
0.000
0.006
0.001
0.001
0.002
0.002
0.001
0.005
0.003
0.001
0.010
0.004
0.002
0.002
0.003
0.002
0.013
0.004
0.004
0.012
0.006
0.003
0.003
0.005
0.003
0.024
0.006
0.009
0.014
0.007
0.005
0.004
0.008
0.004
0.035
0.010
0.011
0.018
0.009
0.006
0.006
0.013
0.007
0.060
0.015
0.014
0.023
0.012
0.009
0.007
0.015
0.009
0.102
0.023
0.014
0.035
0.013
0.012
0.01
0.011
0.012
Ruggedness
0.153
0.035
0.013
0.027
0.015
0.014
0.012
0.019
0.017
0.153
0.035
0.014
0.027
0.015
0.014
0.012
0.019
0.017
Function
2
4
8
16
32
64
(10D)
STD STD
STD
STD
STD
STD
STD
C01
C02
C03
C06
C07
C09
C10
C17
C18
0
0
0
0
0
0
0
0
0
0.002
0.002
0.000
0.013
0.001
0.001
0.001
0.002
0.001
0.005
0.003
0.000
0.016
0.002
0.001
0.001
0.002
0.001
0.006
0.003
0.000
0.016
0.003
0.002
0.002
0.005
0.002
0.009
0.005
0.001
0.017
0.004
0.002
0.003
0.011
0.002
0.0160
0.008
0.002
0.019
0.006
0.004
0.004
0.022
0.004
0.028
0.0140
0.003
0.024
0.007
0.006
0.005
0.041
0.004
128
256
STD
STD
0.044
0.022
0.004
0.035
0.009
0.010
0.007
0.008
0.006
0.058
0.035
0.009
0.028
0.009
0.011
0.009
0.009
0.010
45
2.5 Conclusions
In this chapter, we have reviewed the literature on measuring ruggedness of fitness
landscapes and discussed the drawbacks of the current methods when dealing with
constrained problems. In order to address constrained continuous optimization problems, we have presented a new technique to quantify the ruggedness of constrained
continuous problem landscapes. The modification is based on replacing the random
sampling data by a biased walk using a (1,)-evolution strategy, which can distinguish the feasible and infeasible individuals. We evaluated our approach on different
benchmark functions and show that it produces more feasible solutions during its
run. Furthermore, we evaluated our method on CEC 2010 benchmark problems and
discussed the results.
Appendix
The experimented benchmark functions described in Mallipeddi and Suganthan
(2010) are summarised here. In this experiment is considered as 0.0001.
C01
Minimize
D
D
2 (z )
i=1 cos4 (z i ) 2 i=1
cos
i
f (x) =
z = x o
D
2
iz
i=1
46
subject to
g1 (x) = 0.75
zi 0
i=1
g2 (x) =
D
0.75D 0
i=1
D
x [0, 10]
C02
Minimize
f (x) = max(z) z = x o, y = z 0.5
subject to
g1 (x) = 10
D
1 2
[z i 10 cos(2 z i ) + 10] 0
D
i=1
D
1 2
g2 (x) =
[z i 10 cos(2 z i ) + 10] 15 0
D
i=1
h(x) =
D
1 2
[yi 10 cos(2 yi ) + 10] 20 0
D
i=1
x [5.12, 5.12] D
C03
Minimize
f (x) =
D1
i=1
subject to
h(x) =
D1
(z i z i+1 )2 = 0
i=1
x [1,000,1,000] D
C06
Minimize
f (x) = max(z) z = x o,
y = (x + 483.6106156535 o)M 483.6106156535
subject to
D
1
yi sin
|yi | = 0
D
h 1 (x) =
i=1
D
1
yi cos 0.5 |yi | = 0
D
h 2 (x) =
i=1
x [600, 600] D
C07
Minimize
f (x) =
D1
i=1
z = x + 1 o, y = x o
subject to
D
D
1
1
g(x) = 0.5 exp 0.1
yi2 3 exp
cos(0.1y)
D
D
i=1
i=1
+ exp(1) 0
x [ 140, 140] D
C09
Minimize
f (x) =
D1
i=1
47
48
z = x + 1 o, y = x o
subject to
h 1 (x) =
D
yi sin
|yi | = 0
i=1
x [500, 500] D
C10
Minimize
f (x) =
D1
i=1
z = x + 1 o, y = (x o)M
subject to
h 1 (x) =
D
yi sin
|yi | = 0
i=1
x [500, 500] D
C17
Minimize
D
(z i z i+1 )2 z = x o
f (x) =
i=1
subject to
g1 (x) =
zi 0
i=1
g2 (x) =
D
i=1
zi 0
h(x) =
D
49
z i sin 4 |z i | = 0
i=1
x [10, 10] D
C18
Minimize
f (x) =
D
(z i z i+1 )2 z = x o
i=1
subject to
g(x) =
D
z i sin
|z i |
i=1
h(x) =
D
z i sin
|z i | = 0
i=1
x [50, 50] D
References
Box GE, Jenkins GM, Reinsel GC (2013) Time series analysis: forecasting and control. Wiley
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the
sixth international symposium on micro machine and human science, MHS95. IEEE, pp 3943
Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization
algorithms, vol 455. Springer, Berlin
Gen M, Cheng R (2000) Genetic algorithms and engineering optimization, vol 7. Wiley, New York
Hordijk W (1996) A measure of landscapes. Evol Comput 4(4):335360
Liang J, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan P, Coello CC, Deb K (2006) Problem
definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter
optimization. J Appl Mech 41
Lipsitch M (1991) Adaptation on rugged landscapes generated by local interactions of neighboring
genes. In: Proceedings of the fourth international conference on genetic algorithms. San Mateo
Malan KM, Engelbrecht AP (2009) Quantifying ruggedness of continuous landscapes using entropy.
In: IEEE congress on evolutionary computation, CEC09, pp 14401447
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization. Nanyang Technological University,
Singapore
50
Manderick B, de Weger, M, Spiessens P (1991) The genetic algorithm and the structure of the fitness
landscape. In: Proceedings of the fourth international conference on genetic algorithms. Morgan
Kauffman, San Mateo, pp 143150
Mattfeld DC, Bierwirth C, Kopfer H (1999) A search space analysis of the job shop scheduling
problem. Ann Oper Res 86:441453
Mersmann O, Bischl B, Trautmann H, Preuss M, Weihs C, Rudolph G (2011) Exploratory landscape
analysis. In: Proceedings of the 13th annual conference on genetic and evolutionary computation.
ACM, pp 829836
Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194
Naudts B, Kallel L (2000) A comparison of predictive measures of problem difficulty in evolutionary
algorithms. IEEE Trans Evol Comput 4(1):115
Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Schwefel HPP (1993) Evolution and optimum seeking: the sixth generation. Wiley, New York
Smith T, Husbands P, Layzell P, OShea M (2002) Fitness landscapes and evolvability. Evol Comput
10(1):134
Stadler PF et al (1995) Towards a theory of landscapes. In: Complex systems and binary networks.
Springer, Heidelberg, pp 78163
Storn R, Price K (1997) Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341359
Vassilev VK, Fogarty TC, Miller JF (2000) Information characteristics and the structure of landscapes. Evol Comput 8(1):3160
Vassilev VK, Fogarty TC, Miller JF (2003) Smoothness, ruggedness and neutrality of fitness landscapes: from theory to application. In: Advances in evolutionary computing. Springer, pp 344
Weinberger E (1990) Correlated and uncorrelated fitness landscapes and how to tell the difference.
Biol Cybern 63(5):325336
Chapter 3
51
52
R.G. Regis
3.1 Introduction
In many real-world engineering optimization problems, the values of the objective
and constraint functions are outputs of computationally expensive simulations. These
types of optimization problems are found in the automotive and aerospace industries
(e.g., Jones 2008; Ong et al. 2003) and in various parameter estimation problems
(e.g., Mugunthan et al. 2005; Tolson and Shoemaker 2007). A reasonable strategy
for solving these problems is to use surrogate-based or surrogate-assisted optimization methods, including surrogate-assisted evolutionary algorithms (EAs) Jin (2011),
where the algorithm uses surrogate models to approximate the black-box objective and constraint functions. For instance, Regis (2014b) successfully developed
a surrogate-assisted Evolutionary Programming (EP) algorithm and applied it to a
large-scale automotive optimization application with 124 decision variables and 68
black-box inequality constraints given a severely limited computational budget of
only 1,000 simulations, where one simulation yields the objective function value
and each of the constraint function values at a given input vector. The purpose of
this paper is two-fold: (1) To develop a new surrogate-assisted EP for constrained
black-box optimization that improves on the algorithm by Regis (2014b) on a set of
benchmark problems, including the above-mentioned large-scale automotive application and (2) To compare the new approach with alternative methods, including a
mathematically rigorous penalty derivative-free algorithm, on the same problems.
This chapter focuses on constrained black-box optimization problems of the
following form:
min f (x)
s.t.
(3.1)
x Rd
gi (x) 0, i = 1, 2, . . . , m
axb
Here, f is the black-box objective function and g1 , . . . , gm are black-box inequality
constraint functions and a, b Rd define the bound constraints of the problem.
Throughout this paper, assume that for any input x [a, b] Rd , the values of
f (x), g1 (x), . . . , gm (x) are obtained by running a time-consuming simulator (a computer code) at the input x. Moreover, assume that f , g1 , . . . , gm are all deterministic
and that their gradients are not available. Furthermore, for simplicity, assume that
[a, b] Rd is a hypercube since any hyper-rectangle can be easily transformed to
the unit hypercube [0, 1]d . Problems with equality constraints or noisy functions will
be treated in future work.
Problem (3.1) is difficult when the dimension d and the number of black-box
constraints m are large, and it is even more difficult when the computational budget
is relatively limited. Although much progress has been made in the development of
constraint handling techniques for EAs (Mezura-Montes and Coello Coello 2011),
most of these approaches require a large number of simulations even on problems of
moderate size, and hence they are not appropriate in the computationally expensive
53
54
R.G. Regis
This paper is organized as follows. Section 3.2 provides a review of the relevant
literature. Section 3.3 describes the proposed TRICEPS algorithm and the RBF surrogate model used. Sections 3.4 and 3.5 discuss the numerical experiments and results.
Finally, Sect. 3.6 provides some conclusions.
55
violations is lost. In fact, some numerical evidence to support this idea can be found in
Regis (2014b). Instead, Powell (1994) suggests treating the constraints individually
by building individual surrogates, one for each constraint.
According to Mezura-Montes and Coello Coello (2011), surrogates are still
seldom used to approximate constraints in nature-inspired algorithms. One example
is a GA combined with Feasible Sequential Quadratic Programming (FSQP) developed by Ong et al. (2003), where local RBF surrogates are used to model the objective
and constraint functions. Other examples are given by Araujo et al. (2009) and Wanner et al. (2005), where quadratic models are used to approximate the objective and
constraint functions in GAs. Moreover, Isaacs et al. (2007, 2009) used RBF networks to model objective and constraint functions in evolutionary multi-objective
optimization. In addition, Emmerich et al. (2006) proposed using local Gaussian
Random Field Metamodels for modeling constraint functions in single- and multiobjective evolutionary optimization. More recently, Gieseke and Kramer (2013) used
SVMs to estimate nonlinear constraints in CMA-ES for expensive optimization.
While there are relatively few algorithms that use surrogates to approximate
black-box constraints, there are even fewer algorithms that have been used on
high-dimensional (more than a hundred decision variables) and highly constrained
problems. In Ong et al. (2003), the GA coupled with FSQP that uses local RBF surrogates was tested only on problems with at most 20 decision variables and at most 4
inequality constraints. The metamodel-based CiMPS method (Kazemi et al. 2011)
was only tested on problems with at most 13 decision variables and 9 inequality
constraints. On the other hand, ConstrLMSRBF (Regis 2011), CEP-RBF (Regis
2014b) and COBRA (Regis 2014a) all use global RBF surrogates and were all successful compared to alternatives on well-known benchmark problems and on the
MOPTA08 automotive application with 124 decision variables and 68 black-box
inequality constraints (Jones 2008). One of the goals of this paper is to develop a
new surrogate-assisted EP that improves upon the surrogate-assisted EP in Regis
(2014b) on benchmark test problems and on the MOPTA08 automotive problem.
56
R.G. Regis
Moreover, as in Regis (2014b), each parent generates multiple trial offspring in every
generation and then the surrogates for the objective and constraint functions are used
to rank these trial offspring according to rules that favor offspring with the best
predicted objective function values among those with the minimum number of predicted constraint violations. The computationally expensive simulations (evaluations
of the objective and constraint functions) are then carried out only on the most
promising offspring of each parent.
TRICEPS differs from the surrogate-assisted EP by Regis (2014b) in that it incorporates a trust-region-like approach to refine the best solution at the end of each
generation. That is, after performing simulations at the offspring of the current
generation, TRICEPS solves a trust-region-like subproblem where it finds a minimizer of the surrogate of the objective function within a trust region centered at
the current best solution and subject to surrogate inequality constraints with a small
margin and with a distance requirement from previously evaluated points. The idea
of refining the best solution at the end of each generation has been implemented in
surrogate-assisted particle swarm algorithms for bound constrained problems (e.g.,
Parno et al. (2012); Regis (2014c)). However, these previous approaches did not use
trust regions that can be expanded or reduced. In TRICEPS, the adjustment of the
trust region depends on whether the subproblem solution turned out to be feasible,
whether the ratio of the actual improvement to the improvement predicted by the
surrogate exceeds or falls below certain thresholds, and also whether the number of
consecutive successful local refinements or the number of consecutive unsuccessful
local refinements have reached certain thresholds. Also, the idea of using a margin
on the surrogate inequality constraints was first proposed by Regis (2014a) and its
purpose is to increase the chances of obtaining feasible points.
When the optimization problem has a large number of decision variables and has
many black-box inequality constraints, Regis (2011, 2014b) implemented a Block
Coordinate Search (BCS) strategy where new trial solutions (or offspring) are generated by perturbing only a small fraction of the coordinates of the current solution
under consideration (i.e., a particular parent solution, including possibly the current
best feasible solution). The BCS strategy resulted in a dramatic improvement for
the ConstrLMSRBF (Regis 2011) and CEP-RBF (Regis 2014b) when applied to
the MOPTA08 benchmark problem from the auto industry proposed by Jones (2008)
involving 124 decision variables and 68 black-box inequality constraints. When only
a small number of coordinates of a parent solution are perturbed, fewer constraint
violations are likely to be introduced in the trial offspring and the trial offspring will
tend to be closer to the parent solution. If this parent solution is feasible, many of the
trial offspring will tend to be feasible thereby making it more likely to find a feasible
solution with an improved objective function value. Hence, the BCS strategy is also
implemented in TRICEPS when it is used for high-dimensional problems with many
black-box inequality constraints.
Figure 3.1 presents a flowchart that shows the main steps of the TRICEPS algorithm. The algorithm begins by initializing the parent population and algorithm
parameters and then calculating the objective and constraint functions at the initial
57
Initialize parent
population and
algorithm parameters
Stop
yes
Computational
budget
reached?
Update
trustregion
no
Update surrogates
of the objective and
constraint functions
Solve trust-region
subproblem
Evaluate
surrogates
at trial
ospring
Evaluate objective
and constraint functions at best ospring
for each parent
population. Then TRICEPS goes through a main loop that terminates only when the
computational budget (i.e., maximum number of function evaluations) is reached.
In the first part of the loop, TRICEPS performs the same steps as in CEP-RBF
(Regis 2014b). That is, TRICEPS fits the surrogates for the objective and constraint
functions, generates a large number of trial offspring for each parent, and then uses
the surrogates to select only the most promising trial offspring and this is where
the function evaluations are performed. In the second part of the loop, TRICEPS
performs a trust-region-like refinement of the best parent solution. That is, the surrogates are updated using information from recently evaluated points, the trust-region
subproblem is solved, then function evaluations are performed on the solution to the
trust-region subproblem, and finally, the algorithm parameters and the trust region
are updated. Note that the surrogates are updated twice in a single iteration, once
before the trial offspring are generated and once before the trust-region step. Hence,
surrogate modeling is integrated into the optimization process in two ways by using
it: (1) to select the most promising among multiple trial offspring for each parent
solution and (2) to identify a local refinement point for the current best solution
during the trust-region step.
58
R.G. Regis
Each individual is a pair of d-dimensional vectors (xi (t), i (t)), where t is the generation number, i is the index of the individual in the current population, xi (t) is
the vector of values of the decision variables, and i (t) is the vector of standard
deviations for the Gaussian mutations.
( + )-TRICEPS for Constrained Black-Box Optimization
(1) Set generation counter t = 0 and set initial population P(0) = {(x1 (0), 1 (0)),
. . . , (x (0), (0))}, where i (0) = init for i = 1, . . . , and x1 (0) = x0
(feasible starting point).
59
min st (x)
s.t. x Rd , a x b
x vn t
(i)
st (x) + t 0, i = 1, 2, . . . , m
x vj , j = 1, . . . , n
(3.2)
60
R.G. Regis
61
at each trial offspring (Step 5.2(b)) and the most promising of the trial offspring
from each parent is chosen (Step 5.2(c)). Next, the simulator is run to determine the
objective and constraint function values at the selected offspring (Step 5.2(d)). Then,
the algorithm selects the the parent population for the next generation (Step 5.3). As
before, the new parent population is reordered so that the first one is the best point.
The next several steps attempt to refine the current best solution, which is the best
parent in the next generation x1 (t + 1). In Step 5.4, the surrogates for the objective
and constraints are updated using the newly obtained function values at the offspring
of the current generation. In Step 5.5, a trust-region subproblem (3.2) is solved. For
convenience, all points in the search space where the simulator has been run are
relabeled as v1 , . . . , vn and let vn be the best feasible point found so far. Because of
previous relabeling, vn = x1 (t +1). In this step, the algorithm finds a local minimizer
of the surrogate of the objective within the trust region of radius t centered at the
current best point and subject to the surrogate inequality constraints with a small
margin t and subject to a distance requirement from previously evaluated points.
Then, in Step 5.6,
x(t) is either a solution to the trust-region subproblem (3.2) or it is
the best infeasible solution to (3.2) from a set of randomly generated points within
the trust region. Here,
x(t) is referred to as the local refinement point. In Step 5.7,
the simulator is run to determine the objective and constraint function values at the
local refinement point
x(t). Then, in Step 5.8, the local refinement point replaces
the best parent in the next generation (which is also the current best solution) if
the former is a better point than the latter. Moreover, the trust-region radius is either
expanded or reduced depending on whether the local refinement point
x(t) is feasible,
whether the ratio of the actual improvement to the improvement originally predicted
by the surrogate for
x(t) exceeds 1 or falls below 0 , and also whether the counters
Csuccess or Cfail have reached the thresholds Tsuccess or Tfail . In addition, in Step 5.9,
the margin for the surrogate inequality constraints is reduced if the counter Cinfeas
reached the threshold Tinfeas . Then, Step 5.10 increments the generation counter and
the algorithm goes back into the loop until a stopping criterion is satisfied. Finally,
the best solution found is returned in Step 6. As with the surrogate-assisted EP in
Regis (2014b), the stopping criterion is a fixed number of simulations.
As in Regis (2014b), each parent generates trial offspring, only one of which
becomes an actual offspring for the current generation. The value of the parameter is
chosen to be large so that the expensive simulations are only run on trial offspring that
are very promising as predicted by the surrogates. Moreover, TRICEPS allows for the
possibility of using the BCS strategy from Regis (2011, 2014b) for high-dimensional
or highly constrained problems. In BCS, the mutations are more conservative in that
only a fraction of the components of the parent vector is perturbed when generating
the trial solutions so the probability of perturbing any component pmut < 1. (When
pmut = 1, the algorithm does not use the BCS strategy.) As explained in Regis
(2011, 2014b), the BCS strategy is helpful for high-dimensional problems or highly
constrained problems because perturbing too many components of a parent vector
that is already good is either likely to make the objective function value worse or it
is likely to result in more constraint violations.
62
R.G. Regis
More precisely, in Step 5.2(a), each parent (xi (t), i (t)) in generation t creates
exactly trial offspring (xij (t), ij (t)) for j = 1, . . . , as follows: For k = 1, . . . , d,
(1) Generate a random number from the uniform distribution on [0, 1].
(2) If pmut , then
xij (t)(k) = xij (t)(k) + ij (t)(k) Nk (0, 1),
ij (t)(k) = ij (t)(k) exp( N(0, 1) + Nk (0, 1)).
Else
xij (t)(k) = xij (t)(k) ,
ij (t)(k) = ij (t)(k) .
End.
In Step 5.2(c), the trial offspring solutions are ranked in the same manner as in
Regis (2014b):
(1) Between two solutions that are predicted to be feasible, the one with the better
predicted objective value wins.
(2) Between a solution that is predicted to be feasible and a solution that is predicted
to the infeasible, the former wins.
(3) Between two solutions that are predicted to be infeasible, the one with the fewer
number of predicted constraint violations wins.
(4) Between two solutions that are predicted to be infeasible with the same number
of predicted constraint violations, the one with the better predicted objective
value wins.
In implementing TRICEPS, a continuously differentiable surrogate whose gradient is easy to compute is highly recommended so that efficient gradient-based
techniques can be used to solve the trust-region subproblem (3.2). One such example
of a surrogate is provided in the next section. Note that the gradients of the trustregion constraints and the distance constraints are easy to calculate. In particular, for
the trust-region constraint Tt (x) = x vn t 0 and the distance constraints
Dt,j (x) = x vj 0 for j = 1, . . . , n, the gradients are given by:
Tt (x) =
(x vj )
x vn
and Dt,j (x) =
.
x vn
x vj
63
are easy to calculate. One popular choice is kriging or Gaussian process modeling,
but this method is computationally intensive and requires an enormous amount of
memory in high dimensions. This study uses the simpler radial basis function (RBF)
model in Powell (1992) that has been successfully used to develop various RBF
methods (e.g., Bjrkman and Holmstrm 2000; Gutmann 2001; Regis 2011; Regis
and Shoemaker 2007; Wild et al. 2008). Fitting this model differs from the training
method typically used for RBF networks. It involves solving a linear system that
possesses good theoretical properties that can be taken advantage of to solve the
system in a stable and efficient manner.
Given n distinct points x1 , . . . , xn Rd and the function values u(x1 ), . . . , u(xn ),
where u(x) could be the objective function or one of the constraint functions, TRICEPS is implemented below using an interpolant of the form
s(x) =
n
i (x xi ) + p(x), x Rd ,
i=1
P
PT 0(d+1)(d+1)
U
,
=
c
0d+1
(3.3)
64
R.G. Regis
n
i=1
i (x vi )
(x vi )
+ p(x), x Rd ,
x vi
x = vi for all i,
65
than the problems typically used in surrogate-based or surrogate-assisted optimization (e.g., Basudhar et al. 2012; Egea et al. 2009; Viana et al. 2010). The goal of this
problem is to determine the values of the decision variables (e.g., shape variables)
that minimize the mass of the vehicle subject to performance constraints (e.g., crashworthiness, durability). The MOPTA08 problem is a relatively inexpensive model
of an actual automotive design problem. It is based on kriging response surfaces to
a real automotive problem. Each simulation of this problem takes about 0.32 s on
an Intel(R) Core(TM) i7 CPU 860 2.8 Ghz desktop machine while each simulation
of the real version could take 13 days (Jones 2008). However, as in Regis (2011,
2014b) the different algorithms are compared by assuming that the simulations are
expensive.
66
Table 3.1 Parameter settings
for TRICEPS-RBF
R.G. Regis
Parameter
Value
init
pmut
0
min
max
0
1
0
1
Tfail
Tsuccess
init
Tinfeas
2 or 5
= min(103 d, 104 )
0.05([a, b])
0.1 (with BCS) or 1 (without BCS)
0.05([a, b])
0.0125([a, b])
0.1([a, b])
0
0.5
0.5
2
min(max(
pmut d, 5), 30)
2
0.0005([a, b])
max(3, d)
0.0005([a, b])
The number of parents in each generation for the EP methods (including the
RBF-assisted ones) is = 2 or 5 and the initial standard deviation of the Gaussian
mutations is init = 0.2([a, b]), where ([a, b]) is the side length of the hypercube
[a, b] in (3.1). For the RBF-assisted EPs (TRICEPS-RBF, CEP-RBF and PenCEPRBF), the number of trial offspring for each parent is = min(103 d, 104 ). Moreover,
when applying the BCS strategy, the probability of perturbing a coordinate is pmut =
0.1 as in Regis (2014b). The other parameters for the ( + )-TRICEPS-RBF are
summarized in Table 3.1.
All algorithms are run on Matlab 7.12 using an Intel(R) Core(TM) i7 CPU
860 2.8 Ghz desktop machine. In particular, a Matlab version of SDPEN, called
SDPENm, is used on the test problems. Each algorithm is run for 10 trials on the
MOPTA08 problem and 30 trials on each of the other test problems. Moreover, each
trial of each algorithm is run for 1,000 simulations on the MOPTA08 problem, 300
simulations on the 30-dimensional test problems, and 200 simulations on the remaining (mostly lower dimensional) problems. Each trial begins with a feasible point that
is the same for all algorithms. For the MOPTA08 problem, only one feasible starting
point is given in Jones (2008) so all trials use this point. This feasible point has an
objective function value of 251.0706, and according to Jones (2008), any algorithm
that can achieve a feasible objective function value of 228 or lower within a relatively
limited number of simulations (say a few thousand simulations) is a good algorithm
for this problem. Moreover, each trial of an EP (with or without RBF surrogates)
begins with the feasible initial point together with a randomly generated Latin hypercube design (LHD) consisting of d + 1 affinely independent points, none of which
67
are guaranteed to be feasible. The case where no feasible point is available at the
beginning will be dealt in future work. In addition, all EP algorithms (with or without
RBF surrogates) use the same LHD in a given trial and their initial parent populations
consist of the best points from d + 2 points: the d + 1 LHD points and the feasible
starting point.
The settings for the alternative methods are the same as those used in Regis
(2014b). For example, for SRES (Runarsson and Yao 2000), = 8 and = 50
for the regular test problems and = 20 and = 140 for the MOPTA08 problem.
The initial population consists of the best points from the same initial points used
by the EP algorithms and the default values are used for the other parameters. For
the eSS code (Egea et al. 2007), the default parameters are modified to reduce the
time spent on the initialization phase. For example, the number of solutions generated by the diversificator is set to 2d, whereas the default is 10d. In addition,
ConstrLMSRBF is initialized by the LHDs used by the RBF-assisted EPs so it is
labeled as ConstrLMSRBF-LHD. Finally, SDPEN has no user-specified parameters
but it requires an initial point, which is the best point among the LHD points and the
feasible starting point.
tp,s
,
min{tp,s : s S }
68
R.G. Regis
where tp,s is the number of simulations required to satisfy the convergence test defined
below. Here, one simulation means one evaluation of the objective and each of the
inequality constraint functions. Clearly, rp,s 1 for any p P and s S , and the
best solver for a given problem attains rp,s = 1. By convention, rp,s = whenever
solver s fails to yield a solution that satisfies the convergence test.
Now, for any solver s S and for any 1, the performance profile of s with
respect to is the fraction of problems where the performance ratio is at most , i.e.,
s () =
1
{p P : rp,s } .
|P|
For any solver s S , the performance profile curve of s is the graph of the performance profiles of s for a range of values of .
In derivative-free, constrained expensive black-box optimization, algorithms are
compared given a fixed and relatively limited number of simulations. Hence, the
convergence test by Mor and Wild (2009) uses a tolerance > 0 and the minimum
feasible objective function value fL obtained by any of the solvers on a particular
problem within a given number s of simulations and it checks if a feasible point x
obtained by a solver satisfies
f (x (0) ) f (x) (1 )(f (x (0) ) fL ),
where x (0) is a feasible starting point corresponding to the given problem. That is,
x is required to achieve a reduction that is 1 times the best possible reduction
f (x (0) ) fL . Here, feasibility is determined according to some constraint tolerance,
which is set to 106 ([a, b]) in this study. Moreover, the parameter is set to 0.05 in
the numerical experiments.
Next, given a solver s S and > 0, the data profile of s with respect to
(Mor and Wild 2009) is given by
tp,s
1
,
pP :
ds () =
|P|
np + 1
where tp,s is the number of simulations required by solver s to satisfy the convergence
test on problem p and np is the number of decision variables in problem p. For any
solver s S , the data profile curve of s is the graph of the data profiles of s for
a range of values of . For a given solver s and any > 0, ds () is the fraction
of problems solved (i.e., problems where the solver generated a feasible point
satisfying the convergence test) by s within (np + 1) simulations (equivalent to
simplex gradient estimates (Mor and Wild 2009)).
Mor and Wild (2009) point out that data profiles are more suitable for comparing optimization algorithms when function evaluations are computationally expensive. This is because performance profiles can only compare algorithms at a fixed
69
computational budget (say after 200 simulations) while data profiles can compare
algorithms at different computational budgets and this is more valuable to users in
the computationally expensive setting where the short-term behavior of algorithms is
more important than long-term behavior. Moreover, since the number of simulations
needed to satisfy the above convergence test typically grows with the problem size,
data profiles take into account the number of decision variables in the problems.
On the other hand, performance profiles ignore problem size. Hence, in some cases
below, only the data profiles are shown to avoid clutter in the presentation of results.
70
R.G. Regis
Performance profiles after 200 simulations (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6
(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP
() 0.5
s
0.4
0.3
0.2
0.1
0
1.5
2.5
3.5
Performance Factor
6
(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP
d ()
s
0.5
0.4
0.3
0.2
0.1
0
10
15
20
25
30
35
40
45
50
Fig. 3.2 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on all test problems
71
(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP
() 0.5
s
0.4
0.3
0.2
0.1
0
1.5
2.5
3.5
Performance Factor
6
0.9
0.8
0.7
0.6
d () 0.5
s
0.4
0.3
0.2
0.1
0
10
Fig. 3.3 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on the 30-dimensional test problems
72
R.G. Regis
Performance profiles after 200 simulations (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6
(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP
s() 0.5
0.4
0.3
0.2
0.1
0
1.5
2.5
3.5
Performance Factor
Data profiles up to 50 simplex gradients (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6
d ()
s
0.5
0.4
0.3
(2+2)TRICEPSRBF
(5+5)TRICEPSRBF
(2+2)CEPRBF
(5+5)CEPRBF
(2+2)CEP
(5+5)CEP
0.2
0.1
0
10
15
20
25
30
35
40
45
50
Fig. 3.4 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on test problems with at least 20 decision variables or with at least 5 inequality constraints
73
74
R.G. Regis
6
1
0.9
0.8
0.7
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.6
() 0.5
s
0.4
0.3
0.2
0.1
0
Performance Factor
Data profiles up to 50 simplex gradients (constraint tolerance = 106)
1
0.9
0.8
0.7
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.6
d () 0.5
s
0.4
0.3
0.2
0.1
0
10
15
20
25
30
35
40
45
50
Fig. 3.5 Performance and data profiles for ( + )-TRICEPS-RBF and alternative methods on all
test problems
75
Table 3.2 Statistics on best feasible objective function value after 1,000 simulations of the
MOPTA08 problem (10 trials)
Algorithm
Best
Median
Worst
Mean
Std Error
(2+2)-TRICEPS-RBF
(2+2)-TRICEPS-RBF-BCS
(2+2)-CEP-RBF
(2+2)-CEP-RBF-BCS
(2+2)-PenCEP-RBF
(2+2)-PenCEP-RBF-BCS
(2+2)-CEP
Stochastic Ranking ES
Scatter Search (eSS)
ConstrLMSRBF-LHD-BCS
SDPEN
227.27
225.48
231.18
226.76
251.07
246.96
251.07
251.07
251.07
225.75
231.77
228.18
226.19
238.62
228.51
251.07
247.84
251.07
251.07
251.07
227.30
231.77
228.76
227.42
251.07
228.92
251.07
248.99
251.07
251.07
251.07
228.64
231.77
228.20
226.43
240.13
228.16
251.07
247.84
251.07
251.07
251.07
227.27
231.77
0.14
0.22
2.10
0.23
0.00
0.22
0.00
0.00
0.00
0.26
0
76
R.G. Regis
6
1
0.9
0.8
0.7
0.6
d () 0.5
s
0.4
0.3
0.2
0.1
10
15
20
25
30
Fig. 3.6 Data profiles for (2 + 2)-TRICEPS-RBF with different values of on all test problems
1
0.9
0.8
0.7
0.6
d ()
s
0.5
0.4
0.3
0.2
0.1
init
10
15
20
25
= 0.2)
30
Fig. 3.7 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on all test problems
77
6
1
0.9
0.8
0.7
0.6
d ()
s
0.5
0.4
0.3
(2+2)TRICEPSRBF (init = 0.05)
0.2
(2+2)TRICEPSRBF (
= 0.1)
(2+2)TRICEPSRBF (
= 0.2)
init
0.1
init
10
15
20
25
30
Fig. 3.8 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on problems with
at least 5 inequality constraints
1
0.9
0.8
0.7
0.6
d ()
s
0.5
0.4
0.3
(2+2)TRICEPSRBF (init = 0.05)
0.2
(2+2)TRICEPSRBF (
= 0.1)
(2+2)TRICEPSRBF (
= 0.2)
init
0.1
init
10
15
20
25
30
Fig. 3.9 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on all test problems
78
R.G. Regis
3.6 Conclusions
This paper developed the TRICEPS algorithm, which is a surrogate-assisted Evolutionary Programming (EP) algorithm for computationally expensive constrained
optimization problems having only black-box inequality constraints and bound constraints. It is meant to be an improvement over CEP-RBF (Regis 2014b) in that the
algorithm performs a trust-region-like local refinement step at the end of every generation where it finds a minimizer of the surrogate model of the objective within a trust
region subject to surrogate inequality constraints with a small margin and subject to
some distance requirement from previously evaluated points. Moreover, TRICEPS
is implemented using a cubic RBF with a linear polynomial tail and a gradient-based
algorithm is used to solve the trust-region-like subproblem. TRICEPS-RBF and CEPRBF are among the few surrogate-assisted EAs that use surrogates to approximate
the constraints and that have been successfully applied to a problem that is considered
large-scale in surrogate-based or surrogate-assisted optimization. TRICEPS-RBF is
compared with alternatives, including CEP-RBF and the mathematically rigorous
sequential penalty derivative-free algorithm SDPEN (Liuzzi et al. 2010), on 18 wellknown benchmark problems and on the MOPTA08 automotive application with 124
decision variables and 68 black-box inequality constraints, which is much larger than
the typical problem used in this area.
TRICEPS-RBF and the alternatives are compared on the 18 test problems using
performance and data profiles (Mor and Wild 2009) instead of average progress
curves such as the ones used in Regis (2014b). Moreover, the algorithms are compared in terms of the best feasible objective function value obtained after only 1,000
simulations on the MOPTA08 problem. The profile curves show that TRICEPS-RBF
is an improvement over CEP-RBF on problems that are either high-dimensional or
highly constrained. Moreover, the results confirm the previous findings in Regis
(2014b) that using an RBF surrogate can dramatically improve the performance of a
constrained EP. Furthermore, the (2 + 2)-TRICEPS-RBF algorithm is substantially
79
and consistently much better than the SDPEN algorithm, an RBF-assisted penaltybased EP, Stochastic Ranking Evolution Strategy (SRES) and Scatter Search (eSS)
on the problems in this study when the algorithms are given a very limited computational budget. In addition, TRICEPS-RBF is also better than the ConstrLMSRBFLHD heuristic (Regis 2011). Finally, sensitivity analyses of TRICEPS-RBF to some
of the user-specified parameters on the test problems suggest that it is somewhat
sensitive to the choice of the initial standard deviation of the Gaussian mutations and
the initial trust-region radius but not so much on the number of trial offpsring for
each parent solution.
On the MOPTA08 problem, (2 + 2)-TRICEPS-RBF-BCS is better than both
(2+2)-CEP-RBF-BCS (Regis 2014b) and ConstrLMSRBF-LHD-BCS (Regis 2011)
while requiring much less computational overhead than ConstrLMSRBF-LHD-BCS.
Moreover, both (2 + 2)-TRICEPS-RBF-BCS and (2 + 2)-CEP-RBF-BCS are much
better than the other alternatives, including SDPEN, on the MOPTA08 problem.
In addition, the results also confirm the previous finding in Regis (2014b) that the
BCS strategy (Regis 2011, 2014b) is very promising for high-dimensional problems
and highly constrained problems. Overall, TRICEPS-RBF is very promising for
computationally expensive constrained black-box optimization and it helps push the
frontier of surrogate-assisted constrained evolutionary optimization.
Acknowledgments Special thanks to Don Jones from General Motors Product Development for
proposing the MOPTA08 benchmark problem and for making a Fortran simulation code for this
problem publicly available. I would also like to thank Prof. Thomas Philip Runarsson for the Matlab
code for Stochastic Ranking Evolution Strategy, Dr. Julio Bangas research group for the Matlab
code for Scatter Search, and Drs. Mallipeddi and Suganthan for the codes that implement the
benchmark problems from the CEC 2010 competition.
Appendix
A. Test Problems
There are four engineering design test problems: Welded Beam Design Problem
(WB4) (Coello Coello and Mezura-Montes 2002; Hedar 2004), Pressure Vessel
Design Problem (PVD4) (Coello Coello and Mezura-Montes 2002; Hedar 2004),
Gas Transmission Compressor Design Problem (GTCD) (Beightler and Phillips
1976), and Speed Reducer Design for small aircraft engine (SR7) (Floudas and
Pardalos 1990). Nine of the test problems are from the well-known constrained optimization test problems in Michalewicz and Schoenauer (1996). These are labeled
G2, G3MOD, G4, G5MOD, G6, G7, G8, G9, and G10. The G3MOD and G5MOD
problems are obtained from G3 and G5 by replacing all equality constraints with
inequality constraints. The Hesse problem is from Hesse (1973). Finally, four of the
test problems are the 30-dimensional versions of the problems C07, C08, C14 and
C15 from Mallipeddi and Suganthan (2010).
80
R.G. Regis
log(1 + x)
log(1 x)
if x 0
if x < 0
where log is the natural logarithm. The mathematical properties of this transformation
are discussed in Regis and Shoemaker (2013a). In particular, it is strictly increasing,
symmetric with respect to the origin, and it tones down extremely high or extremely
negative function values without changing the location of the local minima and
maxima.
Welded Beam (WB4) (Coello Coello and Mezura-Montes 2002; Hedar 2004):
f (x) = 1.10471x12 x2 + 0.04811x3 x4 (14.0 + x2 )
s.t.
P = 6, 000, L = 14, E = 30 106 , G = 12 106
tmax = 13600, smax = 30, 000, xmax = 10, dmax = 0.25
E/G
4.013E
3
x
x
Pc =
1
0.25x
3 4
3
6L 2
L
t = t12 + t1 t2 x2 /R + t22
s = 6PL/(x4 x32 )
d = 4PL 3 /(Ex4 x33 )
g1 (x) = (t tmax )/tmax 0
g2 (x) = (s smax )/smax 0
g3 (x) = (x1 x4 )/xmax 0
g4 (x) = (0.10471x12 + 0.04811x3 x4 (14.0 + x2 ) 5.0)/5.0 0
g5 (x) = (d dmax )/dmax 0
g6 (x) = (P Pc )/P 0
0.125 x1 10, 0.1 xi 10 for i = 2, 3, 4
81
Pressure Vessel Design (PVD4) (Coello Coello and Mezura-Montes 2002; Hedar
2004):
f (x) = 0.6224x1 x3 x4 + 1.7781x2 x32 + 3.1661x12 x4 + 19.84x12 x3
s.t.
g1 (x) = x1 + 0.0193x3 0
g2 (x) = x2 + 0.00954x3 0
g3 (x) = plog( x32 x4 43 x33 + 12, 96, 000) 0
0 x1 , x2 1, 0 x3 50, 0 x4 240
Speed Reducer (SR7) (Floudas and Pardalos 1990):
f (x) = 0.7854x1 x22 A 1.508x1 B + 7.477C + 0.7854D
where
A = 3.3333x32 + 14.9334x3 43.0934
B = x62 + x72
C = x63 + x73
D = x4 x62 + x5 x72
s.t.
g1 (x) = (27 x1 x22 x3 )/27 0
g2 (x) = (397.5 x1 x22 x32 )/397.5 0
g3 (x) = (1.93 (x2 x64 x3 )/x43 )/1.93 0
g4 (x) = (1.93 (x2 x74 x3 )/x53 )/1.93 0
0.5
A1 = (745x4 /(x2 x3 ))2 + (16.91 106 )
B1 = 0.1x63
g5 (x) = ((A1/B1) 1100)/1100 0
0.5
A2 = (745x5 /(x2 x3 ))2 + (157.5 106 )
B2 = 0.1x73
g6 (x) = ((A2/B2) 850)/850 0
g7 (x) = (x2 x3 40)/40 0
g8 (x) = (5 (x1 /x2 ))/5 0
g9 (x) = ((x1 /x2 ) 12)/12 0
g10 (x) = (1.9 + 1.5x6 x4 )/1.9 0
g11 (x) = (1.9 + 1.1x7 x5 )/1.9 0
2.6 x1 3.6, 0.7 x2 0.8, 17 x3 28
7.3 x4 , x5 8.3, 2.9 x6 3.9, 5.0 x7 5.5
82
R.G. Regis
f (x) =
d
2
i=1 ixi
s.t.
d
/plog(10d ) 0
g1 (x) = plog xi + 0.75
g2 (x) =
i=1
d
xi 7.5d /(2.5d) 0
i=1
0 xi 10 for i = 1, 2, . . . , d
G3MOD (Michalewicz and Schoenauer 1996) (d = 20):
d
d
xi
f (x) = plog ( d)
i=1
s.t.
g1 (x) =
d
xi2 1 0
i=1
0 xi 1 for i = 1, 2, . . . , d
G4 (Michalewicz and Schoenauer 1996):
f (x) = 5.3578547x32 + 0.8356891x1 x5 + 37.293239x1 40792.141
s.t.
u = 85.334407 + 0.0056858x2 x5 + 0.0006262x1 x4 0.0022053x3 x5
g1 (x) = u 0
g2 (x) = u 92 0
83
84
R.G. Regis
s.t.
g1 (x) = x12 x2 + 1 0
g2 (x) = 1 x1 + (x2 4)2 0
0 x1 , x2 10
G9 (Michalewicz and Schoenauer 1996):
f (x) = (x1 10)2 + 5(x2 12)2 + x34 + 3(x4 11)2
+10x56 + 7x62 + x74 4x6 x7 10x6 8x7
s.t.
g1 (x) = (2x12 + 3x24 + x3 + 4x42 + 5x5 127)/127 0
g2 (x) = (7x1 + 3x2 + 10x32 + x4 x5 282)/282 0
g3 (x) = (23x1 + x22 + 6x62 8x7 196)/196 0
g4 (x) = 4x12 + x22 3x1 x2 + 2x32 + 5x6 11x7 0
10 xi 10 for i = 1, . . . , 7
G10 (Michalewicz and Schoenauer 1996):
f (x) = x1 + x2 + x3
s.t.
g1 (x) = 1 + 0.0025(x4 + x6 ) 0
g2 (x) = 1 + 0.0025(x4 + x5 + x7 ) 0
g3 (x) = 1 + 0.01(x5 + x8 ) 0
g4 (x) = plog(100x1 x1 x6 + 833.33252x4 83333.333) 0
g5 (x) = plog(x2 x4 x2 x7 1, 250x4 + 1, 250x5 ) 0
g6 (x) = plog(x3 x5 x3 x8 2, 500x5 + 12, 50, 000) 0
102 x1 104 , 103 x2 , x3 104 ,
10 xi 103 for i = 4, 5, . . . , 8
85
Hesse (1973):
f (x) = 25(x1 2)2 (x2 2)2 (x3 1)2 (x4 4)2 (x5 1)2 (x6 4)2
s.t.
g1 (x) = (2 x1 x2 )/2 0
g2 (x) = (x1 + x2 6)/6 0
g3 (x) = (x1 + x2 2)/2 0
g4 (x) = (x1 (3x2 ) 2)/2 0
g5 (x) = (4 (x3 3)2 x4 )/4 0
g6 (x) = (4 (x5 3)2 x6 )/4 0
0 x1 5, 0 x2 4, 1 x3 5
0 x4 6, 1 x5 5, 0 x6 10
d1
[100(zi2 zi+1 )2 + (zi 1)2 ]
i=1
d
1
yi2
g1 (x) = 0.5 exp 0.1
d
i=1
d
1
3 exp
cos(0.1yi ) + exp(1) 1
d
i=1
140 xi 140, i = 1, . . . , d
C08 (Mallipeddi and Suganthan 2010):
f (x) =
d1
[100(zi2 zi+1 )2 + (zi 1)2 ]
i=1
86
R.G. Regis
d
1
g1 (x) = 0.5 exp 0.1
yi2
d
i=1
d
1
3 exp
cos(0.1yi ) + exp(1) 1
d
i=1
140 xi 140, i = 1, . . . , d
d1
i=1
g2 (x) =
d
(yi cos( |yi |)) d 0
i=1
d
g3 (x) =
(yi sin( |yi |)) 10d 0
i=1
1,000 xi 1,000, i = 1, . . . , d
d1
[100(zi2 zi+1 )2 + (zi 1)2 ]
i=1
g2 (x) =
d
87
(yi cos( |yi |)) d 0
i=1
d
g3 (x) =
(yi sin( |yi |)) 10d 0
i=1
1,000 xi 1,000, i = 1, . . . , d
1
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.9
0.8
0.7
0.6
d () 0.5
s
0.4
0.3
0.2
0.1
0
10
Fig. 3.10 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G3MOD problem
88
R.G. Regis
6
1
0.9
0.8
0.7
0.6
d () 0.5
s
0.4
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.3
0.2
0.1
0
10
Fig. 3.11 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the C07 problem
ds()
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.5
0.4
0.3
0.2
0.1
0
10
15
20
25
30
Fig. 3.12 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Hesse problem
89
1
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.9
0.8
0.7
0.6
d () 0.5
s
0.4
0.3
0.2
0.1
0
10
20
30
40
50
60
70
80
90
100
Fig. 3.13 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G8 problem
1
0.9
0.8
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.7
0.6
d ()
s
0.5
0.4
0.3
0.2
0.1
0
10
15
20
25
30
Fig. 3.14 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Speed Reducer
(SR7) problem
90
R.G. Regis
Data profiles up to 10 simplex gradients (constraint tolerance = 106 )
1
0.9
0.8
0.7
0.6
ds() 0.5
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.4
0.3
0.2
0.1
0
10
Fig. 3.15 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the C08 problem
1
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.9
0.8
0.7
0.6
d ()
s
0.5
0.4
0.3
0.2
0.1
0
10
15
20
25
30
Fig. 3.16 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G9 problem
91
6
1
0.9
0.8
0.7
0.6
ds()
0.5
(2+2)TRICEPSRBF
(2+2)CEPRBF
(2+2)PenCEPRBF
ConstrLMSRBF
Scatter Search
Stochastic Ranking ES
SDPENm
0.4
0.3
0.2
0.1
0
10
15
20
25
30
35
40
45
50
Fig. 3.17 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Pressure Vessel
Design (PVD4) problem
References
Araujo MC, Wanner EF, Guimares FG, Takahashi RHC (2009) Constrained optimization based on
quadratic approximations in genetic algorithms. In: Mezura-Montes E (ed) Constraint-handling
in evolutionary computation. Studies in Computational Intelligence, vol 198, Chapter 9. Springer,
Berlin, pp 193217
Arnold DV, Hansen NA (2012) (1 + 1)-CMA-ES for constrained optimisation. In: 2012 genetic
and evolutionary computation conference (GECCO 2012), Philadelphia, July 2012. ACM Press,
pp 297304
Basudhar A, Dribusch C, Lacaze S, Missoum S (2012) Constrained efficient global optimization
with support vector machines. Struct Multidiscip Optim 46(2):201221
Beightler CS, Phillips DT (1976) Applied geometric programming. Wiley, New York
Bjrkman M, Holmstrm K (2000) Global optimization of costly nonconvex functions using radial
basis functions. Optim Eng 1(4):373397
Coello Coello CA (2012) Constraint-handling techniques used with evolutionary algorithms. In:
Proceedings of the genetic and evolutionary computation conference (GECCO 2012) companion,
pp 849872
Coello Coello CA, Mezura-Montes E (2002) Constraint-handling in genetic algorithms through the
use of dominance-based tournament selection. Adv Eng Inform 16(3):193203
Coello Coello CA, Landa-Becerra R (2004) Efficient evolutionary optimization through the use of
a cultural algorithm. Eng Optim 36(2):219236
Datta R, Deb K (2013) Individual penalty based constraint handling using a hybrid bi-objective and
penalty function approach. In: 2013 IEEE congress on evolutionary computation (CEC 2013),
Cancn, Mxico, June 2013. IEEE Press, pp 27202727
92
R.G. Regis
Deb K, Datta R (2013) A bi-objective constrained optimization algorithm using a hybrid evolutionary and penalty function approach. Eng Optim 45(5):503527
Egea JA, Rodriguez-Fernandez M, Banga JR, Mart R (2007) Scatter search for chemical and
bioprocess optimization. J Glob Optim 37(3):481503
Egea JA, Vazquez E, Banga JR, Mart R (2009) Improved scatter search for the global optimization
of computationally expensive dynamic models. J Glob Optim 43(23):175190
Emmerich MTM, Giannakoglou K, Naujoks B (2006) Single- and multiobjective evolutionary
optimization assisted by Gaussian random field metamodels. IEEE Trans Evol Comput 10(4):421
439
Emmerich M, Giotis A, zdemir MM, Bck T, Giannakoglou K (2002) Metamodel-assisted evolution strategies. In: Parallel problem solving from nature VII, pp 362370
Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization
algorithms. Springer, Berlin
Gieseke F, Kramer O (2013) Towards non-linear constraint estimation for expensive optimization.
In: Esparcia-Alczar AI, Isabel A (eds) Evoapplications. Lecture Notes in Computer Science, vol
7835. Springer, Berlin, pp 459468
Gutmann H-M (2001) A radial basis function method for global optimization. J Glob Optim
19(3):201227
Hedar A (2004) Studies on metaheuristics for continuous global optimization problems. PhD thesis,
Kyoto University, Japan
Hesse R (1973) A heuristic search procedure for estimating a global solution of nonconvex programming problems. Oper Res 21:12671280
Isaacs A, Ray T, Smith W (2007) An evolutionary algorithm with spatially distributed surrogates
for multiobjective optimization. In: Randall M et al (eds) Proceedings of the 3rd Australian
conference on progress in artificial life (ACAL 2007) Lecture Notes in Computer Science, vol
4828. Springer, pp 257268
Isaacs A, Ray T, Smith W (2009) Multiobjective design optimization using multiple adaptive spatially distributed surrogates. Int J Prod Dev 9(13):188217
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.
Swarm Evol Comput 1(2):6170
Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate
fitness functions. IEEE Trans Evol Comput 6(5):481494
Jones DR (2008) Large-scale multi-disciplinary mass optimization in the auto industry. In: MOPTA,
(2008) modeling and optimization: theory and applications conference, Ontario, Canada, August
2008
Kazemi M, Wang GG, Rahnamayan S, Gupta K (2011) Metamodel-based optimization for problems
with expensive objective and constraint functions. ASME J Mech Des 133(1):014505
Kramer O, Barthelmes A, Rudolph G (2009) Surrogate constraint functions for CMA evolution
strategies. In: Mertsching B, Hund M, Aziz MZ (eds) KI, Lecture Notes in Computer Science,
vol 5803. Springer, pp 169176
Liuzzi G, Lucidi S, Sciandrone M (2010) Sequential penalty derivative-free methods for nonlinear
constrained optimization. SIAM J Optim 20(5):26142635
Loshchilov I, Schoenauer M, Sebag M (2012) Self-adaptive surrogate-assisted covariance matrix
adaptation evolution strategy. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2012), pp 321328
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Mezura-Montes E, Coello Coello CA (2005) A simple multimembered evolution strategy to solve
constrained optimization problems. IEEE Trans Evol Comput 9(1):117
Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194
93
94
R.G. Regis
Tolson BA, Shoemaker CA (2007) Dynamically dimensioned search algorithm for computationally
efficient watershed model calibration. Water Resour Res 43:W01413
Viana FAC, Haftka RT, Watson LT (2010) Why not run the efficient global optimization algorithm with multiple surrogates? In: 51st AIAA/ASME/ASCE/AHS/ASC structures, structural
dynamics, and materials conference. Orlando
Wang Y, Cai Z (2012) Combining multiobjective optimization with differential evolution to solve
constrained optimization problems. IEEE Trans Evol Comput 16(1):117134
Wanner EF, Guimars FG, Takahashi RH, Saldanha RR, Fleming PJ (2005) Constraint quadratic
approximation operator for treating equality constraints with genetic algorithms. In: 2005 IEEE
congress on evolutionary computation (CEC 2005), vol 3. IEEE Press, Edinburgh, pp 22552262
Wild SM, Shoemaker CA (2011) Global convergence of radial basis function trust region derivativefree algorithms. SIAM J Optim 21(3):761781
Wild SM, Regis RG, Shoemaker CA (2008) ORBIT: optimization by radial basis function interpolation in trust-regions. SIAM J Sci Comput 30(6):31973219
Zhou Z, Ong YS, Nair PB, Keane AJ, Lum KY (2007) Combining global and local surrogate
models to accelerate evolutionary optimization. IEEE Trans Syst, Man, Cybern Part C: Appl Rev
37(1):6676
Chapter 4
R. Allmendinger (B)
Department of Biochemical Engineering, University College London,
Torrington Place, London WC1E 7JE, UK
e-mail: r.allmendinger@ucl.ac.uk
URL: http://www.ucl.ac.uk/ucberal
J. Knowles
University of Manchester, School of Computer Science, Oxford Road,
Manchester M13 9PL, UK
e-mail: j.knowles@manchester.ac.uk
URL: http://www.cs.man.ac.uk/jknowles
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_4
95
96
4.1 Introduction
In this chapter, we discuss a new and broad class of constraint that departs quite
strongly from those considered usually in optimization. While typical or standard
constraints place limits on the feasible region (hard constraints), or suggest strong
preferences on solutions (soft constraints), the constraints we describe here instead
pose limits on which solutions in a search space are evaluable. That is to say, when
a solution violates one or more of these constraints, it is not possible to evaluate
that solution on the objective function, even though it may later turn out to be a
good solution to the problem, and one that is feasible in the normal sense. The type
of constraint we discuss here is called an ephemeral resource constraint (or ERC),
and we have introduced it in a number of recent papers (Allmendinger and Knowles
2010, 2011, 2013).
As the name suggests, ERCs arise only temporarily or dynamically during optimization (i.e., are ephemeral) and come about due to limitations on the resources
needed to evaluate (or construct) a solution. As we will explain in detail below, the
motivation for these constraints comes about from considering (mainly though not
exclusively) problems sometimes referred to as closed-loop optimization problems.1
In a closed-loop problem, candidate solutions are evaluated experimentally, and may
need to be realized physically, chemically, or in some other tangible way, thus requiring the use or availability of resources. From this reliance on resourceswhich may
be limitedit follows that candidate solutions cannot be guaranteed to be evaluable
(realizable) at all times during optimization. Thus, both evaluable and non-evaluable
solutions can coexist in the search space, and the boundaries between them can be
described as dynamic (or ephemeral) constraints.
These constraints, and the non-evaluability of solutions, is not rare in practical
applications; for example, Finkel and Kelley (2009) lists eight references where
solutions were non-evaluable, and more examples are given in Knowles (2009),
Allmendinger (2012), as well as later in this chapter. We are also aware from
personal communication that such resourcing issues have been faced by Schwefel
(in his famous jet nozzle optimization experiments from the 70s) (Schwefel 1968;
Klockgether and Schwefel 1970) and others, even if not always reported in the literature. Since closed-loop problems are quite various (see, e.g.,Schwefel (1968),
Klockgether and Schwefel (1970), Judson and Rabitz (1992), Shir (2008), Caschera
et al. (2010), Small et al. (2011), Vaidyanathan et al. (2003), OHagan et al. (2005,
2007), Thompson (1996), Herdy (1997), Knowles (2009) and the tutorials Shir and
Bck (2009), Bck et al. (2010)) and are growing in importance in a number of
domains (e.g., high-throughput automated science, as in Bedau (2010)), it seems
timely to consider the effects these resourcing issues (ERCs) can have on optimization performance, and this has been our objective in recent work.
In this chapter, our aims are threefold. First, we wish to summarize the terminology
and framework for describing ERCs reported in earlier papers (Sects. 4.2 and 4.3).
1
When an EA is used, closed-loop optimization may also be referred to as evolutionary experimentation (Rechenberg 2000) or experimental evolution.
97
Secondly, we wish to augment this earlier work with a theoretical study that considers
the fundamental effects of ERCs on simple evolutionary algorithms (Sect. 4.4). Third,
we evaluate some of the methods we have proposed for handling ERCs and consider
how these can be developed further (Sects. 4.54.8).
Optimization algorithm
(running on a computer)
Decision variables (genotype) of solution x
Physical Experimentation or
expensive computer simulations
Prototype x
E.g.: - Mix drugs
- Adjust instrument
- Run simulation
Noisy measurements
of quality f ( x)
Phenotype of x
Fig. 4.1 Schematic of closed-loop optimization. The genotype of a candidate solution x is generated
on the computer but its phenotype is experimentally prototyped. The quality or fitness f (x) of a
solution may be obtained experimentally too and thus may be subject to measurement errors (noise)
2
98
The main job in defining ERCOPs, and simulating them so that they can be studied,
is to specify what happens when a candidate solution cannot be evaluated. In a real
situation, when a candidate solution proposed by the optimization algorithm is found
to be non-evaluable, an operator or scientist within the loop (if there is such a person)
may notice, and can choose to ignore this solutionto miss it out. This may seem
to be an adequate solution, but there are several issues here. We need to consider
at what time it is known that a solution cannot be evaluated, for how long it can
remain non-evaluable, whether new resources can be requested in order to fulfill the
optimizers request to evaluate that solution, whether the optimizer is informed that
the solution could not be evaluated, and so on.
If we are able to specify these things, then we can also imagine a range of possible
(automated) remedial actions that the optimizer can take when it is informed about
non-evaluable solutions. It could automatically order more resources, it could wait
(stopping all solution evaluations until the non-evaluable one is again evaluable), it
could carry on and assign the non-evaluable solution a dummy value (or no value
at all), or it could place the non-evaluable solution in a queue to be evaluated later on.
All these types of responses need to be possible within the framework that we use
to describe ERCOPs. To keep things as general and flexible as possible, our ERCOP
framework consists of just two essentials: (1) ERCs are functions of a number of
(visible or hidden) variables which determine when they are switched on and (2) the
optimizer has access to a number of additional functions that allow it to operate in
a well-defined manner when a solution is non-evaluable. To achieve this, and to be
able to talk meaningfully about the performance of optimizers, we also embed the
optimization process in a global clock, so that every action is synchronized and its
time cost can be accounted for. In the following, we put these essentials in a more
mathematical form.
99
100
Activation period
Constraint schema
start
t ctf
101
recovery period
end
t ctf
Fig. 4.2 An illustration of how the available optimization time T can be divided into the preparation
start , the constraint time frame t start t < t end , and the recovery period t end t T
period 0 t < tctf
ctf
ctf
ctf
In the above example, the constraint schema H represents the parameter combination that corresponds to instrument configuration b. The length of an activation
period is bounded by 0 k(j) 9. For instance, imagine we select instrument
configuration b in the middle of the day, say at 1pm, as indicated by epoch j = 1 in
the figure. This will activate the ERC for a period of k(1) = 4 (= 5 pm1 pm) hours
(indicated by the dashed part). Activating the ERC later, earlier, or not at all during
a working day changes k(j) accordingly.
start , t end , V , H).
We denote commitment relaxation ERCs by commRelaxERC(tctf
ctf
An extension to this simple commitment relaxation ERC is to maintain not only
V
k ( 1)
0
k ( 2)
...
t
T
Fig. 4.3 An illustration of how a commitment relaxation ERC may partition the optimization time
into epochs of length V , and how it may be potentially activated. The activation period k(j) during
the jth epoch is represented by the dashed part
102
one but several commitment relaxation ERCs with different constraint schemata Hi .
In this case, we need to consider three aspects: (i) a solution is non-evaluable if it
violates at least one ERC, (ii) a repaired solution has to satisfy all activated ERCs
and not only the ones that were violated, and (iii) it needs to be checked whether a
repaired solution activates an ERC that was not activated before. This extension will
be considered later in Sect. 4.6.
In the above example, the activation period is k = 1 (assuming a time step is a day),
the period length is P = 7 (i.e., a week), and the constraint schema H represents the
parameter combination that corresponds to the instruments (or their settings) operated
start , t end , k, P, H).
by engineer engi . We denote periodic ERCs by perERC(tctf
ctf
t start
ctf
t end
ctf T
103
In this example a composite is a tyre and the composite-defining bits are the
variables defining a tyre. Ordering tyres is associated with a time lag of TL = 3
(assuming a time step is one day), and tyres have a reuse number of RN = 5 and
a shelf life of SL = 30 (assuming one month consists of 30 days). The number of
storage cells is #SC = 10, and the costs associated with a composite order and time
step are corder = 500 and ctime_step = 3,000, respectively.
104
001
101
SL = 2 SL = 3
RN = 5 RN = 4
000
111
SL = 7 SL = 1
RN = 1 RN = 6
x = ( 10101)
EA
Experiment
f ( x)
000
111
SL = 6 SL = 0
RN = 1 RN = 6
001
101
SL = 1 SL = 2
RN = 5 RN = 3
EA
t +1
step )
110
101
000
SL = 20 SL = 2 SL = 6
RN = 10 RN = 3 RN = 1
011
SL = 20
RN = 10
...
...
(c+ = ctime
Fig. 4.5 A visual example of the commitment composite ERC commCompERC(H# = {###
}, #SC = 4, TL = 1, RN = 10, SL = 20); each composite order and time step costs corder and
ctime_step units, respectively. The evaluation step at time step t reduces the reuse number of the
composite in cell 2. At the same time step, the shelf life of the composite in cell 4 expires, and two
new composites are ordered. One time step later, t + 1, the ordered composites arrive and put into
cells determined by the EA
105
simulation results are analyzed and summarized. The Markov chain model presented
here is based on an analysis we carried out in Allmendinger (2012).
(4.1)
where u0 is the (+1)-dimensional probability vector that represents the initial distribution over the set of states.
When an EA is modeled by a Markov chain it is easy to see that the population is
the natural choice for describing a state. The transition probabilities then express the
likelihoods that an EA changes from a current population to any other possible population after applying the stochastic effects of selection, crossover, and/or mutation.
It is also possible to consider other effects such as noisy fitness functions (Nakama
2008), niching (Horn 1993) and elitism (He and Yao 2002). Once the transition
matrix is calculated it can be used to calculate a variety of measurements, such as
the first hitting time of a particular state or the probability of hitting a state at all. An
overview of tools of Markov chain analysis can be found in any general textbook on
stochastic processes, such as Norris (1998), Doob (1953).
The drawback of modeling EAs with Markov chains is that the size of the required
transition matrix grows exponentially in both the population size and string length. To
keep Markov chain models manageable it is therefore common to use small population sizes and string lengths (Goldberg and Segrest 1987; Horn 1993). Other options,
which allow the modeling of more realistic EAs, are to make simplifying assumptions about the state space (Mahfoud 1991) or to use matrix notation only (Vose and
Liepins 1991; Nix and Vose 1992; Davis and Principe 1993).
106
mf (A)
.
mf (A) + ( m)f (B)
(4.2)
As there are only two individuals types in total, the probability of choosing a type B
individual is Pm (B) = 1 Pm (A). From the above equation it is apparent that once
a uniform population is reached, i.e., m = 0 or , there is no chance of selecting
individuals from the other type. Thus, the two corresponding states S0 and S are
absorbing states.
Under tournament selection we first randomly select a number of individuals from
the population (with replacement) and then perform a tournament among them with
the fittest one serving subsequently as a parent. It is common to use a tournament
size of two, which will also be used here; this selection strategy is known as binary
tournament selection (BTS). The result of a tournament is clear: the individual with
the higher fitness wins the tournament; there is a draw if an individual meets another
individual with the same fitness in which case the winner is randomly determined; and
an individual will be the winner of a tournament with itself. We distinguish two cases
regarding the fitness of the individual types: (i) f (A) = f (B) and (ii) f (A) > f (B).
107
The following selection probabilities are obtained for each of the cases:
f (A) = f (B) :
f (A) > f (B) :
2
m
m( m)
+
2
2
m
m( m)
Pm (A) =
+2
.
Pm (A) =
(4.3)
(4.4)
r = 1, . . . , .
=
Pm (A)r (1 Pm (A))r .
r
For m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
With steady state reproduction, the population is updated after each selection
step. Usually, an offspring individual replaces the worst individual in the population. This replacement strategy, however, is elitist and ensures that the number of
the less fit individual type in the population does not increase. Thus, to allow for a
fair comparison with GGA, an offspring does not replace the worst individual in the
population but a randomly chosen one regardless of its fitness; we denote this reproduction scheme by SSGA (rri), where rri refers to replacing a random individual. It
has been shown elsewhere (Syswerda 1991) that GGA and SSGA (rri) yield similar
performance. Bearing in mind that one time step corresponds to one selection step
with SSGA (rri), we obtain the following transition probabilities:
108
For m = 0
pmm = 1
pmr = 0,
(4.5)
r = 1, . . . , .
pmr
pmm1
pmm
pmm+1
pmr
r = 0, . . . , m 2
m
= (1 Pm (A))
m
m
= Pm (A) + (1 Pk (A))
n
( m)
= Pm (A)
= 0, r = m + 2, . . . , .
For m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
The transition probabilities of either GGA or SSGA (rri) will be the entries of the
transition matrix P.
109
If we are in state S0 and the ERC is activated, then S0 is not an absorbing state
anymore and we move directly to state Sk .
As a population contains at least k type A individuals after lifting the constraint,
we are not able to move to a state Sr with r < k during the constrained generation
(time step).
The ERC reduces the number of freely selected offspring down to new = k.
Moving to a state Sr with r > k is already achieved by selecting r new = r k
(instead of r) type A individuals from the current population.
Considering these points, we derive for the time step for which the ERC is activated
the following constrained transition probabilities for GGA:
For m = 0
pmr = 0, r = 0, . . . , k 1, k + 1, . . . ,
pmk = 1.
(4.6)
new
new
new
new
Pm (A)r (1 Pm (A)) r .
=
new
r
For m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
The above periodic ERC is set such that the activation period of k selection steps
is upper bounded by the population size , and, in the case of GGA, starts and
ends within a single time step (generation). This does not need to be necessarily the
case. In fact, a periodic ERC can feature an activation period k that is so long that it
constrains selection steps within two or more successive generations, or so short that
several activation periods may start during a single generation. In such scenarios,
one needs to constrain all generations that are subject to constrained selection steps.
The number of constrained selection steps within a generation, referred to as k in
Eq. (4.6), is then simply the sum of all selection steps that happen to be constrained
during any particular generation. That is, depending on the ERC, the number of
constrained selection steps may change between generations.
With SSGA (rri), the population is updated after each selection step, which remember is a single time step with this scheme. This means that we need to determine for
each selection step (time step) separately whether it lies within the activation period
and thus is constrained or not. During the activation period, the periodic ERC of
110
above prevents us from moving from a current state Sm to a state Sm1 , which can
only be reached if a type B individual replaces a type A individual. As above, if the
constraint is active, then the state S0 is not an absorbing state anymore, and we move
directly to state S1 . We obtain the following new transition probabilities for each of
the k constrained time steps:
For any m = 0
pmr = 0, r = 0, 2, 3, . . . ,
pm1 = 1.
(4.7)
= 0, r = 0, . . . , m 1
m
=
m
=
= 0, r = m + 2, . . . , .
For any m =
pmr = 0, r = 0, . . . , 1
pmm = 1.
We will denote the transition matrix with the constrained transition probabilities
by Pc .
111
0 t < i,
u0 Pi Pti
c ,
i g tgi
u0 P Pc P
,
i t < g + i,
g + i t,
where the entries of the transition matrices P and Pc are calculated using Eqs. (4.4)
and (4.6), respectively. The probability vector u0 of the initial state distribution has
a value of 1 at the ith entry and a value of 0 in the others, if we want to start with a
population of exactly i type A individuals.
One time step with GGA corresponds to time steps with SSGA (rri). To compute
the probability vector u for SSGA (rri) we thus need to look at the state distributions
at time step t:
ut = u0 Pt ,
ut =
ut =
0 t < i,
u0 P (Pkc Pk )(ti) ,
u0 Pi (Pkc Pk )g P(tgi) ,
i
i t <g+i
g + i t,
where the transition matrices P and Pc are calculated according to Eqs. (4.5) and
(4.7), respectively.
Having obtained the probabilities of ending up in all the different states, we can
calculate the expected proportions ct (A) and ct (B) of type A and B individuals in a
population at time step t (or t in the case of SSGA (rri)) as follows:
ct (A) =
1 i
iut , ct (B) = 1 ct (A),
i=0
112
perERC(400,450,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
200
400
600
800
1000
1200
1400
#Selection steps
Fig. 4.6 A plot showing the proportion of type B individuals ct (B) for GGA and SSGA (rri) as a
function of the number of selection steps for the ERC perERC(400, 450, 20, 50, H = (A)). Both
individual types have equal fitness and the constraint settings used are given above the plot. The
terms real and expected refer to proportions obtained by actually running the EA, respectively, by
running the Markov chain. The EA results are averaged across 500 independent runs
types in the population. However, because of genetic drift this is impossible and
an EA eventually converges to a uniform population (i.e., states Si = 0, n). As
the probability of ending up in one of the two states is proportional to the initial
state, the expected individual type proportion is identical to the initial proportion,
which is specified by u0 . Thus, for a random initialization, the expected proportion
is 0.5.
From Fig. 4.6 we can see that an expected proportion of 0.5 is achieved until
selection step 400 at which we activate the periodic ERC, perERC(400, 450, 20, 50,
H = (A)), which has a unique activation period of k = 20 selection steps.4 This
ERC forces us to evaluate k = 20 type A individuals and subsequently, reduces
(increases) the proportion of type B (A) individuals in the population. After the ERC
is lifted at selection step 420, the expected individual type proportion does not get
back to the initial proportion. Although this effect can be put down to the specifics
of the model (no selection pressure toward either individual type), we will see in the
following theoretical and experimental studies several results which display a similar
pattern. That is, a constraint can have a permanent or long-lived effect on search
performance even if it was active for a short time only.
From the figure we can also see that the proportion is affected more severely for
GGA than for SSGA (rri). The reason that SSGA (rri) is more robust is that with this
reproduction scheme there is a chance that an offspring of type A replaces another type
A individual that is currently in the population. Of course, if an offspring replaces
a solution of the same type, then this will not affect the proportion. By contrast,
with GGA, all offspring are carried over to the population of the next generation.
4 Note, in an EA performing optimization of a function, the number of performed selection steps
displayed on the x-axes of Fig. 4.6 would be equivalent to the number of performed function evaluations.
113
0.5
GGA
SSGA (rri)
0.4
0.3
0.2
0.1
0
25
50
75
100
125
150
Activation period k
114
perERC(50,400,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
200
400
600
800
1000
1200
1400
#Selection steps
perERC(50,400,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
200
400
600
800
1000
1200
1400
#Selection steps
Fig. 4.8 Plots showing the proportion of type B individuals ct (B) for FPS (top) and BTS (bottom)
as a function of the number of selection steps for the ERC perERC(50, 400, 20, 50, H = (A)). The
term unconstrained refers to the proportions obtained in an ERC-free environment
schemes: GGA with FPS and SSGA (rri) with FPS (top plot), and GGA with BTS
and SSGA (rri) with BTS (bottom plot).5
We want to point out that during activation periods, SSGA (rri) with BTS and
FPS perform identically, since independently of selection type, an A offspring will
replace an individual selected at random. But during the inactive periods, the stronger
selection pressure of BTS recovers more of the B-to-A replacements, so that overall
BTS maintains a higher proportion of Bs. This behavior can be seen in the zigzag
shape, where there is the same steep falloff of fitness in both methods, but a steeper
recovery for BTS. Overall, the same is true for GGA, (BTS is better for the same
reason) but it is not possible to see this so clearly in the plots.
We get the zigzag-shaped line for SSGA (rri) during the constraint time frame because ct (B) is
plotted after each time step containing here of one selection step. For GGA the change in ct (B) is
smooth because a time step consists of selection steps.
start
perERC(tstart
ctf ,tctf +350,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
200
400
600
800
1000
115
perERC(50,400,k,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10
20
30
40
50
Activation period k
perERC(50,400,20,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
1.5
2.5
3.5
4.5
Fig. 4.9 Plots showing the proportion of type B individuals ct (B) at selection step 1,500 as a
start (left) and the activation period k (right) for
function of the start of the constraint time frame tctf
start , t start + 350, 20, 50, H = (A)) and perERC(50, 400, k, 50, H = (A)),
the ERCs perERC(tctf
ctf
respectively
perERC(50,550,25,50,H=(A))
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
1.5
2.5
3.5
4.5
Fig. 4.10 Plots showing the proportion of type B individuals ct (B) at selection step 1,500 as a
function of the fitness ratio f (A)/f (B) for the ERCs perERC(50, 400, 20, 50, H = (A)) (left) and
perERC(50, 550, 25, 50, H = (A)) (right)
Figures 4.9 and 4.10 indicate how the proportion of type B individuals is affected
when altering the constraint parameters. We can observe that:
Longer activation periods degrade the performance of all EAs (see right plot of
Fig. 4.9).
Fixing the constraint time frame duration, but translating it (see left plot of Fig. 4.9),
yields a non-monotonic effect on performance (of all EAs, but most apparently
with FPS): more preparation time gives more time to fill the population with fit
individuals, whereas little recovery time detriments final fitness. These two effects
trade off against each other.
Changing the fitness ratio (see Fig. 4.10) has only a switching effect on BTS (when
the fitter individual changes), but for FPS the ratio smoothly affects final proportion
up to a saturation point.
116
Overall, comparing GGA with SSGA we see that SSGA achieves the higher proportion of fit individuals during the constraint time frame, and it recovers more
rapidly after the constraint is lifted, but its rate of recovery does not reach the rate
achieved by GGA, and ultimately GGA reaches a higher proportion (see Figs. 4.7
and 4.8). This can be explained by the replacement strategy of SSGA (rri): offspring
may replace individuals in the population that are from the same type. During the
activation period, this is beneficial as the number of poor type A individuals in the
population does not increase linearly with the activation period. However, during
the unconstrained selection steps, this may be disruptive in the sense that fit type
B offspring may replace other type B individuals of the current population, which
slows down the convergence.
117
Offspring
Regenerating
X
Forcing
Member of Pop
xt
Member of SP
xt,repaired
xt,repaired
E( t )
xt,repaired
Member of both
Pop and SP
Offspring individual and the
potential repaired versions of it
Subpopulation strategy
Fig. 4.11 A depiction of the current population Pop (filled circles and squares) and an offspring
individual xt , which is feasible but not evaluable (because it is in X but not in E(t )). Solutions
indicated by the filled squares coexist in both the actual EA population Pop and the population SP
maintained by the subpopulation strategy. The three solutions xt,repaired indicate repaired solutions
that might have resulted after applying one of the three repairing strategies to xt : while forcing
simply flips incorrectly set bits of xt and thus creates a repaired solution that is as close as possible
to xt but not necessarily fit, regenerating creates a new solution in E(t ) using the genetic material
available in Pop. Similarly, the subpopulation strategy creates also a new solution but uses the
genetic material available in the subpopulation SP (empty and filled squares), which contains only
solutions from E(t )
form to other ERCs). The strategies are static in the sense that they deal with a nonevaluable solution always in the same pre-specified way, as opposed to learningbased strategies that switch between different static strategies during search (see
Sect. 4.6). Some of the static strategies are based on constraint-handling strategies
developed for standard constraints, and this will be pointed out where applicable.
Figure 4.11 depicts how the three repairing strategies, forcing, regenerating, and the
subpopulation strategy, may handle a non-evaluable solution. Below we describe
each static strategy in detail.
1. Forcing. Upon encountering a non-evaluable solution, this strategy forces it into
the constraint schemata Hi of all activated ERCs ERC i , i = 1, . . . , r by flipping
all solution bits that are different from the order-defining bit values of Hi . Similar
repairing strategies have been proposed, e.g., in Liepins and Potter (1991).
2. Regenerating. This strategy, which is similar to the death penalty method
(Schwefel 1975), avoids the evaluation of a non-evaluable solution by iteratively
creating new solutions, based on the current parent population, until an evaluable
one has been created or until L regeneration trials have passed without success. In
the latter case, we pick the solution created within the L trials that has the smallest
sum of Hamming distances to the schemata Hi of all activated ERCs and apply forcing to it. The goal of this strategy is to avoid the potential drawback of forcing of
destroying good genotypes by enforcing changes in decision variable values. On the
118
119
Parameter
Setting
50
50
1/l
0.7
120
commRelaxERC(0,700,V,H=(00***...))
0.95
0.9
Forcing
Regenerating
Waiting
Subpopulation strategy
Penalizing
0.85
0.8
0.98
0.96
0.94
0.92
0.9
0.88
0.86
0.84
10
12
14
commRelaxERC(0,700,15,H=(00***...))
1
0.98
0.96
0.94
0.92
0.9
700
10
15
20
Epoch duration V
start
start
commRelaxERC(tstart
ctf ,tctf +700,15,H=(00***...)), T = tctf +700
1
0.98
0.96
0.94
0.92
0.9
800
900
1000
Optimization time T
1100
1200
100
200
300
400
500
Fig. 4.12 Plots showing the average best solution fitness found (across 500 EA runs) and its
standard error on OneMax as a function of the order of the constraint schema o(H) (top left), the
epoch duration V (top right), the optimization time T (bottom left), and the start of the constraint
start (bottom right). Note, while the optimization time in the top plots is fixed to
time frame tctf
T = 700 evaluations, the parameter T varies in the bottom plots. For each setting shown on the
abscissa, a Friedman test (significance level of 5 %) has been carried out. In the top left plot, waiting
performs best in the range 2 < o(H) < 6, while, in the top right plot, it performs best in the range
2 < V < 12 with the subpopulation strategy being best in the range V > 12. In the bottom left
plot, the subpopulation strategy performs best for T = 750, while in the bottom right plot, waiting
start < 300. There is no clear winner for the other settings
performs best in the range 0 < tctf
Shifting the start time of the constraint time frame further to the end of the optimization decreases the probability of activating a commitment relaxation ERC
that is associated with a poor constraint schema and thus has a beneficial impact
on the performance of all strategies (see bottom right plot).
Figure 4.13 analyzes the performance impact of ERCs with constraint schemata that
represent both good and poor genetic material, i.e., 0 and 1-bits are present in H. It is
obvious from the figure that the performance is affected most significantly for loworder schemata regardless of the quality of the genetic material they represent, and
schemata of higher order given they represent good genetic material (i.e., schemata
along or near the diagonal). Other schemata setups have little or no performance
impact as they do not lie on an optimizers search path, reducing the probability of
activating the associated ERC.
121
Picking schemata
at random
#1s in H
15
10
15
0.94
0.93
0.92
0.91
0.9
20
#1s in H
20
Picking schemata
at random
10
0.84
0.82
0.8
0
5
10
15
20
Order of constraint schema H, o(H)
0.98
0.96
0.94
0.92
0.9
0.88
0.86
0
5
10
15
20
Order of constraint schema H, o(H)
Fig. 4.13 Plots showing the average best solution fitness obtained (across 500 EA runs) by
forcing (left) and waiting (right) on OneMax (with l = 30 bits) as a function of the order of
the constraint schema o(H), and the number of order-defining bits in H with value 1 for the
ERC commRelaxERC(0, 700, 15, H). The straight line represents the expected performance when
picking a schema (i.e., the order-defining bits and their values) with a particular order at random.
The performance obtained in an unconstrained environment is represented by the square at
o(H) = #1s = 0
From Fig. 4.14 we can see that the performance of the strategies is affected differently when the activation period is set deterministically as done by periodic ERCs.
From the left plot we can clearly see that waiting performs worst for all ERC settings. This is due to the high probability of encountering a non-evaluable solution
during the activation period and subsequently freezing the optimization regardless of
Forcing
Regenerating
Waiting
Subpopulation strategy
Penalizing
0.99
0.98
15
Picking schemata
at random
0.99
0.98
#1s in H
perERC(0,700,20,50,H=(0o(H)***...))
0.97
0.96
10
0.97
0.96
0.95
5
0.95
0.94
0.93
0
4
6
8
10
12
Order of constraint schema H, o(H)
14
0.94
0
5
10
15
20
Order of constraint schema H, o(H)
Fig. 4.14 The left plot shows the average best solution fitness found and its standard error (across
500 EA runs) on OneMax (with l = 30 bits) as a function of the order of the constraint schema
o(H). For each setting shown on the abscissa, a Friedman test (significance level of 5 %) has been
carried out revealing that the subpopulation strategy performs best for o(H) = 2; there are no clear
winners for the other settings. The right plot shows the average best solution fitness obtained by
the subpopulation strategy as a function of both o(H) and the number of order-defining bits in
H with value 1 for the ERC perERC(0, 700, 20, 50, H). The straight line represents the expected
performance when picking a schema (i.e., the order-defining bits and their values) with a particular
order at random
122
the order and genetic material represented by a constraint schema. The performance
of the other strategies decreases more smoothly as a function of the order and the
quality of the genetic material represented, as can be seen from the right plot for the
subpopulation strategy.
123
dynamic multi-armed bandit (D-MAB) algorithm (Hartland et al. 2006, 2007; Costa
et al. 2008). The goal of the algorithm is to maximize the sum of rewards received
over a number of actions (or arms played) taken. D-MAB is dynamic in the sense
that it monitors the sequence of rewards obtained using statistical testing, and then
restarts the MAB on detecting a significant deviation in the sequence.6
Unlike the RL agent, a MAB algorithm requires that the play of an arm is followed
by a subsequent reward. We provide a reward immediately after the play of an arm,
and it is the raw fitness of the resulting solution, which is a common credit assignment
scheme.
Note, some alternative common credit assignment schemes are not directly
applicable in the presence of ERCs, such as ones that assign a credit based on the
fitness improvement of an offspring compared to its parent after applying a variation
operator to it. With ERCs, the parent would be the individual that is to be repaired and
the offspring the repaired individual after applying a constraint-handling strategy to
the parent. As we do not know the fitness of the parent because it is non-evaluable,
we cannot quantify by how much its fitness differs from the one of the repaired
individual.
124
Waiting
0.4
Subpop.strategy
Penalizing
0.2
0
0
400
Fig. 4.15 A plot showing the greedy actions a learnt by the RL agent for each state s. Training
was done across 5,000 different NK landscapes with N = 30 and K = 2. (For unvisited sates, a
default strategy would need to be selected)
It is unknown whether the schemata associated with the two ERCs represent good
or poor instrument setups. As in OHagan et al. (2005, 2007) we assume that the
fitness landscape to be optimized is subject to epistasis. Please refer to OHagan
et al. (2005, 2007), Allmendinger and Knowles (2011), Allmendinger (2012) for a
detailed description of the closed-loop problem and the ERCs.
We use NK landscapes (Kauffman 1989) to investigate the impact of the two ERCs
as a function of different levels of epistasis. Prior to applying RL-EA online we train
the RL agent offline on 5,000 different NK landscapes with N = 30 and K = 2,
which represent problems with low epistatis. Figure 4.15 shows the greedy actions
(optimal static strategies) a learnt by the agent for each state s during the training
phase. Clear patterns can be observed from the plot: the agent learned to use mainly
waiting at the beginning of the optimization process (to avoid introducing a search
bias early on), penalizing in the middle part of the optimization, and, depending on
the population average fitness, either forcing, waiting, or the subpopulation strategy,
in the final part of the optimization. Other policies, such as using only a repairing
strategy at the beginning of the optimization, were not learnt by the agent as they are
associated with the risk of converging to a homogeneous population state of which
it is difficult to escape if needed (e.g., if schemata represent poor genetic material).
Figure 4.16 compares how the policy learned by the RL agent fares against the
online-learning approach, D-MAB, and the static strategies themselves for NK landscapes with N = 30 and K = {3, 4}; using different problems for training and testing
allows us to assess the robustness of the policy learned. We can see from the plots
that although RL-EA performs poorly at the beginning of the search, at time step
t 800 the performance kicks up due to a change in the static strategy employed,
allowing RL-EA to be the best performing strategy at the end of the search. D-MAB
is not able to perform as well as RL-EA because it selects the currently most useful
static strategy (which is typically a repairing strategy) without accounting for future
consequences this might have. On the other hand, RL-EA is tuned here to optimize
0.7
0.65
Forcing
Regenerating
Waiting
Subpop. strategy
Penalizing
RL-EA
D-MAB
Unconstrained EA
0.6
0.55
0.5
500
1000
1500
0.75
0.75
2000
125
commRelaxERC(0,2000,20,H=(10101***...)),
commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4
0.7
0.65
Forcing
Regenerating
Waiting
Subpop. strategy
Penalizing
RL-EA
D-MAB
Unconstrained EA
0.6
0.55
0.5
Time counter t
500
1000
1500
2000
Time counter t
Fig. 4.16 Plots showing the population average fitness (we do not show the standard error as it was
negligible) obtained by the different constraint-handling strategies on NK landscapes with N = 30
and K = 3 (left) and K = 4 (right) as a function of the time counter t; results are averaged over
100 independent runs using a different randomly generated NK problem instance for each run.
All instances were subject to the commitment relaxation ERCs commRelaxERC(0, 2000, 20, H =
(10101 . . .)) and commRelaxERC(0, 2000, 20, H = ( . . . 101)). The results of Unconstrained EA were obtained by running the EA on the same problem instances but without the ERCs.
According to the Kruskal-Wallis test (significance level of 5 %), the final population average fitness obtained by RL-EA is significantly better than the one obtained with the second best strategy,
waiting, for both problems
the final performance only allowing it to adjust to the problem at hand. For instance, if
the would shorten the optimization time T , then the RL agent would learn a different
policy, while D-MAB would behave the same.
Overall, the strong performance of the RL-EA is encouraging, but we want to
mention that in order to achieve that performance, some tuning of the agent may be
required. For a more in-depth discussion on this topic and an experimental analysis
of alternative agent settings please refer to Allmendinger and Knowles (2011).
126
127
The instance considered is a uniform random 3-SAT problem and can be downloaded online at
http://people.cs.ubc.ca/~hoos/SATLIB/benchm.html; the name of the instance is uf50-218/uf5001.cnf. The instance consists of 218 clauses and is satisfiable. We treat this 3-SAT instance as a
MAX-SAT optimization problem, with fitness calculated as the proportion of satisfied clauses.
128
0.1
0
3
12
21
30
39
48
Number of storage cells #SC
0.5
52
Reuse number RN
52
Reuse number RN
0.4
40
0.3
28
0.2
16
0.1
0
0
12
24
36
Time lag TL
48
Fig. 4.17 Plots showing the probability of SW (left) and JIT (right) of achieving the population
average fitness of our base algorithm obtained in an ERC-free environment given a budget and
time limit of C = T = 1,500. For SW this probability is shown as a function of #SC and RN for
the ERC commCompERC(o(H# ) = 30, #SC, TL =10, RN, SL = RN), and for JIT it is shown as a
function of TL and RN for the ERC commCompERC(o(H# ) = 10, #SC =10, TL, RN, SL = RN);
cost parameters were set to corder = 0, ctime_step = 1, and C = 1,500
SW, more storage cells means that the probability of having a required composite
available increases, which in turn reduces the number of repairs. On the other hand, a
smaller reuse number (or shorter shelf life SL) shortens the time gap between asking
for a composite, i.e., adding it to the sliding window, and having it available in a
storage cell.
The performance of a just-in-time strategy, such as JIT and JITR, depends largely
on the time it takes for a resource to arrive once ordered. Consequently, we observe
from the right plot of Fig. 4.17 that the performance of JIT (also for JITR) improves
with shorter time lags TL. An increase in the reuse number RN (or shelf life SL)
yields a slight performance improvement too. The reason for this is that composites
can be kept for longer in the storage cells and thus allow for a more efficient usage of
old composites. A similar effect can be achieved by increasing the number of storage
cells SC (results not shown here).
While JIT and JITR perform similarly for large budgets, there are differences
for scenarios where budget is a limiting factor as can be seen from the right plot of
Fig. 4.18. For small budgets, in the range 0 < c 600, 0 ctime_step 0.5, JITR is
able to outperform JIT as repairing allows the evaluation of more solutions while JIT
would have to wait for suitable composites to arrive. The weak performance of JIT
for small budgets is also apparent when comparing it to SW (left plot of Fig. 4.18).
For large budgets c > 1,200, JIT is able to match and sometimes even outperform
JITR and SW as it does not introduce any search bias coming from repairing.
In the previous experiment, the number of storage cells was relatively low, which
is beneficial for SW. An increase in #SC means that more composites are regularly
ordered to fill all the storage cells. This approach is expensive and dampens the
performance of SW when compared to JIT (and JITR) as can be observed from
Fig. 4.19.
1.4
1.6
1.4
10
1.2
1
10
0.8
101
0.6
0.4
10
0.2
0
104
Cost per time step ctime_step
400
800
1200
Cost counter c
101
0
10
1.2
1
-1
10
0.8
10-2
0.6
0.4
-3
10
0.2
0
-1
10
0
129
-4
10
1600
400
800
1200
Cost counter c
1600
Fig. 4.18 Plots showing the ratio P(f (x) > fJIT )/P(f (x) > fSW ) (left) and P(f (x) > fJITR )/P(f (x) >
fJIT ) (right) as a function of c and ctime_step for the ERC commCompERC(o(H# ) = 10, #SC =
5, TL = 5, RN = 30, SL = 30) and corder = 1. Here, x is a random variable that represents
solutions drawn uniformly at random from the search space and f the population average fitness
obtained with policy . If P(f (x) > f )/P(f (x) > f ) > 1, then strategy is able to achieve a
higher average best solution fitness than strategy and a greater advantage of is indicated by a
darker shading in the heat maps; similarly, if P(f (x) > f )/P(f (x) > f ) < 1, then is better than
and a lighter shading indicates a greater advantage of
4.8 Conclusion
In this chapter we have considered a new type of (dynamic or temporary) constraint
that differs in several aspects from the traditional hard and soft constraints. Hard
constraints define the feasible region in the search space, and soft constraint express
objectives or preferences on solutions, while the constraints we discussed here specify
the set of solutions in the search space that can be evaluated at any moment in time.
That is, a solution that violates one of these constraints cannot be evaluated at the
moment although it may be a feasible solution to the problem. This constraint type
is called ephemeral resource constraint (or ERC) and is commonly encountered
in closed-loop optimization problems, where it models limitations on the resources
needed to construct and/or evaluate solutions.
50
10
42
10
34
10
26
10
18
10
10
2
3
12
21
30
39
48
10
-1
-2
-3
130
We pursued three goals in this chapter. First, we have summarized the framework
and terminology for describing ERC problems, and defined three ERC types that arise
commonly in practical applications including (i) absence of resources at regular time
intervals (periodic ERCs), (ii) temporary commitment to a certain resource triggered
on using that resource (commitment relaxation ERCs), and (iii) an ERC where costly
resources need to be purchased in advance, kept in capacity-limited storage, and
used up within a certain number of experiments or a fixed time frame (commitment
composite ERCs).
Secondly, we have extended our previous work with a theoretical study focused on
understanding the fundamental effects of ERCs on simple evolutionary algorithms
(EAs). Using the concept of Markov chains, the study concluded that (i) an order
relation-based selection operator, such as tournament selection, is more robust to
simple ERCs than a fitness proportionate-based selection operator, and (ii) while an
EA with a non-elitist generational reproduction scheme converges more quickly to
some optimal population state than with a non-elitist steady state scheme when the
ERC is active, the opposite is the case when the ERC is inactive. This result implies
that ERCs should be accounted for when tuning EAs for ERCOPs.
Third, we have summarized and evaluated empirically several of the constrainthandling methods we have proposed for handling ERCs including static and learningbased strategies (Sects. 4.5 and 4.6), as well as resource-purchasing strategies for
dealing with commitment composite ERCs (Sect. 4.7). Generally, the empirical study
revealed that ERCs affect the performance of an optimizer and that different strategies should be favored as a function of the ERC and its parameters. Moreover, we
have demonstrated here and in more detail in our previous work (Knowles 2009;
Allmendinger and Knowles 2010, 2011, 2013) that the effect of a particular ERC
is similar across different problem types, meaning that knowing about the ERC is
sufficient to select a constraint-handling strategy. Overall, we can therefore say that
if the ERCs are known in advance, then a promising strategy is one that learns offline
how to deal best with the ERCs during the optimization. As an example, in this
chapter we have seen that good results can be achieved with a reinforcement learning approach that learns offline when to switch between different static strategies
during the optimization.
131
(featuring also real or mixed integer variables) than the ones we considered so far. Of
course, it would be ideal to validate the search strategies on real-world closed-loop
problems featuring real resource constraints. However, this approach is generally not
realistic due to time and/or budgetary requirements. The next best thing we can do is
to simulate a fitness landscape based on data obtained from real-world experiments.
This is the approach we have taken in Allmendinger and Knowles (2011), and more
studies of this kind are needed.
Further theoretical analysis of resourcing issues. In Sect. 4.4 we have used Markov
chains to analyze theoretically the effect of a particular ERC type on simple EAs.
Although our analysis used a simplified optimization environment (two solution types
only), valuable observations were made with respect to the applicability of different
selection and reproduction schemes. We also gained some understanding about the
impact of ERCs on evolutionary search, which ultimately, may help us in the design
of effective and efficient search strategies for closed-loop optimization. However,
our theoretical results were limited in the sense that we did not derive mathematical equations relating, for instance, ERC configurations to optimal EA parameter
settings. It remains to be seen whether it is possible to derive such expressions, and
how applicable they would be in practice. A number of recent advances in EA theory
might present the possibility of understanding ERCs more deeply, including drift
analysis (Auger and Doerr 2011) and the fitness level method (Chen et al. 2009;
Lehre 2011).
Understanding the effects of non-homogeneous experimental costs in closedloop optimization. So far, we have made the assumption that all solution evaluations
take equal time or resources. This need not be the case. For instance, when dealing
with commitment composite ERCs, it is a very realistic scenario that the composites
to be ordered vary in their prices and delivery periods. Under a limited budget, this
scenario might cause an optimizer not only to follow fitness gradients but also to
account for variable experimental costs. Hence, further work should investigate how
to trade-off these two aspects effectively. For inspiration, we may look at strategies
employed in the Robot Scientist study (King et al. 2004), where this scenario has
been encountered within an inference problem rather than an optimization problem.
Broadening the application of machine learning and surrogate modeling techniques in closed-loop optimization. We have shown (in Sect. 4.6) that evolutionary
search augmented with machine learning techniques, such as reinforcement learning
(RL), can be a powerful optimization tool to cope with ERCs. To increase the applicability of learning-based optimizers to different types of optimization problems, one
could also try combining offline learning with online learning. For instance, RL
can be used to learn offline a policy until some distant point in time, and this policy can then be refined or slightly modified online using the anticipation approach
of (Bosman 2005). Another avenue worth pursuing is to extend an optimizer with
surrogate modeling techniques (Jin 2011) in order to help cope with ERCs. In the
simplest case, surrogate modeling would be used to approximate the objective values
of solution that cannot be evaluated due to a lack of resources. More sophisticated
132
approaches might use surrogate modeling to scan the search space for promising
regions from which solutions are then created. If the active ERCs are known, or can
be well predicted, then scanning can be used to avoid the non-evaluable parts of the
search space, while still concentrating the search on the most promising areas in
terms of fitness.
References
Allmendinger R (2012) Tuning evolutionary search for closed-loop optimization. PhD thesis,
Department of Computer Science, University of Manchester, UK
Allmendinger R, Knowles J (2010) On-line purchasing strategies for an evolutionary algorithm
performing resource-constrained optimization. In: Proceedings of parallel problem solving from
nature, pp 161170
Allmendinger R, Knowles J (2011) Policy learning in resource-constrained optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 19711978
Allmendinger R, Knowles J (2013) On handling ephemeral resource constraints in evolutionary
search. Evol Comput 21(3):497531
Auger A, Doerr B (2011) Theory of randomized search heuristics. World Scientific, Singapore
Bck T, Knowles J, Shir OM (2010) Experimental optimization by evolutionary algorithms.
In: Proceedings of the genetic and evolutionary computation conference (companion),
pp 28972916
Bedau MA (2010) Coping with complexity: machine learning optimization of highly synergistic
biological and biochemical systems. In: Keynote talk at the international conference on genetic
and evolutionary computation
Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University
Press, Cambridge
Bosman PAN (2005) Learning, anticipation and time-deception in evolutionary online dynamic
optimization. In: Proceedings of genetic and evolutionary computation conference, pp 3947
Bosman PAN, Poutr HL (2007) Learning and anticipation in online dynamic optimization with
evolutionary algorithms: the stochastic case. In: Proceedings of genetic and evolutionary computation conference, pp 11651172
Branke J (2001) Evolutionary optimization in dynamic environments. Kluwer Academic Publishers,
Dordrecht
Caschera F, Gazzola G, Bedau MA, Moreno CB, Buchanan A, Cawse J, Packard N, Hanczyc MM
(2010) Automated discovery of novel drug formulations using predictive iterated high throughput
experimentation. PLoS ONE 5(1):e8546
Chen T, He J, Sun G, Chen G, Yao X (2009) A new approach for analyzing average time complexity
of population-based evolutionary algorithms on unimodal problems. IEEE Trans Syst Man Cybern
B 39(5):10921106
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng
191(1112):12451287
Costa LD, Fialho A, Schoenauer M, Sebag M (2008) Adaptive operator selection with dynamic
multi-armed bandits. In: Proceedings of genetic and evolutionary computation conference,
pp 913920
Davis TE, Principe JC (1993) A Markov chain framework for the simple genetic algorithm. Evol
Comput 1(3):269288
Doob JL (1953) Stochastic processes. Wiley, New York
Finkel DE, Kelley CT (2009) Convergence analysis of sampling methods for perturbed Lipschitz
functions. Pac J Optim 5:339350
133
Goldberg DE, Segrest P (1987) Finite Markov chain analysis of genetic algorithms. In: Proceedings
of the international conference on genetic algorithms, pp 18
Hartland C, Gelly S, Baskiotis N, Teytaud O, Sebag M (2006) Multi-armed bandits, dynamic
environments and meta-bandits. In: NIPS workshop online trading of exploration and exploitation
Hartland C, Baskiotis N, Gelly S, Sebag M, Teytaud O (2007) Change point detection and metabandits for online learning in dynamic environments. In: CAp, pp 237250
He J, Yao X (2002) From an individual to a population: an analysis of the first hitting time of
population-based evolutionary algorithms. IEEE Trans Evol Comput 6(5):495511
Herdy M (1997) Evolutionary optimization based on subjective selection-evolving blends of coffee.
In: European congress on intelligent techniques and soft computing, pp 640644
Holland JH (1975) Adaptation in natural and artificial systems. MIT Press, Boston
Horn J (1993) Finite Markov chain analysis of genetic algorithms with niching. In: Proceedings of
the international conference on genetic algorithms, pp 110117
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.
Swarm Evol Comput 1(2):6170
Judson RS, Rabitz H (1992) Teaching lasers to control molecules. Phys Rev Lett 68(10):15001503
Kauffman S (1989) Adaptation on rugged fitness landscapes. In: Lecture notes in the sciences of
complexity, pp 527618
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley,
New York
King RD, Whelan KE, Jones FM, Reiser PGK, Bryant CH, Muggleton SH, Kell DB, Oliver SG
(2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature
427:247252
Klockgether J, Schwefel H-P (1970) Two-phase nozzle and hollow core jet experiments. In: Engineering aspects of magnetohydrodynamics, pp 141148
Knowles J (2009) Closed-loop evolutionary multiobjective optimization. IEEE Comput Intell Mag
4(3):7791
Lehre PK (2011) Fitness-levels for non-elitist populations. In: Proceedings of the conference on
genetic and evolutionary computation, pp 20752082
Liepins GE, Potter WD (1991) A genetic algorithm approach to multiple-fault diagnosis. In: Handbook of genetic algorithms, pp 237250
Mahfoud SW (1991) Finite Markov chain models of an alternative selection strategy for the genetic
algorithm. Complex Syst 7:155170
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Nakama T (2008) Theoretical analysis of genetic algorithms in noisy environments based on
a Markov model. In: Proceedings of the genetic and evolutionary computation conference,
pp 10011008
Nguyen TT (2010) Continuous dynamic optimisation using evolutionary algorithms. PhD thesis,
University of Birmingham
Nix A, Vose MD (1992) Modeling genetic algorithms with Markov chains. Ann Math Artif Intell
5:7988
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Norris JR (1998) Markov chains (Cambridge Series in Statistical and Probabilistic Mathematics).
Cambridge University Press, Cambridge
OHagan S, Dunn WB, Brown M, Knowles J, Kell DB (2005) Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of
the metabolomes of human serum and of yeast fermentations. Anal Chem 77(1):290303
OHagan S, Dunn WB, Knowles J, Broadhurst D, Williams R, Ashworth JJ, Cameron M, Kell DB
(2007) Closed-loop, multiobjective optimization of two-dimensional gas chromatography/mass
spectrometry for serum metabolomics. Anal Chem 79(2):464476
Pettinger JE, Everson RM (2003) Controlling genetic algorithms with reinforcement learning. Technical report, The University of Exeter
134
Rechenberg I (2000) Case studies in evolutionary experimentation and computation. Comput Methods Appl Mech Eng 24(186):125140
Reeves CR, Rowe JE (2003) Genetic algorithmsprinciples and perspectives: a guide to GA theory.
Kluwer Academic Publishers, Boston
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report
CUED/F-INFENG/TR 166, Cambridge University Engineering Department
Schwefel H-P (1968) Experimentelle Optimierung einer Zweiphasendse, Teil 1. AEG Research
Institute Project MHD-Staustrahlrohr 11.034/68, Technical report 35, Berlin
Schwefel H-P (1975) Evolutionsstrategie und numerische Optimierung. PhD thesis, Technical University of Berlin
Shir O, Bck T (2009) Experimental optimization by evolutionary algorithms. In: Tutorial at the
international conference on genetic and evolutionary computation
Shir OM (2008) Niching in derandomized evolution strategies and its applications in quantum
control: a journey from organic diversity to conceptual quantum designs. PhD thesis, University
of Leiden
Small BG, McColl BW, Allmendinger R, Pahle J, Lpez-Castejn G, Rothwell NJ, Knowles J,
Mendes P, Brough D, Kell DB (2011) Efficient discovery of anti-inflammatory small molecule
combinations using evolutionary computing. Nat Chem Biol (to appear)
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Syswerda G (1989) Uniform crossover in genetic algorithms. In: Proceedings of the international
conference on genetic algorithms, pp 29
Syswerda G (1991) A study of reproduction in generational and steady state genetic algorithms.
In: Foundations of genetic algorithms, pp 94101
Thompson A (1996) Hardware evolution: automatic design of electronic circuits in reconfigurable
hardware by artificial evolution. PhD thesis, University of Sussex
Vaidyanathan S, Broadhurst DI, Kell DB, Goodacre R (2003) Explanatory optimization of protein
mass spectrometry via genetic search. Anal Chem 75(23):66796686
Vose MD, Liepins GE (1991) Punctuated equilibria in genetic search. Complex Syst 5:3144
Zhang W (2001) Phase transitions and backbones of 3-SAT and maximum 3-SAT. In: Proceedings
of the international conference on principles and practice of constraint programming, pp 153167
Chapter 5
S. Oh (B)
School of Information and Communications,
Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
e-mail: oosshoun@gist.ac.kr
Y. Jin
Department of Computing, University of Surrey,
Guildford, Surrey GU2 7XH, UK
e-mail: yaochu.jin@surrey.ac.uk
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_5
135
136
S. Oh and Y. Jin
5.1 Introduction
Evolutionary algorithms (EAs) have been widely employed to solve (COP)s, which
are commonly seen in solving real-world optimization problems (Jin et al. 2010; Oh
et al. 2011). Without loss of generality, COPs can be formulated as a minimization
problem subject to one or more (in)equality constraints as follows:
minimize f (x), x = (x1 , . . . , xn ) R n
subject to h i (x) = 0, i = {1, 2, . . . , r }
g j (x) 0, j = {r + 1, . . . , m},
(5.1)
(5.2)
(5.3)
r
j Hj +
= f (x) +
j Gj
(5.4)
j=r +1
j=1
m
m
j Gj ,
(5.5)
j=1
where H j = |h j (x)| and G j = max 0, g j (x) functions of the constraints
h j and g j , and and are constants which are set to 1 or 2, respectively. By
virtue of introducing a small tolerance value of , equality constraints can be
converted into inequality constraints, i.e., |H j | 0 (Coello 2002). Thus,
given that = = 1, the original formula (5.4) can be reformulated as (5.5),
where Gj indicates inequality constraints Gj {|H j | , G j }. The penalty
function-based approaches may work well for some COPs; however, it is not
straightforward to determine an optimal value for the penalty factor. In particular,
a too small value of may mislead the EA because of insufficient penalty. By
contrast, a too large penalty factor may prevent the EA from finding the optimal
solution. To determine the penalty factor, four types of penalty handling methods
137
such as death penalties, static penalties, dynamic penalties, and adaptive penalties
are proposed (Coello 2002).
2. Another constraint-handling approach is the separate consideration between the
objective and the constraints during optimization. It is typically categorized by
three major techniques. The first approach was a (SRES) proposed by Runarsson
and Yao in (Runarsson and Yao 2000). The aim of SRES was to balance the influence of the objective function and the constraints in selection by using the dominance comparison between the fitness and constraint violations by the use-defined
parameter of P f . Coello and Montes suggested a method (Coello and Montes
2002) inspired by a well-known constraint technique in the niched-Pareto genetic
algorithm. It designed a new dominance-based selection scheme to integrate constraints into the fitness function used for global optimization. Montes and Coello
introduced another method based on a simple diversity mechanism (Montes and
Coello 2005).
3. A few ad hoc constraint-handling techniques, viz., special representations and
operators, have also been suggested (Coello 2002). This fundamental idea is to
simplify the shape of the feasible search space and to preserve feasible solutions
found during the evolutionary process. Several examples are Daviss work (Davis
and Mitchell 1991), random key (Bean 1994), GENOCOP (Michalewicz 1996),
constraint consistent GAs (Kowalczyk 1997), locating the boundary of the feasible
region (Glover and Kochenberger 1996), and a homomorphous mapping (HM)
to transform COP into an unconstrained one using a high-dimensional cube and
a feasible search space (Koziel and Michalewicz 1999).
4. Finally, hybrid techniques have also been proposed. They combine either a mathematical or heuristic approach such as Lagrangian multipliers (Adeli and Cheng
1994), fuzzy logic (Le 1995), immune system (Smith et al. 1993), cultural algorithms (Reynolds 1994), differential evolutions (Das and Suganthan 2011), and
ant colony optimization (Dorigo and Gambardella 1997).
This chapter is concerned with constraint optimization problems that are affected
by the highly constrained feasible regions, i.e., separated and small feasible regions.
To systematically alleviate the low degree of feasibility, we propose the incremental
approximation model-assisted constraint-handling approach. The model starts with
a rough approximation of the constraints using a linear model. As the evolution
proceeds, the accuracy of the approximate constrained functions should increase
gradually. At the end of the search process, the accuracy of approximate constraint
functions is desired. We term this approach, where an originally stationary optimization problem is converted into a dynamic optimization problem (Paenke et al. 2006;
Nguyen et al. 2012; Jin et al. 2013) to make the problem easier to solve. Here, the
approximate model, also known as a (Jin 2011), plays a key role.
In this study, we adopt two representative methods, i.e., Neural Network and
(GP), for constructing the approximate models. The proposed algorithms have been
compared with a few state-of the-art algorithms on 13 benchmark problems and a
tension/compression design optimization problem.
138
S. Oh and Y. Jin
(a)
139
(b)
g1 (x) = x21 x2 +1 0
Feasibility proportion:
Feasible Regions
= 0.8560%
Feasibility proportion:
= 0.0066%
Feasible Regions
Fig. 5.1 Illustrations of feasible regions and feasibility proportion in two benchmark problems.
a Benchmark problem: g06. b Benchmark problem: g08
(a)
(b)
Original Feasible Regions
(c)
g1 (x)
g1 (x)
g1 (x)
g1 (x)
g2 (x)
g2 (x)
g2 (x)
g2 (x)
g1 (x)
g2 (x)
Fig. 5.2 Synthetical change of the feasible regions by incremental approximation models of two
constrained functions. a The design space has small feasible regions with two nonlinear constrained
functions. b With a linear approximation of both constraints, the approximated feasible regions
become larger. c The approximate nonlinear constraint functions become more accurate to original
constraints
of the nonlinear constraints becomes more accurate, as described in Fig. 5.2c. Note
however that the system should switch back to the original constraints at the end
of the evolutionary optimization so that the obtained optimal solutions are always
feasible.
140
S. Oh and Y. Jin
is the dimension of the given problem. In the initialization, both vectors are generated
by a uniform distribution within a lower bound of x j and an upper bound of x j , and
h, j =
i, j + k, j
,
2
(5.6)
x h, j
(g)
(g+1)
= x h, j + h, j
N j (0, 1).
(5.8)
where Gj (x) denotes the sum of all constraint violations and the constant is set
to 1. Our defined constraints are called the synthesized constraints2 of g j (x)
{g j (x), g j (x)}.
Given the pair of objective and constraint violations ( f (x j ), G(x j )), where x j
denotes the solution of the jth offspring individual, j = {1, , }, they will be
They are assembled as comparing the degree of feasibility between the original constraint of g j (x)
and the incremental approximate constraint of g j (x).
2
141
ranked according to the stochastic ranking algorithm. The details of the stochastic
ranking algorithm can be found in (Runarsson and Yao 2000).
In our algorithm, all equality constraints are modified on inequalities by introducing a tolerance (), i.e., |h j (x)| 0, where the constant is set to 1.
The parameter is updated according the generation number, as formulated below
(Hamida and Schoenauer 2002).
(t + 1) =
(t)
.
(5.10)
Here, the initial value of the tolerance 0 and the allowable value of tolerance are
denoted as 3 and 1.0168, respectively, as recommended in (Hamida and Schoenauer
2002). This approach is analogous to our proposed approximation of constraints due
to the concept of the dynamic setting of the tolerance. In other words, the accuracy
of the altered constraints should increase gradually during generations. Thanks to its
property, we need not apply our approximate mechanism into equality constraints.
142
S. Oh and Y. Jin
Manipulate
synthesized constraints
No
?
Yes
j=1
Yes
gj is inquality
?
Re-trained approximate
constraint gj
No
j =j+1
No
NF gj N Fgj
?
Yes
Add gj into
Add gj into
synthesized constraints
synthesized constraints
j = Noc
No
?
Yes
Fig. 5.4 Synthesized constraints via a competition between original and approximate constraints,
where N F is the number of feasible solutions and Noc is the number of original constraints
143
However, the condition of tkmax tmax should be satisfied, where tmax is the allowed
maximum number of generations. During the remaining generations of tmax tkmax ,
only the original constrained functions are considered for guaranteeing the obtained
optimal solution, avoiding the under-fitting problem. Also, we should formulate
how many samples are used for training our approximation model to approximate
constrained functions. In this work, we heuristically designate the number of the
samples Nk = n j k 2 , where n j is the number of design variables involved on the
jth constraint function and k is the number of sampling times k = {1, 2, . . . , kmax }.
For instance, in the initial generation (k = 1) of approximate constraint functions
on g08, each pair of training data (2 12 ) is sampled individually, because both
constraints of g1 (x) = x12 x2 + 1 0 and g2 (x) = 1 x1 (x2 4)2 0
consist of only two variables of x1 and x2 . Based on two sampled data, we obtains
two approximate models derived by GP, one of representative symbolic regression
models, with regard to two constraints of g08, i.e., g 1 (x) = 3x1 x2 + 1 0 and
g 2 (x) = x1 x2 + 11 0, as shown in Fig. 5.2b. Later, we compare the number of
feasible solutions with regard to each approximate constraint of g j and the original
constraint of g j , j = {1, 2}. Based on the comparisons, we create a set of synthesized
constraints, i.e., g j (x) = {g 1 (x), g 2 (x)}, since all approximate constraints result in
more feasible solutions than original ones.
Our assumption is that the initial approximate models start from a simple model
such as a linear approximation of the nonlinear constraints. Then we increase the
number of samples as evolution proceeds. Therefore, we can achieve more accurate
approximate models. In particular, at the sixth sampling time k
= 6 of g08, our
6
(i 1)2 , and
approximate models are updated in 550 generation, t6 = t0 +10 i=1
2
generate 72 samples following the defined rule as N6 = 26 . Based on the sampled
data, we approximate both constraints as g 1 = x12 x2 + cos(sin(x2 )) 0 and g 2 =
1x1 (x2 4)2 0 by GP (see Fig. 5.2c). At this time, we comprise the synthesized
constraints g j (x) = {g 1 (x), g2 (x)} by comparing approximate constraints with the
original ones according to the feasibility degrees.
The location of the samples is determined by a (LHS) which generates an arbitrary
number of dimensions, whereby each sample is the only one in each axis-aligned
hyperplane containing it (Jin and Branke 2005).
There are two proposed incremental approximation models such as neural network-assisted approximation model and guided approximation model adopted in this
study.
Neural network-assisted approximation model for ES: NNA-ES
In this work, we adopt a (MLP) network with one hidden layer (Reed and Marks
1998) (refer to Fig. 5.5) for approximating the nonlinear constraints. Both the hidden
neurons and the output neurons use a tan-sigmoid transfer function. The number of
input nodes equals the number of parameters in the constrained function plus one
(a constant input as bias), the number of hidden nodes is set to three times that of the
input nodes, and the number of output node is one.
144
S. Oh and Y. Jin
X1
w1,1
w1,2
w2,1
X2
w1,1
w2,2
w2,1
wn,1
wn,2
Xn
L
x3
sin
exp
x1 0.5 0.2 x2
0.5
Parent
Offspring
sin
exp
x3 1 x1
0.5 0.2
+
x2 1
x1 x3
0.2
x1
x1 x2
0.5
x2 1
0.2
0.4
0.5
exp
0.2
Crossover
sin
x1 x3
0.5
145
exp
x3 1 x1
exp
0.2
x2
0.5
0.4
Offspring
Parent
Mutation point
L
x3
sin
exp
Mutation
x1 0.5 0.2 x2
+
1 0.5
sin
x1 x3 0.3 x1 0.2 x2
+
1 0.5
chromosome and the object of an constrained function in accordance with the given
inputs. On the basis of the fitness value of each individual, our GP operates the pairwise tournament selection without replacement to improve the average quality of the
population by passing the high quality chromosomes to the next. To explore the search
spaces, the variation operators (i.e., crossover and mutation), which are described in
Figs. 5.7 and 5.8, respectively, are applied on the selected chromosome(s). The GP
iterates two procedures including evaluation and genetic operators until a stopping
criterion is satisfied. At the end, the GP is able to obtain a robust approximation
of the original nonlinear constraint function. Based on the discovered approximate
constraints, we assemble synthesized constraints, which are created and used in the
SR selection.
146
S. Oh and Y. Jin
1)2 = {0, 10, 50, 140, 300, 550, 910}, where k is the updated time
i=1
k = {1, 2, 3, 4, 5, 6 , 7}, and t0 is an initial generation which is set to 0. During the rest generations, we only used the original constraints to guarantee that
the obtained solutions are feasible. At that time, we require the sampling training
Table 5.1 Summary of 13 benchmark functions
fcn
n
Type of f
|F|/|S| (%)
g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13
13
20
10
5
4
2
10
2
7
8
2
3
5
Quadratic
Nonlinear
Polynomial
Quadratic
Cubic
Cubic
Quadratic
Nonlinear
Polynomial
Linear
Quadratic
Quadratic
Nonlinear
0.0111
99.9971
0.0000
52.1230
0.0000
0.0066
0.0003
0.8560
0.5121
0.0010
0.0000
4.7713
0.0000
LI
NI
LE
NE
9
0
0
0
2
0
3
0
0
3
0
0
0
0
2
0
6
0
2
5
2
4
3
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
3
0
0
0
0
0
1
0
3
6
1
1
2
3
2
6
0
2
6
1
0
3
147
data for updating our approximations, which are extracted by the prefixed rule
Nk = n j k 2 = {n j , 4n j , 9n j , 16n j , 25n j , 36n j , 49n j }, where n j is the number of design variables involved in the jth constraint function. Note that, if Nk = 1,
the minimum number of samples should be 2, and if Nk 200, the maximum
samples should be set to 200.
Our NNA-ES is used for training the MLP for 150 iterations every time when the
MLP network models need to be updated, where the learning rate is set to 0.1. Also,
the system parameters of GPA-ES are designated as the depth of tree is set to 4, the
size of population is equal to the number of sampled data, the maximum generations
are three times the size of population, and probabilities of crossover and mutation
are set to 1.0 and 0.5, respectively.
To study our performances, we utilize the state-of-the-art EAs, which are briefly
described below:
1. Self-adaptive fitness formulation (SAFF) employed the penalty function method
for solving the COPs, where infeasible solutions that have a high fitness value are
also favored in selection (Farmani and Wright 2005). In the SAFF, the infeasible
constraint violations were handled by the designed two-stage penalties.
2. Homomorphous mapping (HM) designed a special operator (i.e., decoders) to
discover the optimal solution in COPs. Thanks to these decoders, all solutions
were mapped into n-dimensional cube for maintaining feasible states (Koziel and
Michalewicz 1999).
3. Stochastic ranking evolutionary strategy (SRES) considered the separation between objective and constraints (Runarsson and Yao 2000). This algorithm utilized
the SR selection mechanism to balance objective and constraint violations directly
and explicitly in the optimization with the probabilistic factor to include infeasible
solutions.
4. Simple multi-membered evolutionary strategy (SMES) was also based on the
separated objective and constraint violations (Montes and Coello 2005). Its main
feature was to devise three diversity mechanisms: diversity mechanism, combined recombination, and reduction of the initial step sizes of ES. All designed
techniques were operated on the basis of the number of infeasible solutions in the
population.
5. Adaptive tradeoff model-based evolutionary strategy (ATMES) was proposed
for facilitating a more explicit tradeoff between objective and constraints (Wang
et al. 2008). It developed three different search techniques which were classified
by the feasibility ratio in the current population.
Table 5.2 presents the parameter setups of each compared algorithm. It shows the
size of population, the number of generations, and the number of fitness evaluations.
148
S. Oh and Y. Jin
Table 5.2 Parameter setups of the compared algorithms, where (, ) is the set of parent and
offspring
Population size
Generations
Fitness evaluations
SAFF (Farmani and Wright 2005)
HM (Koziel and Michalewicz 1999)
SRES (Runarsson and Yao 2000)
SMES (Montes and Coello 2005)
ATMES (Wang et al. 2008)
NNA-ES
GPA-ES
70
70
(30,200)
(100,300)
(50,300)
(30,200)
(30,200)
20,000
20,000
1,200
800
800
1,200
1,200
1,400,000
1,400,000
240,000
240,000
240,000
240,000
240,000
(g01, g03, g08 and g11). Our first algorithm found a better best result in g10 than
the SAFF; on the other hand, GPA-ES obtained a worse best result. In addition,
our algorithm reached better and similar solutions in a mean result in most of the
problems except for g04 and g06 in case of GPA-ES and NNA-ES, separately. No
comparisons were made with two functions, g12 and g13, since the results from
SAFF are not available.
149
in a best result. Also, each of our algorithms such as NNA-ES and GPA-ES found
a competitive mean result on ten problems, respectively. Meanwhile, the SMES
discovered slightly better mean results in four functions of g04, g06, g09, and g10.
Especially, the mean value of SMES in g09 was much smaller than that of both of
them.
g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13
15.000
0.803619
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7049.331
0.75
1.000
0.053950
15.000
0.802970
1.0000
30665.500
5126.989
6961.800
24.480
0.095825
680.640
7061.340
0.75
N.A
N.A
14.786
0.799530
0.9997
30664.500
N.A
6952.100
24.620
0.095825
680.910
7147.900
0.75
N.A
N.A
HM
(Koziel and Michalewicz
1999)
15.000
0.803481
1.0000
30665.539
5126.498
6961.814
24.314
0.095825
680.633
7053.064
0.75
1.000
0.054008
SRES
(Runarsson
and Yao 2000)
15.000
0.803601
1.0000
30665.539
5126.599
6961.814
24.327
0.095825
680.632
7051.903
0.75
1.000
0.053986
SMES
(Montes and
Coello 2005)
15.000
0.803388
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7052.253
0.75
1.000
0.053950
ATMES
(Wang et al.
2008)
15.000
0.803185
1.0000
30665.539
5126.505
6961.807
24.309
0.095825
680.630
7056.710
0.75
1.000
0.053950
NNA-ES
15.000
0.803532
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7081.948
0.75
1.000
0.053950
GPA-ES
obtained by the proposed GAP-EA as well as five references on 13 benchmark functions, where N.A. = Not
150
S. Oh and Y. Jin
15.000
0.790148
1.0000
30665.539
15.000
0.803619
1.0000
30665.539
g01
g02
g03
g04
1.60.E 14
1.30.E 02
5.90.E 05
7.40.E 12
15.000
0.00.E + 00
0.790100 1.20.E 02
0.9999
7.50.E 05
30665.200 4.85.E 01
5432.08
3.89.E + 03
6961.800 0.00.E + 00
26.580
1.14.E + 00
0.095825 0.00.E + 00
680.720
5.92.E 02
7627.890
3.73.E + 02
0.75
0.00.E + 00
N.A
N.A
N.A
N.A
ATMES (Wang et al. 2008)
Mean
St. dev
15.000
0.803619
1.0000
30665.539
5126.498
6961.814
24.306
0.095825
680.630
7049.331
0.75
1.000
0.053950
Optimal
g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13
fcn
15.000
0.794128
1.0000
30665.539
14.708
0.796710
0.9989
30655.300
N.A
6342.600
24.826
0.0891568
681.160
8163.600
0.75
N.A
N.A
NNA-ES
Mean
0.00.E + 00
8.04.E 03
1.90.E 04
2.05.E 04
St. dev
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
15.000
0.791084
1.0000
30648.853
15.000
0.775346
1.0000
30665.525
5132.882
6875.442
24.364
0.095825
680.658
7472.902
0.75
1.000
0.083290
GPA-ES
Mean
6.29.E 07
8.03.E 03
1.35.E 05
4.98.E + 01
St. dev
0.00.E + 00
2.35.E 02
2.90.E 04
6.32.E 02
8.61.E + 00
1.53.E + 02
5.59.E 02
2.82.E 17
4.20.E 02
4.20.E 02
4.20.E 02
0.00.E + 00
9.70.E 02
Table 5.4 Comparison of the mean results obtained by the proposed GAP-EA as well as five references on 13 Benchmark
Available
fcn Optimum
SAFF (Farmani and Wright 2005) HM (Koziel and Michalewicz 1999) SRES (Runarsson and Yao
2000)
Mean
St. dev
Mean
St. dev
Mean
St. dev
15.000
0.785238
1.0000
30665.539
5174.492
6961.284
24.475
0.095825
680.643
7253.047
0.75
1.000
0.166385
(continued)
0.00.E + 00
1.67.E 02
2.09.E 04
0.00.E + 00
5.01.E + 01
1.85.E + 00
1.32.E 01
0.00.E + 00
1.55.E 02
1.36.E + 02
1.52.E 04
0.00.E + 00
1.77.E 01
g05
g06
g07
g08
g09
g10
g11
g12
g13
5126.498
6961.814
24.306
0.095825
680.630
7049.331
0.75
1.000
0.053950
5127.648
6961.814
24.316
0.095825
680.639
7250.437
0.75
1.000
0.053959
1.80.E 14
4.60.E 12
1.10.E 02
2.80.E 17
1.00.E 02
1.20.E + 02
3.40.E 04
1.00.E 03
1.30.E 05
NNA-ES
Mean
9.05.E + 00
1.62.E + 02
2.01.E 02
2.82.E 17
2.74.E 02
4.38.E + 02
9.80.E 04
0.00.E + 00
9.95.E 02
St. dev
5152.634
6961.814
24.315
0.095825
680.648
7342.196
0.75
1.000
0.054024
GPA-ES
Mean
St. dev
4.14.E + 01
4.63.E 12
1.83.E 02
2.82.E 17
2.23.E 02
2.25.E + 02
1.25.E 03
4.10.E 05
1.40.E 04
152
S. Oh and Y. Jin
153
(5.11)
subject to
g1 (x) = 1
g2 (x) =
x23 x3
71785x14
4x22 x1 x2
12566(x13 x2 x14 )
1
5108x12
10
140.45x1
0
x22 x3
x1 +x2
1.5 1 0.
g3 (x) = 1
g4 (x) =
(5.12)
Table 5.5 illustrates their statistical results such as best, mean, worst, and standard
deviation outcomes from whole algorithms. It can be seen in Table 5.5 that the performance of GPA-ES is even better than those of our compared algorithms, and our
worst solution is smaller than the optimal values of the compared ones.
To sum up the experimental results and comparisons of the above three engineering
optimization problems, we could verify the superiority of the proposed incremental
approximation-assisted algorithms.
154
S. Oh and Y. Jin
Table 5.5 The comparison of the statistics on tension/compression spring optimization problem
Method
Best
Mean
Worst
St. dev
GA1 (Coello 2000)
GA2 (Coello and Montes 2002)
HE-PSO (Hu et al. 2003)
CPSO (He and Wang 2007a)
HPSO (He and Wang 2007b)
NM-PSO (Zahara and Kao 2009)
NNA-ES
GPA-ES
0.0127048
0.0126810
0.0126661
0.0126747
0.0126652
0.0126302
0.0098725
0.0098725
0.0127690
0.0127420
0.0127190
0.0127300
0.0127072
0.0126314
0.0098741
0.0098725
0.0128220
0.0129730
N.A
0.0129240
0.0127190
0.0126330
0.0098930
0.0098725
3.94.E 05
5.90.E 05
6.45.E 05
5.20.E 04
1.58.E 05
8.74.E 07
4.69E 06
9.87.E 03
5.4 Conclusion
This chapter has presented the new evolutionary algorithm for solving COPs. We
particularly targeted problems that are highly constrained and thus the feasible
regions are small and separated. To methodically solve problems caused by an
extremely low degree of feasibility, we suggested the incremental approximation
models. Thanks to a manipulated, gradually increasing feasible region managed by
the approximate constraints, we could handle the highly constrained problems more
effectively. We have empirically compared our approach with a few state-of-the-art
algorithms for handling COPs on 13 benchmark problems and one engineering optimization problem. As a whole, the proposed method has shown to be promising as
they produced better or comparable results on most test problems.
Acknowledgments The authors would like to thank Chang Wook Ahn for useful discussions.
References
Adeli H, Cheng N-T (1994) Augmented Lagrangian genetic algorithm for structural optimization.
J Aerosp Eng 7:104118
Bean J (1994) Genetic algorithms and random keys for sequencing and optimization. ORSA J
Comput 6:154160
Coello CAC (2000) Use of a self-adaptive penalty approach for engineering optimization problems.
Comput Ind 41(2):113127
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191
(1112):12451287
Coello CAC, Montes EM (2002) Constraint-handling in genetic algorithms through the use of
dominance-based tournament selection. Adv Eng Inform 16(3):193203
Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol
Comput 15(1):431
Davis LD, Mitchell M (eds) (1991) Handbook of genetic algorithms. Van Nostrand Reinhold, New
York
155
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the
traveling salesman problem. IEEE Trans Evol Comput 1:5366
Farmani R, Wright J (2005) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Glover F, Kochenberger G (1996) Critical event tabu search for multidimensional knapsack problems. Meta heuristics: theory and applications. Kluwer Academic Publishers, Dordrecht
Goh C, Lim D, Ma L, Ong Y, Dutta P (2011) A surrogate-assisted memetic co-evolutionary algorithm
for expensive constrained optimization problems. In: IEEE congress on evolutionary computation,
pp 744749
Hamida S, Schoenauer M (2002) ASCHEA: new results using adaptive segregational constraint
handling. In: Proceedings of IEEE conference on evolutionary computation 2002. Honolulu,
Hawaii, pp 8287
He Q, Wang L (2007a) An effective co-evolutionary particle swarm optimization for constrained
engineering design problems. Eng Appl Artif Intell 20(1):8999
He Q, Wang L (2007b) A hybrid particle swarm optimization with a feasibility-based rule for
constrained optimization. Appl Math Comput 186(2):14071422
Hu X, Eberhart R, Shi Y (2003) Engineering optimization with particle swarm. In: Proceedings of
the IEEE swarm intelligence symposium 2003 (SIS 2003). Indianapolis, Indiana, pp 5357
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.
Swarm Evol Comput 1(2):6170
Jin Y, Branke J (2005) Evolutionary optimization in uncertain environmentsa survey. IEEE Trans
Evol Comput 9:303317
Jin Y, Oh S, Jeon M (2010) Incremental approximation of nonlinear constraints functions for evolutionary constrained optimization. In: Proceedings of IEEE conference on evolutionary computation 2010 (CEC 2010), Barcelona, Spain, pp 18
Jin Y, Tang K, Yu X, Sendhoff B, Yao X (2013) A framework for finding robust optimal solutions
over time. Memet Comput 5(1):318
Kowalczyk R (1997) Constraint consistent genetic algorithms. In: Proceedings of IEEE international
conference on evolutionary computation. Indianapolis, pp 343348
Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):1944
Le TV (1995) A fuzzy evolutionary approach to constrained optimization problems. In: Proceedings
of parallel problem solving form nature, vol 274278. Perth
Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello CAC, Deb K (2006)
Problem definitions and evaluation criteria for the CEC 2006 special session on constrained
real-parameter optimization. Technical report, Nanyang Technological University, Singapore
Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs. Springer, New
York
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4:132
Montes EM, Coello CAC (2005) A simple multimembered evolution strategy to solve constrained
optimization problems. IEEE Trans Evol Comput 9(1):117
Nguyen T, Yang S, Branke J (2012) Evolutionary dynamic optimization: a survey of the state of the
art. Swarm Evol Comput 6:124
Oh S, Lee S, Jeon M (2009) Evolutionary optimization programming with probabilistic models. In:
International conference on bio-inspired computing. Beijing, P.R. China, pp 16
Oh S, Jin Y, Jeon M (2011) Approximate models for constraint functions in evolutionary constrained
optimization. Int J Innov Comput, Inf Control 7(11):65856603
Paenke I, Branke J, Jin Y (2006) Efficient search for robust solutions by means of evolutionary
algorithms and fitness approximation. IEEE Trans Evol Comput 10(4):405420
Reed RD, Marks RJ (1998) Neural smithing: supervised learning in feedforward artificial neural
networks. MIT Press, Cambridge
156
S. Oh and Y. Jin
Chapter 6
6.1 Introduction
Constrained optimization problems, especially nonlinear optimization problems,
where objective functions are minimized under given constraints, are important
and frequently appear in the real world. There exist several studies on solving
T. Takahama (B)
Hiroshima City University, 3-4-1 Ozuka-higashi, Asaminami-ku,
Hiroshima 731-3194, Japan
e-mail: takahama@info.hiroshima-cu.ac.jp
S. Sakai
Hiroshima Shudo University, 1-1-1 Ozuka-higashi, Asaminami-ku,
Hiroshima 731-3195, Japan
e-mail: setuko@shudo-u.ac.jp
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_6
157
158
159
large error
correct
order relation
the estimated comparison, the evaluation of the true function is sometimes omitted
and the number of function evaluations can be reduced.
In this chapter, the estimated comparison is applied to constrained optimization and DEpm , which is a combination of the constrained method and the estimated comparison (Takahama and Sakai 2013) using a potential model defined and
improved by approximating not only the objective function but also the constraint
violation. The potential model without learning process is adopted as a rough approximation model (Takahama and Sakai 2008b). DEpm is an efficient constrained optimization algorithm that can find near-optimal solutions in a small number of function
evaluations. The effectiveness of DEpm is shown by solving well-known 13 constrained problems mentioned in Coello (2002) and comparing the results of DEpm
with those of representative methods. It is shown that DEpm can solve problems
with a much smaller, about half, number of function evaluations compared with the
representative methods.
In Sect. 6.2, constrained optimization methods and approximation methods are
reviewed. The constrained method and the estimated comparison using the potential
model are explained in Sects. 6.3 and 6.4, respectively. The DEpm is described in
Sect. 6.5. In Sect. 6.6, experimental results on 13 constrained problems are shown and
the results of DEpm are compared with those of other methods. Finally, conclusions
are described in Sect. 6.7.
160
(6.1)
161
4. Every constraint and objective function are used separately. In this category,
constrained optimization problems are solved as multi-objective optimization
problems in which the objective function and the constraint functions are objectives to be optimized (Aguirre et al. 2004; Camponogara and Talukdar 1997;
Coello 2000a; Ray et al. 2002; Runarsson and Yao 2003; Surry and Radcliffe
1997; Wang et al. 2007). However, in many cases solving a constrained problem
as a mult-iobjective optimization problem is a more difficult and expensive task
than solving the constrained problem as essentially a single objective optimization
problem in categories 1, 2, and 3.
5. Hybridization methods. In this category, constrained problems are solved by combining some of the above-mentioned methods. Mallipeddi and Suganthan (2010)
proposed a hybridization of the methods in categories 2, 3, and 4.
162
3. All individuals have true values. Some methods in this type are called surrogate
approaches. In surrogate approaches, an estimated optimum is searched using an
approximation model called a surrogate model, which is usually a local model.
The estimated optimum is evaluated, the true value is obtained, and the true value
is also used to improve the approximation model (Bche et al. 2005; Guimares
et al. 2006; Ong et al. 2006). If the true value is good, the value is included
as an individual. In the approaches, rough approximation models might be used
because approximation values are compared with other approximation values.
These approaches are less affected by the quality of the approximation model than
the evolution control approaches. However, they have the process of optimization
using the approximation model only. If the process is repeated many times, they
are much affected by the quality of the approximation model.
The estimated comparison method is classified in the last category because all
individuals have true values. However, the method is different from the surrogate
approaches. It uses a global approximation model of current individuals using the
potential model. It does not search for the estimated optimum, but judges whether
a new individual is worth evaluating its true value or not. Also, it can specify the
margin of approximation error when comparison is carried out. Thus, it is not much
affected by the quality of the approximation model.
(x) =
q
j=1
max
|h j (x)|}
(6.2)
||h j (x)|| p
(6.3)
q +1 j m
||max{0, g j (x)}|| p +
m
j=q+1
163
f 1 < f 2 , if 1 , 2
( f 1 , 1 ) < ( f 2 , 2 ) f 1 < f 2 , if 1 = 2
1 < 2 , otherwise
f 1 f 2 , if 1 , 2
( f 1 , 1 ) ( f 2 , 2 ) f 1 f 2 , if 1 = 2
1 < 2 , otherwise
(6.4)
(6.5)
In case of = , the level comparisons < and are equivalent to the ordinary
comparisons < and between function values. Also, in case of = 0, <0 and
0 are equivalent to the lexicographic orders in which the constraint violation (x)
precedes the function value f (x).
(6.6)
(6.7)
where (P0 ) is equivalent to (P) because the feasible solution satisfies (x) = 0.
For the three types of problems, (P ), (P ) and (P), the following theorems are
given based on the constrained method (Takahama and Sakai 2005b).
Theorem 1 If an optimal solution (P0 ) exists, any optimal solution of (P ) is an
optimal solution of (P ).
Theorem 2 If an optimal solution of (P) exists, any optimal solution of (P0 ) is an
optimal solution of (P).
Theorem 3 Let {n } be a strictly decreasing nonnegative sequence and converge
to 0. Let f (x) and (x) be continuous functions of x. Assume that an optimal solution
164
x of (P0 ) exists and an optimal solution x n of (Pn ) exists for any n . Then, any
accumulation point to the sequence {xn } is an optimal solution of (P0 ).
Theorems 1 and 2 show that a constrained optimization problem can be converted
into an equivalent unconstrained optimization problem by using the level comparison. So, if the level comparison is incorporated into an existing unconstrained
optimization method, constrained optimization problems can be solved. Theorem 3
shows that, in the constrained method, an optimal solution of (P0 ) can be given by
converging to 0 as well as by increasing the penalty coefficient to infinity in the
penalty method.
m
mm
, Fg = G 2
r
r
(6.8)
Uf =
(6.9)
(6.10)
U f (y) =
Uc (y) =
165
(6.11)
f (xi )
d(xi , y) pd
(6.12)
1
d(xi , y) pd
(6.13)
When the true function values ( f (xi ), (xi )) of all points in P = {xi , i =
1, 2, , N } are known and a new child point xi is generated from a parent point xi ,
the approximation values at points xi are given as follows:
U f (xi ) =
Uc (xi ) =
(6.14)
j =i
f (x j )
d(x j , xi )
(6.15)
j =i
1
d(x j , xi )
(6.16)
166
U (xi ) =
(x j )
j =i
d(x j , xi )
(6.17)
( )
(6.18)
It should be noted that the parent point xi ( j = i) is omitted in the equation. If the
parent point is not omitted, the approximation value of the parent point becomes an
almost true value. As a result, the difference between the precision of approximation
at the parent point and that at the child point becomes big, and it is difficult to compare
the approximation values.
When search points are far from the feasible region, the -level comparison precedes the constraint violations. In this case, the constraint violation values are approximated. When search points are near the feasible region, the -level comparison precedes the objective values. In this case, the objective values are approximated. The
far case and the near case are judged by the number of feasible solutions. In this study,
the near case is identified when the ratio of feasible solutions in the population is
greater than or equal to 0.8. The estimated comparison for constrained optimization
using the constrained method can be defined as follows:
EstimatedBetter (xi , xi , ) {
if(the number of feasible solutions 0.8N ) {
// approximation of objective function
if( f(xi ) < f(xi ) + ) {
Evaluate xi ;
if(( f (xi ), (xi )) < ( f (xi ), (xi )))
return yes;
}
}
else {
// approximation of constraint violation
i ) + 2|(xi ) (x
i )|) {
) < (x
if((x
i
Evaluate xi ;
if(( f (xi ), (xi )) < ( f (xi ), (xi )))
return yes;
}
}
return no;
}
where the true value at the parent point ( f (xi ), (xi )) is known. In this study, the error
margin for the objective value is defined based on the error level of the population.
In contrast, the error margin for the constraint violation is defined based on the error
167
level of each individual because it is thought that feasible solutions and infeasible
solutions have different error levels. The error margin parameter 0 controls the
margin value for the approximation error. When is 0, the estimated comparison
can reject many children and omit a large number of function evaluations. However,
the possibility of rejecting good child becomes high and a true optimum sometimes
might be skipped. When is large, the possibility of rejecting good child becomes
low. However, the estimated comparison can reject fewer children and omit a small
number of function evaluations. Thus, should have a proper value.
The estimation error can be given as the standard deviation of errors between
approximation values and true values.
=
1
(ei e)
2
N
(6.19)
1
ei
ei = f(xi ) f (xi ), e =
N
(6.20)
In potential model, current population P is used as the set of solutions that have
known objective values. When searching process progresses, the area where individuals exist may become elliptical. In order to handle such a case, the normalized
distance is introduced, in which the distance is normalized by the width of each
dimension in the current population P.
d(x, y) =
j
maxxi P
x j yj
xi j minxi P xi j
2
(6.21)
168
indicates the method of selecting a parent that will form the base vector. For example, DE/rand selects the parent for the base vector at random from the population.
DE/best selects the best individual in the population. In DE/rand/1, for each individual xi , three individuals x p1 , x p2 and x p3 are chosen from the population without
overlapping xi and each other. A new vector, or a mutant vector xm is generated by
the base vector x p1 and the difference vector x p2 x p3 as follows, where F is a
scaling factor.
(6.22)
xm = x p1 + F(x p2 x p3 )
num indicates the number of difference vectors used to perturb the base vector. cr oss indicates the crossover operation used to create a child. For example,
bin shows that the crossover is controlled by binomial crossover using constant
crossover rate, and exp shows that the crossover is controlled by a kind of twopoint crossover using exponentially decreasing the crossover rate. A new child xi
is generated from the parent xi and the mutant vector xm , where CR is a crossover
rate.
169
(6.23)
t cp
Tc ) ,
0 < t < Tc ,
t Tc
(6.24)
Fig. 6.2 The algorithm of the constrained differential evolution with estimated comparison using
potential model, where (t) is the level control function
170
Form of f
LI
NI
LE
NE
Active
g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13
13
20
10
5
4
2
10
2
7
8
2
3
5
Quadratic
Nonlinear
Polynomial
Quadratic
Cubic
Cubic
Quadratic
Nonlinear
Polynomial
Linear
Quadratic
Quadratic
Nonlinear
9
1
0
0
2
0
3
0
0
3
0
0
0
0
1
0
6
0
2
5
2
4
3
0
93
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
3
0
0
0
0
0
1
0
2
6
1
1
2
3
2
6
0
2
6
1
0
3
171
Table 6.2 Experimental results on 13 benchmark problems using standard settings; 30 independent
runs were performed
Best
Median
Mean
Worst
st. dev.
g01 15.000
Optimal
15.000000
15.000000
15.000000
15.000000
4.193e 12
g02 0.803619
0.803547
0.803056
0.802406
0.790861
2.255e 03
g03 1.000
1.000500
1.000500
1.000500
1.000499
1.134e 07
g05 5126.498
5126.496714
5126.496714
5126.496714
5126.496714
g06 6961.814
6961.813876
6961.813876
6961.813876
6961.813876
2.803e 12
g07 24.306
24.306209
24.306209
24.306210
24.306214
1.215e06
g08 0.095825
0.095825
0.095825
0.095825
0.095825
0.000e + 00
g09 680.630
680.630057
680.630057
680.630057
680.630057
0.000e + 00
g10 7049.248
7049.248021
7049.28021
7049.248021
7049.248026
1.028e 06
g11 0.750
0.749900
0.749900
0.749900
0.749900
0.000e + 00
g12 1.000000
1.000000
1.000000
1.000000
1.000000
0.000e + 00
g13 0.053950
0.0539415
0.0539415
0.0539415
0.0539415
0.000e + 00
pd = 2 and the margin parameter = 0.1. In this paper, 30 independent runs are
performed.
172
DEpm without the approximation of the constraint violation, or DEpm -, where xi
is always evaluated when the number of the feasible solutions is small.
The number of evaluations of the objective function and the constraints to reach a
near-optimal solution, where the difference between the objective value of the nearoptimal solution and the optimal solution is within 104 , is shown in Table 6.3. The
average number of evaluations for the objective function and the constraints over 30
runs are shown in the columns labeled #func and #const respectively. The standard
deviations of the number of evaluations for the objective function and the constraints
are shown in parentheses. Also, the ratios of FEs of DEpm and DEpm - compared
with FEs of the DE and statistical significance are shown under the standard deviations. Statistical differences between DEpm and DEpm - and between DEpm and
DE using Welchs t-test are shown by ++/, +/ and as significantly different (smaller/greater) with p-value p < 0.01, significantly different (smaller/greater)
with p < 0.05 and otherwise, respectively.
Apparently, DEpm attained the best results followed by DEpm -. DEpm is
statistically faster than DE in 12 problems and faster than DEpm - in 9 problems.
DEpm can reduce the evaluation of the constraints by about 550 % compared with
DE. DEpm - can reduce the evaluation of the constraints by 0 to about 45 %.
Also, DEpm can reduce the evaluation of the objective function by about 1550 %
compared with DE. DEpm - can reduce the evaluation of the objective function
by about 045 %.
These results show that the potential model is effective not only for objective
function but also for constraint violation. Thus, it is thought that the potential model
is a general-purpose rough approximation model.
In the constrained method, the objective function and the constraints are treated
separately. So, when the order relation of the search points can be decided only by
the constraint violation of the constraints, the objective function is not evaluated, or
the evaluation of the objective function can often be omitted. Thus, the number of
evaluations of the objective function is less than the number of evaluations of the
constraints. This nature of the constrained method contributes to the efficiency of
the algorithm, especially when the objective function is computationally demanding.
The number of evaluations of the constraint violations to find the near-optimal solution ranged from about 500 to 120,000. The number of evaluations of the objective
function ranged between about 200 and 50,000. For these problems, DEpm can omit
the evaluation of the objective function by about 1590 %. Therefore, DEpm can
find optimal solutions very efficiently, especially from the viewpoint of the number
of evaluations for the objective function.
173
Table 6.3 Comparison of the number of FEs to attain within 104 error from the optimal value
f
DEpm
DEpm -
DE
#const
#func
#const
#func
#const
#func
g01
g02
g03
g04
g05
g06
g07
g08
g09
g10
g11
g12
g13
44099.2
(1250.4)
0.76,++,++
123382.6
(11190.3)
0.83,,++
39489.8
(9040.0)
0.97,,
13556.8
(671.1)
0.56,,++
25007.6
(1435.7)
0.65,++,++
3344.7
(251.8)
0.53,++,++
54781.8
(4487.7)
0.76,,++
462.4
(85.9)
0.49,++,++
14700.6
(873.3)
0.69,++,++
45332.1
(2872.1)
0.72,++,++
10302.3
(3335.6)
0.60,++,++
2127.7
(419.1)
0.53,+,++
22304.5
(1049.0)
0.66,++,++
13626.1
(344.9)
0.82,,++
51697.8
(4062.7)
0.87,,++
11827.3
(483.2)
0.86,++,++
5087.9
(240.9)
0.54,,++
10173.6
(537.5)
0.74,++,++
1468.5
(176.4)
0.48,,++
15278.5
(1194.8)
0.77,,++
206.2
(67.8)
0.52,,++
7047.1
(398.2)
0.71,,++
7975.0
(463.2)
0.76,,++
8681.2
(2684.1)
0.70,++,++
207.4
(60.4)
0.56,,++
7618.8
(1211.1)
0.65,++,++
45899.6
(1411.9)
0.79
123382.6
(11190.3)
0.83
38707.7
(2530.4)
0.95
13589.1
(494.9)
0.56
38502.9
(409.4)
1.00
4110.0
(249.0)
0.65
56584.8
(3509.1)
0.79
713.3
(82.6)
0.75
15662.9
(946.7)
0.74
48126.4
(3182.2)
0.77
17105.3
(5476.2)
1.00
2447.7
(532.9)
0.61
33869.8
(691.6)
1.00
13782.8
(375.8)
0.83
51697.8
(4062.7)
0.87
13587.7
(287.3)
0.98
5061.7
(169.8)
0.54
13663.1
(225.8)
1.00
1418.2
(118.6)
0.46
15443.9
(878.9)
0.78
212.1
(54.8)
0.53
7225.8
(409.5)
0.73
8095.5
(577.7)
0.77
12380.3
(4027.3)
1.00
218.7
(55.6)
0.59
11662.2
(1133.7)
1.00
58135.3
(1306.0)
1.00
148677.6
(13972.9)
1.00
40566.8
(3575.5)
1.00
24063.7
(1124.7)
1.00
38502.9
(409.4)
1.00
6336.6
(366.5)
1.00
71619.5
(4163.2)
1.00
946.0
(142.5)
1.00
21177.6
(959.0)
1.00
62695.3
(3647.7)
1.00
17105.3
(5476.2)
1.00
4041.9
(1122.6)
1.00
33869.8
(691.6)
1.00
16667.1
(293.6)
1.00
59273.8
(5224.9)
1.00
13818.7
(341.5)
1.00
9410.9
(326.1)
1.00
13663.1
(225.8)
1.00
3058.8
(201.8)
1.00
19851.5
(1051.2)
1.00
397.8
(108.5)
1.00
9947.2
(439.3)
1.00
10466.0
(578.9)
1.00
12380.3
(4027.3)
1.00
370.0
(105.8)
1.00
11662.2
(1133.7)
1.00
g05
5126.4967
g04
30665.5387
g03
1.0005
g02
0.803619
g01
15.000
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
15.000000
15.000000
15.000000
15.000000
4.19e12
0.803547
0.803056
0.802406
0.790861
2.26e03
1.000500
1.000500
1.000500
1.000499
1.13e07
30665.538672
30665.538672
30665.538672
30665.538672
0.00e+00
5126.496714
5126.496714
5126.496714
5126.496714
0.00e+00
15.000000
15.000000
15.000000
15.000000
0.00e+00
0.803618
0.803614
0.803613
0.803588
5.59e06
1.000500
1.000500
1.000500
1.000500
6.46e09
30665.538670
30665.538670
30665.538670
30665.538670
0.00e+00
5126.496714
5126.496714
5126.496714
5126.496714
1.82e12
15.000
15.000
15.000
15.000
0.00e+00
0.803601
0.792549
0.785238
0.751322
1.67e02
1.000
1.000
1.000
1.000
2.09e04
30665.539
30665.539
30665.539
30665.539
0.00e+00
5126.599
5160.198
5174.492
5304.167
5.006e+01
15.000
15.000
15.000
15.000
1.6e14
0.803388
0.792420
0.790148
0.756986
1.3e02
1.000
1.000
1.000
1.000
5.9e05
30665.539
30665.539
30665.539
30665.539
7.4e12
5126.498
5126.776
5127.648
5135.256
1.8e+00
15.000000
15.000000
15.000000
14.999998
4.297e07
0.803241
0.802556
0.801258
0.792363
3.832e03
1.000000
1.000000
1.000000
1.000000
1.304e12
30665.539
30665.539
30665.539
30665.539
5.404e07
5126.4981
5126.4981
5126.4981
5126.4984
1.727e07
15.0000
15.0000
15.0000
15.0000
0.00e+00
0.8036191
0.8033239
0.7998220
0.7851820
6.29e03
1.0005
1.0005
1.0005
1.0005
0.0e+00
30665.5387
30665.5387
30665.5387
30665.5387
0.0e+00
5126.4967
5126.4967
5126.4967
5126.4967
0.0e+00
Table 6.4 Comparison of statistical results among the DEpm , the DE, SMES, ATMES, HCOEA, ECHT-EP2, and A-DDE
Stat.
DEpm
DE
SMES
ATMES
HCOEA
ECHT-EP2
f &optimal
F E max
100,000
200,000
240,000
240,000
240,000
240,000
(continued)
15.000
15.000
15.000
15.000
7.00e06
0.803605
0.777368
0.771090
0.609853
3.66e02
1.000
1.000
1.000
1.000
9.30e12
30665.539
30665.539
30665.539
30665.539
3.20e13
5126.497
5126.497
5126.497
5126.497
2.10e11
A-DDE
180,000
174
T. Takahama and S. Sakai
g09
680.630057
g08
0.095825
g07
24.3062
g06
6961.8139
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
DEpm
100,000
6961.813876
6961.813876
6961.813876
6961.813876
2.80e12
24.306209
24.306209
24.306210
24.306214
1.22e06
0.095825
0.095825
0.095825
0.095825
0.00e+00
680.630057
680.630057
680.630057
680.630057
0.00e+00
6961.814
6961.814
6961.284
6952.482
1.85e+00
24.327
24.426
24.475
24.843
1.32e01
0.095825
0.095825
0.095825
0.095825
0.00e+00
680.632
680.642
680.643
680.719
1.55e02
SMES
240,000
6961.814
6961.814
6961.814
6961.814
4.6e12
24.306
24.313
24.316
24.359
1.1e02
0.095825
0.095825
0.095825
0.095825
2.8e17
680.630
680.633
680.639
680.673
1.0e02
ATMES
240,000
6961.81388
6961.81388
6961.81388
6961.81388
8.507e12
24.3064582
24.3073055
24.3073989
24.3092401
7.118e04
0.095825
0.095825
0.095825
0.095825
2.417e17
680.6300574
680.6300574
680.6300574
680.6300578
9.411e08
HCOEA
240,000
6961.8139
6961.8139
6961.8139
6961.8139
0.00e+00
24.3062
24.3063
24.3063
24.3063
3.19e05
0.09582504
0.09582504
0.09582504
0.09582504
0.0e+00
680.630057
680.630057
680.630057
680.630057
2.61e08
ECHT-EP2
240,000
(continued)
6961.814
6961.814
6961.814
6961.814
2.11e12
24.306
24.306
24.306
24.306
4.20e05
0.095825
0.095825
0.095825
0.095825
9.10e10
680.63
680.63
680.63
680.63
1.15e10
A-DDE
180,000
g13
0.0539415
g12
1.000
g11
0.749900
g10
7049.248
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
Best
Median
Mean
Worst
7049.248021
7049.248021
7049.248021
7049.248026
1.03e06
0.749900
0.749900
0.749900
0.749900
0.00e+00
1.000000
1.000000
1.000000
1.000000
0.00e+00
0.0539415
0.0539415
0.0539415
0.0539415
0.00e+00
DEpm
100,000
7049.248021
7049.248021
7049.248021
7049.248021
0.00e+00
0.749900
0.749900
0.749900
0.749900
0.00e+00
1.000000
1.000000
1.000000
1.000000
0.00e+00
0.053942
0.053942
0.053942
0.053942
0.00e+00
DE
200,000
7051.903
7253.603
7253.047
7638.366
1.36e+02
0.75
0.75
0.75
0.75
1.52e04
1.0000
1.0000
1.0000
1.0000
0.00e+00
0.053986
0.061873
0.166385
0.468294
1.77e01
SMES
240,000
7052.253
7215.357
7250.437
7560.224
1.2e+02
0.75
0.75
0.75
0.75
3.4e04
1.000
1.000
1.000
0.994
1.0e03
0.053950
0.053952
0.053959
0.053999
1.3e05
ATMES
240,000
7049.286598
7049.486145
7049.525438
7049.984208
1.502e01
0.750000
0.750000
0.750000
0.750000
1.546e12
1.000000
1.000000
1.000000
1.000000
0.00e+00
0.0539498
0.0539498
0.0539498
0.0539499
8.678e08
HCOEA
240,000
7049.2483
7049.2488
7049.2490
7049.2501
6.60e04
0.7499
0.7499
0.7499
0.7499
0.0e+00
1.0000
1.0000
1.0000
1.0000
0.0e+00
0.0539415
0.0539415
0.0539415
0.0539415
1.00e12
ECHT-EP2
240,000
7049.248
7049.248
7049.248
7049.248
3.23e4
0.75
0.75
0.75
0.75
5.35e15
1.000
1.000
1.000
1.000
4.10e11
0.053942
0.053942
0.079627
0.438803
9.60e02
A-DDE
180,000
176
T. Takahama and S. Sakai
177
6.7 Conclusions
In order to utilize a rough approximation model in constrained optimization, a new
scheme of combining the constrained method and the estimated comparison using
potential model is proposed. The potential model is used to approximate not only the
objective function but also the constraint violation. This idea is introduced to differential evolution, which is known as a simple, efficient, and robust search algorithm
that can solve unconstrained optimization problems, and the DEpm is proposed.
It is shown that DEpm could solve 13 benchmark problems most efficiently
compared with many other methods. Also, it is shown that the potential model is
a general-purpose rough approximation model and the approximation of both the
objective function and the constraint violation can improve the efficiency of DE.
In the future, we will apply DEpm to various real-world problems that have
expensive objective functions.
Acknowledgments This research is supported in part by Grant-in-Aid for Scientific Research
(C) (No. 24500177, 26350443) of Japan society for the promotion of science and Hiroshima City
University Grant for Special Academic Research (General Studies).
References
Aguirre AH, Rionda SB, Coello CAC, Lizrraga GL, Montes EM (2004) Handling constraints using
multiobjective optimization concepts. Int J Numer Methods Eng 59(15):19892017
Bche D, Schraudolph NN, Koumoutsakos P (2005) Accelerating evolutionary algorithms with
Gaussian process fitness function models. EEE Trans Syst, Man, Cybern, Part C: Appl Rev
35(2):183194
178
Camponogara E, Talukdar SN (1997) A genetic algorithm for constrained and multiobjective optimization. In: Alander JT (ed) 3rd Nordic workshop on genetic algorithms and their applications
(3NWGA), University of Vaasa, Vaasa pp 4962
Coello CAC (2000a) Constraint-handling using an evolutionary multiobjective optimization technique. Civ Eng Environ Syst 17:319346
Coello CAC (2000b) Use of a self-adaptive penalty approach for engineering optimization problems.
Comput Ind 41(2):113127
Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11
12):12451287
Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods
Appl Mech Eng 186(2/4):311338
Farmani R, Wright JA (2003) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Guimares FG, Wanner EF, Campelo F, Takahashi RH, Igarashi H, Lowther DA, Ramrez JA (2006)
Local learning and search in memetic algorithms. In: Proceedings of the 2006 IEEE congress on
evolutionary computation, Vancouver. pp 98419848
Homaifar A, Lai SHY, Qi X (1994) Constrained optimization via genetic algorithms. Simulation
62(4):242254
Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft
Comput 9:312
Jin Y, Olhofer M, Sendhoff B (2000) On evolutionary optimization with approximate fitness functions. In: Proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann, pp 786792
Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate
fitness functions. IEEE Trans Evol Comput 6(5):481494
Jin Y, Sendhoff B (2004) Reducing fitness evaluations using clustering techniques and neural
networks ensembles. In: Genetic and evolutionary computation conference. LNCS, vol 3102,
Springer, pp 688699
Joines J, Houck C (1994) On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with GAs. In: Fogel D (ed) Proceedings of the first IEEE conference on evolutionary computation. IEEE Press, Orlando, pp 579584
Mallipeddi R, Suganthan PN (2010) Ensemble of constraint handling techniques. IEEE Trans Evol
Comput 14(4):561579
Mezura-Montes E, Coello CAC (2005) A simple multimembered evolution strategy to solve constrained optimization problems. IEEE Trans Evol Comput 9(1):117
Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1:173194
Mezura-Montes E, Palomeque-Ortiz AG (2009) Parameter control in differential evolution for constrained optimization. In: Proceedings of the 2009 IEEE congress on evolutionary computation,
pp 13751382
Michalewicz Z (1995) A survey of constraint handling techniques in evolutionary computation
methods. In: Proceedings of the 4th annual conference on evolutionary programming. The MIT
Press, Cambridge, pp 135155
Michalewicz Z, Attia N (1994) Evolutionary optimization of constrained problems. In: Sebald A,
Fogel L (eds) Proceedings of the 3rd annual conference on evolutionary programming. World
Scientific Publishing, River Edge, pp 98108
Ong YS, Zhou Z, Lim D (2006) Curse and blessing of uncertainty in evolutionary algorithm using
approximation. In: Proceedings of the 2006 IEEE congress on evolutionary computation. Vancouver, pp 98339840
Ray T, Liew KM, Saini P (2002) An intelligent information sharing strategy within a swarm for
unconstrained and constrained optimization problems. Soft ComputFusion Found, Methodol
Appl 6(1):3844
179
Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Runarsson TP, Yao X (2003) Evolutionary search and constraint violations. In: Proceedings of the
2003 congress on evolutionary computation, vol 2. IEEE Service Center Piscataway, New Jersey,
pp 14141419
Sakai S Takahama T (2010) A parametric study on estimated comparison in differential evolution
with rough approximation model. In: Kitahara M, Morioka K (eds) Social systems solution by
legal informatics. Economic sciences and computer sciences, Kyushu University Press, Fukuoka,
pp 112134
Storn R, Price K (1997) Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341359
Surry PD, Radcliffe NJ (1997) The COMOGA method: constrained optimisation by multiobjective
genetic algorithms. Control Cybern 26(3):391412
Takahama T, Sakai S (2000) Tuning fuzzy control rules by the constrained method which solves
constrained nonlinear optimization problems. Electron Commun Japan, Part 3: Fundam Electron
Sci 83(9):112
Takahama T, Sakai S (2005a) Constrained optimization by applying the constrained method to
the nonlinear simplex method with mutations. IEEE Trans Evol Comput 9(5):437451
Takahama T, Sakai S (2005b) Constrained optimization by constrained particle swarm optimizer
with -level control. In: Proceedings of the 4th IEEE international workshop on soft computing
as transdisciplinary science and technology (WSTST05), pp 10191029
Takahama T, Sakai, S (2006) Constrained optimization by the constrained differential evolution
with gradient-based mutation and feasible elites. In: Proceedings of the 2006 IEEE congress on
evolutionary computation, pp 308315
Takahama T, Sakai S (2008a) Efficient optimization by differential evolution using rough approximation model with adaptive control of error margin. In: Proceedings of the joint 4th international conference on soft computing and intelligent systems and 9th international symposium on
advanced intelligent systems, pp 14121417
Takahama T, Sakai S (2008b) Reducing function evaluations in differential evolution using rough
approximation-based comparison. In: Proceedings of the 2008 IEEE congress on evolutionary
computation, pp 23072314
Takahama T, Sakai S (2009a) A comparative study on Kernel smoothers in differential evolution
with estimated comparison method for reducing function evaluations. In: Proceedings of the 2009
IEEE congress on evolutionary computation, pp 13671374
Takahama T, Sakai S (2009b) Fast and stable constrained optimization by the constrained differential evolution. Pac J Optim 5(2):261282
Takahama T, Sakai S (2010a) Constrained optimization by the constrained differential evolution
with an archive and gradient-based mutation. In: Proceedings of the 2010 IEEE congress on
evolutionary computation, pp 16801688
Takahama, T, Sakai S (2010b) Efficient constrained optimization by the constrained adaptive
differential evolution. In: Proceedings of the 2010 IEEE congress on evolutionary computation,
pp 20522059
Takahama T, Sakai S (2010c) Reducing function evaluations using adaptively controlled differential evolution with rough approximation model. In: Tenne Y, Goh C-K (eds) Computational
intelligence in expensive optimization problems. Adaptation learning and optimization, vol 2.
Springer, Berlin, pp 111129
Takahama T, Saka S (2013) Efficient constrained optimization by the constrained differential
evolution with rough approximation using kernel regression. In: Proceedings of the 2013 IEEE
congress on evolutionary computation, pp 6269
Takahama T, Sakai S, Iwane N (2006) Solving nonlinear constrained optimization problems by the
constrained differential evolution. In: Proceedings of the 2006 IEEE adaptation learning and
optimization, pp 23222327
180
Tessema B, Yen G (2006) A self adaptive penalty function based algorithm for constrained optimization. In: Yen GG, Lucas SM, Fogel G, Kendall G, Salomon R, Zhang B-T, Coello CAC,
Runarsson TP (eds) Proceedings of the 2006 IEEE congress on evolutionary computation. IEEE
Press, Vancouver, pp 246253
Venkatraman S, Yen GG (2005) A generic framework for constrained optimization using genetic
algorithms. IEEE Trans Evol Comput 9(4):424435
Wang Y, Cai Z, Cuo G, Zhou Z (2007) Multiobjective optimization and hybrid evolutionary
algorithm to solve constrained optimization problems. IEEE Trans Syst, Man Cybern, Part B
37(3):560575
Wang Y, Cai Z, Xhau Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary
computation. IEEE Trans Evol Comput 12(1):8092
Chapter 7
Abstract Many step size adaptation techniques for evolution strategies have been
developed with unconstrained optimization problems in mind. In constrained settings, the interplay between step size adaptation and constraint handling is both of
crucial importance and often not well understood. We consider a linear optimization
problem with a feasible region defined by a right circular cone symmetric about the
gradient direction, such that the optimal solution is located at the cones apex. We
provide a detailed analysis of the behaviour of a multi-recombinative evolution strategy that employs cumulative step size adaptation and a simple constraint handling
technique. The results allow studying the influence of parameters of both the problem
class at hand, such as the angle at the cones apex, and of the strategy considered,
including its population size parameters. The impact of assuming different models
for the cost of objective and constraint function evaluations is discussed.
Keywords Evolution strategy Constraint handling
adaptation Conically constrained problem
7.1 Introduction
While numerous constraint handling techniques used in connection with evolution
strategies exist and are in common use (compare Mezura-Montes and Coello Coello
(2011)), the understanding of their properties lags behind that of strategy variants for
unconstrained problems. Of particular significance for the success of the strategies is
the interaction between step size adaptation and constraint handling technique. Generally, convergence to non-stationary points is more easily avoided in unconstrained
settings than in constrained ones.
J. Porter (B) D.V. Arnold
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada
e-mail: jporter@cs.dal.ca
D.V. Arnold
e-mail: dirk@cs.dal.ca
Springer India 2015
R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,
Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_7
181
182
See Beyer and Schwefel (2002) for an overview of evolution strategy terminology.
183
the expected behaviour of a single step of the iterative algorithm. Section 7.4 expands
on these results to model the strategy as a Markov process and describes its steady
state for scale invariant step size. Section 7.5 expands that analysis to consider cumulative step size adaptation, and derives update rules for related quantities. Finally,
in Sect. 7.6 we provide a summary of our results and discuss their implications. An
Appendix contains details of computations related to Sect. 7.3.
7.2.1 Algorithm
The (/, )-ES with cumulative step size adaptation (CSA) is an iterative algorithm for solving N -dimensional, real-valued optimization problems. The variant
considered throughout this paper resamples infeasible offspring candidate solutions
until they are feasible (compare Oyman et al. (1999)). Its state is described by the
population centroid x R N , the step size R, and the search path s R N . A
single iteration is described in detail in Algorithm 1.
Algorithm 1 Single iteration of (/, )-ES with CSA
Input: f : R N R
1: for k = 1 do
2:
repeat
3:
z(k) = N (0, I)
4:
x(k) = x + z(k)
5:
until IsFeasible(x(k) )
6: endfor
7: sort [z(1) , . . . , z() ], [ f (x(1) ), . . . , f (x() )]
1 (k)
z
8: z =
k=1
9: x = x + z
10: s = (1 c)s
c)z
+ 2 c(2
s N
11: = exp
2D N
update s
update
184
a feasible offspring candidate solution has been generated (Lines 16). Parameter
determines the variance and thereby the step size of the strategy; vectors z(k) are
referred to as mutation vectors. For the purpose of selection, the objective function
of the problem at hand is then used to evaluate the quality of the offspring candidate
solutions. Recombination averages the best offspring candidate solutions to form
the next population centroid and is implemented by averaging the mutation vectors
corresponding to the selected offspring (Lines 79).
The cumulative step-size adaptation approach introduced by Ostermeier et al.
(1994) modifies the step size parameter of the strategy based on past averaged
mutations. It employs an exponentially fading record of recent steps referred to as
the search path (Line 10), where
c (0, 1) is a constant that controls the rate of
exponential fading. The factor (2 c)/c in the update rule normalizes the nonunit variances of the steps, and ensures that if successive steps are uncorrelated, the
search path is of expected length N . The step size of the strategy is then increased if
recent steps of the strategy are positively correlated (as indicated by search paths with
length exceeding the dimension of the problem), and it is decreased if correlations
between recent steps are negative (if search paths are short). The factor D in the
update rule (Line 11) is a damping constant and controls how rapidly the step size
can be adapted. The search path and step size are initialized as s = 0 and = 1,
respectively.
(7.1)
N
xi2 0
(7.2)
x1 0.
(7.3)
i=2
185
to denote the distance from x to the axis of the cone within the N 1 dimensional
hyperplane determined by x1 , then z can be written as
186
z =
N
1
xi z i .
R
i=2
In each generation, all of the offspring must be feasible before recombination can
occur. In other words, for any offspring both
x1 + z 1 0
and
(x1 + z 1 )
2
N
(7.5)
(xi + z i )2 0
(7.6)
i=2
N
R
(7.7)
(7.8)
x1 = R
+ .
N
(7.9)
Substituting this into Eq. (7.5) and using Eq. (7.7) gives us the equivalent statement
+ +
z1 0
N
N
using normalized quantities. Assuming that both and tend to finite limit values
as N increases (and it will be confirmed
below that they do), then taking the limit
+ 2 z 1
N
2 2
+ z +
z i2 0.
z1
N
N
i=2
187
+ 2 z 1 2
z
2
(7.10)
for a mutation vector to result in a feasible offspring candidate solution. Since both
the z i and z are standard normally distributed, the probability of the offspring
candidate solution x + z being feasible can thus be expressed using the conditional
probability of z 1 as
Pfeas =
1
2
+2 x 2
2
ex
2 /2
ey
2 /2
dy dx
1
+ 2 x 2
2
dx
=
ex /2
2
2
2
=
2 + 2
(7.11)
where () denotes the cumulative distribution function of the standard normal distribution. Equality between the second and third lines is established by use of an
identity from Arnold (2002, p. 117).
+ 2 x 2
2
otherwise.
1
2
2
e(x +y )/2
p1, (x, y) = 2 Pfeas
if y
(7.12)
p1, (x, y) dy
1
2 Pfeas
x 2 /2
+ 2 x 2
2
(7.13)
188
(7.14)
for the slack is directly implied by Eq. (7.6), where superscripts indicate iteration
number. To derive the update rule for the normalized slack , this can be combined
with Eq. (7.8) to write
(t+1) =
R (t)
R (t+1)
(t)
(t)
(t) + 2 (t) z (t)
+ z
1
N
2
(t) (t) 2
(t) 2
(t) 2
z 1
+
z
+ z
N
where z 1 , z , and z refer to the respective component lengths averaged from the
best offspring. The update rule for distance R is derived from Eq. (7.4) to be
R (t+1) =
189
N
2
(xi + z i (t) )
i=2
=R
(t) 2
(7.15)
Using Eq. (7.14), combining this with Eq. (7.15), and taking the limit as N ,
the update rule becomes
2
.
(t+1) = (t) + 2 z 1 2 z
(7.16)
(7.17)
The expected values E[z 1 ] and E[z ] are functions of , and expressions for both
can be found in the Appendix.
Figure 7.1 plots the average normalized slack for the (/, )-ES with = 10
and {1, 3}. The curves were computed by numerically solving Eq. (7.17) with
Eqs. (7.27) and (7.30) using Eqs. (7.28) and (7.31). The data points were found by
artificially restricting the normalized step size of Algorithm 1 to a fixed value of
and initializing runs with a point on the boundary of the constrained region. For
each run, the first 40N iterations were discarded to allow for initialization conditions
to subside, then the average normalized slack over the next 20,000 iterations was
recorded. An upper limit for resampling was set at 1,000, so that a run for generating
a data point would be aborted if any offspring remained infeasible after 1,000 resampling operations. In this event, all subsequent data points were also omitted from
the graph. As observed for the = 1 case in Arnold (2013a), the normalized slack
increases with increasing and increasing . The same holds true for > 1. The
190
normalized slack *
1.0e+02
= 10.0
1.0e+01
= 1.0
1.0e+00
1.0e-01
= 0.1
1.0e-02
1.0e-03
1.0e-01
1.0e+00
1.0e+01
Fig. 7.1 Average normalized slack plotted against the normalized step size . Solid lines
represent results for = 1, while dashed lines represent results for = 3. In both cases, = 10.
Marked points represent experimental data from runs of the strategy with scale invariant step size
and dimension N = 40
case of = 3 shows larger overall values of normalized slack than for = 1. This
can be explained by noting that by averaging across multiple offspring, selection
pressure for remaining close to the constraint boundary is reduced, and candidate
solutions will tend to drift farther away. The data points appear to match very closely
to the predicted curves throughout, which suggests that using the Dirac delta model
is suitable for the range of parameters considered in the plot.
z 1
N + /N
"
.
(7.18)
191
Dropping quadratic and higher order terms from the Taylor series expansion of the
logarithm and taking expected values, as N this becomes
E z 1
.
=
(7.19)
That is, convergence rates are affected by the normalized step size of the strategy as
well as by the population size parameters and that are implicit in E[z 1 ].
Higher convergence rates can be achieved by using larger values of and .
However, increasing the population size parameters also increases the computational costs of a single iteration of the algorithm. We consider two cost models for
comparing different parameter settings. In the first model, we assume that objective
function evaluations have a uniform cost that dominates the cost of all other operations involved in Algorithm 1. In particular, the cost of constraint function evaluations
is assumed to be negligible in this model. In the second cost model, we assume that
the cost of constraint function evaluations dominates all other costs. Optimal performance under the first cost model requires optimizing obj = /, as the number
of objective function evaluations per iteration equals . Optimal performance under
the second cost model involves optimizing feas = Pfeas /, as /Pfeas is the
expected number of constraint function evaluations per iteration.
In Fig. 7.2, the probability Pfeas of generating feasible offspring is shown for the
(/, )-ES with scale invariant step size for = 10 and {1, 3}. The lines have
been obtained from Eq. (7.11), with the normalized slack computed using the Dirac
delta model as above. The data points were calculated from averages over the same
runs of 20,000 iterations used to generate Fig. 7.1. As observed for the = 1 case
in Arnold (2013a), the probability Pfeas decreases with increasing , going below
1.0
probability Pfeas
0.8
0.6
= 0.1
0.4
0.2
0.0
1.0e-01
= 1.0
= 10.0
1.0e+00
1.0e+01
Fig. 7.2 Probability Pfeas of a random offspring candidate solution being feasible plotted against
the normalized step size . Solid lines represent results for = 1, while dashed lines represent
results for = 3. In both cases, = 10. Marked points represent experimental data from runs of
the strategy with scale invariant step size and dimension N = 40
192
convergence rate *
2.0
= 10.0
1.5
= 1.0
1.0
0.5
0.0
-0.5
1.0e-01
= 0.1
1.0e+00
1.0e+01
Fig. 7.3 Convergence rate plotted against the normalized step size . Solid lines represent
results for = 1, while dashed lines represent results for = 3. In both cases, = 10. Marked
points represent experimental data from runs of the strategy with scale invariant step size and
dimension N = 40
one half and appearing to approach zero for large . For equal normalized step
size, Pfeas is larger for = 3 than for = 1, which is unsurprising as it has been
observed in Fig. 7.1 that = 3 results in larger normalized slack values.
Figure 7.3 shows the convergence rate of the (/, )-ES with scale invariant
step size for = 10 and {1, 3}. The data points were calculated from averages
computed over the same runs used to generate Figs. 7.1 and 7.2, and the curves were
computed using Eq. (7.19) after solving Eq. (7.17) numerically for the normalized
slack. As observed for the = 1 case in Arnold (2013a), each curve first increases
with increasing step size before it starts decreasing and eventually turns negative
(indicating divergence of the strategy). This overall pattern introduces the notion
of an optimal normalized step size that maximizes the rate of convergence .
Larger values of , which correspond to more narrow cones delimiting the feasible
region, appear to admit higher maximal convergence rates. In terms of the strategys
behaviour, this suggests that narrower regions of feasibility funnel the candidate
solutions toward the optimum solution by inherently limiting the choice of offspring
in perpendicular directions.
Figure 7.4 shows the behaviour of various quantities when the normalized step
and which maximize and ,
size is fixed at the optimum values obj
feas
obj
feas
respectively. The resulting probability of generating feasible offspring, convergence
rates relative to the number of objective and constraint function evaluations, and the
optimal step size itself are all plotted for the (/, )-ES with = 10 and {1, 3}.
The data for the curves was generated by numerically computing the optimal values
and using Eqs. (7.11) and (7.19) with the Dirac delta model.
obj
feas
(shown with solid lines), a cost model is assumed where objective function
For obj
evaluations dominate overall computational costs. The case of = 1 corresponds to
the observations made in Arnold (2013a). The probability Pfeas is higher for = 3
for sufficiently large . For all choices
than for = 1, and the same is true for obj
12.0
8.0
=3
4.0
0.0
1.0e-02
=1
1.0e-01
1.0e+00
0.8
0.4
=3
0.10
=1
0.05
1.0e-01
1.0e+00
constraint parameter
1.0e-01
1.0e+00
1.0e+01
constraint parameter
0.20
0.15
=1
0.2
0.0
1.0e-02
1.0e+01
=3
0.6
constraint parameter
0.00
1.0e-02
193
1.0
probability Pfeas
1.0e+01
0.20
0.15
0.10
=3
0.05
=1
0.00
1.0e-02
1.0e-01
1.0e+00
1.0e+01
constraint parameter
Fig. 7.4 Optimal normalized step size , probability Pfeas of generating feasible offspring, convergence rate obj relative to the number of objective function evaluations, and convergence rate feas
relative to the number of constraint function evaluations are plotted against constraint parameter
for = 10 and {1, 3}. All figures use solid lines to indicate the optimal normalized step
, and dotted lines to indicate the optimal normalized step size
size obj
feas
= 0.01
= 0.1
1.0
= 1.0
= 10.0
probability Pfeas
194
8.0
4.0
0.0
0.0
0.2
0.4
0.6
0.8
0.8
0.6
0.2
0.0
0.0
1.0
= 0.01
= 0.1
= 1.0
= 10.0
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.8
truncation ratio /
0.2
0.4
0.6
0.8
1.0
truncation ratio /
convergence rate *feas
truncation ratio /
0.3
= 0.01
= 0.1
= 1.0
= 10.0
0.4
1.0
0.3
= 0.01
= 0.1
= 1.0
= 10.0
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.8
1.0
truncation ratio /
, probability P
Fig. 7.5 Optimal normalized step size obj
feas of generating feasible offspring,
normalized convergence rate obj relative to the number of objective function evaluations, and
normalized convergence rate feas relative to the number of constraint function evaluations are
.
plotted against truncation ratio / for = 10. All figures use the optimal normalized step size obj
The data points are joined by lines for ease of visibility
normalized convergence rate relative to the two cost models show optimal behaviour
for intermediate values of , except for very small values of where = 1 is
optimal. For both models, the optimal value of appears to increase monotonically
with respect to .
, corresponding probability P
In Fig. 7.6, the optimal normalized step size feas
feas
of generating feasible offspring, and convergence rates relative to both cost models
are shown for = 10 and varying . All points were generated by computing the
using the same method as in Fig. 7.5, adjusted for
optimal normalized step size feas
the different cost model. Throughout, the values seem more tightly clustered than
in Fig. 7.5. The optimal value of for both cost models still appears to increase
monotonically with respect to .
16.0
12.0
= 0.01
= 0.1
8.0
4.0
0.0
0.0
0.2
0.4
0.6
0.8
0.8
0.6
0.2
0.0
0.0
1.0
= 1.0
= 10.0
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
1.0
truncation ratio /
1.0
= 0.01
= 0.1
= 0.01
= 0.1
= 1.0
= 10.0
0.4
truncation ratio /
0.3
195
1.0
= 1.0
= 10.0
probability Pfeas
0.3
= 0.01
= 0.1
= 1.0
= 10.0
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.8
1.0
truncation ratio /
truncation ratio /
, probability P
Fig. 7.6 Optimal normalized step size feas
feas of generating feasible offspring,
normalized convergence rate obj relative to the number of objective function evaluations, and
normalized convergence rate feas relative to the number of constraint function evaluations are
.
plotted against truncation ratio / for = 10. All figures use the optimal normalized step size feas
The data points are joined by lines for ease of visibility, and the scales are kept identical to Fig. 7.5
for straightforward comparison
s =
N
1
si xi
R
(7.20)
i=2
refers to the magnitude of the component of vector s which points in the direction
from the axis of the cone to candidate solution x. Together with the component s1 ,
normalized slack , normalized step size , and deviation s2 N , this describes
the state of the strategy. This gives a five-dimensional parameter space for modeling the Markov process, compared to the one-dimensional parameter space used in
Sect. 7.4. Using the consequence given in Eq. (7.17) of the existing update rule for
, and known expected values E[z 1 ], E[z ] as computed in the Appendix, then by
following a similar approach to that of Arnold (2013a) and Arnold and Beyer (2010)
we will derive update rules and model the stationary distributions for s1 , s , and s2
in order to completely describe the expected behaviour of the system when using
CSA.
An immediate consequence of the update of the search path in Line 10 of Algorithm 1 is the update equation
(t+1)
s1
(t)
= (1 c)s1 +
(t)
c(2 c)z 1
196
where superscripts indicate iteration number, for the component of s contained in the
subspace spanned by the x1 axis. Employing the Dirac delta model in the dynamical
(t+1)
(t)
systems approach and requiring that E[s1
] = s1 results in
s1 =
(2 c)
E[z 1 ]
c
(7.21)
#
R (t)
(t) (t)
(t)
(t)
s
= (t+1) (1 c) s +
z
N 2...N 2...N
R
(t) (t) 2
(t)
z2...N
+ c(2 c) z +
.
N
Then applying Eqs. (7.14) and (7.15) while omitting terms that disappear in the limit
N yields
(t+1)
s
(t)
(1 c)s
c(2 c) z +
(t+1)
s =
.
(t)
] = s , we have
(2 c)
E[z ] +
c
(7.22)
s(t+1) =
N
(t)
(t) 2
(1 c)si + c(2 c)z i
i=1
(t) (t)
= (1 c)2 s(t) 2 + 2(1 c) c(2 c)(z 1 s1
(t) (t)
+ z s ) + c(2 c)z(t) 2 .
Taking expected values, imposing the condition E[s(t+1) 2 ] = s(t) 2 , and recalling
that E[z2 ]/N = 1/ for large N , this becomes
197
s2 = (12c +c2 )s2 +2(1c) c(2 c)(E z 1 s1 +E z s )+c(2c)N .
Using Eqs. (7.21) and (7.22) gives
2(1 c)
E z 1
s N =
c
2
+ E z
+ E z
(7.23)
as an approximation for the average deviation of the squared length of the search
path from the expected value in the case of uncorrelated steps.
Finally, considering the normalized step size, using Eqs. (7.7) and (7.15) with the
update rule in Line 11 of Algorithm 1 results in
(t+1)
s(t+1) 2 N
R (t) (t)
= (t+1)
exp
.
2D N
R
1
=$
2
(t)
(t)
1 + 2 z /N +
/(N )
(t)
s(t+1) 2 N
exp
2D N
Using the Taylor expansions for 1/ 1 + x and exp(x) and dropping all terms of
quadratic and higher order we arrive at
#
(t+1)
(t)
1
1
N
(t)
(t) z
(t)
+
2
s(t+1) 2 N
+
2D N
.
s2 N
2
=
.
2
2D
Applying Eq. (7.23) to the right hand side while again taking expected values, this
yields
2(1 c)
2
2
2
=
E[z ] .
E[z 1 ] + E[z ] +
E[z ] +
2
2cD
constant D may be set to 1/c = N . Re-arranging the terms above while simplifying
and omitting those that vanish as N gives
2 = 22 E[z 1 ]2 + E[z ]2
(7.24)
1.0
=1
=3
probability Pfeas
198
8.0
6.0
4.0
2.0
0.0
1.0e-02
1.0e-01
1.0e+00
0.8
0.6
0.4
0.2
0.0
1.0e-02
1.0e+01
=1
=3
0.1
0.0
1.0e-02
1.0e-01
1.0e+00
1.0e-01
1.0e+00
1.0e+01
constraint parameter
1.0e+01
constraint parameter
constraint parameter
0.2
=1
=3
0.10
=1
=3
0.05
0.00
1.0e-02
1.0e-01
1.0e+00
1.0e+01
constraint parameter
Fig. 7.7 Normalized step size , probability Pfeas of generating feasible offspring, convergence
rate obj relative to the number of objective function evaluations, and convergence rate feas relative
to the number of constraint function evaluations plotted against constraint parameter . All plots
represent runs using CSA to control step size. Values for = 1 and = 3 are compared for
= 10. In all figures, the marked points represent experimental data from runs of the strategy using
dimension N = 40 (+) and dimension N = 400 (). The extra black dotted lines are provided for
reference, and indicate the curves for normalized step size optimized for obj as shown in Fig. 7.4
as an approximation to the average normalized step size that CSA will generate in
the stationary state of the strategy.
In Fig. 7.7, the average normalized step size, the probability Pfeas of generating
feasible offspring, and the normalized convergence rates relative to the two cost
models are plotted when using CSA to control the value of . The curves were
generated by numerically solving Eqs. (7.17) and (7.24) with Eqs. (7.27) and (7.30).
The data points were determined by averaging results from runs of 20,000 iterations
of the (/, )-ES using CSA. As before, the first 40N iterations were discarded to
avoid initialization biases, and resampling offspring over 1,000 times resulted in no
further data points included from that run. Step sizes generated using CSA with = 3
are larger than those generated with = 1, and in both cases the values generated
are close to the optimal ones for the obj cost model (shown with dotted lines)
except where is large and CSA results in significantly smaller than optimal values.
Considering Pfeas , the probability of generating feasible offspring decreases with
increasing constraint parameter, though not as rapidly as in Fig. 7.4 when optimized
for . Values of the convergence rate obj relative to the number of objective
function evaluations are close to optimal throughout, provided that N is large enough
for the approximations to be sufficiently accurate. Values of the convergence rate
feas relative to the number of constraint function evaluations decrease and lose
0.8
1.0
= 10.0
probability Pfeas
probability Pfeas
1.0
0.6
0.4
0.2
0.0
10
100
= 0.1
= 1.0
0.8
0.4
0.2
1
1.2
= 10.0
convergence rate
convergence rate
= 0.1
= 1.0
0.8
0.4
0.0
10
100
dimension N
10
100
1000
dimension N
dimension N
1.6
= 10.0
0.6
0.0
1000
199
1000
2.0
= 0.1
= 1.0
1.5
= 10.0
1.0
0.5
0.0
10
100
1000
dimension N
Fig. 7.8 Probability Pfeas and convergence rate plotted against search space dimension N . The
left hand graphs represent results for = 1 and those on the right for = 3. In both cases,
= 10. The horizontal lines represent results obtained using the dynamical systems approach
assuming N . The marked points represent results measured in runs of the (/, )-ES with
cumulative step size adaptation
accuracy with increasing constraint parameter, mirroring the behaviour of Pfeas . The
relatively inaccurate predictions of the convergence rates for = 3 and N = 40
can be explained from the large observed values of the normalized slack causing
significant error when dropping the term /N compared to in the calculation going
from Eqs. (7.18) to (7.19). Measurements for N = 400 are noticeably more accurate
in this case.
Finally, Fig. 7.8 illustrates the accuracy of the predictions made using the dynamical systems approach in the limit N by comparing the estimates for the
probability Pfeas of generating feasible offspring and the convergence rate with
measurements made in runs of the (/, )-ES with cumulative step size adaptation
as described above. It can be seen that the error in the predictions decreases with
increasing search space dimensionality, though not necessarily monotonically. Predictions for small values of are more accurate than those for larger values of the
constraint parameter, and the error in the predictions of is generally larger for
= 3 than it is for = 1. While in the latter case the error is below 15 % for N as
small as 20, = 3 requires N an order of magnitude larger in order to achieve that
level of accuracy for larger values of .
200
7.6 Conclusion
We have analyzed the behaviour of the (/, )-ES with cumulative step size adaptation applied to a conically constrained problem where the gradient direction coincides with the cones axis, and the optimal solution lies in the cones apex, on the
boundary of the feasible region. Under the assumption of scale invariant step size,
we used a Markov process model to estimate the evolving slack of candidate solutions and the overall operation of the strategy probabilistically. More narrow conic
regions of feasibility were found to result in higher convergence rates, for appropriately chosen normalized step size. If choosing the step size to maximize the rate of
convergence, the strategy performed better with larger choices of when the feasible
region was narrow, while = 1 was a better choice for feasible regions approaching
the half-space.
An offsetting factor for the high convergence rates in narrow regions of feasibility was that these regions also resulted in a lower probability of feasible offspring,
requiring more resampling in each generation on average. Selecting more offspring
for recombination with larger could improve the probability of offspring being
feasible in these narrow regions, but would not improve the rate of convergence in
more broad regions of feasibility. As the region approaches the half-space, choosing
> 1 would eventually reduce the convergence rate. The balance between the probability of generating feasible offspring and the rate of convergence was considered
using two cost models: one that assumes that objective function evaluations dominate
computational costs, and one that assumes that constraint function evaluations play
that role.
Using cumulative step size adaptation was found to lead to convergence, usually
at a rate close to the optimal one, at least for sufficiently large N . However, the
predicted convergence rates were notably inaccurate when both and were large
and the feasible region was narrow. In these cases, the strategy moves farther from
the constraint boundary, developing a large average value of normalized slack. With
dimension N = 40, the error term then dominates the predicted convergence rate.
With larger dimensional problems, the observed values once again approached the
predicted rate.
Acknowledgments This research was supported by the Natural Sciences and Engineering Research
Council of Canada (NSERC).
7.7 Appendix
The derivation of expressions for E[z 1 ] and E[z ] closely follows similar calculations
by Arnold (2013b), with differences due to the task here being minimization rather
than maximization and the underlying probability distributions differing from those
that hold for the linearly constrained problem.
201
p1
(x) =
!
p1 (x) [1 P1 (x)]k [P1 (x)]k1 .
( k)!(k 1)!
(7.25)
Since the value of z 1 is the average of the best individuals, its expected value can
be expressed as
1
E z 1 =
!
=
(k;)
x p1
(x) dx
k=1
[1 P1 (x)]k [P1 (x)]k1
dx.
x p1 (x)
( k)!(k 1)!
k=1
1
Q k [1 Q]k1
=
z 1 (1 z)1 dz (7.26)
( k)!(k 1)!
( 1)!( 1)!
Q
k=1
E z 1
= ( )
x p1 (x)
1P
1 (x)
z 1 (1 z)1 dz dx.
E z 1
= ( )
x p1 (x) p1 (y) [1 P1 (y)]1 [P1 (y)]1 dy dx
(7.27)
202
where
y
I1 (y) =
x p1 (x) dx.
We introduce abbreviations
Ax =
and
+ 2 x 2
2
2
B=
2 + 2
u = (A x )
2
u = eA x /2 / 2
yielding
I1 (y) =
y
1
2 Pfeas
xex
2 /2
(A x ) dx
1
2 Pfeas
= p1 (y) +
ey 2 /2 A y + 1
2
1
2 Pfeas
y
e(x
2 +A2 )/2
x
y
ex
2 /2
eA x /2 dx
2
dx.
The remaining integral can be solved by quadratic completion of the argument to the
exponential function and subsequent change of variable, resulting in
1
1
2
1 + Ay B .
eB /2
I1 (y) = p1 (y) +
2 Pfeas 1 +
(7.28)
Together with Eq. (7.27), the expression in Eq. (7.28) allows numerically computing
the expected value of z 1 .
p (y | z 1 = x) =
203
p1, (x, y)
,
p1 (x)
where the densities on the right hand side are given in Eqs. (7.12) and (7.13). The
corresponding conditional expected value is therefore
E z | z1 = x =
p1, (x, y)
dy
p1 (x)
1
2
2
ex /2 eA x /2 .
2 p1 (x)Pfeas
(7.29)
We use Eqs. (7.25) and (7.26) to express the expected value of this component for
the average of the best individuals, and write analogously to the calculations for
E[z 1 ]
E z
1
=
(k;)
E z | z 1 = x p1 (x) dx
k=1
= ( )
p1 (y) [1 P1 (y)]1 [P1 (y)]1 I2 (y) dy (7.30)
where
y
I2 (y) =
E z | z 1 = x p1 (x) dx.
y
e(x
2 +A2 )/2
x
dx.
Again using quadratic completion for the argument to the exponential function and
performing a change of variable results in
I2 (y) =
2
eB /2
1 + Ay B .
2 Pfeas 1 +
1
(7.31)
Together with Eq. (7.30), the expression in Eq. (7.31) allows numerically computing
the expected value of z .
204
References
Arnold DV (2002) Noisy optimization with evolution strategies. Kluwer Academic Publishers,
Dordrecht
Arnold DV (2011a) Analysis of a repair mechanism for the (1, )-ES applied to a simple constrained
problem. In: Genetic and evolutionary computation conferenceGECCO 2011. ACM Press, pp
853860
Arnold DV (2011b) On the behaviour of the (1, )-ES for a simple constrained problem. In: Beyer
H-G, Langdon WB (eds) Foundations of genetic algorithmsFOGA 2011. ACM Press, New
York, pp 1524
Arnold DV (2013a) On the behaviour of the (1, )-ES for a conically constrained problem. In:
Genetic and evolutionary computation conferenceGECCO 2013. ACM Press, pp 423430
Arnold DV (2013b) Resampling versus repair in evolution strategies applied to a constrained linear
problem. Evol Comput 21(3):389411
Arnold DV, Beyer H-G (2010) On the behaviour of evolution strategies optimising Cigar functions.
Evol Comput 18(4):661682
Arnold DV, Brauer D (2008) On the behaviour of the (1 + 1)-ES for a simple constrained problem.
In: Rudolph G et al (eds) Parallel problem solving from naturePPSN X. Springer, Berlin, pp
110
Auger A, Hansen N (2006) Reconsidering the progress rate theory for evolution strategies in finite
dimensions. In: Genetic and evolutionary computation conferenceGECCO 2006. ACM Press,
pp 445452
Balakrishnan N, Rao CR (1998) Order statistics: an introduction. In: Balakrishnan N et al (eds)
Handbook of statistics, vol 16. Elsevier, New York, pp 324
Beyer H-G (1989. Ein Evolutionsverfahren zur mathematischen Modellierung stationrer Zustnde
in dynamischen Systemen. PhD thesis, Hochschule fr Architektur und Bauwesen, Weimar
Beyer H-G (2001) The theory of evolution strategies. Springer, Heidelberg
Beyer H-G, Schwefel H-P (2002) Evolution strategiesa comprehensive introduction. Nat Comput
1(1):352
Meyer-Nieberg S, Beyer H-G (2012) The dynamical systems approachprogress measures and
convergence properties. In: Rozenberg G et al (eds) Handbook of natural computing. Springer,
Berlin, pp 741814
Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present, and future. Swarm Evol Comput 1(4):173194
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Ostermeier A, Gawelczyk A, Hansen N (1994) Step-size adaptation based on non-local use of
selection information. In: Davidor Y et al (eds) Parallel problem solving from naturePPSN III.
Springer, Berlin, pp 189198
Oyman AI, Deb K, Beyer H-G (1999) An alternative constraint handling method for evolution
strategies. In: Proceedings of the 1999 IEEE congress on evolutionary computation. IEEE Press,
pp 612619
Rechenberg I (1973) EvolutionsstrategieOptimierung technischer Systeme nach Prinzipien der
biologischen Evolution. Friedrich Frommann Verlag, Stuttgart
Schwefel H-P (1981) Numerical optimization of computer models. Wiley, Chichester
Chapter 8
205
206
with nonoverlapping topology with small swarm size in each sub-swarm performs
better in terms of locating different feasible regions in comparison to other topologies,
such as the global best topology and the ring topology.
Keywords Constrained optimization Feasible regions Disjoint feasible regions
Particle swarm optimization
8.1 Introduction
A constrained optimization problem (COP) is formulated as follows:
(a)
y S f (z) f (y)
find z S RD such that gi (z) 0, for i = 1 to q (b)
(8.1)
for j = 1 to m
(8.2)
207
Search space
c
a
d
b
1998), among others. The task of the optimization algorithm is to generate new solutions at every iteration. In each optimization algorithm, an operator is needed to
compare candidate solutions thus enabling the optimizer to select one (or more) of
the solutions.1 This comparison operator plays a key role in the performance of the
algorithm in finding better solutions. In unconstrained problems, this comparison
operator is simple, and, for a minimization problem, it is implemented as
x S is better than y S iff f (x) < f (y)
(8.4)
where f (.) : RD R is the objective function and x and y are two samples from
the search space. However, in COPs, in addition to the objective function, there are
constraints that need to be considered in the comparison procedure. There are three
cases for comparing two solutions x and y in a COP:
1. x and y F , i.e. both are feasible
2. x
/ F and y
/ F , i.e. both are infeasible
3. x
/ F and y F , one is feasible the other is infeasible.
If the solutions follow the case (1) then the comparison is easy because it is
made in the same way as in Eq. 8.4 (both solutions are feasible). In cases (2) and
(3) however, this comparison is more complicated. Figure 8.1 provides examples to
show the reason behind the complications within cases (2) and (3).
In Fig. 8.1, both solutions a and b are infeasible. Also, assume that all constraint
values for solution a are smaller than the constraint values for solution b (i.e. gj (a) <
gj (b) for all j). However, solution b is much closer to the optimal solution than
solution a (d is the optimal solution). Thus, if solution b is selected, there is a greater
chance for the algorithm to improve the solution thereby reaching the optimal solution
in the next steps. Clearly, choosing one of a or b is not an easy task because solution
a is better than b in terms of one aspect (the value of constraints), while solution
b is better than a in terms of another aspect (closeness to the optimal solution).
1
Note that this selection can be performed by a direct decision (the better solution is selected) or
by some analysis to find out the potential of the solutions. However, in either approach, the concept
of being better needs to be defined.
208
Also, choosing one of the solutions in case (3) is complicated. As an example, let
us concentrate on solutions b (an infeasible solution) and c (a feasible solution) in
Fig. 8.1. If solution c is selected, it is harder for the optimization algorithm to move
the solutions in the next steps toward the optimal solution, i.e., d. However, if solution
b is selected, although it is infeasible it is easier for the optimization algorithm to
move the solutions in the next steps toward the optimal solution. Clearly, the easiest
case is case (1) as the standard comparison between solutions can be used. However,
there are complications in regard to cases (2) and (3).
The aim of a CHT is to compare two solutions and decide which solution is the
better. Note that such a comparison needs to consider all the three aforementioned
cases. There are several categories of techniques for handling constraints that can
be incorporated in an optimization algorithm (Michalewicz and Schoenauer 1996);
these categories include penalty functions, special operators, repairs, decoders, and
hybrid techniques. In the category of penalty functions, the objective function is
combined with constraints in such a way that the problem is turned into an unconstrained problem. Thus, all solutions are feasible and, hence, comparisons follow
case (1) thereby making the comparison easy. In the category of special operators,
an operator is designed that always maps a feasible solution to a feasible solution.
Note that to use a technique in this category, the initial solutions need to be feasible. Because the solutions are always feasible all comparisons follow case (1), and
hence, comparison is done easily. In the category of repair, each infeasible solution
is repaired and a feasible solution is generated. In this case, two possibilities can be
considered: the original solution is kept in the population and is known as Baldwinian
evolution (Whitley et al. 1994), or it is replaced by the repaired solution known as
Lamarckian evolution (Whitley et al. 1994). In this category, because the solutions
are always feasible (repaired), again all comparisons follow case (1), thereby making the comparisons easier. In the category of decoder-based techniques, mapping
from genotype to phenotype is established such that any genotype is mapped into
a feasible phenotype. In this category, as with the previous categories, all solutions
are feasible, thus making it unnecessary to consider cases (2) and (3). Finally, the
last category, hybrid, includes all possible combinations of CHTs. It seems that all
CHTs try to apply some modification to the solutions (e.g., via repairing, applying
penalty) to get rid of the complications in comparison within cases (2) and (3).
There have been some attempts to design methods to explore the search space
of COPs to find a feasible solution: these methods are called constraint satisfaction methods (Tsang 1993). The acceptance criterion for a constraint satisfaction
method is at least one feasible solution. Normally, this feasible solution, found by
the constraint satisfaction method, is fed into an optimization method as an initial
solution, and the method improves the quality of this solution in terms of objective
value while maintaining feasibility. As feasible regions in COPs might have irregular shapes (e.g., disjoint, with holes, connected with narrow passages, non-convex),
the quality of the final solution, namely the improved solution by the optimization
method, is highly dependent on the location of the initial feasible solution. Figure 8.2
shows some examples of irregular shapes of feasible regions.
209
Search space
Feasible passage
The term potentially disjoint feasible regions refers to disjoint feasible regions and the regions
that are connected with narrow passages. Also, note that without information about the topology of
the search space, it is not possible to claim that the found solutions are in disjoint feasible regions.
210
issues. Then, the MLPSO is extended in such a way that it can locate feasible regions
in a COP. To confirm that the proposed method performs effectively in locating
feasible regions, the performance of the method is tested through some test cases
where the locations of their feasible regions are known.
The rest of the chapter is organized as follows: some background on COPs and
CHTs are provided in Sect. 8.2. An overview of the PSO algorithm including variants,
issues, topologies, and niching abilities, is given in Sect. 8.3. The proposed method
for locating feasible regions is proposed and discussed in Sect. 8.4 and it is tested
later in Sect. 8.5. At the end, we conclude the chapter and provide suggestions for
future research directions in Sect. 8.6.
q
max{0, gi (x)} +
i=1
m
|hi (x)|k
(8.5)
i=1
where k is a constant (in all of the experiments represented in this paper, k = 2).
Each solution x is represented by the pair (f , G) where f is the objective value at x
and G is its constraint violation value. If f1 and f2 are the objective values and G1 and
G2 are constraint violation values of the solution points x1 and x2 , then the level
comparison operator is defined as follows:
x1 x2
f1 f 2
G1 G2
if G1 , G2 or G1 = G2
otherwise
(8.6)
In other words, the -level comparison compares two solutions by constraint violation
value first. If both solutions have a violation value under a small threshold , or they
have the same level of violation, the two solutions are then ranked by the objective
function value only. Otherwise, the constraint violation value is taken into account.
There are some techniques to control the value of (Takahama and Sakai 2005).
211
212
(8.7)
i
i
, for i = 1, . . . , n
xt+1
= xti , vt+1
(8.8)
pit f ptt f xt+1
, for i = 1, . . . , n
= i
xt+1
otherwise
pit+1
(8.9)
In Eq. 8.7, Nti (known as the neighbor set of the particle i) is a subset of personal
best positions of some particles which contribute
updating rule of
to the velocity
that particle at iteration t, i.e. Nti = pkt k Tti {1, 2, . . . , n} where Tti is a
set of indices of particles which contribute to the velocity updating for particle i
at iteration t. Clearly, the strategy of determining Tti might be different for various
types of PSO algorithms and it is usually referred to as the topology of the swarm.
Many different topologies have been defined so far (Kennedy and Mendes 2002),
e.g., the global best topology (gbest), the ring topology, the nonoverlapping, and
the pyramid, that are discussed later in this paper. The function (.) calculates
the new velocity vector for the particle i according to its current position, current
velocity vti , and neighborhood set Nti . In Eq. 8.8, (.) is a function that calculates
3
In general, personal best can be a set of best positions, but all PSO types listed in this paper use
single personal best.
213
(8.10)
In this equation, 1 and 2 are two real numbers called acceleration coefficients,4
and pit and gt are the personal best (of particle i) and the global best vector, respectively, at iteration t. Also, the role of the vectors PI = pit xti (Personal Influence)
and SI = gt xti (Social Influence) is to attract the particles to move toward known
quality solutions, i.e., personal and global best. Moreover, R1t and R2t are two d d
diagonal matrices,5 where their elements are random numbers distributed uniformly
(U (0, 1)) in [0, 1]. Note that matrices R1t and R2t are generated at each iteration
for each particle separately.
In 1998, Shi and Eberhart (1998) introduced a new coefficient , known as inertia
weight, to control the influence of the last velocity value on the updated velocity.
Indeed, Eq. 8.10 was written as
i
i
i
pit xti + 2 R2t
gt xti
= vti + 1 R1t
vt+1
(8.11)
The coefficient controls the influence of the previous velocity on movement. The
iterative application of Eq. 8.11 (plus position updating) causes the particles to oscillate around personal and global best vectors (Clerc and Kennedy 2002). This oscillation is controlled by three parameters , 1 , and 2 so that the larger is, with
respect to 1 and 2 , the more explorative the particles are, and vice versa. In this
chapter, this variant is known as the standard PSO. In the standard PSO, if the random matrices are replaced by random values, the new variant is called the linear PSO
(LPSO).
There are several well-studied issues in the standard PSO, such as stagnation
(Bergh and Engelbrecht 2002, 2010), line search (Spears et al. 2010; Wilke et al.
2007a), swarm size (Bergh and Engelbrecht 2002, 2010), local convergence (Bergh
4
These two coefficients control the effect of personal and global best vectors on the movement
of particles and they play an important role in the convergence of the algorithm. They are usually
determined by a practitioner or by the dynamic of particles movement.
5 Alternatively, these two random matrices are often considered as two random vectors. In this case,
the multiplication of these random vectors by PI and SI is element-wise.
214
and Engelbrecht 2010), and rotation variance (Spears et al. 2010; Wilke et al. 2007b).
Apart from these issues within PSO, there have been some attempts to extend the
algorithm to work with COPs (Liang et al. 2010; Paquet and Engelbrecht 2007;
Takahama and Sakai 2005), to support niching6 (Brits et al. 2002, 2007; Engelbrecht
et al. 2005; Li 2010), to work effectively with large-scale problems (Helwig and
Wanka 2007), and to work in nonstationary environments (Wang and Yang 2010).
2k
/
2c c2 4c
(8.12)
(8.13)
is called the constriction factor and it is proposed to set its value by Eq. 8.13.
Also, c = c1 + c2 > 4. Note that this notation is algebraically equivalent to that in
Eq. 8.11. The authors proved that if these conditions hold for the constriction factor,
particles converge to a stable point and the velocity vector does not grow to infinity.
The values of c1 and c2 are often set to 2.05 and the value of k is in the interval
Niching is the ability of the algorithm to locate different optima rather than only one local optima.
The niching concept is used usually in the multi-modal optimization.
215
[0, 1] (usually set to 1). Note that with these settings, the value of is in the interval
[0, 1]. This analysis was also done from other perspectives by Trelea (2003), Bergh
and Engelbrecht (2006).
Although the constriction coefficient guarantees converging the particles to a
point (a convergent sequence), there is no guarantee that this final point is a quality
point in the search space (Bergh and Engelbrecht 2006). In Bergh and Engelbrecht
(2010), it has been proven that for any c1 and c2 that satisfy converging conditions,
all particles collapse to the global best gt , i.e. limt xti = pit = gt for all particles.
Also, if gt = pit = xti for all particles, the velocity vector shrinks very fast. In this
situation, i.e., gt = pit = xti for all particles and at the same time vti = 0, all particles
stop moving and no improvement can take place as all components for moving the
particles are zero. This issue is known as stagnation, and was first introduced as a
defect in the standard PSO (Bergh and Engelbrecht 2002) and further investigated
by Bergh and Engelbrecht (2010). This issue exists in both LPSO and CCPSO. A
variant of PSO was proposed (called Guaranteed Converging PSO, GCPSO) which
addressed the stagnation issue. The only difference between GCPSO and CCPSO
was in updating the velocity of the global best particle (the particle that its personal
best is the current global best of the swarm).
i
=
vt+1
x
ti + gt +
vti +
vti + c1 Rti pit xti + c2 Rti gt xti
if i = t
otherwise
(8.14)
where t is the index of the particle which its personal best is the global best of the
swarm, i.e., (gt = pt t ), and is a randomly generated through and adaptive approach
(Bergh and Engelbrecht 2010). Note that, according to this formulation, stagnation
might still happen for all particles except for the global best particle. Hence, if the
global best particle is improved, gt is improved, which causes the other particles to
get out of the stagnation situation. See Bonyadi and Michalewicz (2014) for more
information.
to LPSO
that is
exclusive
is called line search (Wilke et al. 2007a);
Another issue
if pit xti || gt xti and vti || pit xti , the particle i starts oscillating between its
personal best and the global best (line search) forever. In this case, only the solutions
that are on this line are sampled by the particle i and other locations in the search space
are not examined anymore. Wilke showed that this is not the case in the standard
PSO (Wilke et al. 2007a); however, there are some situations where the particles
in the standard PSO start oscillating along one of the dimensions while there is no
chance for them to get out of this situation (Bonyadi 2014; Spears et al. 2010; Bergh
and Engelbrecht 2010). Note that GCPSO does not have this issue.
Stagnation happens with a higher probability when the swarm size is small (Bergh
and Engelbrecht 2002); this is called the swarm size issue throughout this chapter.
In Bergh and Engelbrecht (2002), the authors argued that PSO is not effective when
its swarm size is small (2 for example), and particles stop moving in the earlier
stage of the optimization process. To address this issue, a new velocity updating rule
was proposed that was only applied to the global best particle to prevent it from
becoming zero. Consequently, the global best particle never stops moving which
216
solves the stagnation issue and, as a result, the swarm size issue is addressed as well.
Experiments confirmed that, especially in single modal optimization problems, the
new algorithm is significantly better than the standard version when the swarm size
is small (with 2 particles). Note that, in LPSO, apart from the stagnation issue, the
line search issue is reason why the algorithm becomes ineffective when the swarm
size is small.
i
vt+1
(8.15)
= vti + c1 R1t pit xti + c2 R2t lb it xti
where lb it is the best ever found solution by the particles i, i 1, and i + 1, i.e.
i
i
lb t = pt t where ti = argminlT i F plt . It has been shown that if the algorithm
t
uses the ring topology, it requires more iteration for exploration in comparison to the
gbest topology, thereby resulting in better explorative behavior.
Another topology that is used in this chapter is called nonoverlapping topology.
In this topology, the particles in the swarm are divided into several sets (called
i
sub-swarms)
that are independent of each other. In fact, if we define the set st =
{i} Tti , in any nonoverlapping topology, there exists at least one particle i that
j
for all j as a member of {1, 2, . . . , n} sti , the intersection
of sti and st is empty, i.e.
j
i {1, 2, . . . n} j {1, 2, . . . , n} sti
sti st = . Note that, in this case,
the gbest topology is a special case of nonoverlapping topology because for all i, the
j
{1, 2, . . . , n} sti is empty and, consequently, st is also empty. This means that
set
j
sti st = for any j {1, 2, . . . , n} sti . If the size of Tti is the same for all i,
7
A particle i is connected to particle j if it is aware of the personal best location of the particle j.
217
we show the topology by the notation nvl where l is the size of each sub-swarm.
Thus, the gbest topology can be indicated by nvn.
There are other topologies (e.g., pyramid) and it is hard to review all of them.
Our review has been limited to the topologies that are used in the rest of the chapter.
For further information about topologies, the readers are referred to Kennedy and
Mendes (2002).
218
219
(8.16)
where d
is a vector that connects the center of the coordinates to the point d, m is the
mutation operator, c and are two constants. Obviously, for every vector d, there
are two elements that the operator m should mutate: direction and magnitude. One
can consider two different ideas to design m: (1) it rotates d by a random rotation
matrix to perturb its direction and multiplies that to a random number to perturb its
magnitude, and (2) it adds a normal distribution to the vector, which mutates both the
length and direction. In the first design (rotating and then mutating the magnitude),
we can write
(8.17)
d
= m (d) = d
where is a rotation matrix and is a random scalar value. There are several ways
to design such as a Euclidean rotation equation (Ricardo and Prez-aguila 2004)
or an exponential map (Wilke et al. 2007b). However, both methods are in O D2
in terms of time complexity (see also Bonyadi (2014)).
The second design of the operator m can be written as
d
= m (d) = d + N (0, )
8
(8.18)
Note that the GCPSO is another variant of PSO (introduced in Sect. 8.3) that does not have the
swarm size issue. However, it is not a good choice for niching using the nonoverlapping topology.
The reason is that, in GCPSO, the only particle which is able to move after stagnation is the global
best particle. All other particles stay unchanged until this particle is improved. As the global best
particle is only in one of the sub-swarms (the sub-swarms do not overlap with each other), this
particle cannot share its information (personal best) with particles in the other sub-swarms. Thus,
all other sub-swarms stay in the stagnation situation and only one of the sub-swarms may continue
searching. This leads to ineffective niching behavior, as only one of the sub-swarms converges to a
local optimum.
220
c ||N (0, )||
c ||d||
if 0 ||d|| <
otherwise
(8.19)
where ||.|| is the norm operator and c is a constant, is a small real number, is a
vector in which the value of all dimensions is equal to , N is the normal distribution.
If the length of the vector d is small, a random vector (N (0, )) is generated and
used for the calculations instead. The mutation operator that uses Eqs. 8.18 and 8.19
is shown by m (d, c, ).
(8.20)
The parameters , c1 , and c2 are exactly the same as the ones in CCPSO, while r1t
and r2t are two random values rather than random matrices. Note that in this variant
of LPSO, we have used CCPSO model (defined in Eq. 8.12); however, any other type
of PSO can be used instead. If the values of ti and ti are guaranteed to be nonzero,
i
is always nonzero (these parameters are investigated later in this subsection).
vt+1
Thus, the stagnation issue is addressed, i.e., there is no stagnation
Also, as
anymore.
i , the condition vi || pi xi is violated,
the mutation m changes the direction of vt+1
t
t
t
which implies that the line search issue does not exist in this variant of LPSO. We
propose an adaptive approach to set the value of ti , which has been inspired by Bergh
and Engelbrecht (2002, 2010) with some modifications. In this adaptive approach,
the value of ti for a particle i at the time t is calculated by:
i
t+1
i
2
t i
0.5t
=
2 i
it
t
221
(8.21)
where sti (fti ) is the number of successive iterations at the current iteration t that the
personal best of the particle i has been (has not been) improved by at least impmin
percent; this value was set to 105 in all experiments. At each iteration, if the personal
best of the particle i was improved, sti is increased by one and fti is set to 0 and if
it was not improved, fti is increased by one and sti is set to 0. If sti was larger than
the constant s (set to 10 in all experiments), the value of ti is multiplied by 2. This
multiplication, which grows the value of ti , takes place to give the algorithm the
opportunity to sample further locations and improve faster. Also, if fti was larger
than fmin and smaller than fmax , the value of ti is reduced to enable the algorithm
to conduct local search around current solutions and improve them. However, if the
value of fti was even larger than fmax , the strategy of controlling ti is reversed and ti
starts to grow. The idea behind this is that if the current solution is not improved for
a large number of successive iterations, the exploitation has been done and no better
solutions can be found in the current region. Thus, it is better to start jumping out
from the current local optima to improve the probability of finding better solutions.
According to Eq. 8.21, the value of ti is increased by a low rate (every q iterations)
in this situation (when fti is very large) to prevent the algorithm from jumping with
big steps. The values of max and min are set to 1 and 1e10 , respectively. Also, the
values of fmin and s are set to 10 as it was proposed in Bergh and Engelbrecht (2010),
fmax and q are set to 200 and 50, and 0i is set to 1 for all particles. We propose to
set the value of ti to D1z where z is a constant real value. Our experiments show that
z = 1.5 has acceptable performance in a wide range of optimization problems. Thus,
we use it = D11.5 in all experiments.
As was mentioned earlier, stagnation and line search are the main reasons behind
the swarm size issue in PSO. As the stagnation and line search issues have been
solved in MLPSO, it is very likely that the swarm size issue has been addressed. To
test if the swarm size issue has been solved, we apply MLPSO, LPSO, and CCPSO to
some standard benchmark functions (taken from CEC2005 (Suganthan et al. 2005))
when both algorithms use 2 particles (n = 2). Table 8.1 shows the results.
Each algorithm was run 20 times for 1000D function evaluations (FE) for D = 10
and D = 30. The results have been compared based on the averages over 20 runs and
the Wilcoxon test (Wilcoxon 1945) (with a significance level of p = 0.05), which
is used to measure the significance of the differences. It is obvious from the table
that the proposed MLPSO has a significantly better performance in 8 cases out of all
10 in comparison with LPSO and CCPSO when the swarm size is small (n = 2) for
the 10-dimensional cases. Also, it is worse than CCPSO in only 2 cases, although the
worst performance is not significant based on the Wilcoxon test. Also, MLPSO was
significantly better than LPSO in all cases when D = 10. When D = 30, MLPSO is
222
Table 8.1 Comparison results between MLPSO and LPSO with small swarm size (n = 2)
Dimension 10
30
Algorithm
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
MLPSO
450LC
450LC
362588.8LC
59091.76L
6682.806LC
1492.037LC
172.326LC
119.746LC
244.852LC
167.377L
LPSO
30240.78
39143.76
1.54E+09
47408.93
26006.44
2.91E+10
1525.607
119.301
186.876
109.014
CCPSO
12259.54
14065.74
1.52E+08
22805.53
17362.1
6.7E+09
369.1611
119.553
233.761
193.809
MLPSO
450LC
445.696LC
4347140LC
622474.1LC
21284.27LC
2453.11LC
179.919LC
119.756LC
9.17035LC
442.103LC
LPSO
136525
717133.8
4.2E+09
682395.7
59100.9
1.55E+11
5673.608
118.796
249.8911
720.9524
CCPSO
87020.26
139933.9
1.86E+09
205955.2
39995.19
1.01E+11
4008.997
118.979
134.7559
465.4687
significantly better than CCPSO and LPSO in all cases. These results confirm that
the proposed method works better than LPSO and CCPSO when the swarm size is
small.
(8.22)
2
1)2 0.25, C3 = D
where C1 = D
(x 1.5)2 1, C2 = D
i=1
i=1 (xi +
i=1 (xi + 3)
D i
D
2
2
5
5
0.0625, C4 = i=1 (xi + 2) + 10 , and C5 = i=1 (xi 3.5) + 10 , C6 =
D
2
5
i=1 (2xi ) + 10 . The objective function (f (x) versus x) has been shown in
Fig. 8.3 in one/two dimensional case.
It is clear that the function has six optima (at x = 3, x = 2, x = 1, x = 0,
x = 1.5, and x = 3.5). We apply MLPSO and CCPSO to the six circles function
with two different topologies: nv2 and nv4. In this test we set the maximum number
of FEs to 3000D and D = {2, 5, 10, 15, 20, 25, 30, 40, and 50}. After each run,
g(x)
(a)
223
(b) 5
50
0.8
45
0.6
40
0.4
35
0.2
30
25
-0.2
-1
20
-0.4
-2
15
-0.6
-3
10
-0.8
-4
-1
-4
-5
-3
-2
-1
-5
Fig. 8.3 The six circles function in a one dimensional, b two dimensional spaces
we evaluated the personal bests of all particles to find how close they are to the
different local optima of the objective function. We consider a personal best of a
particle i (pit ) has located a local optimum if the mean square of error over all
dimensions of pit form that local optimum is less than 0.05. We set n = 20 for this
test. Figure 8.4 shows the average results over 20 runs.
The performance of MLPSO is inferior to CCPSO in both topologies when the
number of dimensions is small (two-dimensional problems). The reason is that when
MLPSO is used, most of the sub-swarms converge to the global optimum of the six
circles function (x = 1.5 in all dimensions) and, hence, the number of located local
optima drops. However, when the number of dimensions grows, MLPSO with both
topologies outperforms CCPSO in terms of the found number of local optima. Also,
the nv2 topology performs more effectively (in terms of locating local optima) than
the nv4 topology in MLPSO. The reason behind this phenomenon is that we have
MLPSO (n-v2)
CCPSO (n-v2)
MLPSO (n-v4)
CCPSO (n-v4)
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
10
15
20
25
30
40
50
Fig. 8.4 Comparison results of applying MLPSO and CCPSO to six circle function with nv2 and
nv4 topologies. The x axis is the number of dimensions and y axis is the average number of found
local optima
224
used 20 particles in all cases. Thus, the number of sub-swarms in the nv2 is greater
than the number of sub-swarms in the nv4. Hence, the number of located local optima
is less when the nv4 is used. In addition, the performance of MLPSO does not drop
when the number of dimensions grows.
Results presented in Fig. 8.4 confirm that MLPSO performs better than CCPSO
in locating different local optima. Note that this result was expected as MLPSO
outperforms CCPSO with small swarm size, hence, MLPSO with small sub-swarms
should outperform CCPSO with small sub-swarms. Also, the performance of MLPSO
does not drop when the number of dimensions grows.
8.4.2.1 EMLPSO
In ELCH, the equality and inequality constraints were combined and a function called
constraint violation function appeared. Also, a level of desired constraint violation
(called ) was considered as the level of feasibility. The value of was reduced
linearly to zero during the optimization process. ELCH is modified by considering
this fact that equalities can be replaced by inequalities (Eq. 8.2). Hence, in ELCH,
we can modify the constraint violation function as follows:
G(x) =
m
max{0, gi (x)}k
(8.23)
i=1
where gi (x) for i = 1, . . . , q is the same as Eq. 8.1, while gi (x) is defined as
gi (x) = |hi (x)| for i = q + 1, . . . , m. Note that in this case, x is a feasible
solution if G (x) = 0. ELCH technique that uses Eq. 8.23 is called MELCH throughout this chapter. We incorporate MELCH technique into MLPSO algorithm (this is
called EMLPSO) to enable the algorithm to deal with constraints. Also, as MELCH
combines all constraints into one function; locating different local optima of this
function corresponds to locating disjoint feasible regions. Note that G(x) = 0 is
essential to count x as a local optima, as G(x) > 0 does not correspond to a feasible solution, which is not desirable. We test the ability of EMLPSO with different
topologies to locate disjoint feasible solutions in the next subsection.
225
D
i=1
(8.24)
where the definition of C1 to C6 is the same as that mentioned in Eq. 8.22. It is clear
that the function has three disjoint feasible regions (x = 1, x = 1, and x = 3) in
which g (x) 0 (feasible regions). However, there are three trap regions (x = 2,
x = 0, and x = 2) where values of g (x) reduce rapidly to 105 . Because the value
of g (x) at these points is larger than 0, these solutions are not feasible (see Fig. 8.5).
We test the ability of EMLPSO with different topologies (gbest, ring, and nonoverlapping) to deal with this function. For the nonoverlapping topology, we test the algorithm with nv6, nv4, nv3, and nv2, i.e., 6, 4, 3, and 2 particles in each sub-swarm. In
this test we set the maximum number of function evaluations (FE) to 3000D/n and
D = 10 and D = 30. Also, we set n = 12 to ensure that the swarm size is divisible
by 2, 3, 4, and 6. Table 8.2 shows the average of the results over 100 runs. The row
satisfaction is the percentage of the runs where a feasible solution was found (e.g.,
EMLPSO with ring topology has found a feasible solution in 76 % of all runs). The
row No. of feasible regions (Avr) is the average number of feasible regions that was
located by the personal bests of the particles in the swarm on average over all runs
(e.g. EMLPSO with ring topology found 1.18 over all three existing feasible regions
on average). The row locating optimal region (%) indicates the percentage of the
runs where the algorithm has found a feasible solution in the optimal region (in this
example, the region around x = 1.5). Comparing the results, it is clear that EMLPSO
with nonoverlapping topology with 2 particles in each sub-swarm (nv2) has the best
performance in satisfying the constraints (100 %), locating different feasible regions
(a)
5
160
(b) 5
80
4
140
70
120
60
100
50
80
-1
60
-2
40
-1
30
-2
40
-3
-4
20
-4
-5
-5
-5
-5
20
-3
10
0
Fig. 8.5 The contour of the function introduced in Eq. 8.24, a the objective values, and b objective
values in the feasible space
226
Table 8.2 Comparison of different topologies in EMLPSO for solving COP defined in Eq. 8.24
where D = 10 and D = 30
D
Topology
Gbest
Ring
Nonoverlapping
nv6
nv4
nv3
nv2
10
30
Satisfaction (%)
No. of feasible regions (Avr)
Locating optimal region (%)
Satisfaction (%)
No. of feasible regions (Avr)
Locating optimal region (%)
58
1
23
61
1
24
76
1.17
26
77
1.18
27
78
1.27
28
77
1.26
31
95
1.4
41
88
1.48
42
96
1.65
53
98
1.6
50
100
2.06
58
100
2.14
73
(2.06 feasible regions in average over all 3 existing regions), and finding the optimal
region (58 % of runs). Note that the last two measures (average of feasible solutions and percentage of locating optimal solution) are interrelated since the ability of
the methods to find feasible regions improves the probability of finding the optimal
region. It is also clear that the results in 30-dimensional space confirm the results
of 10-dimensional space. Thus there is a better performance in locating different
feasible regions when there are several small sub-swarms and a better performance
in improving the final solutions when there are few large sub-swarms.
Table 8.3 The test functions used for the next experiments
Functions Equation
Boundaries
2
5.1x 2
g1 (x) = x2 4 21 + 5x1 6 +
10
10 8
cos (x1 ) + 9
12
g2 (x) = x2 + x11.2
2
2
Rastrigin1 g1 (x) = x1 + x2 + 20
20 (cos (2 x1 ) + cos (2 x2 ))
g2 (x) = x2 x13
Schwefel1 g1 (x) = x1 sin |x1 | +
x2 sin |x2 | + 125
1 2
g2 (x) = x2 16
x1 + 150
Branin1
5 x1 10,
0 x2 15
5 x1 5,
5 x2 5
150 x1 150,
150 x2 150
36
227
Table 8.4 Results of applying EMLPSO, CCPSO, and CC to three 2-dimensional COPs to locate
their feasible regions
Branin1
Rastrigin1
Schwefel1
EMLPSO
CCPSO+MELCH
CC
3/50
2.4/99
3/50
20.4/90
17.1/192
16/50
5.5/50
2.9/110
3/50
The table reports the averages of number of found feasible disjoint regions/needed FE over 20 runs
the constraints) to these problems. The PSO methods used nv2 topology with 50
particles, because CC method uses 50 initial solutions. The maximum number of FE
was also set to 3000*D. Table 8.4 shows the average results over 20 runs of each
method.
Figure 8.6 shows the feasible regions of all three functions and the personal bests
of the particles after finding the feasible regions.
Clearly, Branin1 function (Fig. 8.6a) contains 3 similar size disjoint feasible
regions fairly scattered over the search space. This makes the problem relatively
easier to solve for the stochastic methods (such as EMLPSO). Also, reported results
15
(a)
(b)
4
3
2
10
1
0
-1
5
-2
-3
-4
0
-5
-5
0
10
-5
-4
-3
-2
-1
(c) 150
100
50
0
-50
-100
-150
-150
-100
-50
50
100
150
Fig. 8.6 A particular run of EMLPSO to locate disjoint feasible regions of a Branin1, b Rasterigin1,
and c Schwefel1. The red areas are feasible regions/the gray areas are infeasible regions, and white
dots are the personal best of the particles
228
in Table 8.4 shows that the proposed EMLPSO was located all feasible regions for
Branin1 function.
Rastrigin1 (Fig. 8.6b) contains 36 disjoint feasible regions with many different
sizes. Some of these regions are very small which makes it harder to locate them. In
this test problem, the proposed EMLPSO has located 20.4 (in average) number of
feasible regions over all 36. Compared to other listed methods, EMLPSO has located
more number of regions in average.
Schwefel1 (Fig. 8.6c) function contains 6 disjoint feasible regions in the different
sizes. Two of these regions are hard to locate as they has been surrounded by two larger
feasible regions. In fact, the methods tend to move the solutions toward these larger
regions rather than the smaller ones in between. However, the proposed EMLPSO
could locate 5.5 regions over all 6 regions (in average) while the other methods, CC
and CCPSO+MELCH, have located 3 and 2.9 feasible regions in average.
229
References
Bonyadi MR, Michalewicz Z (2014) A locally convergent rotationally invariant particle swarm
optimization algorithm. Swarm Intell 8(3):159198
Bonyadi MR, Li X, Michalewicz Z (2013) A hybrid particle swarm with velocity mutation for
constraint optimization problems. In: Genetic and evolutionary computation conference. ACM,
pp 18
Bonyadi MR, Michalewicz Z, Li X (2014) An analysis of the velocity updating rule of the particle
swarm optimization algorithm. J Heuristics 20(4):417452
Brits R, Engelbrecht AP, Van den Bergh F (2002) A niching particle swarm optimizer. In: 4th AsiaPacific conference on simulated evolution and learning, vol 2. Orchid Country Club, Singapore,
pp 692696
Brits R, Engelbrecht AP, Van den Bergh F (2007) Locating multiple optima using particle swarm
optimization. Appl Math Comput 189(2):18591883
Clerc M, Kennedy J (2002) The particle swarmexplosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6(1):5873
Dantzig G (1998) Linear programming and extensions. Princeton University Press, Princeton
Engelbrecht AP, Masiye BS, Pampard G (2005) Niching ability of basic particle swarm optimization
algorithms. In: Swarm intelligence symposium. IEEE, pp 397400
Gilbert JC, Nocedal J (1992) Global convergence properties of conjugate gradient methods for
optimization. SIAM J Optim 2(1):2142
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. AddisonWesley Publishing Company, Reading
Hansen N (2006) The CMA evolution strategy: a comparing review. In: Towards a new evolutionary
computation. Springer, Berlin, pp 75102
Helwig S, Wanka R (2007) Particle swarm optimization in high-dimensional bounded search spaces.
In: Swarm intelligence symposium. IEEE, pp 198205
Jabr RA (2012) Solution to economic dispatching with disjoint feasible regions via semidefinite
programming. IEEE Trans Power Syst 27(1):572573
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: International conference on neural
networks, vol 4. IEEE, pp 19421948
Kennedy J, Mendes R (2002) Population structure and particle swarm performance. In: Congress
on evolutionary computation, vol 2. IEEE, pp 16711676
Lasdon L, Plummer JC (2008) Multistart algorithms for seeking feasibility. Comput Oper Res
35(5):13791393
Li XD (2010) Niching without niching parameters: particle swarm optimization using a ring topology. IEEE Trans Evol Comput 14(4):150169
Liang JJ, Zhigang S, Zhihui L (2010) Coevolutionary comprehensive learning particle swarm optimizer. In: Congress on evolutionary computation. IEEE, pp 18
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Paquet U, Engelbrecht AP (2007) Particle swarms for linearly constrained optimisation. Fundam
Inf 76(1):147170
Ricardo A, Prez-aguila R (2004) General n-dimensional rotations
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: World congress on computational
intelligence. IEEE, pp 6973
Smith L, Chinneck J, Aitken V (2013) Constraint consensus concentration for identifying disjoint
feasible regions in nonlinear programmes. Optim Methods Softw 28(2):339363
Spears WM, Green DT, Spears DF (2010) Biases in particle swarm optimization. Int J Swarm Intell
Res 1(2):3457
Suganthan PN, Hansen N, Liang JJ, Deb K, Chen YP, Auger A, Tiwari S (2005) Problem definitions
and evaluation criteria for the CEC 2005 special session on real-parameter optimization. KanGAL
Report
230
Chapter 9
231
232
R. Mallipeddi et al.
being a general concept can be realized with any EA framework. In this chapter,
ECHT is combined with an improved differential evolution (DE) algorithm referred
to as EPSDE. EPSDE is an improved of DE version based on ensemble framework.
The performance of the proposed architecture is compared with the state-of-the-art
algorithms.
Keywords Constraint handling
problems
9.1 Introduction
Optimization is an intrinsic part of life and of human activity. For example,
manufacturers seek maximum efficiency in the design of their production processes,
investors aim at creating portfolios that avoid high risk while yielding a good return,
traffic planners need to decide on the level and ways of routing traffic to minimize
congestion, etc.
Classical optimization techniques make use of differential calculus, where it is
assumed that the function is differentiable twice with respect to the design variables,
and that the derivatives are continuous in locating the optimum solution. Thus, classical methods have limited scope in practical real-world applications as objective
functions are characterized by chaotic disturbances, randomness, and complex nonlinear dynamics and may not always be continuous and/or differentiable. Recently,
population-based stochastic algorithms such as evolutionary algorithms (EAs) are
well known for their ability to handle non linear and complex optimization problems. The primary advantage of EAs over other numerical methods is that they just
require the objective function values, while properties such as differentiability and
continuity are not necessary (Anile et al. 2005).
Many optimization problems in science and engineering involve constraints.
The presence of constraints reduces the feasible region and complicates the search
process. In addition, when solving constrained optimization problems, solution candidates that satisfy all the constraints are feasible individuals while individuals that
fail to satisfy any of the constraints are infeasible individuals. To solve constrained
optimization problems, EAs require additional mechanisms referred to as constraint
handling techniques. One of the major issues in constraint optimization using EAs
is how to deal with infeasible individuals throughout the search process. One way
to handle is to completely disregard infeasible individuals and continue the search
process with feasible individuals only. This approach may be ineffective as EAs are
probabilistic search methods and potential information present in infeasible individuals can be wasted. If the search space is discontinuous, then the EA can also
be trapped in one of the local minima. Therefore, different techniques have been
developed to exploit the information in infeasible individuals. In the literature, several constraint handling techniques are proposed to be used with the EAs (Coello
233
Coello 2002). Michalewicz and Schoenauer (1996) grouped the methods for handling
constraints within EAs into four categories: preserving feasibility of solutions (Koziel
and Michalewicz 1999), penalty functions, make a separation between feasible and
infeasible solutions, and hybrid methods. A constrained optimization problem can
also be formulated as a multi-objective (Wang et al. 2007) problem, but it is computationally intensive due to non-domination sorting.
According to the No Free Lunch theorem (Wolpert and Macready 1997), no single
state-of-the-art constraint handling technique can outperform all others on every
problem. Hence, solving a particular constrained problem requires numerous trialand-error runs to choose a suitable constraint handling technique and to fine-tune the
associated parameters. This approach clearly suffers from unrealistic computational
requirements in particular if the objective function is computationally expensive (Jin
2005) or solutions are required in real-time. Moreover, depending on several factors
such as the ratio between feasible search space and the whole search space, multimodality of the problem, the chosen EA and global exploration/local exploitation
stages of the search process, different constraint handling methods can be effective
during different stages of the search process.
In pattern recognition and machine learning (Rokach 2009; Zhang 2000), ensemble methodology has been successfully employed. Ensemble integrates different
methods available to perform the same task into a single method so that the reliability can be improved. For example, in classification, an ensemble model formed by
integrating multiple classifiers reduces the variance, or instability caused by single
methods and improves the classification efficiency or prediction accuracy.
In this chapter, an ensemble of constraint handling techniques (ECHT) with four
constraint handling techniques (Coello Coello 2002; Huang et al. 2006; Runarsson
and Yao 2000; Tessema and Yen 2006) is presented as an efficient alternative to
the trial-and-error-based search for the best constraint handling technique with its
best parameters for a given problem. In ECHT, each constraint handling technique
has its own population and each function call is efficiently utilized by each of these
populations. Ensemble being a general concept can be realized with any EA framework. In this chapter, we integrate ECHT with an improved version of DE algorithm
referred to as EPSDE. EPSDE is a version of DE algorithm which is based on the concept of ensemble (Mallipeddi et al. 2011). In EPSDE, a pool of distinct mutation and
crossover strategies along with a pool of control parameters associated with DE algorithm coexist throughout the evolution process and competes to produce offspring.
Experimental results show that the performance of ECHT-EPSDE is better than each
single constraint handling method used to form the ensemble and competitive to the
state-of-the-art algorithms.
234
R. Mallipeddi et al.
subject to:
gi (X ) 0,
h j (X ) = 0,
(9.1)
i = 1, . . . , p
j = p + 1, . . . , m
Here f need not be continuous but must be bounded. S is the search space. p
and (m p) are the number of inequality and equality constraints respectively. The
inequality constraints that satisfy gi (X ) = 0 at the global optimum solution are
called active constraints. All equality constraints are active constraints. The equality
constraints can be transformed into inequality form and can be combined with other
inequality constraints as
G i (X ) =
max{gi (X ), 0}
max{| h i (X ) | , 0}
i = 1, . . . p
i = p + 1, . . . , m
(9.2)
where is a tolerance parameter for the equality constraints. An adaptive setting of the
tolerance parameter, which is originally proposed in (Hamida and Schoenauer 2002)
and used in Mezura-Montes and Coello Coello (2003), Mezura-Montes and Coello
Coello (2005),Wang et al. (2008) is adopted in our work with some modifications.
Therefore, the objective is to minimize the fitness function f (X ) such that the optimal
solution obtained satisfies all the inequality constraints G i (X ). The overall constraint
violation for an infeasible individual is a weighted mean of all the constraints, which
is expressed as
m
wi (G i (X ))
m
(X ) = i=1
(9.3)
i=1 wi
where wi (=1/G maxi ) is a weight parameter, G maxi is the maximum violation of
constraint G i (X ) obtained so far. Here, wi is set as 1/G maxi which varies during
the evolution in order to balance the contribution of every constraint in the problem
irrespective of their differing numerical ranges.
The search process for finding the feasible global optimum in a constrained problem can be divided in to three phases (Wang et al. 2008) depending on the number
of feasible solutions present in the combined parent population and its offspring
population as (a) Phase 1: No feasible solution, (b) Phase 2: At least one feasible
solution, and (c) Phase 3: Combined offspring-parent population has more feasible
solutions than the size of next generation parent population. Different constraint
handling techniques perform differently during each of these three phases.
235
236
R. Mallipeddi et al.
feasible individuals, then infeasible individuals with high fitness values will have
small penalties added to their fitness values. These two penalties allow the algorithm
to switch between finding more feasible solutions and searching for the optimum
solution at any time during the search process. This algorithm requires no parameter
tuning. The final fitness value based on which the population members are ranked is
given as F(X ) = d(X ) + p(X ), where d(X ) is the distance value and p(X ) is the
penalty value. The distance value is computed as follows:
d(X ) =
(X ),
f (X )2 + (X )2 ,
if r f = 0
otherwise
(9.4)
where r f =
defined in
minimum values of the objective function f (X ) in the current combined population.
The penalty value is defined as
p(X ) = (1 r f )M(X ) + r f N (X )
where
M(X ) =
N (X ) =
0
f (X ),
(9.5)
if r f = 0
otherwise
(9.6)
if X is a feasible individual
if X is an infeasible individual
(9.7)
0,
(X ),
Therefore, in Farmani and Wright (2003), Tessema and Yen (2006), the selection
of individuals in all the three phases is based on a value determined by the overall
constraint violation and objective values. Thus, there is a chance for an individual
with lower overall constraint violation and higher fitness to get selected over a feasible
individual with lower fitness even in Phase 3, where there is sufficient number of
feasible solutions to form the parent population using only feasible solutions.
237
(0) = (X )
(k) =
(0) 1
0,
G
Tc
cp
(9.8)
0 < G < Tc
G Tc
(9.9)
where X is the top -th individual and = (0.05 N P). The recommended
parameter ranges are (Takahama and Sakai 2006):Tc [0.1Tmax , 0.8Tmax ] and cp
[2, 10].
The selection of individuals in the three phases of evolution by using the
-constraint technique is similar to the SF, but in the EC, a solution is regarded
as feasible if its overall constraint violation is lower than (G).
238
R. Mallipeddi et al.
239
lose diversity and the search ability of ECHT may deteriorate. Thus, the performance
of ECHT can be improved by selecting constraint handling methods with diverse and
competitive nature. The general framework of the ensemble algorithm is illustrated
in the flowchart shown in Fig. 9.1.
As ECHT employs different constraint handling methods each having its own
population, it can be compared with hybrid methods like memetic algorithms
(Ishibuchi et al. 2003; Ong and Keane 2004; Ong et al. 2006). Some methods like
island models (Skolicki and De Jong 2007) sometimes called Migration model or
Coarse Grained model, also employ subpopulations in their approach. The main
difference between the ECHT and the island model is that in island model, subpopulations in different islands evolve separately with occasional communication
between them to maintain diversity while in ECHT the communication between different populations is by sharing of all offspring and thus facilitating efficient usage
of each function call.
9.3.1 ECHT-EPSDE
In this section, an ECHT with EPSDE as the basic search algorithm (ECHT-EPSDE)
is demonstrated. ECHT-EPSDE uses the four constraint handling techniques discussed in Sects. 9.2.19.2.4. Each constraint handling technique has its own population and parameters. Each population corresponding to a constraint handling method
produces its offspring using the associated strategies and parameters of the EPSDE.
The offspring produced are evaluated. In ECHT-EPSDE, the parent population corresponding to a particular constraint handling method not only competes with its own
offspring population but also with offspring population of the other three constraint
handling methods. In DE, since mutation and crossover are employed to produce
an offspring, among the parent and offspring population of the same constraint handling technique DEs one-to-one selection is employed. But when the parents of one
constraint handling method competes with offspring population of the other constraint handling method then corresponding to every offspring a parent is randomly
selected for competition. Hence, in ECHT-EPSDE every function call is utilized by
every population associated with each constraint handling technique in the ensemble. Due to this, an offspring produced by a particular constraint handling method
may be rejected by its own population, but could be accepted by the populations of
other constraint handling methods. Therefore, the ensemble transforms the burden of
choosing a particular constraint handling technique and tuning the associated parameter values for a particular problem into an advantage.
The ECHT-EPSDE can be summarized as
STEP 1: Each of the four constraint handling techniques (SF, SP, EC and SR
in Sects. 9.2.19.2.4) has its own population of NP individuals each with dimension D (POPk , k = 1, . . . , 4) and parameter/strategy pools (P Sk , k = 1, . . . , 4)
240
R. Mallipeddi et al.
STEP 1:
POP1
PAR1
POP2
PAR2
POP3
PAR3
POP4
PAR4
NO
nfeval
Max_FEs
STOP
YES
STEP 4:
OFFS1
OFFS2
OFFS3
OFFS4
STEP 6:
POP1
OFFS1
OFFS2
OFFS3
OFFS4
STEP 7:
POP1
POP2
OFFS1
OFFS2
OFFS3
OFFS4
POP3
OFFS1
OFFS2
OFFS3
OFFS4
POP4
OFFS1
OFFS2
OFFS3
OFFS4
POP3
POP4
Fig. 9.1 Flowchart of ECHT (CH: constraint handling method, POP: population, PAR: parameters,
OFF: offspring, Max_FEs: maximum number of function evaluations)
241
initialized according to the EPSDE rules and the corresponding constraint handling
method (C Hk , k = 1, . . . , 4). Set the generation counter G = 0.
STEP 2: Evaluate the objective/constraint function values and the overall constraint
violation for each individual X ik ,i {1, . . . , N P} of every population (POPk , k =
1, . . . , 4) using Eqs. (9.29.3).
STEP 3: The parameter values of constraint handling methods are updated according
to Sect. 9.2.
STEP 4: Each parent population (POPk , k = 1, . . . , 4) produces offspring population (OFFS k , k = 1, . . . , 4) by mutation and crossover (Takahama and Sakai
2006).
STEP 5: Compute the objective/constraint function values and the overall constraint
violation of each offspring X i k i {1, . . . , N P}. Each offspring retains the objective and constraint function values separately, i.e., each offspring is evaluated only
once.
STEP 6: Each parent population POPk , k = 1, . . . , 4 is combined with offspring
produced by it and the offspring produced by all other populations corresponding to
different constraint handling techniques as in STEP 6 in Fig. 9.1. The four different
groups are:
Group 1: (POP1 , OFFS k , k = 1, . . . , 4), Group 2: (POP2 ,OFFS k , k = 1, . . . , 4),
Group 3: (POP3 , OFFS k , k = 1, . . . , 4) and Group 4: (POP4 ,OFFS k , k =
1, . . . , 4).
STEP 7: In selection step, parent populations POPk , k = 1, . . . , 4 for the next
generation are selected from Groups 1, 2, 3, and 4 respectively. In a Group (say
Group 1), since OFF1 is produced by POP1 by mutation and crossover, DEs selection
based on competition between parent and its offspring is employed when POP1
competes with OFF1 . But when POP1 competes with OFF2 or OFF3 or OFF4 ,
produced by other populations, each member in POP1 competes with a randomly
selected offspring from OFF2 or OFF3 or OFF4 .
STEP 8: Stop if termination criterion is met. Else, G = G + 1 and go to STEP 3.
242
R. Mallipeddi et al.
Table 9.1 Function values achieved for FES = 2 105 for 10D problems
C01
C02
C03
C04
C05
246.8502
246.7401
240.4916
0, 0, 0
0
245.7474
2.2307
C07
Best
1.000E05
Median 1.000E05
Worst 1.000E05
c
0, 0, 0
0
Mean
1.000E05
Std
2.9292E05
C13
Best
0.0036
Median 0.0036
Worst 0.0036
c
0, 0, 0
0
Mean
0.0036
Std
7.7800E09
Best
Median
Worst
c
Mean
Std
580.7301
602.0537
608.4520
0, 0, 0
0
600.5612
7.2523
C08
20.0780
19.9875
18.9875
0, 0, 0
0
19.3492
0.3452
C14
0.7473
0.7473
0.7406
0, 0, 0
0
0.7470
0.0014
0.0034
0.0034
0.0034
0, 0, 0
0
0.0034
8.5413E18
C09
68.4294
68.4294
61.6487
0, 0, 0
0
67.4211
1.8913
C15
1417.2374
1417.2374
1417.2374
0, 0, 0
0
1417.2374
0
420.9687
420.9687
420.9687
0, 0, 0
0
420.9687
4.6711E07
C10
2.2777
2.2777
2.2612
0, 0, 0
0
2.2761
5.2000E03
C16
325.4888
0.1992
0.1992
0, 0, 0
0
75.2591
122.3254
0
0
0
0, 0, 0
0
0
0
C11
2.2800E+02
9.9040E+02
1.5013E+03
0, 0, 0
0
1.0356E+03
1.0344E+03
C17
2960.9139
2960.9139
2960.9139
0, 0, 0
0
2960.9139
0
C06
2.4983E+01
7.7043E+01
9.2743E+04
0, 0, 0
0
9.7245E+03
2.9188E+04
C12
0
0
0
0, 0, 0
0
0
0
C18
0
0
0
0, 0, 0
0
0
0
243
The initial (0) is selected as the median of equality constraint violations over the
entire initial population. The value of is selected in such a way that it causes to
reach a value of E-04 at around 600 generations, after which the value of is fixed
at E-04.
The experimental results (best, median, mean, worst, and standard deviation
values) are presented in Tables 9.1 and 9.2. c are the number of violated constraints
at the median solution: the sequence of three numbers indicates the number of violations (including inequality and equalities) by more than 1.0, more than 0.01 and
more than 0.0001 respectively. is the mean value of the violations of all constraints
at the median solution. The ranking of the algorithm in comparison with the stateof-the-art algorithms is shown in Tables 9.3 and 9.4. The overall and average ranking
for each of the algorithms is presented in Table 9.5.
From the results it can be observed that the best three algorithms are DEg,
ECHT-EPSDE and ECHT-DE with average ranks of 3.08, 3.58, and 4.67. In other
words, the performance of ECHT-EPSDE is better than the ECHT-DE variant.
Table 9.2 Function values achieved for FES = 6 105 for 30D problems
C01
C02
C03
C04
C05
Best
Median
Worst
c
Mean
Std
500
500
501
0, 1, 1
1.3250E02
485.3521
76.4931
C07
Best
6.2793E04
Median 7.2345E04
Worst 8.3291E04
c
0, 0, 0
0
Mean
7.8321E04
Std
9.5612E05
C13
Best
0.0039
Median 0.0039
Worst 0.0039
c
0, 0, 0
0
Mean
0.0039
Std
1.1166E05
1962.5740
2040.3251
2051.3521
0, 0, 0
0
2021.2371
24.5128
C08
20.2688
19.8770
11.1774
0, 0, 0
0
18.5035
2.7152
C14
0.8217
0.8012
0.7557
0, 0, 0
0
0.7994
0.0179
0.0005
0.0001
0.0022
0, 0, 1
3.1000E03
0.0007
0.025
C09
67.4137
64.4208
62.6694
0, 0, 0
0
64.3612
1.2845
C15
2344.6224
2933.9001
3310.3263
0, 0, 0
0
2887.4795
556.8420
420.9832
439.1865
500
1, 1, 1
2.9637E+03
450.6785
28.4321
C10
1.2574
2.3390
4.1011
0, 0, 0
0
2.4532
0.9931
C16
0.1993
0.1993
11096.2789
0, 0, 0
0
79.5125
255.1325
28.6735
29.6333
87.3162
0, 0, 0
0
37.2923
15.1524
C11
4.3051E+03
4.3051E+03
4.3053E+03
0, 0, 0
0
4.3051E+03
6.7521E07
C17
3.1120
9320.5713
21577.5875
0, 0, 1
7.6318E04
12705.5579
6455.6924
C06
2.4983E+01
2.49832E+01
2.49832E+01
0, 0, 0
0
2.49832E+01
3.5147E06
C12
6514.7354
12470.9657
10670.6636
0, 0, 1
1.7311E04
12229.2897
2178.3588
C18
4.2090E09
2.400E07
4.1800E05
0, 0, 0
0
2.1100E06
8.3000E06
244
R. Mallipeddi et al.
7
11
9
10
1
12
8
5
1
13
6
1
4
C10
5
6
7
10
2
4
8
12
13
9
11
3
1
13
7
9
8
6
5
4
10
12
11
1
3
1
C11
3
8
9
12
1
6
11
7
13
10
2
5
4
C03
C04
C05
C06
C07
C08
C09
9
11
13
12
1
1
6
7
10
8
5
1
1
C12
5
10
11
5
1
9
5
2
13
12
5
5
2
1
9
8
11
5
10
7
1
13
12
6
1
1
C13
4
6
7
8
2
12
9
2
13
11
5
10
1
10
6
11
8
1
1
1
12
13
7
9
5
1
C14
4
5
7
12
1
2
3
11
10
13
6
8
9
4
10
11
8
1
1
1
12
13
7
9
6
5
C15
8
5
7
11
2
1
4
13
9
12
6
10
3
1
10
11
12
1
8
9
1
1
13
5
6
7
C16
9
1
10
7
8
6
2
12
13
11
3
5
4
2
11
5
12
9
10
1
8
7
13
6
3
3
C17
10
7
8
9
5
2
6
12
13
11
1
4
3
4
6
7
9
1
5
8
12
13
10
11
3
2
C18
10
1
8
9
1
6
7
12
13
11
1
1
1
5
12
7
8
2
11
10
1
3
13
4
6
9
C10
2
7
8
11
3
1
9
13
12
10
6
5
4
9
7
8
10
3
2
1
11
12
13
6
5
4
C11
3
8
7
10
2
6
11
1
13
9
12
5
3
245
C03
C04
C05
C06
C07
C08
C09
3
8
12
11
2
1
10
6
9
7
13
4
5
C12
1
10
7
8
11
2
3
9
13
4
12
5
6
4
8
7
10
5
9
6
3
13
11
12
1
1
C13
1
10
9
6
4
8
11
3
13
12
2
7
5
9
5
6
7
1
10
2
12
13
8
11
3
4
C14
4
5
10
8
1
3
5
11
13
12
2
9
7
3
8
9
7
1
10
4
11
13
6
12
5
2
C15
6
4
7
10
2
1
3
12
11
13
5
9
8
1
7
12
13
4
6
9
1
11
8
5
10
3
C16
8
7
10
9
1
6
1
11
13
12
5
1
1
7
10
13
9
2
3
8
1
11
12
4
5
6
C17
10
6
9
8
7
5
4
13
11
12
1
2
3
2
9
8
10
3
13
7
1
12
11
6
5
4
C18
10
5
7
9
8
4
6
12
11
13
1
1
1
246
Table 9.5 Overall ranking of the algorithms
Algorithm
JDEsoco (Brest et el. 2010)
DE-VPS (Tasgetiren et al. 2010)
RGA (Saha et al. 2010)
E-ABC (Mezura-Montes and Velez-Koeppel 2010)
DEg (Takahama and Sakai 2010)
DCDE (Zhihui et al. 2010)
Co-CLPSO (Liang et al. 2010)
CDEb6e6r (Tvrdik and Polakova 2010)
sp-MODE (Reynoso-Meza et al. 2010)
MTS (Lin-Yu and Chun 2010)
IEMA (Singh et al. 2010)
ECHT-DE (Mallipeddi and Suganthan 2010a)
ECHT-EPSDE
R. Mallipeddi et al.
Ranking
10D
30D
Overall
Average
109
130
158
173
49
101
100
151
193
194
98
80
53
197
266
314
337
111
202
210
283
400
380
217
168
129
5.47
7.39
8.72
9.36
3.08
5.61
5.83
7.86
11.11
10.56
6.03
4.67
3.58
88
136
156
164
62
101
110
132
207
186
119
88
76
9.4 Conclusions
In this chapter, a novel constraint handling procedure called ECHT was presented
with four different constraint handling methods where each constraint handling
method has its own population. In ECHT every function call is effectively used
by all four populations and the offspring population produced by the best suited constraint handling technique dominates the others at a particular stage of the optimization process. Furthermore, an offspring produced by a particular constraint handling
method may be rejected by its own population, but could be accepted by the populations associated with other constraint handling methods. No Free Lunch (NFL)
theorem implies that irrespective of the exhaustiveness of parameter tuning, no single constraint handling method can be the best for every constrained optimization
problem. Hence, according to the NFL, the ECHT has the potential to perform well
over diverse problems over any single constraint handling method. In this chapter, we
evaluated the performance of ECHT using EPSDE algorithm. Experimental results
showed that the ECHT-EPSDE outperforms the state-of-the-art methods on CEC
2010 problems.
247
References
Anile AM, Cutello V, Nicosia G, Rascuna R, Spinella S (2005) Comparison among evolutionary
algorithms and classical optimization methods for circuit design problems. Paper presented at
the IEEE conference on evolutionary computation, Vancouver, Canada
Brest J, Boskovic B, Zumer V (2010) An improved self-adaptive differential evolution algorithm in
single objective constrained real-parameter optimization. Paper presented at the IEEE congress
on evolutionary computation
Coello Coello CA (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11
12):12451287
Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods
Appl Mech Eng 186(24):311338
Farmani R, Wright JA (2003) Self-adaptive fitness formulation for constrained optimization. IEEE
Trans Evol Comput 7(5):445455
Hamida SB, Schoenauer M (2002) ASCHEA: New results using adaptive segregational constraint
handling. Paper presented at the proceedings of congress evolutionary computation
Huang VL, Qin AK, Suganthan PN (2006) Self-adaptive differential evolution algorithm for constrained real-parameter optimization. Paper presented at the IEEE congress on evolutionary computation, Vancouver, Canada
Ishibuchi H, Yoshida T, Murata T (2003) Balance between genetic search and local search in
memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans Evol Comput 7(2):204223
Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft
Comput 9(1):312
Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):1944
Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello Coello CA, Deb K
(2006) Problem definitions and evaluation criteria for the CEC 2006 special session on constrained
real-parameter optimization: Technical Report, Nanyang Technological University, Singapore
Available from http://www3.ntu.edu.sg/home/EPNSugan/
Liang JJ, Shang Z, Li Z (2010) Coevolutionary comprehensive learning particle swarm optimizer.
Paper presented at the IEEE congress on evolutionary computation
Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010
competition on constrained real-parameter optimization, Nanyang Technological University, Singapore
Lin-Yu T, Chun C (2010) Multiple trajectory search for single objective constrained real-parameter
optimization problems. Paper presented at the IEEE congress on evolutionary computation
Mallipeddi R, Suganthan PN (2010a) Differential evolution with ensemble of constraint handling
techniques for solving CEC 2010 benchmark problems. Paper presented at the IEEE congress on
evolutionary computation
Mallipeddi R, Suganthan PN (2010b) Ensemble of constraint handling techniques. IEEE Trans Evol
Comput 14(4):561579
Mallipeddi R, Suganthan PN, Pan QK, Tasgetiren MF (2011) Differential evolution algorithm with
ensemble of parameters and mutation strategies. Appl Soft Comput 11(21):6791696. doi: http://
dx.doi.org/10.1016/j.asoc.2010.04.024
Mezura-Montes E, Coello Coello CA (2003) Adding diversity mechanism to a simple evolution strategy to solve constrained optimization problems. Paper presented at the proceedings of congress
on evolutionary computation
Mezura-Montes E, Coello Coello CA (2005) A simple multimembered evolution strategy to solve
constrained optimization problems. IEEE Trans Evol Comput 9(1):117
Mezura-Montes E, Velez-Koeppel RE (2010) Elitist artificial bee colony for constrained realparameter optimization. Paper presented at the IEEE congress on evolutionary computation
248
R. Mallipeddi et al.
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132
Ong YS, Keane AJ (2004) Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol
Comput 8(2):99110
Ong YS, Lim M-H, Zhu N, Wong K-W (2006) Classification of adaptive memetic algorithms: a
comparative study. IEEE Trans Syst, Man, Cybern 36(1):141152
Powell D, Skolnick M (1993) Using genetic algorithms in engineering design optimization with
non-linear constraints. Paper presented at the proceedings of fifth international conference on
genetic algorithms, San Mateo,California
Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation
for global numerical optimization. IEEE Trans Evol Comput 13(2):398417
Reynoso-Meza G, Blasco X, Sanchis J, Martinez M (2010) Multiobjective optimization algorithm
for solving constrained single objective problems. Paper presented at the IEEE congress on
evolutionary computation
Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review
and annotated bibliography. Comput Stat Data Anal 53:40464072
Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput 4(3):284294
Runarsson TP, Yao X (2005) Search biases in constrained evolutionary optimization. IEEE Trans
Syst, Man, Cybern 35(2):233243
Saha A, Datta R, Deb K (2010) Hybrid gradient projection based genetic algorithms for constrained
optimization. Paper presented at the IEEE congress on evolutionary computation
Singh HK, Ray T, Smith W (2010) Performance of infeasibility empowered memetic algorithm
for CEC 2010 constrained optimization problems. Paper presented at the IEEE congress on
evolutionary computation
Skolicki Z, De Jong K (2007) The importance of a two-level perspective for Island model design.
Paper presented at the IEEE congress on evolutionary computation
Takahama T, Sakai S (2006) Constrained Optimization by the constrained differential evolution with
gradient-based mutation and feasible elites. Paper presented at the IEEE congress on evolutionary
computation, Sheraton Vancouver wall centre hotel, Vancouver, BC, Canada
Takahama T, Sakai S (2010) Constrained optimization by the -constrained differential evolution
with an archive and gradient-based mutation. Paper presented at the IEEE congress on evolutionary computation
Tasgetiren MF, Suganthan PN, Quan-ke P, Mallipeddi R, Sarman S (2010) An ensemble of differential evolution algorithms for constrained function optimization. Paper presented at the IEEE
congress on evolutionary computation
Tessema B, Yen GG (2006) A Self adaptive penalty function based algorithm for constrained
optimization. Paper presented at the IEEE congress on evolutionary computation
Tvrdik J, Polakova, R (2010) Competitive differential evolution for constrained problems. Paper
presented at the IEEE congress on evolutionary computation
Wang Y, Cai Z, Guo G, Zhou Y (2007) Multiobjective optimization and hybrid evolutionary algorithm to solve constrained optimization problems. IEEE Trans Syst, Man, Cybern 37(3):560575
Wang Y, Cai Z, Zhou Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary
optimization. IEEE Trans Evol Comput 12(1):8092
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol
Comput 1(1):6782
Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst, Man, Cybern Part
CAppl Rev 30(4):451462
Zhihui L, Liang JJ, Xi H, Zhigang S (2010) Differential evolution with dynamic constraint-handling
mechanism. Paper presented at the IEEE congress on evolutionary computation
Chapter 10
249
250
10.1 Introduction
Most real-world optimization problems involve constraints mainly due to physical
limitations or functional requirements. A constraint can be of equality type or of
inequality type, but all constraints must be satisfied for a solution to be called feasible. Most often in practice, constraints are of inequality type and in some cases
an equality constraint can be suitably approximated as an inequality constraint. In
some situations, a transformation of an equality constraint to a suitable inequality
constraint does not change the optimal solution. Thus, in most constrained optimization studies, researchers are interested in devising efficient algorithms for handling
inequality constraints.
Traditionally, constrained problems with inequality constraints are solved by using
a penalty function approach in which a penalty term proportional to the extent of
constraint violation is added to the objective function to define a penalized function.
Since constraint violation is included, the penalized function can be treated as an
unconstrained objective, which then can be optimized using an unconstrained optimization technique. A nice aspect of this approach is that it does not care about
the structure of the constraint functions (linear or nonlinear, convex or nonconvex).
However, as it turns out, the proportionality term (which is more commonly known
as the penalty parameter) plays a significant role in the working of the penalty
function approach. In essence, the penalty parameter acts as a balancer between
objective value and the overall constraint violation value. If too small a penalty
parameter is chosen, constraints are not emphasized enough, thereby causing the
algorithm to lead to an infeasible solution. On the other hand, if too large a penalty
parameter is chosen, the objective function is not emphasized enough and the problem behaves like a constraint satisfaction problem, thereby leading the algorithm to
find an arbitrary feasible solution. Classical optimization researchers tried to make
a good balance between the two tasks (constraint satisfaction and convergence to
the optimum) by trial-and-error means in the beginning and finally resorting to a
sequential penalty function approach. In the sequential approach, a small penalty
parameter is chosen at first and the corresponding penalized function is optimized.
Since the solution is likely to be an infeasible solution, a larger penalty parameter is
chosen next and the corresponding penalized function is optimized starting from the
obtained solution of the previous iteration. This process is continued till no further
improvement in the solution is obtained. Although this method, in principle, seems
to eliminate the difficulties of choosing an appropriate penalty parameter by trialand-error or other ad hoc schemes, the sequential penalty function method is found
to not work well on problems having (i) a large number of constraints, (ii) a number
local or global optima, and (iii) different scaling of constraint functions. Moreover,
the performance of the algorithm depends on the choice of initial penalty parameter
value and how the penalty parameter values are increased from one iteration to the
next.
251
252
Minimize f (x),
subject to gj (x) 0, j = 1, . . . , J,
hk (x) = 0, k = 1, . . . , K,
xil xi xiu , i = 1, . . . , n.
(10.1)
In the above nonlinear programming (NLP) problem, there are n variables, J greaterthan-equal-to type constraints, and K equality constraints. The function f (x) is the
objective function, gj (x) is the jth inequality constraint, and hk (x) is the kth equality
constraint. The ith variable varies in the range [xil , xiu ]. The conventional way to deal
with equality constraints is by converting it into an appropriate inequality constraint:
gJ+k (x) = |k hk (x)| 0, with a small given value of k .
The penalty function approach is a popular approach used with classical and early
evolutionary approaches. In this approach, an amount proportional to the constraint
violation of a solution is added to the objective function value to form the penalized
function value, as follows:
P(x, R) = f (x) +
J
j=1
Rj gj (x) +
K
Rk |hk (x)| .
(10.2)
k=1
The term gj (x) is zero if gj (x) 0 and is gj (x), otherwise. The parameter Rj
is the penalty parameter associated with inequality constraints and Rk is the penalty
parameter associated with equality constraints. The penalty function approach has
the following features:
1. The optimum value of the penalized function P() largely depends on the penalty
parameters Rj and Rk . Users generally attempt with different values of Rj and Rk
to find what value would push the search toward the feasible region. This requires
extensive experimentation to find a reasonable approximation for the solution of
problem given in Eq. (10.1).
2. The addition of the penalty term makes a distortion of the penalized function
from the given objective function. For small values of the penalty parameter,
the distortion is small, but the optimal solution of P() may happen to lie in
the infeasible region. By contrast, if a large Rj and Rk is used, any infeasible
solution has a large penalty, thereby causing any feasible solution to be projected
as an exceedingly better solution than any infeasible solution. The difference
between two feasible solutions gets overshadowed by the difference between an
feasible and an infeasible solution. This often leads the algorithm to converge to
an arbitrary feasible solution. Moreover, the distortion may be so severe that in
253
the presence of two or more constraints, P() may have artificial locally optimal
solutions.
To overcome these difficulties, classical penalty function approach works in a
sequence of solving a number of penalized functions, where in every sequence the
penalty parameters are increased in steps and the current sequence of optimization
begins from the optimized solution found in the previous sequence. However, the
sequential penalty function approach has shown its weakness in (i) handling multimodal objective functions having a number of local optima, (ii) handling a large
number of constraints, particularly due to the increased chance of having artificial
local optimum where the procedure can get stuck to, and (iii) using the numerical
gradient-based approaches, due to the inherent numerical error which is caused in
taking one feasible and one infeasible solution in the numerical gradient computation.
Let us consider a single-variable constrained problem to illustrate some of these
difficulties:
g2 (x) x 0.
Figure 10.1 shows the objective function f (x) in x [0, 6.5] in which all solutions
satisfying x > 1 are infeasible.
The constrained minimum is the point H with x = 1. Due to multimodalities
associated with the objective function, the first iteration of the sequential penalty
function method (with R = 0) may find the global minimum (A) of the associated
penalized function P(x, 0). In the next sequence, if R is increased to one and the
resulting P(x, 1) is minimized starting from A, a solution close to A will be achieved,
P(x,R)
R=20
10
1
H
G
5
2
E
D
C
f(x)
A
x
Fig. 10.1 Penalized function for different values of R for the problem given in Eq. (10.3)
254
255
genetic algorithms. Coit et al. (1996) proposed a general adaptive penalty technique
which uses a feedback obtained during the search along with a dynamic distance
metric. Another study proposed adaptation of penalty parameter using co-evolution
(Coello and Carlos 2000). A stochastic approach is proposed by Runarsson and Yao
(2000) to balance the objective and penalty functions. Nanakorn and Meesomklin
(2001) proposed an adaptive penalty function that gets adjusted by itself during
the evolution in such a way that the desired degree of penalty is always obtained.
Kuri-Morales and Gutirrez-Garca (2002) proposed a statistical analysis based on
the penalty function method using genetic algorithms with five different penalty
function strategies. For each of these, they have considered three particular GAs. The
behavior of each strategy and the associated GAs is then established by extensively
sampling the function suite and finding the worst-case best values.
Zhou et al. (2003) did not suggest any new penalty term, but performed a time
complexity analysis of EAs for solving constrained optimization using the penalty
function approach. It is shown that when the penalty coefficient is chosen properly,
direct comparison between pairs of solutions using penalty fitness function is equivalent to that using the criteria superiority of feasible point or superiority of objective
function value. They also analyzed the role of penalty coefficients in EAs in terms of
time complexity. In some cases, EAs benefit greatly from higher penalty parameter
values, while in other examples, EAs benefit from lower penalty parameter values.
However, the analysis procedure still cannot make any theoretical predication on the
choice of suitable penalty parameter for an arbitrary problem.
Wang and Ma (2006) proposed an EA-based constraint-handling scheme with continuous penalty function where only one control parameter is proposed on penalty
function. Lin and Chuang (2007) proposed an adjustment of penalty parameter with
generations by using the rough set theory. Matthew et al. (2009) suggested an adaptive
GA that incorporates population-level statistics to dynamically update penalty functions, a process analogous to strategic oscillation used in the tabu search literature.
The method of delineating feasible from infeasible solutions was proposed by
Powell and Skolnick (1993). The method was modified in devising a penaltyparameter-less approach (Deb 2000). From the objective function and constraint
function values, a fitness function is derived so that (i) every feasible solution is
better than any infeasible solution, (ii) between two feasible solutions, the one with
better objective function value is better, and (iii) between two infeasible solutions,
the one with a smaller overall constraint violation is better. Angantyr et al. (2003) is
another effort in this direction:
1. If no feasible individual exists in the current population, the search should be
directed toward the feasible region.
2. If the majority of the individuals in the current populations are feasible, the search
should be directed toward the unconstrained optimum.
3. A feasible individual closer to the optimum is always better than the feasible
individual away from the optimum.
4. An infeasible individual might be a better individual than the feasible individual
if the number of feasible individuals is high.
256
257
Dominated
points
H
G
F E
f(x)
D
C
front
Fig. 10.2 Two-objective plot of a set of solutions for the problem given in Eq. (10.3)
offspring are selected based on the highest Pareto strength and with lower degree of
constraint violation.
Venkatraman and Yen (2005) proposed a two-phase framework. In the first phase,
the objective function is neglected and the problem is treated as a constraint satisfaction problem to find at least one feasible solution. Population is ranked based on
the sum of constraint violations. As and when at least a single feasible solution is
found, both the objective function and the constraint violation are taken into account
where two objectives are original objective and summation of normalized constraint
violation values.
Cai and Wang (2005) proposed a novel EA for constrained optimization. In the
process of population evolution, the algorithm is based on multiobjective optimization, i.e., an individual in the parent population may be replaced if it is dominated
by a nondominated individual in the offspring population. In addition, three models
of a population-based algorithm generator and an infeasible solution archiving and
replacement mechanism are introduced. Furthermore, the simplex crossover is used
as a recombination operator to enrich the exploration and exploitation abilities of the
approach proposed.
Deb et al. (2007) proposed a hybrid reference-point-based evolutionary multiobjective optimization (EMO) algorithm, where the summation of normalized constraint violation is used as the second objective. Wang et al. (2008) proposed a
multi-objective way of constraint handling with three main issues: (i) the evaluation of infeasible solutions when the population contains only infeasible individuals;
(ii) balancing feasible and infeasible solutions when the population consists of a
combination of feasible and infeasible individuals; and (iii) the selection of feasible
solutions when the population is composed of feasible individuals only.
258
259
Deb et al. (2007) proposed a hybrid reference-point-based evolutionary multiobjective optimization (EMO) algorithm coupled with the classical SQP procedure
for solving constrained single-objective optimization problems. The reference pointbased EMO procedure allowed the procedure to focus its search near the constraint
boundaries, while the SQP methodology acted as a local search to improve the
solutions. Deep et al. (2008) proposed a constraint-handling method based on the
features of genetic algorithm and self-organizing migrating algorithm.
Araujo et al. (2009) proposed a novel methodology to be coupled with a genetic
algorithm to solve optimization problems with inequality constraints. This methodology can be seen as a local search operator that uses quadratic and linear approximations for both objective function and constraints. In the local search phase, these
approximations define an associated problem with a quadratic objective function and
quadratic and/or linear constraints that are solved using a linear matrix inequality
(LMI) formulation. The solution of this associated problems is then reintroduced in
the GA population.
Bernardino et al. (2009) proposed a hybridized genetic algorithm (GA) with an
artificial immune system (AIS) as an alternative to tackle constrained optimization
problems in engineering. The AIS is inspired by the clonal selection principle and is
embedded into a standard GA search engine in order to help move the population into
the feasible region. The resulting GA-AIS hybrid is tested in a suite of constrained
optimization problems with continuous variables, as well as structural and mixed
integer reliability engineering optimization problems. In order to improve the diversity of the population, a variant of the algorithm is developed with the inclusion of a
clearing procedure. The performance of the GA-AIS hybrids is compared with other
alternative techniques, such as the adaptive penalty method, and the stochastic ranking technique, which represent two different types of constraint handling techniques
that have been shown to provide good results in the literature.
Yuan and Qian (2010) proposed a new HGA combined with local search to solve
twice continuously differentiable nonlinear programming (NLP) problems. The local
search eliminates the necessity of a penalization of infeasible solutions or any special
crossover and mutation operators.
Recently Mezura-Montes (2009) edited a book on constraint handling in evolutionary optimization. The most recent study in constraint-handling survey using
nature-inspired optimization can be found in Mezura-Montes and Coello (2011). The
following methodologies are briefly described in their paper:
Feasibility rules
Stochastic ranking
-constraint method
Novel penalty functions
Novel special operators
Multiobjective concepts
Ensemble of constraint-handling techniques
The authors also showed a good future direction for the researchers in constrainthandling areas. These areas will be helpful for researchers, novice, and experts alike.
260
Year
Fig. 10.3 Paper published in evolutionary constrained optimization per year (19612013,
September 26) (taken from Coello (2013))
The aforesaid literature clearly indicates that different techniques are proposed
using EAs for efficient constraint handling. However, it is difficult to cover the
whole literature on constraint handling. Coello (2013) maintains a constrainthandling repository which holds a broad spectrum of constraint-handling techniques.
Figure 10.3 quantitatively shows the histogram of a number of paper published in
evolutionary constrained optimization. From Fig. 10.3 it is clear that researchers are
coming up with new constraint-handling mechanisms using EAs, for which the number of published papers is directly proportional to time. For the year 2013, we have
data until September 26.
261
classical approach to speed-up the convergence. The main motivation of the hybridization is to take advantage of one method to overcome difficulties of the other method
and, in the process, develop an algorithm that may outperform each method individually and preferably to most reported high-performing algorithms.
(10.4)
The feasible region is the area inside a circle of radius one and center at (1.5, 1.5)T .
Since the objective function is one more than the distance of any point from
the origin,
the constrained minimum lies on the circle and at x1 = x2 = 1.5 1/ 2 = 0.793.
The corresponding function value is f = 2.121. Thus, in this problem, the minimum
point makes the constraint g() active. This problem was also considered elsewhere
(Deb 2001).
Let us now convert this problem into the following two-objective problem:
minimize f1 (x) = CV(x) = g(x),
x12 + x22 ,
(10.5)
where CV(x) is the constraint violation. For multiple inequality and equality constraints, the constraint violation function is defined in terms of normalized constraint
functions, as follows:
CV(x) =
J
K
hk (x) .
gj (x) +
j=1
k=1
(10.6)
262
6
5
f(x)
f(x)
R0
Feasible solutions
of equation (1)
1 tangent
CV(x)
A
Constrained minimum
2
1
Unconstrained
minimum
Paretooptimal front
10
15
20
25
For the above problem, the first objective (f1 ()) is always nonnegative. If for any
solution the first objective value is exactly equal to zero, it is the feasible solution to
the original problem, given in Eq. (10.4). Figure 10.4 shows the objective space of
the above bi-objective optimization problem. Since all feasible solutions lie on the
CV = 0 axis, the minimum of all feasible solutions corresponds to the minimum
point of the original problem. This minimum solution is shown in the figure as
solution A.
The corresponding Pareto-optimal front for the two-objective optimization problem (given in Eq. (10.5)) is marked. Interestingly, the constraint minimum solution
A lies on one end of the Pareto-optimal solution front. Such bi-objective problems
are usually solved using a lexicographic method (Miettinen 1999), in which after
finding the minimum-CV solution (corresponds to CV = 0 here), the second level
optimization task would minimize f (x) subject to CV(x) 0. But this problem
is identical to the original problem (Eq. (10.4)). Thus, the lexicographic method of
solving the bi-objective problems is not computationally and algorithmically advantageous in solving the original constrained optimization problem. However, an EMO
with a modification in its search process can be used to solve the bi-objective problem. Since we are interested in the extreme solution A, there is no need for us to
find the entire Pareto-optimal front. Fortunately, a number of preference-based EMO
procedure which can find only a part of the entire Pareto-optimal front (Branke 2008;
Branke and Deb 2004). In solving constrained minimization problems, we may then
employ such a technique to find the Pareto-optimal region close to the extreme left
of the Pareto-optimal front (as in Fig. 10.4).
In summary, we claim here that since an EMO procedure (even for a preferencebased EMO approach) emphasizes multiple trade-off solutions by its niching (crowding or clustering) mechanism, an EMO population will maintain a diverse set of
solutions than a single-objective EA would. This feature of EMO should help solve
complex constrained problems better. Moreover, the use of bi-objective optimization
263
avoids the need of any additional penalty parameter which is required in a standard
penalty function-based EA approach.
(10.7)
J
gj (x).
(10.8)
j=1
Here, the purpose of the penalty parameter is to make a balance of the overall constraint violation to the objective function value. If an appropriate R is not chosen, the
optimum solution of the above penalized function P() will not be close to the true
constrained minimum solution. There is an intimate connection to this fact with our
bi-objective problem given in Eq. (10.5), which we discuss next.
264
(10.9)
(10.10)
where f1 () and f2 () are described in Eq. (10.5). It is well known that one way to solve
a two-objective minimization problem (minimize {f1 (x), f2 (x)}) is to convert the
problem as a weighted-sum minimization problem (Chankong and Haimes 1983):
minimize Fw1 ,w2 (x) = w1 f1 (x) + w2 f2 (x).
(10.11)
In the above formulation, w1 and w2 are two nonnegative numbers (and both are not
zero). It is proven that the solution to the above problem is always a Pareto-optimal
point of the two-objective optimization problem (Miettinen 1999). Moreover, the
optimal point of problem (10.11) is a particular point on the Pareto-optimal front
which minimizes Fw1 ,w2 . For a convex Pareto-optimal front, the optimal point for
the weighted-sum approach is usually the point on which the linear contour line
of the weighted-sum function is tangent to the Pareto-optimal front, as depicted in
Fig. 10.5. The contour line has a slope of m = w1 /w2 .
Against this background, let us now compare Eqs. (10.11 with 10.10). We observe
that solving the penalized function P() given in Eq. (10.10) is equivalent to solving
the bi-objective optimization problem given in Eq. (10.5) with w1 = R and w2 = 1.
This implies that for a chosen value of penalty parameter (R), the corresponding
optimal solution will be a Pareto-optimal solution to the bi-objective problem given
in Eq. (10.5), but need not be the optimal solution for the original single-objective
optimization problem (or solution A). This is the reason why the penalty function
A
w1
w2
f1
Paretooptimal front
265
266
Step 2:
Step 3:
minimize f (x),
minimize CV(x),
(10.12)
subject to CV(x) c,
x(L) x x(U) .
The constraint is added to find the nondominated solutions close to
minimum-CV(x) solution. Since CV(x) is the normalized constraint violation (Eq. (10.6)), it is suggested that c = 0.2J be chosen for problems
having no equality constraints and c = 0.4(J + K) in the presence of
equality constraints. To have an adequate number of feasible solutions in
the population to estimate the critical penalty parameter R0 , we count the
number of feasible solutions (checked with CV 106 ). If there are more
than three bi-objective feasible solutions (with CV c) in the population,
we proceed to Step 2, else increment generation counter t and repeat Step 1.
If t > 0 and ((t mod ) = 0), compute Rnew from the current nondominated front as follows. First, a cubic-polynomial curve is fitted for the
nondominated points (f = a + b(CV) + c(CV)2 + d(CV)3 ) and then the
penalty parameter is estimated by finding the slope at CV = 0, that is,
R = b. Since this is a lower bound on R, we use R = rb, where r is
a weighting parameter greater than equal to one. So as not to have abrupt
changes in the values of R between two consecutive local searches, we set
Rnew = (1 w)Rprev + wR, where w is a weighting factor. In the very first
local search, we use Rnew = R.
Thereafter, the following penalized function is optimized with Rnew computed from above and starting with the current minimum-CV solution:
Jj=1 gj (x),
if K = 0,
2
minimize P(x) = f (x) + Rnew J
K
j=1 gj (x)2 + k=1 h k (x) , otherwise.
x(L) x x(U) .
(10.13)
Step 4:
267
(10.14)
For this problem, only constraint g1 is active at the minimum point. To demonstrate
the working of our proposed hybrid strategy, we use different optimization techniques
to solve the same problem.
minimize f (x),
subject to g1 (x) ,
0 x1 6, 0 x2 6.
(10.15)
268
f(x)
R0
CV(x)
We use different values of and for each case find the optimum solution by solving
mathematical KKT conditions exactly. The resulting f (x) and CV(x) values are
shown in Fig. 10.6 with diamonds.
The optimum solution of the problem given in Eq. (10.14) is obtained for = 0
and is x = (2.219, 2.132)T with a function value of 0.627. The corresponding
Lagrange multiplier is u1 = 1.74. Later, we shall use this theoretical result to verify
the working of our hybrid procedure.
When we fit a cubic polynomial passing through the obtained points (f -CV) from
the above theoretical analysis, we obtain the following fitted function of the Paretooptimal front:
f = 0.628 1.739(CV) + 1.643(CV)2 0.686(CV)3 .
(10.16)
269
Table 10.1 Effect of penalty parameter values for the problem given in Eq. (10.14)
Penalty parameter
x1
x2
F
0.01
2.9939
0.1
2.9406
1
2.4949
1.5
2.3021
1.75
2.2189
10
2.2187
15
2.2191
50
2.2215
Theoretical optimum (using Eq. (10.18))
1.74
2.219
CV
2.0010
2.0096
2.0856
2.1183
2.1303
2.1302
2.1326
2.1469
0.0085
0.0812
0.5330
0.6181
0.6274
0.6274
0.6274
0.6277
0.8421
0.7761
0.2507
0.0780
0.0001
0
0
0
2.132
0.627
3.5
3.0
Infeasible
region
2.0
2.5
1.5
Feasible
region
1.0 0.1
1.5 0.5 R=0.01
150
0.5
0.79
1,500
2
1.0
R=15,000
1.75
Infeasible
region
problem. Since constraint satisfaction becomes the main aim, the algorithm converges
to any arbitrary feasible solution. This example clearly shows the importance of
setting an appropriate value of R. Too small or too large values may produce infeasible
or an arbitrary feasible solution, respectively.
270
f (x ) u1 g1 (x ) = 0,
g1 (x ) 0,
u1 g1 (x ) = 0,
u1 0.
Here, any variable bound that will be active at the optimum must also be considered as
an inequality constraint. Next, we consider the penalized function given in Eq. (10.2).
The solution (xp ) of the penalized function (given at Eq. (10.8)) at an Rcr R0 can
be obtained by setting the first derivative of P() to zero:
f (xp ) + Rcr
dg1 (xp )
g1 (xp ) = 0.
dg1
(10.17)
The derivative of the bracket operator at g1 = 0 does not exist, as at a point for which
g1 = 0+ , the derivative is zero and at a point for which g1 = 0 , the derivative is
1. But considering that an algorithm usually approaches the optimum from the
infeasible region, the optimum is usually found with an arbitrarily small tolerance
on constraint violation. In such a case, the derivative at a point xp for which g1 = 0
is 1. The comparison of both conditions states that Rcr = u1 . Since xp is arbitrarily
close to the optimum, thus the second and third KKT conditions above are also
satisfied at this point with the tolerance. Since u1 = Rcr and the penalty parameter
R is chosen to be positive, u1 > 0. Thus, for a solution of the penalized function
formed with a single active constraint, we have an interesting and important result:
Rcr = u1 .
(10.18)
For the example problem of this section, we notice that the u1 = 1.74 obtained from
the KKT condition is identical to the critical lower bound on Rcr .
Finding the bi-objective Pareto-optimal front through a generating method verified
by KKT optimality theory and by verifying the derived critical penalty parameter
with the theoretical Lagrange multiplier obtained through the KKT optimality theory,
we are now certain about two aspects:
1. The obtained bi-objective front is optimal.
2. The critical penalty parameter obtained from the front is adequate to obtain the
constrained minimum.
271
Table 10.2 Function evaluations, FE (NSGA-II and local search), needed by the hybrid algorithm
in 25 runs
Best
Median
Worst
FE
f
index 10, and mutation index 100 (Deb 2001). Here, we use = 5, r = 2, and
w = 0.5. The hybrid algorithm is terminated when two consecutive local searches
produce feasible solutions with a difference of 104 or less in the objective values.
The obtained front is shown in Fig. 10.6 with small circles, which seems to match
with the theoretical front obtained by performing KKT optimality conditions on
several -constraint versions (in diamonds) of the bi-objective problem.
At best, our hybrid approach finds the optimum solution in only 677 function
evaluations (600 needed by EMO and 77 by fmincon() procedure). The corresponding solution is x = (2.219, 2.132)T with an objective value of 0.627380.
Table 10.2 shows the best, median, and worst performance of the hybrid algorithm
in 25 different runs.
Figure 10.8 shows the variation of population-best objective value with generation
number for the median performing run (with 999 function evaluations). The figure
shows that the objective value reduces with generation number. The algorithm could
not find any feasible solution in the first two generations, but from generation 3,
the best population member is always feasible. At generation 5, the local search
method is called the first time. The penalty parameter obtained from the NSGA-II
front is R = 1.896 at generation 5 and a solution very close to the true optimum is
R=1.896
Infeasible
points
R
R=1.722
Fig. 10.8 Objective value reduces with generation for the problem in Eq. (10.14)
272