0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

414 Ansichten330 SeitenThis book discusses all the constrained handling techniques in great details.

Feb 08, 2016

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

This book discusses all the constrained handling techniques in great details.

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

414 Ansichten330 SeitenThis book discusses all the constrained handling techniques in great details.

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 330

Rituparna Datta

Kalyanmoy Deb Editors

Evolutionary

Constrained

Optimization

Applied Sciences and Engineering

Editors

Evolutionary Constrained

Optimization

123

Editors

Rituparna Datta

Department of Electrical Engineering

Korea Advanced Institute of Science

and Technology

Daejeon

Republic of Korea

Kalyanmoy Deb

Electrical and Computer Engineering

Michigan State University

East Lansing, MI

USA

ISSN 2363-6149

ISSN 2363-6157 (electronic)

Infosys Science Foundation Series

ISSN 2363-4995

ISSN 2363-5002 (electronic)

Applied Sciences and Engineering

ISBN 978-81-322-2183-8

ISBN 978-81-322-2184-5 (eBook)

DOI 10.1007/978-81-322-2184-5

Library of Congress Control Number: 2014957133

Springer New Delhi Heidelberg New York Dordrecht London

Springer India 2015

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part

of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,

recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission

or information storage and retrieval, electronic adaptation, computer software, or by similar or

dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this

publication does not imply, even in the absence of a specific statement, that such names are exempt

from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this

book are believed to be true and accurate at the date of publication. Neither the publisher nor the

authors or the editors give a warranty, express or implied, with respect to the material contained

herein or for any errors or omissions that may have been made.

Printed on acid-free paper

Springer (India) Pvt. Ltd. is part of Springer Science+Business Media (www.springer.com)

and Khela Datta (mother).

Rituparna Datta

To Sadhan Chandra Deb (Baro Mama,

Eldest Uncle) whose inspiration has always

shown me the way.

Kalyanmoy Deb

Preface

problems.

The critical challenge in optimization lies in iteratively nding the best

combination of variables which minimize or maximize one or more objective

functions by satisfying the variable requirements and restrictions which are largely

known as constraints. Most optimization problems involve one or many constraints

due to the limitation in the availability of resources, physical viability, or other

functional requirements. The existence of constraints in problems in science and

engineering is continuously motivating researchers to develop newer and more

efcient methods of constraint handling in optimization.

Evolutionary optimization algorithms are population-based metaheuristic

techniques to deal with optimization problems. These algorithms have been

successively applied to a wide range of optimization problems due to their ability to

deal with nonlinear, nonconvex, and discontinuous objective and constraint

functions. Originally, evolutionary algorithms (EAs) were developed to solve

unconstrained problems. However, as demands for solving practical problems

arose, evolutionary algorithm researchers have been regularly devising new

and efcient constraint handling techniques. Out of these constraint handling

techniques, some are borrowed from the classical literature, while others use different strategies like preference of feasible solutions over infeasible ones, choice of

less constraint-violated solutions, separation of objective and constraint functions,

special operators, and hybrid classical evolutionary methods, to name a few.

In most top evolutionary computation conferences, a good number of papers are

regularly published to discuss various ways of handling constraints using different

EAs. Almost all books and journals on evolutionary computation consist one or many

topics on constrained optimization. In 2009, Springer Studies in Computational

Intelligence came up with a full monograph on EA-based constrained optimization

(Constraint-Handling in Evolutionary Optimization by Mezura-Montes; ISBN: 9783-642-00618-0). This book takes the same direction as that monograph, and presents

a more updated view of the subject matter. Moreover, this book aims to serve as a

self-contained collection of the current research addressing general constrained

vii

viii

Preface

optimization. The book can also serve as a textbook for advanced courses and as a

guide to the future direction of research in the area. Many constraint handling

techniques that exist in bits and pieces are assembled together in the present

monograph. Hybrid optimization, which is gaining a lot of popularity today due to its

capability of bridging the gap between evolutionary and classical optimization is

broadly covered here. These areas will be helpful for researchers, novices and experts

alike.

The book consists of ten chapters covering diverse topics of constrained

optimization using EAs.

Helio J.C. Barbosa, Afonso C.C. Lemonge, and Heder S. Bernardino review the

adaptive penalty techniques in the rst chapter that mainly deals with constraints using

EAs. The penalty function approach is one of the most popular constraint handling

methodologies due to its simple working principle and its ease of integration with

any unconstrained technique. The study also indicates the need for implementation of

different adaptive penalty methods in a single search engine. It will facilitate better

information for the decision maker to choose a particular technique.

The theoretical understanding of constrained optimization is one of the key

features to select the best constraint handling mechanism for any problem.

To tackle this issue, Shayan Poursoltan and Frank Neumann have studied the

influence of tness landscape in Chap. 2. The study introduces different methods to

quantify the ruggedness of a given constrained optimization problem.

Rommel G. Regis proposes a constraint handling method to solve computationally expensive constrained black-box optimization using surrogate-assisted

evolutionary programming (EP) in Chap. 3. The proposed algorithm creates

surrogates model for the black-box objective function and inequality constraint

functions in every generation of the EP. Furthermore, at the end of each generation

a trust-region-like approach is used to rene the best solution. Hard and soft

constraints are common in constrained optimization problems.

In Chap. 4, Richard Allmendinger and Joshua Knowles point out a new type of

constraint known as ephemeral resource constraints (ERCs). The authors have

explained the presence of ERCS in real-world optimization problems.

A combination of multi-membered evolution strategy and an incremental

approximation strategy-assisted constraint handling method is proposed by Sanghoun Oh and Yaochu Jin in Chap. 5 to deal with highly constrained, tiny and

separated feasible regions in the search space. The proposed approach generates an

approximate model for each constraint function with increasing order of accuracy.

It starts with a linear model and consecutively reaches to the complexity similar to

the original constraint function.

Chapter 6, by Tetsuyuki Takahama and Setsuko Sakai, describes a method

combining the e-constrained method and the estimated comparison. In this method,

rough approximation is utilized to approximate both the objective function as well

as constraint violation. The methodology is integrated with differential evolution

(DE) for its simple working principle and robustness.

Preface

ix

Jeremy Porter and Dirk V. Arnold carry out a detailed analysis of the behavior of a

multi-recombinative evolution strategy that highlights both cumulative step size

adaptation and a simple constraint handling technique in Chap. 7. In order to obtain

the optimal solution at the cones apex, a linear optimization problem is considered for

analysis with a feasible region dened by a right circular cone, which is symmetric

about the gradient direction.

A niching technique is explored in conjunction with multimodal optimization by

Mohammad Reza Bonyadi and Zbigniew Michalewicz in Chap. 8 to locate feasible

regions, instead of searching for different local optima. Since in continuous

constrained optimization, feasible search space is more likely to appear with many

disjoint regions, the global optimal solution might be located within any one

of them. A particle swarm optimization is used as search engine.

In Chap. 9, Rammohan Mallipeddi, Swagatam Das, and Ponnuthurai Nagaratnam

Suganthan present an ensemble of constraint handling techniques (ECHT). Due to

the nonexistence of a universal constraint handling method, an ensemble method can

be a suitable alternative. ECHT is collated with an improved (DE) algorithm and the

proposed technique is known as EPSDE.

Rituparna Datta and Kalyanmoy Deb propose an adaptive penalty function

method using genetic algorithms (GA) in the concluding chapter (Chap. 10) of this

book. The proposed method amalgamates a bi-objective evolutionary approach

with the penalty function methodology in order to overcome individual weakness.

The bi-objective approach is responsible for the approximation of appropriate

penalty parameter and the starting solution for the unconstrained penalized function

by a classical method, which is responsible for exact convergence.

We would like to thank the team at Springer. In particular we acknowledge the

contributions of our Editor, Swati Meherishi, and the editorial assistants, Kamya

Khatter and Aparajita Singh, who helped bring this manuscript to fruition.

Rituparna Datta would like to thank his wife Anima and daughter Riddhi for their

love and affection.

Daejeon, Korea, September 2014

East Lansing, MI, USA

Rituparna Datta

Kalyanmoy Deb

Acknowledgments to Reviewers

With deep gratitude we convey our heartfelt greetings and congratulations to the

following colleagues and key researchers who spared no pains for reviewing this

book to make it a signal success.

Richard Allmendinger, University College London, UK

Dirk Arnold, Dalhousie University, Canada

Helio J.C. Barbosa, Universidade Federal de Juiz de Fora, Brazil

Heder S. Bernardino, Laboratorio Nacional de Computacao Cientica, Brazil

Hans-Georg Beyer, FH Vorarlberg, University of Applied Sciences, Austria

Fernanda Costa, University of Minho, Portugal

Dilip Datta, Tezpur University, India

Oliver Kramer, University of Oldenburg, Germany

Afonso Celso de Castro Lemonge, Federal University of Juiz de Fora, Brazil

Xiaodong Li, RMIT University, Australia

Rammohan Mallipeddi, Kyungpook National University, South Korea

Tomasz Oliwa, Toyota Technological Institute at Chicago, USA

Khaled Rasheed, University of Georgia, USA

Rommel G. Regis, Saint Josephs University, USA

xi

Contents

in Evolutionary Computation . . . . . . . . . . . . . . . . . . . . . . . . . . .

Helio J.C. Barbosa, Afonso C.C. Lemonge

and Heder S. Bernardino

Ruggedness Quantifying for Constrained Continuous

Fitness Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Shayan Poursoltan and Frank Neumann

Trust Regions in Surrogate-Assisted Evolutionary

Programming for Constrained Expensive Black-Box

Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Rommel G. Regis

29

51

Richard Allmendinger and Joshua Knowles

Evolutionary Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sanghoun Oh and Yaochu Jin

135

Differential Evolution with Rough Approximation . . . . . . . . . . . .

Tetsuyuki Takahama and Setsuko Sakai

157

Strategies Applied to a Conically Constrained Problem . . . . . . . .

Jeremy Porter and Dirk V. Arnold

181

95

xiii

xiv

Contents

of a Search Space with a Particle Swarm Optimizer . . . . . . . . . . .

Mohammad Reza Bonyadi and Zbigniew Michalewicz

Ensemble of Constraint Handling Techniques for Single

Objective Constrained Optimization . . . . . . . . . . . . . . . . . . . . . .

Rammohan Mallipeddi, Swagatam Das

and Ponnuthurai Nagaratnam Suganthan

205

231

Rituparna Datta and Kalyanmoy Deb

249

315

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

317

Technology (RIT) Laboratory at the Korea Advanced Institute of Science and

Technology (KAIST). He earned his Ph.D. in Mechanical Engineering at Indian

Institute of Technology (IIT) Kanpur and thereafter worked as a Project Scientist in

the Smart Materials, Structures, and Systems Lab at IIT Kanpur. His current research

work involves investigation of Evolutionary Algorithms-based approaches to constrained optimization, applying multiobjective optimization in engineering design

problems, memetic algorithms, derivative-free optimization, and robotics. He is a

member of ACM, IEEE, and IEEE Computational Intelligence Society. He has been

invited to deliver lectures in several institutes and universities across the globe,

including Trinity College Dublin (TCD), Delft University of Technology

(TUDELFT), University of Western Australia (UWA), University of Minho, Portugal, University of Nova de Lisboa, Portugal, University of Coimbra, Portugal, and

IIT Kanpur, India. He is a regular reviewer of IEEE Transactions on Evolutionary

Computation, Journal of Applied Soft Computing, Journal of Engineering Optimization, Journal of The Franklin Institute, and International Journal of Computer

Systems in Science and Engineering, and was in the program committee of Genetic

and Evolutionary Computation Conference (GECCO 2014), iNaCoMM2013,

GECCO 2013, GECCO 2012, GECCO 2011, eighth international conference on

Simulated Evolution And Learning (SEAL 2010), international conference on molecules to materials (ICMM-06), and some Indian conferences. He has also chaired

sessions in ACODS 2014 and UKIERI Workshop on Structural Health Monitoring

2012, GECCO 2011, IICAI 2011 to name a few. He was awarded an international

travel grant (Young Scientist) from Department of Science and Technology,

Government of India, in July 2011 and June 2012 and travel grants from Queensland

University, Australia, June 2012, GECCO Student Travel Grant, ACM, New York.

xv

xvi

Electrical and Computer Engineering in Michigan State University (MSU), East

Lansing, USA. He also holds a professor position at the Department of Computer

Science and Engineering, and at the Department of Mechanical Engineering in

MSU. Prof. Debs main research interests are in genetic and evolutionary optimization algorithms and their application in optimization, modeling, and machine

learning. He is largely known for his seminal research in developing and applying

Evolutionary Multi-objective Optimization. Prior to coming to MSU, he was

holding an endowed chair professor position at Indian Institute of Technology

Kanpur, India, where he established KanGAL (http://www.iitk.ac.in/kangal) to

promote research in genetic algorithms and multi-criterion optimization since 1997.

His Computational Optimization and Innovation (COIN) Laboratory (http://www.

egr.msu.edu/kdeb) at Michigan State University continues to act in the same spirit.

He has consulted with various industries and software companies in the past.

Prof. Deb was awarded the prestigious Infosys Prize in 2012, TWAS Prize in

Engineering Sciences in 2012, CajAstur Mamdani Prize in 2011, JC Bose

National Fellowship in 2011, Distinguished Alumni Award from IIT Kharagpur

in 2011, Edgeworth-Pareto award in 2008, Shanti Swarup Bhatnagar Prize in

Engineering Sciences in 2005, Thomson Citation Laureate Award from Thompson Reuters. Recently, he has been awarded a Honarary Doctorate from University

of Jyvaskyla, Finland. His 2002 IEEETEC NSGA-II paper is judged as the Most

Highly Cited paper and a Current Classic by Thomson Reuters having more than

4,200+ citations. He is a fellow of IEEE, ASME, Indian National Science Academy

(INSA), Indian National Academy of Engineering (INAE), Indian Academy of

Sciences (IASc), and International Society of Genetic and Evolutionary Computation (ISGEC). He has written two text books on optimization and more than 375

international journal and conference research papers with Google Scholar citations

of 65,000+ with h-index of 85. He is in the editorial board on 20 major international

journals. More information about his research can be found from http://www.egr.

msu.edu/kdeb.

Chapter 1

Techniques in Evolutionary Computation

Helio J.C. Barbosa, Afonso C.C. Lemonge and Heder S. Bernardino

engineering, and economics. Due to the growing complexity of the problems tackled,

nature-inspired metaheuristics in general, and evolutionary algorithms in particular,

are becoming increasingly popular. As move operators (recombination and mutation)

are usually blind to the constraints, most metaheuristics must be equipped with a

constraint handling technique. Although conceptually simple, penalty techniques

usually require user-defined problem-dependent parameters, which often significantly impact the performance of a metaheuristic. A penalty technique is said to

be adaptive when it automatically sets the values of all parameters involved using

feedback from the search process without user intervention. This chapter presents a

survey of the most relevant adaptive penalty techniques from the literature, identifies

the main concepts used in the adaptation process, as well as observed shortcomings,

and suggests further work in order to increase the understanding of such techniques.

Keywords Adaptive techniques Penalty techniques Evolutionary computation

1.1 Introduction

Constrained optimization problems are common in the sciences, engineering, and

economics. Due to the growing complexity of the problems tackled, nature-inspired

metaheuristics in general, and evolutionary algorithms in particular, are becoming

H.J.C. Barbosa (B)

National Laboratory for Scientific ComputingLNCC, Petropolis, Rio de Janeiro, RJ, Brazil

e-mail: hcbm@lncc.br

A.C.C. Lemonge

Department of Applied and Computational Mechanics, Federal University of Juiz de Fora,

Juiz de Fora, MG, Brazil

e-mail: afonso.lemonge@ufjf.edu.br

H.S. Bernardino H.J.C. Barbosa

Department of Computer Science, Federal University of Juiz de Fora, Juiz de Fora, MG, Brazil

e-mail: heder@ice.ufjf.br

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_1

increasingly popular. That is due to the fact that, in contrast to classical mathematical

programming techniques, they can be readily applied to situations where the objective

function(s) and/or constraints are not known as explicit functions of the decision

variables. This happens when potentially expensive computer models (generated by

means of the finite element method (Hughes 1987), for example) must be run in order

to compute the objective function and/or check the constraints every time a candidate

solution needs to be evaluated. For instance, in the design of truss structures, one

possible definition of the problem is to find the cross-section areas of the bars that

minimize the structures weight subject to limitations in the nodal displacements and

in the stress of each bar (Krempser et al. 2012). Notice that although the structures

weight can be easily calculated from the design variables, the values of the nodal

displacements and of the stress in each bar are determined by solving the equilibrium

equations defined by the finite element model.

As move operators (recombination and mutation) are usually blind to the

constraints (i.e., when operating upon feasible individual(s) they do not necessarily

generate feasible offspring) most metaheuristics must be equipped with a constraint

handling technique. In simpler situations, repair techniques (Salcedo-Sanz 2009),

special move operators (Schoenauer and Michalewicz 1996), or special decoders

(Koziel and Michalewicz 1998) can be designed to ensure that all candidate solutions are feasible.

We do not attempt to survey the current literature on constraint handling in this

chapter, and the reader is referred to survey papers of, e.g., Michalewicz (1995),

Michalewicz and Schoenauer (1996), Coello (2002), and Mezura-Montes and Coello

(2011) as well as to the other chapters in this book. Instead we consider the oldest, and

perhaps most general class of constraint handling methods: the penalty techniques,

where infeasible candidate solutions have their fitness value reduced and are allowed

to coexist and evolve with the feasible ones.

Although conceptually simple, penalty techniques usually require user-defined

problem-dependent parameters, which often significantly impact the performance of

a metaheuristic.

The main focus of this chapter is on adaptive penalty techniques, which automatically set the values of all parameters involved using feedback from the search

process without user intervention. This chapter presents a survey of the most relevant adaptive penalty techniques from the literature as well as a critical assessment of

their assumptions, rationale for the design choices made, and reported performance

on test-problems.

The chapter is structured as follows. Section 1.2 summarizes the penalty method,

Sect. 1.3 introduces the main taxonomy for strategy parameter control, and Sect. 1.4

reviews some representative proposals for adapting penalty parameters. Section 1.5

presents a discussion of the main findings and the chapter ends with some conclusions,

including suggestions for further work in order to increase the understanding of such

adaptive techniques.

We consider in this chapter the constrained optimization problem consisting in the

minimization of a given objective function f (x), where x Rn is the vector of

decision/design variables, which are subject to inequality constraints gp (x) 0,

p = 1, 2, . . . , p as well as equality constraints hq (x) = 0, q = 1, 2, . . . , q . In many

applications the variables are also subject to bounds xiL xi xiU . However, this type

of constraint is usually trivially enforced in an EA and are not considered here. The

set of all feasible solutions are denoted by F , while d(x, F ) is a distance measure

of the element x to the set F . The definition of d(x, F ) depends on the particular

constraint-handling strategy adopted and is specified for each strategy independently.

The penalty method, which transforms a constrained optimization problem into

an unconstrained one, can be traced back at least to the paper by Courant (1943) in

the 1940s, and its adoption by the evolutionary computation community happened

very soon.

In this chapter, penalty techniques used within evolutionary computation

methods are classified as multiplicative or additive. In the multiplicative case, a

positive penalty factor p(v(x), T ) is introduced where v(x) denotes a measure of how

constraints are violated by the candidate solution x and T denotes a temperature.

The idea is to amplify the value of the fitness function of an infeasible individual (in

a minimization problem):

F(x) = p(v(x), T ) f (x).

One would have p(v(x), T ) = 1, for any feasible candidate solution x, and

p(v(x), T ) > 1 otherwise. Also, p(v(x), T ) increases with the temperature T and

with the magnitude of the constraint violation v(x). An initial value for the temperature is required together with the definition of a schedule for T such that T grows as

the evolution advances. This type of penalty has however received much less attention

in the EC community than the additive type. The most recent work seems to be by

Puzzi and Carpinteri (2008), where the technique introduced by Yokota et al. (1995)

and later modified in Gen and Cheng (1996), is also presented. Harrell and Ranjithan

(1999) compare additive and multiplicative penalty techniques for an instance of the

watershed management problem.

In the additive case, a penalty functional is added to the objective function in

order to define the fitness value of an infeasible element. They can be further divided

into: (a) interior techniques, when a barrier functional B(x)which grows rapidly

as x F approaches the boundary of the feasible domainis added to the objective

function

1

Fk (x) = f (x) + B(x)

k

Fk (x) = f (x) + kP(d(x, F ))

(1.1)

In both cases (a) and (b), under reasonable conditions, as k any limit point

of the sequence {xk } of solutions of the unconstrained problem of minimizing Fk (x)

is a solution of the original constrained problem (Luenberger and Ye 2008).

In order to define d(x, F ) it is useful to define a measure of the violation of the

jth constraint by a given candidate solution x Rn . One possibility is to take

vj (x) =

|hj (x)|,

max{0, gj (x)}

otherwise

(1.2)

However, the equality constraints hj (x) = 0 are often replaced by the inequalities

|hj (x)| 0, for some small positive , and one would have

vj (x) =

max{0, gj (x)}

otherwise

(1.3)

For computational efficiency the violations vj (x) are used to compute a substitute for

d(x, F ) in the design of penalty functions that grow with the vector of violations

v(x) Rm , where m = p + q is the number of constraints to be penalized. At this

point it is easy to see that interior penalty techniques, in contrast to exterior ones,

require feasible solutions (which are often hard to find) thus explaining the high

popularity of the later.

The most popular penalty function is perhaps (Luenberger and Ye 2008)

P(x) =

m

(vj (x))2

(1.4)

j=1

Although it is conceptually easy to obtain the unconstrained problem, the definition of good penalty parameter(s) is usually a time-consuming, problem dependent,

trial-and-error process.

One must also note that even if both the objective function f (x) and distance to the

feasible set F (usually based on the constraint violations vj (x)) are defined for all

x, it is not possible to know in general which of the two given infeasible solutions is

closer to the optimum or should be operated upon or kept in the population. One can

have f (x1 ) > f (x2 ) and v(x1 ) = v(x2 ) or f (x1 ) = f (x2 ) and v(x1 ) > v(x2 )

and still have x1 closer to the optimum. Figure 1.1 illustrates these situations.

(a)

(b)

Fig. 1.1 Illustration of situations in which x1 is closer to the optimum (x ) than x2 even when:

a f (x1 ) = f (x2 ) and v(x1 ) > v(x2 ); or b f (x1 ) > f (x2 ) and v(x1 ) = v(x2 )

1.3 A Taxonomy

In order to organize the large amount of penalty techniques available in the literature

Coello (2002) proposed the following taxonomy: (a) static penalty, (b) dynamic

penalty, (c) annealing penalty, (d) adaptive penalty, (e) co-evolutionary penalty, and

(f) death penalty. We think however that the general definitions proposed by Eiben and

Smith (2003) with respect to the way strategy parameters are set within metaheuristics

in general and evolutionary algorithms in particular can be naturally adopted here.

Beyond the simplest static case where strategy parameters are defined by the user

and remain fixed during the run, dynamic schemes have been also used where an

exogenous schedule is proposed in order to define the strategy parameters at any

given point in the search process. It is easy to see that if setting fixed parameters is

not trivial, defining the way they should vary during the run seems to be even harder.

It is also felt that such strategy parameters should not be defined before the run but

rather vary according to what is actually happening in the search process. This gives

rise to the so-called adaptive techniques, where feedback from the search process is

used to define the current strategy parameters.

From the reasoning above, the death penalty can be included as a particular case

of static penalty, and the annealing penalty can be seen as a dynamic penalty scheme.

Co-evolutionary penalty techniques are considered in Sect. 1.5.2.

It should be noted here that the design of the adaptive mechanisms mentioned

above often involve meta-parameters or, at least, implicit design choices. The rationale here is that such meta-parameters should be easier to set appropriately; preferably fixed by the designer, with no posterior user intervention required. However,

the parameter setting in some adaptive techniques can be as hard as in the case of the

static ones (Coello 2002), contradicting the main objective of the adaptive penalty

methods.

Finally, an even more ambitious proposal can be found in the literature: the selfadaptive schemes. In this case, strategy parameters are coded together with the

candidate solution, and conditions are created so that the evolutionary algorithm

not only evolves increasingly better solutions but also better adapted strategy parameters. With this increasing sophistication in the design of the algorithms one not

only seeks to improve performance but also to relieve the user from the task of

strategy parameter setting and control.

However, as will be shown in the next section, another possibility, which has not

been contemplated in the taxonomy considered above, can be found in the literature

for the task of automatically setting strategy parameters. The idea is to maintain an

additional population with the task of co-evolving such strategy parameters (here

penalty coefficients) along with the standard population evolving the solutions to the

constrained optimization problem at hand.

In this section some selected papers from the literature are reviewed in order to

provide an overview of the diversity of techniques proposed for automatically setting

parameters involved in the various penalty schemes for constrained optimization.

Such techniques not only intend to relieve the user from the task of parameter setting

for each new application but also to improve the final performance in the case at hand

by adapting the values of those parameters along the search process in a principled

way. Table 1.1 presents a summary of the adaptive penalty techniques cited in this

section. Some references are not included in the table as their work extends a previous

one but do not require any additional information.

The main lines of reasoning have been identified and a few representative

proposals of each line have been grouped together in the following subsections.

A procedure where the penalty parameters change according to information gathered

during the evolution process was proposed by Bean and Alouane (1992). The fitness

function is again given by (1.1) but with the penalty parameter k = (t) adapted at

each generation by the following rules:

1

1 (t),

(t + 1) = 2 (t),

(t)

if bi F

if bi F

otherwise

for all t g + 1 i t

for all t g + 1 i t

1 , 2 > 1. In this method the penalty parameter of the next generation (t + 1)

decreases when all best elements in the last g generations were feasible, increases if

all best elements were infeasible and otherwise remains without change.

Table 1.1 Summary of the adaptive penalty techniques described here

Reference

Used information

Bean and Alouane (1992)

Coit et al. (1996)

Degree of infeasibility

Difference between the fitnesses of the best and best

feasible individuals

Hamida and Schoenauer (2000)

Percentage of feasible individuals

Ratio between the sum of the objective function values

and constraint violations

Nanakorn and Meesomklin (2001) Mean of the objective function values of the feasible

solutions

Beaser et al. (2011)

Average of the objective function values

Degree of infeasibility

Barbosa and Lemonge (2003b)

Average of the objective function values

Lemonge and Barbosa (2004)

Average of the violation values of each constraint

Rocha and Fernandes (2009)

Farmani and Wright (2003)

Normalized violation values

Objective function value of the worst solution

Lin and Wu (2004)

Percentage of feasible solutions with respect to each

constraint

Rate between the objective function value and a given

constraint violation

Fitness of the best solution

Number of objective function evaluations

Difference between the medians of the objective function

values of feasible and inFeasible solutions

Ratio of the previous value and the median of the

Constraint violations

Tessema and Yen (2006, 2009)

Percentage of feasible solutions

Average of the normalized constraint violation values

Normalized objective function value

Wang et al. (2009)

Degree of infeasibility

Percentage of feasible solutions

Gan et al. (2010)

Percentage of Feasible solutions

Costa et al. (2013)

Degree of infeasibility

Objective function value of the worst solution

Constraint violation of the equality constraints for the best

solution

Vincenti et al. (2010)

Objective function value of the best feasible solution

Montemurro et al. (2013)

Objective function value of the best infeasible solution

Difference between the two previous values

Ratio between the previous difference and the violation

value of each constraint

The method proposed by Coit et al. (1996), uses the fitness function F(x) written as

F(x) = f (x) + (Ffeas Fall )

m

dj (x, F ) Kj

j=1

NFTj

where f (x) is the unpenalized objective function for the solution x, Fall corresponds to

the best solution already found, Ffeas corresponds to the best feasible solution already

found, and dj (x, F ) returns the distance between x and the feasible region (dependent

of the problem). Kj and NFTj , the near-feasible threshold of the jth constraint, are

user-defined parameters.

Rasheed (1998) proposed an adaptive penalty approach for handling constraints

within a GA. The strategy required the user to set a relatively small penalty parameter

and then it would increase or decrease it on demand as the optimization progresses.

The method was tested in a realistic continuous-variable conceptual design of a

supersonic transport aircraft, and the design of supersonic missile inlets, as well as

in benchmark engineering problems. The fitness of each individual was based on the

sum of an adequate measure of merit computed by a simulator (such as the take-off

mass of an aircraft). If the fitness value is between V and 10V , where V is a power of

V

. The proposed algorithm featured

10, the penalty coefficient starts with the value 100

two points: (i) the individual that has the least sum of constraint violations and

(ii) the individual that has the best fitness value. The penalty coefficient is considered

adequate if both individuals are the same and otherwise the penalty coefficient is

increased to make the two solutions have equal fitness values. The author concluded

that the idea of starting with a relatively small initial penalty coefficient and increasing

it or decreasing it on demand proved to be very good in the computational experiments

conducted.

Hamida and Schoenauer (2000) proposed an adaptive scheme named as Adaptive

Segregational Constraint Handling Evolutionary Algorithm (ASCHEA) employing:

(i) a function of the proportion of feasible individuals in the population; (ii) a seduction/selection strategy to mate feasible and infeasible individuals applying a specific

feasibility-oriented selection operator, and (iii) a selection scheme to give advantage

for a given number of feasible individuals. The ASCHEA algorithm was improved

(Hamida and Schoenauer 2002) by considering a niching technique with adaptive

radius to handle multimodal functions and also (i) a segregational selection that distinguishes between feasible and infeasible individuals, (ii) a constraint-driven recombination, where in some cases feasible individuals can only mate with infeasible ones,

and (iii) a population-based adaptive penalty method that uses global information

on the population to adjust the penalty coefficients. Hamida and Schoenauer (2002)

proposed the following penalty function:

P(x) =

m

j=1

j vj (x)

(1.5)

where j is adapted as

j (t + 1) = j (t)/fact

j (t + 1) = j (t) fact

otherwise

(1.6)

where fact > 1 and target are to be defined by the user (although the authors suggest

target = 0.5), and t (j) is the proportion of individuals which do not violate the

jth constraint. The idea is to have feasible and infeasible individuals on both sides

of the corresponding boundary. The adapted parameter j , with initial value j (0),

are computed using the first population, trying to balance objective function and

constraint violations:

j (0) = 1,

if

vj (xi ) = 0

i

(1.7)

j (0) =

ni |f (xi )| 100, otherwise

|v (x )|

i

The early proposals reviewed here were not able in general to adequately deal with

the problem, suggesting that more information from the search process, at the price

of added complexity, was required.

Nanakorn and Meesomklin (2001) proposed an adaptive penalty function for a GA

that is able to adjust itself during the evolutionary process. According to that method,

the penalty is such that

/F

F(x) (t)favg for x

(1.8)

where favg represents the average fitness value of all feasible individuals in the current

generation and (t) depends on favg . Thus, the fitness function is defined as

F(x) = f (x) (t)E(x)

(1.9)

where

E(x) =

m

vi (x)

(1.10)

i=1

f (x) (t)favg

.

(t) = max 0, max

E(x)

x

/F

(1.11)

10

The function (t) is defined according to the user defined parameter . If 1 then

(t) =

(C 1) favg

(1.12)

where C is a user defined parameter which is the maximum scaled fitness value

assigned to the best feasible member. The scaled fitness values are used only in the

selection procedure and will not be described here.

Otherwise (if < 1), then (t) is defined by an iterative process which is initialized with (t) = 1 and is repeated until the value of (t) becomes unchanging. The

steps of the procedure are

(i) to calculate by means of Eq. (1.11)

(ii) to evaluate the candidate solutions according to Eq. (1.9)

(iii) to obtain xmin and x , where Fmin = F (xmin ) is the minimum value of F and

x is the candidate solution that leads to

(t) =

F (x ) (t) favg

E (x )

(1.13)

(t) =

( 1) E (xmin ) F (x ) + E (x ) F (xmin ) + favg F (xmin )

favg [E (x ) + ( 1) E (xmin )]

(1.14)

Beaser et al. (2011) updates the adaptive penalty function theory proposed by

Nanakorn and Meesomklin (2001), expanding its validity beyond maximization

problems to minimization as well. The expanded technique, using a hybrid genetic

algorithm, was applied to a problem in chemistry.

The first modification was introduced in the Eq. (1.8):

/F

F(x) (t)favg for x

(1.15)

f (x) (t)favg

(t) = min 0, min

E(x)

x

/F

(1.16)

in the form of an adaptive penalty function. The method decides which individual

is maintained in a Pareto optimal set and decides which individuals are going to be

replaced. The fitness function in this strategy is written as usual:

F(x) = f (x) + C G(x).

(1.17)

11

population and C is designed as a function of rf , i.e., C(rf ), and two basic rules

need to be satisfied: (1) It should be a decreasing function, because the coefficient C

decreases as rf increases and, (2) When rf varies from 0 to 1, C decreases sharply

from a large number at the early stage, and decreases slowly to a small number at

the late stage. The reason is that, with rf increasing (it means that there are more

and more feasible solutions in the population), the search emphasis should shift from

low constraint violations to good objective function values quickly. The proposed

function that satisfies these two rules is expressed as C(rf ) = 10(1rf ) , where is

a positive constant coefficient to be adjusted, and the fitness function is rewritten as

F(x) = f (x) + 10(1rf ) G(x)

(1.18)

Besides, two properties are established: (1) the fitness assignment maps the twodimensional vector into the real number space: in this way, it is possible to

compare the solutions in the Pareto optimal set, selecting which one is preferable

and (2) the penalty coefficient C varies with the feasibility proportion of the current

population and, if there are no feasible solutions in the population, this parameter

will receive a relatively large value in order to guide the population in the direction

of the feasible space.

The common need for user-defined parameters together with the difficulty of

finding adequate parameter values for each new application pointed the way to the

challenge of designing penalty techniques which do not require such parameters.

A parameterless adaptive penalty scheme for GAs was proposed by Barbosa and

Lemonge (2003b), which does not require the knowledge of the explicit form of the

constraints as a function of the design variables and is free of parameters to be set

by the user. In contrast with other approaches where a single penalty parameter is

used for all constraints, an adaptive scheme automatically sizes the penalty parameter

corresponding to each constraint along the evolutionary process. The fitness function

proposed is written as

if x is feasible,

f (x), m

(1.19)

F(x) = f (x) +

kj vj (x), otherwise

j=1

vj (x)

2

l=1 [vl (x)]

kj = |f (x)|

m

(1.20)

12

where f (x) is the average of the objective function values in the current population

and vl (x) is the violation of the lth constraint averaged over the current population.

The idea is that the values of the penalty coefficients should be distributed in a way

that those constraints that are more difficult to be satisfied should have a relatively

higher penalty coefficient.

With the proposed definition one can prove the following property: an individual

whose jth violation equals the average of the jth violation in the current population

for all j, has a penalty equal to the absolute value of the average fitness function of

the population.

The performance of the APM was examined using test problems from the evolutionary computation literature as well as structural engineering constrained optimization problems but the algorithm presented difficulties in solving some benchmark

problems, for example, the functions G2 , G6 , G7 and G10 proposed by Michalewicz

and Schoenauer (1996). That was improved in the conference paper (Barbosa and

Lemonge 2002), where f (x) in the definition of the objective function of the infeasible

individuals in Eq. (1.19) was changed to

f (x) =

f (x),

f (x)

otherwise

(1.21)

and f (x) is the average of the objective function values in the current population.

The new version was tested (Lemonge and Barbosa 2004) in benchmark engineering

optimization problems and in the GSuite (Michalewicz and Schoenauer 1996) with

a more robust performance.

The procedure proposed by Barbosa and Lemonge (2002), originally conceived

for a generational GA, was extended to the case of a steady-state GA (Barbosa and

Lemonge 2003a), where, in each generation, usually only one or two new individuals are introduced in the population. Substantial modifications were necessary to

obtain good results in a standard test-problem suite (Barbosa and Lemonge 2003a).

The fitness function for an infeasible individual is now computed according to the

equation:

m

kj vj (x)

(1.22)

otherwise

(1.23)

F(x) = H +

j=1

where H is defined as

f (xworst )

H=

f (xbestFeasible )

vj (x)

2

l=1 [vl (x)]

kj = H

m

(1.24)

13

Also, every time a better feasible element is found (or the number of new elements

inserted into the population reaches a certain level), H is redefined and all fitness

values are recomputed. The updating of each penalty coefficient is performed in

such a way that no reduction in its value is allowed. The fitness function value is

then computed using Eqs. (1.22)(1.24). It is clear from the definition of H in (1.23)

that if no feasible element is present in the population, one is actually minimizing

a measure of the distance of the individuals to the feasible set, since the actual

value of the objective function is not taken into account. However, when a feasible

element is found then it immediately enters the population as, after updating all

fitness values using (1.19), (1.23), and (1.24), it becomes the element with the best

fitness value.

Later, APM variants were introduced with respect to the definition of the penalty

parameter kj (Barbosa and Lemonge 2008). The APM, as originally proposed,

computes the constraint violations in the initial population, and updates all penalty

coefficients, for each constraint, after a given number of offspring is inserted in

the population. A second variant, called sporadic APM with constraint violation

accumulation, accumulates the constraint violations during a given number of insertions of new offspring in the population, updates the penalty coefficients, and keeps

the penalty coefficients for the next generations. The APM with monotonic penalty

coefficients is the third variant, where the penalty coefficients are calculated as in

the original method, but no penalty coefficient is allowed to have its value reduced

along the evolutionary process. Finally, the penalty coefficients are defined by using

a weighted average between the previous value of a coefficient and the new value

predicted by the method. This variant is called the APM with damping. Besides that,

these variants of the APM were extended to the steady-state GA and presented in

Lemonge et al. (2012).

Rocha and Fernandes (2009) proposed alternative expressions for the APM

penalty coefficients

pop

pop

vj (x i )

i

f (x )

m i=1

kj =

pop

k=1 i=1 vk (x i )

i=1

and also

pop

pop

l

i)

v

(x

j

kj =

f (x i ) exp

m i=1

1

pop

i

k=1

i=1 vk (x )

i=1

Farmani and Wright (2003) introduced a parameterless adaptive technique that

uses information about the degree of infeasibility of solutions written as

1 vj (x)

m

vjmax

m

u(x) =

j=1

(1.25)

14

where m is the total number of inequality and equality constraints, and vjmax

is the maximum value of the jth violation in the current population. The xworst of the

infeasible solutions is selected by comparing all infeasible individuals against the

best individual xbest . Two potential population distributions exist in relation to this:

(i) if one or more of the infeasible solutions have an objective function value that

is lower than the f (xbest ), the f (xworst ) of the infeasible solutions is taken as the

infeasible solution having the highest infeasibility value and an objective function

value that is lower than the f (xbest ) solution. If more than one individual exists

with the same highest degree of infeasibility, then f (xworst ) is taken as the solution

with maximum infeasibility value and the lower of the objective function values,

and (ii) when all of the infeasible solutions have an objective function value that is

greater than f (xbest ). Thus, f (xworst ) is identified as being the solution with the highest degree of infeasibility value. Having more than one individual in the population

with the same highest infeasibility value, then f (xworst ) is taken as the solution with

the maximum infeasibility value and the higher of the objective function values. The

highest objective function value in the current population to penalize the infeasible

individuals is defined as fmax . The method is applied in two stages where the first

stage considers the case where one or more infeasible solutions have a lower and

potentially

better objective

problem) than the xbest solution

function (minimization

and x| f (x) < f (xmax ) (u(x) > 0.0) . A linear relationship between the degree

of infeasibility of the xbest and xworst is considered as

u (x) =

u(x) u(xworst )

u(xbest ) u(xworst )

(1.26)

Thus, the fitness function F1st (x), in the first stage, is written as

F1st (x) = f (x) + u (x)(f (xmax ) f (xworst ))

(1.27)

The second stage increases the objective function such that the penalized objective

function of the worst infeasible individual F2nd (x) is equal to the worst objective

individual (Eqs. (1.28) and (1.29)).

exp(2.0u(x)) 1

F2nd (x) = F1st (x) + |F1st (x)|

exp(2.0) 1

(1.28)

and

f (xmax )f (xbest )

f (xbest )

= 0,

f (xmax )f (xworst ) ,

f (xworst )

if f (xworst ) f (xbest )

if f (xworst ) = f (xmax ).

if f (xworst ) > f (xbest )

(1.29)

The scaling factor , is introduced to ensure that the penalized value of worst

infeasible solution is equivalent to the highest objective function value in the current

population. = 0 (second case in Eq. (1.29)) is used when the worst infeasible

15

individual has an objective function value equal to the highest in the population. In this

case, no penalty is applied since the infeasible solutions would naturally have a low

fitness and should not be penalized further. The use of absolute values of the fitness

function in Eq. (1.29) is considered since minimization of objective functions may

have negative values. The use of absolute values of the fitness function in Eq. (1.29)

is considered since minimization of objective functions may have negative values.

A self-organizing adaptive penalty strategy (SOAPS) is presented in Lin and Wu

(2004) featuring the following aspects: (1) The values of penalty parameters are

automatically determined according to the population distribution; (2) The penalty

parameter for each constraint is independently determined; (3) The objective and

constraint functions are automatically normalized; (4) No parameters need to be

defined by the user; (5) Solutions are maintained evenly distributed in both feasible

and infeasible parts of each constraint. The pseudo objective function defined by

the proposed algorithm is given as

F(x) = f (x) + P(x)

(1.30)

1

100 + t

rjt vj (x)

100

p + 2q

m

P(x) =

(1.31)

j=1

where t is the generation, rjt is the penalty parameter for the jth constraint at generation t, and p and q are the number of inequality and equality constraints, respectively.

The penalty parameter rjt for the jth constraint at the tth generation is set as

t (j) 0.5

,

1

5

rjt

rjt1

t1

(1.32)

where t (j) is the percentage of feasible solutions with respect to the jth constraint

at the tth generation. This parameter will be adapted during the evolutionary process

and its initial value is set as

rj0

1

QRobj

=

1

QRcon

j

(1.33)

1

1

and QRcon

where QRobj

j are the interquartile ranges of the objective function and

the jth constraint function values, respectively, in the initial population.

Although the proposed algorithm performed satisfactorily on constrained optimization problems with inequality constraints, it had difficulties in solving problems

with equality constraints. The authors presented in the same paper (Wu and Lin

2004) a modification (with added complexity) of the first version of the algorithm.

They detected that the initial penalty parameter for a constraint may become undesirably large due to the poor initial population distribution. A sensitivity analysis of

16

the parameter rj0 was done by the authors and they concluded that enlarged penalties

undesirably occur because solutions with these unexpected large constraint violations

are not evenly sampled in the initial population. The value for F(x) in the second

generation of SOAPS is written as

F(x) =

f (x),

f (x) (1 rGEN ) + FBASE rGEN + P(x)

if x F

otherwise

(1.34)

where FBASE means the minimum value of all feasible solutions or, in the absence

of them, the infeasible solutions with the smallest amount of constraint violation.

The value of rGEN is given by the number of function evaluations performed so far

divided by the total number of function evaluations. The expression for P(x) is

P(x) =

rj vj (x)

(1.35)

rj0

med1obj,feasj med1obj,infeasj

med1

if med1obj,feasj med1obj,infeasj

0.5

otherwise

conj

med1obj,infeasj med1obj,feasj

med1conj

(1.36)

where med1obj,feasj is the median of the objective function value of the feasible solutions, and med1obj,infeasj is the median of all infeasible solutions with respect to the

jth constraint, in the initial population. The value med1conj represents the median of

all constraint violations of the jth constraint in the initial population. The value of

medobj,feas , used in Eq. (1.36), is written as

medobj,feas = med,feas = med,infeas

= medobj,infeas + r medcon

(1.37)

where med,feas is the median of the pseudo-objective function values of feasible designs, and med,infeas is the median of the pseudo-objective function values

of infeasible designs. The latter, med,infeas consists of medobj,infeas the median

of objective function values of all infeasible designs and medcon , the median of

constraint violations of all infeasible designs. The second generation of SOAPS was

tested in two numerical illustrative problems and one engineering problem.

Tessema and Yen (2006) proposed an adaptive penalty function for solving constrained optimization problems using a GA. A new fitness value, called distance

value, in the normalized fitness-constraint violation space, and two penalty values

are applied to infeasible individuals so that the algorithm would be able to identify the best infeasible individuals in the current population. The performance of the

algorithm was tested on the G1 to G13 test-problems and the algorithm was considered

able to find competitive results when compared with others from the literature.

17

In (Tessema and Yen 2009) an algorithm that aims to exploit infeasible individuals

with low objective value and low constraint violation was proposed. The fraction

of feasible individuals in the population is used to guide the search process either

toward finding more feasible individuals or searching for the optimum solution. The

objective function of all individuals in the current population will be evaluated first,

and the smallest and the largest values will be identified as fmin and fmax , respectively.

The fitness function of each individual is normalized as

f (x) fmin

f (x) =

fmax fmin

(1.38)

Eq. (1.25) and the modified fitness function is then written as

f (x),

u(x),

if

there is no feasible ind.

F(x) =

f

f

where rf [0, 1] is the fraction of feasible individuals in the population, and u(x) is

the average of the normalized violations (vj (x)).

A hybrid evolutionary algorithm and an adaptive constraint-handling technique

is presented by Wang et al. (2009). The hybrid evolutionary algorithm simultaneously uses simplex crossover and two mutation operators to generate the

offspring population. The proposed method operates upon three types of population:

(1) a population that contains only infeasible solutions, (infeasible situation), (2) a

population that contains feasible and infeasible solutions, (semi-feasible situation),

and (3) a population

Denoting G(x) = m

j=1 Gj (x) as the degree of constraint violation of the individual

x, one has

1. Infeasible situation: the constrained optimization problem is treated as a constraint satisfaction problem. Thus, finding feasible solutions is the most important objective in this situation. To achieve this, the constraint violations G(x) of

the individuals in the population, and the objective function f (x) is disregarded

completely. First, the individuals in the parent population are ranked based on

their constraint violations in ascending order, and then the individuals with the

least constraint violations are selected and form the offspring population.

2. Semi-feasible situation: the population is divided into the feasible group K1 and

the infeasible group K2 . After that, the best feasible xbest and the worst feasible

solutions xworst are identified from the feasible group K1 . Then, the objective

function f (x) of a candidate solution is written as

f (xi ) =

f (xi ),

max {f (xbest ) + (1 )f (xworst ), f (xi )}

if xi K1

if xi K2

(1.39)

18

The normalized objective function is obtained using the Eq. (1.38). Also, the

normalized constraints are written as

0,

if xi K1

G(x

)

min

G(x)

i

i) =

(1.40)

G(x

xK2

xK2

xK2

If only one infeasible solution appears in the population, the normalized constraint

of such individual will always be equal to 0. To avoid it, the normalized

violation G

of such individual is set to a value uniformly chosen between

constraint violation G

0 and 1. The fitness function is defined by adding the normalized objective function

values and constraint violations and defined as

i)

F(xi ) = f (xi ) + G(x

(1.41)

3. Feasible situation: in this case, the comparisons of individuals are based only on

the objective function f (x).

Costa et al. (2013) proposed an adaptive constraint handling technique where the

fitness function of an infeasible individual is defined as

F(x) = fmax +

m

vj (x)

(1.42)

j=1

and vj (x) is defined as in Eq. (1.3). An adaptive tolerance was introduced in order

to handle equality constraints. An initial tolerance 0 is defined and it is adaptively

updated along the evolutionary process, with a periodicity of generations, according

to the expression:

k+1 = k + (1 )Cbest 2

(1.43)

where is a smoothing factor, Cbest is the vector of equality constraints for the best

point in the population, and 2 is the Euclidean norm.

A parameterless adaptive penalty technique used within a GA has been proposed

in Vincenti et al. (2010), Montemurro et al. (2013) where the basic idea is that some

good infeasible individuals (in the sense of having good objective function values)

can be useful to attract the exploration toward the boundary of the feasible domain,

as the optimum usually has some active constraints. The penalty coefficients ci and qj

(for equality and inequality constraints, respectively) are computed at each generation

t as

F

F

NF

NF

f

f

best fbest

best fbest

i = 1, . . . , q and qj (t) = NF

j = 1, . . . , p

ci (t) =

(gi )NF

hj best

best

(1.44)

19

where the superscripts F and NF stand for feasible and non-feasible, respectively.

F and f NF are the values of the objective function for the best individuals within

fbest

best

the feasible and the infeasible sides of the domain, respectively, while (gi )NF

best and

NF

hj best represent the violation of inequality and equality constraints, respectively,

for the best infeasible solution.

Individuals that are infeasible with respect to the kth constraint are grouped and

ranked with respect to their objective function values: the objective function of the

NF

while the individuals that are feasible with

best individual of such a group is fbest

respect to the kth constraint are grouped and ranked with respect to their objective

F .

function values: the objective function of the best individual of this group is fbest

When no feasible individuals are available in the population with respect to the

kth constraint, the population is then sorted into two groups: individuals having

smaller values of the kth constraint violation (10 % of the population) are grouped

as virtually feasible while the rest are grouped as infeasible and ranked in terms of

their objective function values: the objective function of the best individual of such

NF

.

a group is fbest

It is worth noting that the definition in Eq. (1.44) forces the value of the objective

function of the best infeasible individual to be equal to that of the best feasible

individual. In the next section, further (perhaps less popular) ways of implementing

penalty techniques are briefly described.

1.5.1 Self-adapting the Parameters

The direct implementation of a standard self-adaptive penalty technique (following

Eiben and Smith (2003)) would entail the encoding of one (or more) penalty

coefficients in the same chromosome where the candidate solution is encoded. They

are then subject to the evolutionary process, undergoing recombination and mutation

just as the problem variables in the chromosome. However, evolution would discover that the best strategy is to drive down all penalty coefficients of an individual

to zerothus eliminating any reduction in the fitness of the corresponding candidate

solutionand actually finding the solution of the unconstrained problem (Eiben and

Smith 2003).

Eiben et al. (2000) proposed a scheme to prevent EAs from cheating when

solving constraint satisfaction problems (CSPs). When solving CSPs by means of

EAs, weights are associated with each constraint to add a penalty to the individual

if that constraint is not satisfied. Changes in the weights along the run will cause

the EA to put more pressure into the satisfaction of the corresponding constraint.

Eiben et al. introduced a tournament selection that uses the maximum of each of

the weights, across all competitors, as a way to eliminate cheating in the CSP case,

without resorting to any feedback mechanism from the search process. Unfortunately,

20

to the best of our knowledge, no strict self-adaptive technique has been applied so

far to constrained optimization problems in Rn .

Coello (2000) introduced a co-evolutionary algorithm to adapt the penalty coefficients of a fitness function in a GA with two populations P1 (size M1 ) and P2

(size M2 ). The fitness function is written as

F(x) = f (x)k (sum_viol(x) w1 + num_viol(x) w2 )

(1.45)

where w1 and w2 are two (integer) penalty coefficients, and sum_viol(x) and

num_viol(x) are, respectively, the sum of the violations and the number of constraints

which are violated by the candidate solution x. The second of these populations, P2 ,

encodes the set of weight combinations (w1 and w2 ) that will be used to compute

the fitness value of the candidate solutions in P1 whereas P2 contains the penalty

coefficients that will be used in the fitness function evaluation. Benchmark problems

from the literature, especially mechanical engineering optimization, are used in the

numerical tests but only inequality constraints were considered in the experiments.

The co-evolutionary idea was also analyzed in He and Wang (2007) and He et al.

(2008). In these works, the penalty factors are adapted by a co-evolutionary particle

swarm optimization approach (CPSO). Two kinds of swarms are used in He and

Wang (2007) and He et al. (2008): one population of multiple swarms is used to

solve the search problem and other one is responsible to adapt the penalty factors.

Each particle j in the second population represents the penalty coefficients for a set

of particles in the first one. The two populations evolve by a given G1 and G2 number

of generations. The adopted fitness function is the one proposed by Richardson et al.

(1989), where not only the amount of violation contributes to the quality of a given

candidate solution but also the number of of violated constraints. According to He

and Wang (2007) and He et al. (2008),

Fj (x) = f (x) + sum_viol(x) wj,1 + num_viol(x) wj,2 ,

where f (x) is the objective function value, and wj,1 and wj,2 are the penalty coefficients from the particle j in the second swarm population. The penalty factors wj,1

and wj,2 are evolved according to the following fitness:

sum_feas

num_feas

num_feas, if there is at least one feasible solution in the subset

pop

i

pop

G(j) =

i=1 sum_viol(x )

max(Gvalid ) +

pop

i=1 num_viol(x i ), otherwise,

num_viol(x i )

i=1

where sum_feas denotes the sum of objective function values of feasible solutions,

num_feas is the number of feasible individuals, and max(Gvalid ) denotes the maximum

21

G over all valid particles; the valid particles are those ones which operate over a subset

of particles where there is at least one feasible solution.

It is interesting to note that, despite all the effort that has been devoted to the research

of penalty techniques in the context of nature inspired metaheuristics in the last 20

years or so, the subject still draws the attention of the researchers, and new tools are

being constantly introduced to this arena. Fuzzy logic and rough set theory are just

two recent examples that will be mentioned in the following.

Wu etal. (2001) proposed a fuzzy penalty function strategy using information

contained in individuals. The fitness function of an infeasible individual is

F(x) = f (x) + rG(x)

(1.46)

where G(x) is the amount of constraint violation from inequality and equality constraints, and r is the penalty coefficient.

f and G are taken as fuzzy variables with the corresponding linguistic values such

as very large, large,

small, very small, etc. The ranges for f and G are defined by

Df = fmin , fmax and DG = [Gmin , Gmax ]. Those ranges must then be partitioned

which is a problem dependent, non-trivial taskand linguistic values are associated

with each part. The sets A and B are introduced as fuzzy sets for f and G, respectively,

and r k , k = 1, . . . , l is defined as a fuzzy singleton for r which is inferred from

appropriate membership functions and finally used in (1.46).

In their numerical experiments, three partitions were used for both f and G with

triangle membership functions, and five points were used for the output. The rule

base contained 9 rules in the form

If f is Ai and G is Bj then r = r k .

Lin (2013) proposed perhaps the first constraint-handling approach which applies

the information granulation of rough set theory to address the indiscernibility relation

among penalty coefficients in constrained optimization. Adaptive penalty coefficients

for each constraint wtk , k = 1, . . . , m were defined in a way that a high penalty is

assigned to the coefficient of the most difficult constraint. In addition, the coefficients

are also depended on the current generation number t. Using the standard definition

for the violation of the jth constraint (vj (x)), the fitness function reads as

F(x) = f (x) +

m

j=1

22

where wtk = (Ct)(k,t) and C is a severity factor. The exponent (k, t), initialized

as (k, 0) = 2 for all k, is defined as

(k, t 1) k , if k = 1

(k, t) =

(k, t 1)

if k = 0

according to the discernible mask and the representative attribute value k of the

superior class Xgood (see the paper for details). If the kth constraint is discernible

(i.e., k = 1), the exponent (k, t) is adjusted by the representative attribute value

(k ); otherwise, the exponent retains the same value as in the previous generation.

1.6 Discussion

1.6.1 User-Defined Parameters

Some of the proposals considered do not require from the user the definition of penalty

parameters, and can as such be considered parameterless. This is very useful for

the practitioner. However, it should be noted that essentially all proposals do embody

some fixed values that are hidden from the user and, as a result, cannot be changed.

Furthermore, all proposals involve design decisions which were madewith variable level of justificationand incorporated into the definition of the technique. It

seems natural to assume that some of those could possibly be changeda research

opportunityleading to improved results.

In order to test the performance of a constraint handling technique, several testproblems have been used over the years. The most popular suite of continuous

constrained optimization problems is that containing the 24 problems used for the

competition held during the 2006 IEEE Congress on Evolutionary Computation

which are described in Liang et al. (2006). Later, larger problems were considered

in another competition, held during the 2010 edition of the same conference. The

details can be found in Mallipeddi and Suganthan (2010).

It can be noticed that the claims concerning the performance of each proposal in

the papers reviewed have been deliberately omitted. This is due to several factors. One

of them is that a statistical study in order to assure a possible statistically significant

superiority of the proposed technique over others from the literature is often missing.

Another criticism is that often the claimed superiority of the proposed technique can

only be observed after the fourth or fifth significant digit of the final results, with

no consideration for the facts (i) that the original model itself may not have such

accuracy, and (ii) that the compared solutions may be indistinguishable from the

practical point of view.

23

Another major issue that makes it impossible to rigorously assess the relative

performance of the adaptive penalty techniques (APTs) reviewed is that the final

results depend not only on the penalty technique considered but also on the search

engine (SE) adopted. The competing results often derive from incomparable arrangements such as APT-1 embedded in SE-1 (a genetic algorithm, for instance) versus

APT-2 applied to SE-2 (an evolution strategy, for instance). The results using stochastic ranking (SR) within an evolution strategy (ES) (Runarsson and Yao 2000) were

shown to outperform APM embedded in a binary-coded genetic algorithm (GA)

(Lemonge and Barbosa 2004) when applied to a standard set of benchmark constrained optimization problems in Rn . This seems to be dueat least in partto the

fact that the ES adopted performs better in this continuous domain than a standard GA.

A proper empirical assessment of the constraint handling techniques considered

(SR versus APM) should be performed by considering settings such as (SR+GA versus APM+GA) and (SR+ES versus APM+ES). An attempt to clarify this particular

question is presented by Barbosa et al. (2010b). It is clear that there is a need for

more studies of this type in order to better assess the relative merits of the proposals

reviewed here.

The standard way of assessing the relative performance of a set A of na

algorithms ai , i {1, . . . , na }, is to define a set P of np representative problems pj ,

j {1, . . . , np }, and then test all algorithms against all problems, measuring the

performance tp,a of algorithm a A when applied to problem p P.

In order to evaluate tp,a one can alternatively (i) define a meaningful goal

(say, level of objective function value) and then measure the amount of resources

(say, number of function evaluations) required by the algorithm to achieve that goal,

or (ii) fix a given amount of resources to be allocated to each algorithm and then

measure the goal attainment.

Considering that tp,a is the CPU time spent by algorithm a to reach the stated goal

in problem p a performance ratio can be defined as

rp,a =

tp,a

.

min{tp,a : a A}

(1.47)

Although each tp,a or rp,a is worth considering by itself, one would like to be able

to assess the performance of the algorithms in A on a large set of problems P in a

user-friendly graphical form. This has been achieved by Dolan and Mor (2002) who

introduced the so-called performance profiles, an analytical tool for the visualization

and interpretation of the results of benchmark experiments. For more details and an

application in the constrained optimization case, see Barbosa et al. (2010a).

One has also to consider that it is not an easy task to define a set P which is

representative of the domain of interest, as one would like P (i) to span the target

problem-space and, at the same time, (ii) to be as small as possible, in order to

alleviate the computational burden of the experiments. Furthermore, it would also

be interesting to assess the relative performance of the test-problems themselves

with respect to the solvers. Are all test-problems relevant to the final result? Are

some test-problems too easy (or too difficult) so that they do not have the ability to

24

discriminate the solvers? Efforts in this direction, exploring the performance profile

concept, were attempted in Barbosa et al. (2013).

Although not always considered in the papers reviewed, the simplicity of the technique (both conceptually and from the implementation point of view) is relevant. It

seems quite desirable that the proposed technique could be easily implemented as

an additional module to any existing metaheuristic for unconstrained optimization

with a minimum interference with the current code. In this respect, techniques resorting to coevolution would typically require another population, an additional set of

parameters, and would lead to more interference and modifications to the original

code.

1.6.4 Extensions

It seems natural to expect that most of, if not all, the proposals reviewed here can

be easily extended to the practically important case of constrained multi-objective

optimization. Although papers presenting such extension have not been reviewed

here, it seems that there is room, and indeed a need, to explore this case.

The same can perhaps be said of the relevant case of mixed (discrete and

continuous) decision variables, as well as the more complex problem of constrained

multi-level optimization.

1.7 Conclusion

This chapter presented a review of the main adaptive penalty techniques available

for handling constraints within nature inspired metaheuristics in general and evolutionary techniques in particular. The main types of evidence taken from the search

process in order to inform the decision-making process of continuously adapting the

relevant parameters of the penalty technique have been identified.

As the different adaptive techniques have not been implemented on a single

given search engine, the existing comparative studies, which are usually based on

the final performance on a set of benchmark problems, are not very informative of the

relative performance of each penalty technique, as the results are also affected by the

different search engines adopted in each proposal. The need for better comparative

studies investigating the relative performance of the different adaptive techniques

when applied within a single search engine in larger and more representative sets of

benchmark problems are also identified.

25

Acknowledgments The authors thank the reviewers for their comments, which helped improve

the quality of the final version, and acknowledge the support from CNPq (grants 308317/2009-2,

310778/2013-1, 300192/2012-6 and 306815/2011-7) and FAPEMIG (grant TEC 528/11).

References

Barbosa HJC, Lemonge ACC (2002) An adaptive penalty scheme in genetic algorithms for constrained optimization problems. In: Langdon WB, Cant-Paz E, Mathias KE, Roy R, Davis D,

Poli R, Balakrishnan K, Honavar V, Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC,

Miller JF, Burke EK (eds) Proceedings of the genetic and evolutionary computation conference

(GECCO). Morgan Kaufmann, San Francisco

Barbosa HJC, Lemonge ACC (2003a) An adaptive penalty scheme for steady-state genetic algorithms. In: Cant-Paz E, Foster JA, Deb K, Davis LD, Roy R, OReilly U-M, Beyer H-G, Standish

R, Kendall G, Wilson S, Harman M, Wegener J, Dasgupta D, Potter MA, Schultz AC, Dowsland

KA, Jonoska N, Miller J (eds) Genetic and evolutionary computation (GECCO). Lecture Notes

in Computer Science. Springer, Berlin, pp 718729

Barbosa HJC, Lemonge ACC (2003b) A new adaptive penalty scheme for genetic algorithms. Inf

Sci 156:215251

Barbosa HJC, Lemonge ACC (2008) An adaptive penalty method for genetic algorithms in constrained optimization problems. Front Evol Robot 34:934

Barbosa HJC, Bernardino HS, Barreto AMS (2010a) Using performance profiles to analyze the

results of the 2006 CEC constrained optimization competition. In: 2010 IEEE congress on evolutionary computation (CEC), pp 18

Barbosa HJC, Lemonge ACC, Fonseca LG, Bernardino HS (2010b) Comparing two constraint

handling techniques in a binary-coded genetic algorithm for optimization problems. In: Deb K,

Bhattacharya A, Chakraborti N, Chakroborty P, Das S, Dutta J, Gupta SK, Jain A, Aggarwal V,

Branke J, Louis SJ, Tan KC (eds) Simulated evolution and learning. Lecture Notes in Computer

Science. Springer, Berlin, pp 125134

Barbosa HJC, Bernardino HS, Barreto AMS (2013) Using performance profiles for the analysis

and design of benchmark experiments. In: Di Gaspero L, Schaerf A, Stutzle T (eds) Advances in

metaheuristics. Operations Research/computer Science Interfaces Series, vol 53. Springer, New

York, pp 2136

Bean J, Alouane A (1992) A Dual Genetic Algorithm For Bounded Integer Programs. Technical Report Tr 92-53, Department of Industrial and Operations Engineering, The University of

Michigan

Beaser E, Schwartz JK, Bell CB, Solomon EI (2011) Hybrid genetic algorithm with an adaptive

penalty function for fitting multimodal experimental data: application to exchange-coupled nonKramers binuclear iron active sites. J Chem Inf Model 51(9):21642173

Coello CAC (2000) Use of a self-adaptive penalty approach for engineering optimization problems.

Comput Ind 41(2):113127

Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191

(1112):12451287

Coit DW, Smith AE, Tate DM (1996) Adaptive penalty methods for genetic optimization of constrained combinatorial problems. INFORMS J Comput 8(2):173182

Costa L, Santo IE, Oliveira P (2013) An adaptive constraint handling technique for evolutionary

algorithms. Optimization 62(2):241253

Courant R (1943) Variational methods for the solution of problems of equilibrium and vibrations.

Bull Am Math Soc 49:123

26

Dolan E, Mor JJ (2002) Benchmarking optimization software with performance profiles. Math

Program 91(2):201213

Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, New York

Eiben AE, Jansen B, Michalewicz Z, Paechter B (2000) Solving CSPs using self-adaptive constraint

weights: how to prevent EAs from cheating. In: Whitley, LD (ed) Proceedings of the genetic and

evolutionary computation conference (GECCO). Morgan Kaufmann, San Francisco, pp 128134

Farmani R, Wright J (2003) Self-adaptive fitness formulation for constrained optimization. IEEE

Trans Evol Comput 7(5):445455

Gan M, Peng H, Peng X, Chen X, Inoussa G (2010) An adaptive decision maker for constrained

evolutionary optimization. Appl Math Comput 215(12):41724184

Gen M, Cheng R (1996) Optimal design of system reliability using interval programming and

genetic algorithms. Comput Ind Eng, (In: Proceedings of the 19th international conference on

computers and industrial engineering), vol 31(12), pp 237240

Hamida H, Schoenauer M (2000) Adaptive techniques for evolutionary topological optimum design.

In: Parmee I (ed) Proceedings of the international conference on adaptive computing in design

and manufacture (ACDM). Springer, Devon, pp 123136

Hamida S, Schoenauer M (2002) ASCHEA: new results using adaptive segregational constraint

handling. In: Proceedings of the IEEE service center congress on evolutionary computation

(CEC), vol 1. Piscataway, New Jersey, pp 884889

Harrell LJ, Ranjithan SR (1999) Evaluation of alternative penalty function implementations in a

watershed management design problem. In: Proceedings of the genetic and evolutionary computation conference (GECCO), vol 2. Morgan Kaufmann, pp 15511558

He Q, Wang L (2007) An effective co-evolutionary particle swarm optimization for constrained

engineering design problems. Eng Appl Artif Intell 20(1):8999

He Q, Wang L, zhuo Huang F (2008) Nonlinear constrained optimization by enhanced coevolutionary PSO. In: IEEE congress on evolutionary computation, CEC 2008. (IEEE World

Congress on Computational Intelligence), pp 8389

Hughes T (1987) The finite element method: linear static and dynamic finite element analysis.

Prentice Hall Inc, New Jersey

Koziel S, Michalewicz Z (1998) A decoder-based evolutionary algorithm for constrained parameter

optimization problems. In: Eiben A, Bck T, Schoenauer M, Schwefel H-P (eds) Parallel problem

solving from nature (PPSN). LNCS, vol 1498. Springer, Berlin, pp 231240

Krempser E, Bernardino H, Barbosa H, Lemonge A (2012) Differential evolution assisted by surrogate models for structural optimization problems. In: Proceedings of the international conference

on computational structures technology (CST). Civil-Comp Press, p 49

Lemonge ACC, Barbosa HJC (2004) An adaptive penalty scheme for genetic algorithms in structural

optimization. Int J Numer Methods Eng 59(5):703736

Lemonge ACC, Barbosa HJC, Bernardino HS (2012) A family of adaptive penalty schemes for

steady-state genetic algorithms. In: 2012 IEEE congress on evolutionary computation (CEC).

IEEE, pp 18

Liang J, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan P, Coello CC, Deb K (2006) Problem

definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter

optimization. Technical report, Nanyang Technological University, Singapore

Lin C-H (2013) A rough penalty genetic algorithm for constrained optimization. Inf Sci 241:

119137

Lin C-Y, Wu W-H (2004) Self-organizing adaptive penalty strategy in constrained genetic search.

Struct Multidiscip Optim 26(6):417428

Luenberger DG, Ye Y (2008) Linear and nonlinear programming. Springer, New York

Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010

competition on constrained real-parameter optimization. Technical report, Nanyang Technological University, Singapore

Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194

27

methods. In: Proceedings of the 4th annual conference on evolutionary programming. MIT Press,

pp 135155

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132

Montemurro M, Vincenti A, Vannucci P (2013) The automatic dynamic penalisation method (ADP)

for handling constraints with genetic algorithms. Comput Methods Appl Mech Eng 256:7087

Nanakorn P, Meesomklin K (2001) An adaptive penalty function in genetic algorithms for structural

design optimization. Comput Struct 79(2930):25272539

Puzzi S, Carpinteri A (2008) A double-multiplicative dynamic penalty approach for constrained

evolutionary optimization. Struct Multidiscip Optim 35(5):431445

Rasheed K (1998) An adaptive penalty approach for constrained genetic-algorithm optimization.

In: Koza J, Banzhaf W, Chellapilla K, Deb K, Dorigo M, Fogel D, Garzon M, Goldberg D,

Iba H, Riolo R (eds) Proceedings of the third annual genetic programming conference. Morgan

Kaufmann, San Francisco, pp 584590

Richardson JT, Palmer MR, Liepins GE, Hilliard M (1989) Some guidelines for genetic algorithms

with penalty functions. In: Proceedings of the international conference on genetic algorithms.

Morgan Kaufmann, San Francisco, pp 191197

Rocha AMAC, Fernandes EMDGP (2009) Self-adaptive penalties in the electromagnetism-like

algorithm for constrained global optimization problems. In: Proceedings of the 8th world congress

on structural and multidisciplinary optimization, Lisbon, Portugal

Runarsson T, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE

Trans Evol Comput 4(3):284294

Salcedo-Sanz S (2009) A survey of repair methods used as constraint handling techniques in evolutionary algorithms. Comput Sci Rev 3(3):175192

Schoenauer M, Michalewicz Z (1996) Evolutionary computation at the edge of feasibility. In:

Proceedings of parallel problem solving from nature (PPSN). LNCS, Springer, pp 245254

Tessema B, Yen GG (2006) A self adaptive penalty function based algorithm for constrained optimization. In: IEEE congress on evolutionary computation, CEC 2006. IEEE, pp 246253

Tessema B, Yen G (2009) An adaptive penalty formulation for constrained evolutionary optimization. IEEE Trans Syst, Man Cybern, Part A: Syst Hum 39(3):565578

Vincenti A, Ahmadian MR, Vannucci P (2010) BIANCA: a genetic algorithm to solve hard combinatorial optimisation problems in engineering. J Glob Optim 48(3):399421

Wang Y, Cai Z, Zhou Y, Fan Z (2009) Constrained optimization based on hybrid evolutionary

algorithm and adaptive constraint-handling technique. Struct Multidiscip Optim 37(4):395413

Wu B, Yu X, Liu L (2001) Fuzzy penalty function approach for constrained function optimization with evolutionary algorithms. In: Proceedings of the 8th international conference on neural

information processing. Citeseer, pp 299304

Wu W-H, Lin C-Y (2004) The second generation of self-organizing adaptive penalty strategy for

constrained genetic search. Adv Eng Softw 35(12):815825

Yokota T, Gen M, Ida K, Taguchi T (1995) Optimal design of system reliability by an improved

genetic algorithm. Trans Inst Electron Inf Comput Eng J78-A(6):702709 (in Japanese)

Chapter 2

Continuous Fitness Landscapes

Shayan Poursoltan and Frank Neumann

real-world applications. In this chapter, we study algorithms for constrained

optimiation problems from a theoretical perspective. Our goal is to understand

how the fitness landscape influences the success of certain types of algorithms.

One important feature for analyzing and classifying fitness landscape is its ruggedness. It is generally assumed that rugged landscapes make the optimization process by

bio-inspired computing methods much harder than smoothed landscapes, which give

clear hints toward an optimal solution. We introduce different methods for quantifying the ruggedness of a given constrained optimization problem. They, in particular,

take into account how to deal with infeasible regions in the underlying search space.

Keywords Constrained optimization

scapes Ruggedness

Continuous optimization

Fitness land-

2.1 Introduction

Constrained optimization problems (COP)s, especially nonlinear ones, are important

and widespread in many real-world applications such as chemical engineering, VLSI

chip design, and structural design (Floudas and Pardalos 1990). Various algorithmic

approaches have been introduced to tackle constrained optimization problems. The

major component of these optimization algorithms is devoted to the handling of the

involved constraints.

Optimisation and Logistics, School of Computer Science, University of Adelaide,

Adelaide, SA 5005, Australia

e-mail: shayan.poursoltan@adelaide.edu.au

F. Neumann

e-mail: frank.neumann@adelaide.edu.au

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_2

29

30

(Schwefel 1993), differential evolution (Storn and Price 1997), and particle swarm

optimization (PSO) (Eberhart and Kennedy 1995) have been applied to constrained

continuous optimization problems. Constraint handling mechanisms that are

frequently used include penalty functions, decoder-based methods, and special operators that separate the treatment of the objective function and the constraints. We

refer the reader to Mezura-Montes and Coello Coello (2011) for an overview of the

different types of methods. Among the various types of optimization algorithms,

penalty methods are well known as one of the most successful and popular approaches

for dealing with constraints. They penalize the violation of constraints by adding

penalty values to the fitness value of a given solution. Effectively, this transforms the

constrained problem into an unconstrained one. Turning constrained optimization

problems into unconstrained ones by using penalty functions makes the problem

easily accessible to a wide range of methods for unconstrained optimization and can

be regarded as one of the major reasons for the popularity of penalty functions.

There are a wide range of optimization algorithms for constrained continuous optimization problems and their performances are usually evaluated based on the results

of popular benchmark problems (Liang et al. 2006; Mallipeddi and Suganthan 2010).

These benchmark problems are designed to impose different types of difficulties for

optimization algorithms. As evolutionary algorithms make heavy use of random decisions, it is hard to understand the behavior of these algorithms from an analytical

perspective. More importantly, it is hard to predict which algorithm would perform the

best for a newly given real-world optimization problem. Mersmann et al. (2011) have

proposed the following steps to select the best possible algorithm from a given suite

of algorithms. First, one has to extract important problem properties from the class of

problems under investigation. Secondly, it is necessary to analyze the performance

of different algorithms based on the problem properties and build a prediction model

that allows to select the best possible algorithm based on problem characteristics.

There are various problem properties associated with the fitness landscape. In

other words, analyzing the fitness landscape helps us to classify them with related

characteristics that make problems easy or hard to solve by certain types of algorithms. In recent years, fitness landscape analysis has become very popular to describe

the characteristics of optimization problems. Important attributes that are associated

with fitness landscapes and that impact the optimization process of evolutionary

algorithms include the smoothness, multi-modality, feasibility rate, and variable

separability of the landscape and the considered problem (Naudts and Kallel 2000).

Among several characteristics associated with fitness landscapes, the notion of

fitness landscape ruggedness plays a vital role in determining the problem difficulty.

If the objective function is unsteady and goes up and down frequently, choosing

the right direction to continue becomes difficult for many solvers. Since ruggedness

and problem difficulty are closely related to each other, many studies have been

conducted to analyze this feature. For discrete landscapes, one important approach

is to consider autocorrelations by calculating the correlation of fitness values of

31

search points that are visited by a random walk on the landscape (Weinberger 1990).

Furthermore, there have been many studies that extend the basic autocorrelation

approach to provide additional insights into fitness landscapes (Box et al. 2013;

Hordijk 1996). One of the drawbacks of using autocorrelation by these statistical

analysis techniques is that the calculated value is a vague notion that does not clearly

reflect the landscape ruggedness. Thus, Vassilev proposed a new technique based on

the assumption that each landscape is an ensemble of different objects (the nodes

seen by a random walk on the fitness landscape), which can be grouped by their form,

size, and distribution (Vassilev et al. 2000). Vasillevs approach was applicable to

discrete problems. For real parameter landscapes, Malan and Engelbrecht (2009) used

Vassilevs information theoretic analysis to measure the fitness landscape ruggedness

in the continuous domain. So far, these landscape analysis techniques have been

conducted only for unconstrained or discrete problems. Measuring the landscape

ruggedness for constrained continuous problems imposes additional challenges and

we will propose how to tackle them in this chapter.

We propose an approach to measure the fitness landscape ruggedness of constrained continuous optimization problems. The quantification of ruggedness

combined with other analytical problem characteristics can help to build an algorithm

selection model based on the relation of different algorithms and problem properties.

This chapter includes a methodology for quantifying fitness landscape ruggedness

of constrained continuous problems. In order to do this, we extend Malans approach

to quantify the fitness landscape ruggedness of constrained continuous problems.

The information obtained by using simple random walks on constrained problems

landscape is not useful enough since it is mostly related to infeasible areas that are

unlikely to be seen by the solver. To cope with constraints in nearly infeasible problems, our approach replaces Malans random walk with a biased one. The obtained

samples are used to quantify the ruggedness of landscapes using the approach of

Vassilev et al. (2000). We evaluate our approach on well-known benchmarks taken

from the recent CEC competitions (Mallipeddi and Suganthan 2010) and discuss

the benefits and drawbacks of our new approach.

The remainder of this chapter is organized as follows: In Sect. 2.2, we introduce

constrained continuous optimization and discuss approaches that have been used to

analyze the ruggedness of unconstrained fitness landscapes. We present our approach

for quantifying ruggedness of constrained continuous fitness landscapes in Sect. 2.3

and the results of our experimental investigations in Sect. 2.4. Finally, we end our

research with some concluding remarks.

2.2 Preliminaries

In this section, we introduce basic notations and summarize the previous works on

measuring the ruggedness of fitness landscapes.

32

Constrained continuous optimization problems are optimization problems where a

function on real-valued variables should be optimized with respect to a given set of

constraints. Constraints are usually given by a set of inequalities and/or equalities.

Without loss of generality, we present our approach for minimization problems.

Formally, we consider single-objective functions f : S R, with S Rn . The

constraints impose a feasible subset F S of the search space S and the goal is to

find an element x S F that minimizes f.

We consider problems of the following form:

Minimize

f (x), x = (x1 , . . . , xn ) Rn

(2.1)

such that x S F.

The feasible region F S of the search space S is defined as

li x i u i , 1 i n

(2.2)

where li and u i are lower and upper bounds on the variable xi , 1 i n. Additional

constraints are given by the functions

gi (x) 0, 1 i q,

h i (x) = 0, q + 1 i p

In order to work with iterative optimization algorithms for these problems, it is

common to relax the equality constraints

h i (x) = 0, q + 1 i p

to

|h i (x)| , q + 1 i p

(2.3)

where is a very small positive value that determines how much the original

constraints can be violated. In our experimental study, we work with = 0.0001

which is the same setting as used in Mallipeddi and Suganthan (2010).

Using the Entropy Measure

A fitness landscape see Stadler et al. (1995) is given by a search space S, a fitness

function f : S R which assigns a value f (s) to each search point s S, and a

33

points. The elements in (s) are called the neighbors of s.

Various techniques have been used for the statistical analysis of fitness landscapes.

Popular techniques measure the correlation of the search points visited by a random

walk algorithm (Lipsitch 1991; Manderick et al. 1991; Weinberger 1990). However,

it has been shown that this information is very basic and not very useful to reflect

problem difficulty (Mattfeld et al. 1999). Vassilev et al. (2000) conducted an information theoretic approach to quantify fitness landscape ruggedness. The difference

between Vassilevs and the previous approaches is that his technique focuses on

the relation between ruggedness and neutrality of the problem landscape. Vassilevs

method performs a random walk on a fitness landscape to generate a sequence of

fitness values { f t }nt=0 . This random walk starts from a random position on a discrete

landscape and moves to its neighbor using bit flips. The aim of this method is to

extract a ensemble of objects from a sequence of fitness values. These objects can

be classified into three categories:

Flat objects: The fitness value of each point is similar to its two visited neighbors

(predecessor and successor).

Isolated objects: Each point has higher or lower fitness value compared to its two

neighbors.

Points that do not belong to the former two groups.

The aim of the approach is to extract the ensemble of objects mentioned above from

the values in a sequence of fitness values. The following function represents the time

series as a set of objects. The ensemble is defined as a string S() = (s1 s2 s3 . . . sn )

0, 1} given by

with si {1,

1,

si = ft (i, ) = 0,

1,

if f i f i1 <

if | f i f i1 |

if f i f i1 >

(2.4)

where the parameter is the real positive number that represents the accuracy of

the calculation of the string S(). According to the function, if = 0 then the

function will be sensitive to the differences in adjacent points. It can be observed

that increasing the value of reduces the sensitivity of the function. Therefore, if the

value of equals the difference of the highest and lowest points in the walk, then the

fitness sequence will only consist of zeros.

To measure the ruggedness, the entropy of the string S() is calculated as follows:

H (S()) =

P[ pq]log 6 P[ pq]

(2.5)

p=q

H (S()) is the information content, which is an estimation of the variety of different shapes within the string of S(). This measurement is used to characterize the

34

landscape ruggedness with respect to the flat areas where neutrality is present. P[ pq]

refers to the frequency of the blocks where p and q have different values ( p = q):

P[ pq] =

n [ pq]

n

(2.6)

necessary to include the rugged block in our estimation ( p = q). Thus, sub-blocks

with two similar elements are excluded in this function (case p = q). The formula calculates the frequencies of sub-blocks with different symbols. As discussed

above, since there are six different possibilities of rugged sub-blocks in the string

(according to Table 2.1), the logarithm base is set to 6. The different possibilities of

rugged objects are considered as isolated areas where each point has different values.

Tables 2.1 and 2.2 show different possibilities of rugged and flat sub-blocks of pq in

the string of S().

As discussed earlier, the variable controls the sensitivity of the function

(see Eq. 2.4). It can be observed that greater values for lead to more neutrality in

the measurement. It is suggested that using smaller values of makes the behaviour of

H (S()) significant for characterising the ruggedness with respect to the landscape

neutrality (Vassilev et al. 2003). Therefore, for comparing various problems with

different fitness ranges, the smaller values of are used for H (S()). The values of

used in Malan and Engelbrecht (2009) are:

= 2k (k = 1, 2, . . . , 8).

(2.7)

in which, is the smallest value that generates all sub-blocks as zeros and consequently the landscape becomes flat. Also, k is considered 18 to calculate smaller

values for s. Note that the parameter can be calculated as the difference in the

highest and lowest fitness that has been found in the random walk.

An entropic measure H (S()) requires a sequence of search points S(). In order

to generate a set of time series, a simple random walk on a landscape path can be

used (see Algorithm 1).

The above method was used for measuring the ruggedness of discrete problems.

The major issue of using this approach for continuous problems is that (unlike the

discrete problems) it is not possible to generate or access all possible neighbors of the

Table 2.1 Various sub-blocks in Si considered as rugged objects

10

11

Sub-block

01

10

Object type

Rugged

Rugged

Rugged

Rugged

11

01

Rugged

Rugged

Object figure

Table 2.2 Various

sub-blocks in Si considered

as flat objects

Sub-block

00

1 1

11

Object type

Flat

Flat

Flat

Object figure

1.

2.

3.

4.

35

Generate all the neighbors of the chosen point using permutation

Choose one neighbor randomly and save its value

Go back to step 2

1. Input: Problem domain (domain), number of the dimensions (dimension)

and number of steps (MaxStepNumber) for the walk

2. Calculate the maximum step size

Range of the problem domain

100

Set counter = 0 and create an array steps to save the steps in the walk

Assign a random position to steps[0] within the boundaries of the problem

Repeat

For every dimension i of the problem

currentStep = random(0,MaxStepSize);

steps(counter) = steps(counter-1)+currentStep;

MaxStepSize =

3.

4.

5.

6.

7.

8.

9.

10.

If steps(counter) > boundaries

11.

steps(counter) = steps(counter-1)-(Range of the problem domain);

12.

Endif

13. Endfor

14. Until (counter < MaxStepNumber)

visited individual. Thus, Malan and Engelbrecht (2009) modified the approach to use

it for unconstrained continuous problems. The proposed approach adopts a random

increasing walk which increases the step size over time. Furthermore, the step size

is decreased if the algorithm produces a solution that is not within the boundaries

given by the constraints. The algorithm for the random increasing walk proposed in

Malan and Engelbrecht (2009) is given in Algorithm 2. Here, we assume that the

variable range is the same for all dimensions, which implies that the maximum step

size is the same for all dimensions. The algorithm can be easily adjusted to problems

with different variable ranges by using a maximum step size for each variable.

Continuous Optimization

In this section, we present a new approach for quantifying the ruggedness of a fitness

landscape of a constrained continuous optimization problem. Since we are working

on constrained optimization problems, dealing with infeasible areas is the important

36

and challenging part. Often in these problems, the infeasibility rate is high and it

might be even very hard to find one feasible solution. This implies that random walk

methods are usually not very helpful as they would produce infeasible solutions most

of the time. Most constraint handling methods direct the search process to feasible

regions of the search space and therefore often allow to optimize in the feasible

region of the search space, which might be a very small proportion of the size of the

overall space.

In the following, we discuss the drawbacks of applying the previous approaches

for ruggedness quantification when dealing with constrained continuous optimization problems. Later, we explain the solution to these issues by following our new

approach. As mentioned in the previous section, random walk algorithms have been

used to measure the ruggedness of fitness landscapes. However, random walk algorithms are often not useful when it comes to constrained optimization problems. We

discuss the different reasons below.

A random walk algorithm is not accurate enough to reflect the fitness landscape as

a whole, which is already true for unconstrained optimization, but becomes even more

evident when dealing with constrained problems. Random walk algorithms cannot

discriminate accurately between two different search spaces (feasible and infeasible

space) since they do not make decisions based on the fitness values. Experiments show

that the statistics obtained by random walks on landscapes are biased to areas with

low fitness (Smith et al. 2002). Hence, various landscapes with different high fitness

value areas and the same low areas generate similar data for walks and, consequently,

the obtained ruggedness measures are within the same range when using previous

methodologies. To address this issue, we introduce methods that take into account

the individual fitness values in the sampling process. Using this method forces the

algorithm to explore higher fitness values in landscape, which is more interesting for

optimization algorithms. Therefore, the calculated fitness landscape ruggedness is

more interesting as it reflects the landscape structure in regions of the search space

that are crucial for optimization.

The chance of finding even a few feasible individuals when using random walk

algorithms is likely to be very low for highly infeasible landscapes. Since the majority of constrained optimization problems are nearly infeasible, it is more likely to

have more infeasible individuals when using a random walk to explore the landscape. Optimization algorithms prefer to move and search in feasible regions. In

order to solve this problem, the sampling method for exploring fitness landscapes

of constraineds optimization problems needs to move toward feasible areas in the

search space. Our remedy for this issue is that we introduce methods that have the

ability to distinguish between feasible and infeasible individuals when choosing the

next step in the walk. Our method is flexible and can be tuned such that the walk

contains more or less feasible individuals in it.

37

We use a biased walk in our approach to quantify the ruggedness of a constrained

problem fitness landscape. Considering the fitness values of individuals in the

sampling process improves the reliability of the calculated measure. Our biased

walk is using a simple evolution strategy (Schwefel 1993). Since the adjacent steps

in the walk should be different, we use a (,)-ES. This means that the selection

is performed among the offspring and their parents are excluded from the new

generation.

In the (,)-ES, each individual (both parents and offspring) is a vector (xi ,i )

consisting of the coordinates of the search point and the step sizes for the different

coordinates. The initial population is generated by choosing solutions uniformly

at random from the search space and the initial step size of variable j in individual

i is given as

xi, j

(0)

i, j =

n

in which i, j refers to the jth component of vector i and xi, j is the difference in

upper and lower bounds on i, j (Schwefel 1993). It is noteworthy that the calculated

strategy parameters for each generation are used in the next generation. The step

sizes for each generation are as follows:

i j (t + 1) = i j (t)e

where =

1

2n

N (0,1)+ N j (0,1)

2 n

random variable and N j (0, 1) denotes that there is a new value for each component

of .

By calculating the next generation strategy parameters (as above), each parent

produces new individuals as

where h {1, . . . , } and i {1, . . . , }. The pseudo-code for (,)-ES is shown

in Algorithm 3. In this chapter, we use = 1, i.e., a (1,)-ES. This implies that each

search point in the sequence we are generating is an offspring of the previous point

in this sequence.

Among all categories of constraint handling methods, it has been shown that penalty

methods in general have good performance (Mallipeddi and Suganthan 2010). Some

methods calculate the constraint violation as a sum of violation of all constraints and

integrate them into the objective function.

38

1. Initialize the strategy parameters, set generationCounter = 0

2. Initialize and create the population of solution of x using uniform n

dimensional probability distribution on problem search space ( individuals)

3. Evaluate the fitness of population

4. Repeat

5. Generate offspring using Eqs. 3.2 and 3.2 (mutation)

6. Evaluate the fitness of offspring

7. Apply the selection process to select from offspring individuals for next

generation (selection).

8. generationCounter = generationCounter +1

9. Until stopping condition is true

When integrating constraint violations into the objective function, the main problem is to choose an appropriate penalty coefficient that determines how strongly the

constraint violation influences the objective value. There are also penalty methods

that use the constraint violation and objective functions separately. In this case, they

optimize the constraint violation and objective function in lexicographic order so

that the main goal is to obtain a feasible solution.

As discussed earlier, to deal with nearly infeasible problems, there is a need to use

a walk with the ability to distinguish between feasible and infeasible individuals. We

choose the stochastic ranking method proposed by Runarsson and Yao (2000) as our

constraint handling mechanism to sample and collect individuals for the time series

S(). It has been observed that there should be a balance between accepting infeasible

individuals and preserving feasible ones. Hence, neither over- nor under-penalizing

infeasible solution is a proper choice as constraint handling method (Gen and Cheng

2000). It is worth noting that all penalty methods try to adjust the balance between the

objective and the penalty function. The proposed stochastic ranking method adjusts

this balance in a direct way. By using this method, the walk is directed toward feasible

areas of the search space.

The stochastic ranking method is used to rank offspring in the evolutionary

strategy discussed earlier (see Algorithm 4). Ranking is achieved by comparing

adjacent individuals in at least sweeps. Ranking is terminated once no change

occurs during a whole sweep. To determine the balance of offspring selection, the

probability of P f is introduced in Runarsson and Yao (2000). In other words, P f

is the probability of comparing two adjacent individuals based on their objective

function. It is obvious that if two comparing individuals are feasible, then P f is 1.

Handling Biased Walk

We already explained how we use a biased walk that can distinguish between feasible

and infeasible individuals. In order to obtain more interesting individuals, we need

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

39

Initialize probability of P f

I j = {1, . . . , }

For i starts at 1, i< N , increment i

For j starts at 1, h< -1, increment j

Generate a random number (U ) in the range of (0,1)

If ((I j ) = (I j+1 ) = 0) or (U < P f )

If f (I j ) > f (I j+1 )

swap(I j , I j+1 )

End if

else

If (I j ) > (I j+1 )

swap(I j , I j+1 )

End if

End if

End for

Break if no changes occurred within a complete sweep

End for

Algorithm 4: Stochastic ranking for dealing with infeasible areas. N is the number of sweeps needed for the whole population, is the number of individuals

that are ranked by at least sweeps and is a real-valued function that imposes

penalty

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

Set counter=0 and create and array of steps to save the steps in the walk

Repeat

Produce new individuals using evolutionary strategy (ES) in algorithm es

Rank generated offspring by employing stochastic ranking method in Algorithm 4

Save the highest ranking individual fitness (infeasible/feasible) in array

of steps[counter]

counter = counter +1

Until (counter < MaxStepNumber)

Set = max(steps[]) min(steps[])

Generate ensemble of objects (Eq. 4)

Calculate the entropic measure H (S()) (Eq. 5)

to use a biased walk that moves through good regions of the fitness landscape. It is

necessary to have feasible solution within the walk steps in order to obtain an effective ruggedness measure. Therefore, our approach uses a biased walk by constraint

handling methods, which makes it possible to have feasible individuals in the path.

In the algorithm, the individuals that are found by the simple evolutionary strategy

are ranked by the stochastic ranking method. Later, the highest rank individual is

selected for the step walk. The pseudo-code of our methodology to quantify the

ruggedness of constrained continuous fitness landscapes is given in Algorithm 5.

40

two adjacent individual x and y based on their objective function. According to

Runarsson and Yao (2000), the probability of winning for x is given by

Pw = P f P f + P (1 P f )

(2.8)

where P f is the probability that individual x wins when x and y are compared

according to their objective function value, and P is the probability that x wins

when they are compared according to the penalty function.

As discussed in Sect. 2.3.1, the walking algorithm should consider both feasible

and infeasible areas. Thus, P f determines whether the comparison is based on the

objective or the penalty function. Of course, the impact of this parameter setting

depends on the fitness landscape under investigation. By adjusting the parameter

P f , we can control the number of feasible or infeasible individuals in the walk and,

consequently, the calculated ruggedness measure is more likely based on the feasible

or infeasible regions.

In this part, we describe experimental studies to evaluate our approach for measuring

the ruggedness of a constrained continuous fitness landscape. We carry out experimental investigations on two different types of problems. The first consists of a

constrained version of the classical Sphere function. Imposing constraints that lead

to different infeasible areas, we examine our approach with respect to the number

of feasible solutions obtained during the run of the algorithm and compare it to the

other approaches outlined in Sect. 2.2.2. Then, we examine our approach on different

benchmark functions taken from the special session on single objective constrained

real parameter optimization (Mallipeddi and Suganthan 2010) at CEC 2010.

To investigate the proposed method, we first consider the following constrained

version of the two-dimensional classical Sphere function:

min Sphere(x) =

n

xi2

5.12 xi 5.12

i=1

subject to g(x) 0

where g(x) imposes the constraints of the two-dimensional sphere function. We

construct three different problems that differ from each other by using each of the

41

following constraints:

n

g1 (x) = 10(i=1

| cos3 (xi 40)|) 4,

n

g2 (x) = 10(i=1 | cos3 (xi 40)|) 8,

n

| cos3 (xi 40)|) 12

g3 (x) = 10( i=1

In this experiment different optimization problems (Sphereg1 , Sphereg2 , Sphereg3 )

have low, medium, and high feasibility rate. In this experiment, we consider twodimensional sphere function to analyze the results more accurately. Figures 2.1

and 2.2 show the feasible areas in these three functions (n = 2).

We apply and compare the random increasing walk (see Algorithm 2) with our

methodology on these problems with different feasibility rates. In this experiment,

we use (1,7)-ES algorithm and P f = 0.4 that means the ES has a tendency to focus

on feasible solutions. We performed 20 independent runs consisting of 1,000 steps

each and for each problem the percentage of feasible solutions is represented in

Table 2.3.

Due to the stochastic nature of evolutionary optimization, the above test is repeated

20 times and the two-tail t-test significance is performed. In all tests, the significant

level is assigned as 0.05. The p-values for each function are represented in Table 2.4.

The results show that the difference in means are significant and less than 0.05.

Clearly, our methodology is less influenced by increasing the infeasibility rate of

the problem. Also, comparing both walks shows that using our biased walk is more

likely to obtain feasible individuals (steps) in the walk, (see Table 2.3). The standard

42

Fig. 2.2 Two-dimensional space of the constrained sphere functions with infeasible areas marked

white: a sphereg1 , b sphereg2 , c sphereg3 having low, medium, and high infeasibility rate

Table 2.3 Percentage of feasible individuals in the walks

Sphereg1

Sphereg2

Random increasing walk

Biased walk

71.3

75.8

55.8

68.1

Sphereg3

28.7

48.7

Table 2.4 p-values for significance of a difference between two means for running random increasing and biased walk over three functions

Sphereg1

Sphereg2

Sphereg3

p-value

0.0043

7.0834E 06

9.4817E 06

deviations of feasible individuals in both walks are shown in Fig. 2.3. It is clear that

the standard deviation of feasible individuals is higher for random walks.

Thus, the obtained ruggedness measure is related to the feasible parts, which is

more likely to be seen by the solver.

43

Fig. 2.3 Standard deviation for average percentage of feasible individuals in walks using random

increasing and biased walks

Also, we investigate our new method on benchmark problems from CEC 2010

competition (Mallipeddi and Suganthan 2010). First, we compare our method with

random increasing walk in terms of number of feasible individuals (steps) in the

walk. In order to this, we use (1,7)-ES in this experiment and P f is considered as

0.4, which forces the walk toward feasible areas (see Eq. 2.8). We calculate the number of feasible steps (individuals) taken by the walking algorithm within 5,000 steps

for nearly infeasible problems. Figure 2.4 shows the results of 30 independent runs

on CEC problems. It can be observed that for nearly infeasible problems (Mallipeddi

and Suganthan 2010), our method performs better to include more feasible individual

in the steps (see Fig. 2.4).

Also, to test the ability of our methodology in ruggedness quantifying, we used

different CEC benchmark problems with D = 10. To quantify the ruggedness, we

calculate the entropic measure H (S()) for different values of (Eq. 2.7). Table 2.5

shows our experimental results. The results indicate the mean value of H (S())s

for different values of s over 30 runs. Based on Malan and Engelbrecht (2009),

the ruggedness feature of problem is considered as the maximum value of H (S())

among all different s. These numbers are values describing the ruggedness of each

problem fitness landscape with respect to neutrality. Also, the standard deviation for

different s is shown in Table 2.6.

44

Fig. 2.4 Percentage of feasible individuals in walks for nearly infeasible CEC benchmark problems

Table 2.5 Ruggedness results for functions in CEC 2010 benchmarks (10D)

Function (10D) 2

4

8

16

32

64

128

256

C01

C02

C03

C06

C07

C09

C10

C17

C18

0

0

0

0

0

0

0

0

0

0.001

0.001

0.000

0.006

0.001

0.001

0.002

0.002

0.001

0.005

0.003

0.001

0.010

0.004

0.002

0.002

0.003

0.002

0.013

0.004

0.004

0.012

0.006

0.003

0.003

0.005

0.003

0.024

0.006

0.009

0.014

0.007

0.005

0.004

0.008

0.004

0.035

0.010

0.011

0.018

0.009

0.006

0.006

0.013

0.007

0.060

0.015

0.014

0.023

0.012

0.009

0.007

0.015

0.009

0.102

0.023

0.014

0.035

0.013

0.012

0.01

0.011

0.012

Ruggedness

0.153

0.035

0.013

0.027

0.015

0.014

0.012

0.019

0.017

0.153

0.035

0.014

0.027

0.015

0.014

0.012

0.019

0.017

Function

2

4

8

16

32

64

(10D)

STD STD

STD

STD

STD

STD

STD

C01

C02

C03

C06

C07

C09

C10

C17

C18

0

0

0

0

0

0

0

0

0

0.002

0.002

0.000

0.013

0.001

0.001

0.001

0.002

0.001

0.005

0.003

0.000

0.016

0.002

0.001

0.001

0.002

0.001

0.006

0.003

0.000

0.016

0.003

0.002

0.002

0.005

0.002

0.009

0.005

0.001

0.017

0.004

0.002

0.003

0.011

0.002

0.0160

0.008

0.002

0.019

0.006

0.004

0.004

0.022

0.004

0.028

0.0140

0.003

0.024

0.007

0.006

0.005

0.041

0.004

128

256

STD

STD

0.044

0.022

0.004

0.035

0.009

0.010

0.007

0.008

0.006

0.058

0.035

0.009

0.028

0.009

0.011

0.009

0.009

0.010

45

objective functions. Problems C17 and C18 are similar according to their objective

functions and present close values for their ruggedness. For problems C03, C07,

C09, and C10 (with the same objective function), the ruggedness measure is in the

same range. C02 and C06 with the same objective function have different ruggedness

measures compared to C01, which has the largest value in ruggedness. Therefore, it

can be concluded that it is more likely that similar problems have similar ruggedness

measures. Based on the table, we can conclude that C01 is more rugged than other

categories.

2.5 Conclusions

In this chapter, we have reviewed the literature on measuring ruggedness of fitness

landscapes and discussed the drawbacks of the current methods when dealing with

constrained problems. In order to address constrained continuous optimization problems, we have presented a new technique to quantify the ruggedness of constrained

continuous problem landscapes. The modification is based on replacing the random

sampling data by a biased walk using a (1,)-evolution strategy, which can distinguish the feasible and infeasible individuals. We evaluated our approach on different

benchmark functions and show that it produces more feasible solutions during its

run. Furthermore, we evaluated our method on CEC 2010 benchmark problems and

discussed the results.

Appendix

The experimented benchmark functions described in Mallipeddi and Suganthan

(2010) are summarised here. In this experiment is considered as 0.0001.

C01

Minimize

D

D

2 (z )

i=1 cos4 (z i ) 2 i=1

cos

i

f (x) =

z = x o

D

2

iz

i=1

46

subject to

g1 (x) = 0.75

zi 0

i=1

g2 (x) =

D

0.75D 0

i=1

D

x [0, 10]

C02

Minimize

f (x) = max(z) z = x o, y = z 0.5

subject to

g1 (x) = 10

D

1 2

[z i 10 cos(2 z i ) + 10] 0

D

i=1

D

1 2

g2 (x) =

[z i 10 cos(2 z i ) + 10] 15 0

D

i=1

h(x) =

D

1 2

[yi 10 cos(2 yi ) + 10] 20 0

D

i=1

x [5.12, 5.12] D

C03

Minimize

f (x) =

D1

i=1

subject to

h(x) =

D1

(z i z i+1 )2 = 0

i=1

x [1,000,1,000] D

C06

Minimize

f (x) = max(z) z = x o,

y = (x + 483.6106156535 o)M 483.6106156535

subject to

D

1

yi sin

|yi | = 0

D

h 1 (x) =

i=1

D

1

yi cos 0.5 |yi | = 0

D

h 2 (x) =

i=1

x [600, 600] D

C07

Minimize

f (x) =

D1

i=1

z = x + 1 o, y = x o

subject to

D

D

1

1

g(x) = 0.5 exp 0.1

yi2 3 exp

cos(0.1y)

D

D

i=1

i=1

+ exp(1) 0

x [ 140, 140] D

C09

Minimize

f (x) =

D1

i=1

47

48

z = x + 1 o, y = x o

subject to

h 1 (x) =

D

yi sin

|yi | = 0

i=1

x [500, 500] D

C10

Minimize

f (x) =

D1

i=1

z = x + 1 o, y = (x o)M

subject to

h 1 (x) =

D

yi sin

|yi | = 0

i=1

x [500, 500] D

C17

Minimize

D

(z i z i+1 )2 z = x o

f (x) =

i=1

subject to

g1 (x) =

zi 0

i=1

g2 (x) =

D

i=1

zi 0

h(x) =

D

49

z i sin 4 |z i | = 0

i=1

x [10, 10] D

C18

Minimize

f (x) =

D

(z i z i+1 )2 z = x o

i=1

subject to

g(x) =

D

z i sin

|z i |

i=1

h(x) =

D

z i sin

|z i | = 0

i=1

x [50, 50] D

References

Box GE, Jenkins GM, Reinsel GC (2013) Time series analysis: forecasting and control. Wiley

Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the

sixth international symposium on micro machine and human science, MHS95. IEEE, pp 3943

Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization

algorithms, vol 455. Springer, Berlin

Gen M, Cheng R (2000) Genetic algorithms and engineering optimization, vol 7. Wiley, New York

Hordijk W (1996) A measure of landscapes. Evol Comput 4(4):335360

Liang J, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan P, Coello CC, Deb K (2006) Problem

definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter

optimization. J Appl Mech 41

Lipsitch M (1991) Adaptation on rugged landscapes generated by local interactions of neighboring

genes. In: Proceedings of the fourth international conference on genetic algorithms. San Mateo

Malan KM, Engelbrecht AP (2009) Quantifying ruggedness of continuous landscapes using entropy.

In: IEEE congress on evolutionary computation, CEC09, pp 14401447

Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010

competition on constrained real-parameter optimization. Nanyang Technological University,

Singapore

50

Manderick B, de Weger, M, Spiessens P (1991) The genetic algorithm and the structure of the fitness

landscape. In: Proceedings of the fourth international conference on genetic algorithms. Morgan

Kauffman, San Mateo, pp 143150

Mattfeld DC, Bierwirth C, Kopfer H (1999) A search space analysis of the job shop scheduling

problem. Ann Oper Res 86:441453

Mersmann O, Bischl B, Trautmann H, Preuss M, Weihs C, Rudolph G (2011) Exploratory landscape

analysis. In: Proceedings of the 13th annual conference on genetic and evolutionary computation.

ACM, pp 829836

Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194

Naudts B, Kallel L (2000) A comparison of predictive measures of problem difficulty in evolutionary

algorithms. IEEE Trans Evol Comput 4(1):115

Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE

Trans Evol Comput 4(3):284294

Schwefel HPP (1993) Evolution and optimum seeking: the sixth generation. Wiley, New York

Smith T, Husbands P, Layzell P, OShea M (2002) Fitness landscapes and evolvability. Evol Comput

10(1):134

Stadler PF et al (1995) Towards a theory of landscapes. In: Complex systems and binary networks.

Springer, Heidelberg, pp 78163

Storn R, Price K (1997) Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341359

Vassilev VK, Fogarty TC, Miller JF (2000) Information characteristics and the structure of landscapes. Evol Comput 8(1):3160

Vassilev VK, Fogarty TC, Miller JF (2003) Smoothness, ruggedness and neutrality of fitness landscapes: from theory to application. In: Advances in evolutionary computing. Springer, pp 344

Weinberger E (1990) Correlated and uncorrelated fitness landscapes and how to tell the difference.

Biol Cybern 63(5):325336

Chapter 3

Evolutionary Programming for Constrained

Expensive Black-Box Optimization

Rommel G. Regis

(EP) algorithm for computationally expensive constrained black-box optimization.

The proposed algorithm, TRICEPS (Trust Regions In Constrained Evolutionary

Programming using Surrogates) builds surrogates for the black-box objective function

and inequality constraint functions in every generation of the EP and uses a trustregion-like approach to refine the best solution at the end of each generation. Each

parent produces a large number of trial offspring in each generation, and then

the surrogates are used to identify promising trial offspring, which become the

actual offspring where the objective and constraint functions are evaluated. After the

function evaluations at these offspring, TRICEPS finds a minimizer of the surrogate

of the objective function within a trust region centered at the current best solution and

subject to surrogate inequality constraints with a small margin and with a distance

requirement from previously evaluated points. The trust region is either expanded

or reduced depending on whether the subproblem solution turned out to be feasible and whether the ratio of the actual improvement to the predicted improvement

exceeds or falls below certain thresholds. TRICEPS is implemented using a cubic

radial basis function (RBF) model with a linear polynomial tail and is compared to

an RBF-assisted EP called CEP-RBF (Regis 2014b) and to other alternatives on 18

benchmark problems and on an automotive application with 124 decision variables

and 68 black-box inequality constraints. Performance and data profiles show that

TRICEPS is a substantial improvement over CEP-RBF and it is much better than the

other alternatives on the test problems used.

Keywords Constrained optimization Evolutionary programming Surrogateassisted evolutionary algorithm Radial basis function Trust region Large-scale

optimization

Department of Mathematics, Saint Josephs University, Philadelphia, PA 19131, USA

e-mail: rregis@sju.edu

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_3

51

52

R.G. Regis

3.1 Introduction

In many real-world engineering optimization problems, the values of the objective

and constraint functions are outputs of computationally expensive simulations. These

types of optimization problems are found in the automotive and aerospace industries

(e.g., Jones 2008; Ong et al. 2003) and in various parameter estimation problems

(e.g., Mugunthan et al. 2005; Tolson and Shoemaker 2007). A reasonable strategy

for solving these problems is to use surrogate-based or surrogate-assisted optimization methods, including surrogate-assisted evolutionary algorithms (EAs) Jin (2011),

where the algorithm uses surrogate models to approximate the black-box objective and constraint functions. For instance, Regis (2014b) successfully developed

a surrogate-assisted Evolutionary Programming (EP) algorithm and applied it to a

large-scale automotive optimization application with 124 decision variables and 68

black-box inequality constraints given a severely limited computational budget of

only 1,000 simulations, where one simulation yields the objective function value

and each of the constraint function values at a given input vector. The purpose of

this paper is two-fold: (1) To develop a new surrogate-assisted EP for constrained

black-box optimization that improves on the algorithm by Regis (2014b) on a set of

benchmark problems, including the above-mentioned large-scale automotive application and (2) To compare the new approach with alternative methods, including a

mathematically rigorous penalty derivative-free algorithm, on the same problems.

This chapter focuses on constrained black-box optimization problems of the

following form:

min f (x)

s.t.

(3.1)

x Rd

gi (x) 0, i = 1, 2, . . . , m

axb

Here, f is the black-box objective function and g1 , . . . , gm are black-box inequality

constraint functions and a, b Rd define the bound constraints of the problem.

Throughout this paper, assume that for any input x [a, b] Rd , the values of

f (x), g1 (x), . . . , gm (x) are obtained by running a time-consuming simulator (a computer code) at the input x. Moreover, assume that f , g1 , . . . , gm are all deterministic

and that their gradients are not available. Furthermore, for simplicity, assume that

[a, b] Rd is a hypercube since any hyper-rectangle can be easily transformed to

the unit hypercube [0, 1]d . Problems with equality constraints or noisy functions will

be treated in future work.

Problem (3.1) is difficult when the dimension d and the number of black-box

constraints m are large, and it is even more difficult when the computational budget

is relatively limited. Although much progress has been made in the development of

constraint handling techniques for EAs (Mezura-Montes and Coello Coello 2011),

most of these approaches require a large number of simulations even on problems of

moderate size, and hence they are not appropriate in the computationally expensive

53

black-box functions by constructing and updating dynamic surrogate models, one

for the objective and one for each of the constraint functions as was done in Regis

(2014b). These surrogates are then used to identify promising offspring or other

promising points in the search space and the expensive simulations are performed

only on these points.

This paper develops a new surrogate-assisted EP for computationally expensive

constrained black-box optimization. The proposed algorithm, called TRICEPS (Trust

Regions In Constrained Evolutionary Programming using Surrogates), does not use

a penalty to handle constraints but builds surrogates for the black-box objective and

constraint functions in every generation of the EP. Moreover, it incorporates a trustregion-like approach to refine the best parent solution at the end of each generation.

As in the surrogate-assisted EP by Regis (2014b), each parent in TRICEPS produces a large number of trial offspring in each generation, and then the surrogates

are used to identify promising trial offspring, which become the actual offspring

where the objective and constraint functions are evaluated. After performing simulations at the offspring of the current generation, TRICEPS solves a trust-region-like

subproblem where it finds a minimizer of the surrogate of the objective function

within a trust region centered at the current best solution and subject to surrogate

inequality constraints with a small margin and with a distance requirement from

previously evaluated points. The margin on the surrogate constraints is meant to

increase the chances of obtaining feasible points. In addition, the trust region is

either expanded or reduced depending on whether the subproblem solution turned

out to be feasible and whether the ratio of the actual improvement to the predicted

improvement exceeds or falls below certain thresholds. In the numerical experiments,

TRICEPS is implemented using a cubic radial basis function (RBF) model with a

linear polynomial tail and is compared to the previously developed RBF-assisted

EP called CEP-RBF (Regis 2014b) and to other alternatives, including the mathematically rigorous penalty derivative-free algorithm SDPEN (Liuzzi et al. 2010),

on 18 benchmark problems and on an automotive application with 124 decision

variables and 68 black-box inequality constraints proposed by Jones (2008) during

the MOPTA08 conference. Performance and data profiles show that TRICEPS with

RBF surrogates is a substantial improvement over CEP-RBF and it is much better

than the other alternatives, including SDPEN, on the test problems used when the

computational budget is relatively limited.

Although this paper is about a surrogate-assisted EP, it is also possible to develop

other surrogate-assisted EAs, including surrogate-assisted evolution strategies (ES),

for constrained black-box optimization using the ideas presented here. However,

when the problem is highly constrained and the computational budget is severely

limited, Regis (2014b) suggests using EAs that mainly use conservative mutation

operators since recombination might have a tendency to produce offspring that violate

one of the many constraints. For example, the (1+1)-CMA-ES by Arnold and Hansen

(2012) would be another good candidate to combine with a surrogate since it only

uses mutation operators.

54

R.G. Regis

This paper is organized as follows. Section 3.2 provides a review of the relevant

literature. Section 3.3 describes the proposed TRICEPS algorithm and the RBF surrogate model used. Sections 3.4 and 3.5 discuss the numerical experiments and results.

Finally, Sect. 3.6 provides some conclusions.

Various constraint handling techniques have been used with EAs. Some of the

most common techniques use penalty functions (e.g., Mezura-Montes et al. 2003;

Runarsson and Yao 2000; Tessema and Yen 2006), multi-objective optimization

(Wang and Cai 2012), a combination of a bi-objective optimization and a penalty

approach (Datta and Deb 2013; Deb and Datta 2013), the epsilon constrained method

(Takahama and Sakai 2012), cultural algorithms (Coello Coello and Landa-Becerra

2004), and those that distinguish between feasible and infeasible solutions (MezuraMontes and Coello Coello 2005). A recent survey on constraint-handling techniques

in evolutionary and swarm algorithms is given by Mezura-Montes and Coello Coello

(2011) and a tutorial is given by Coello Coello (2012).

As mentioned above, surrogates or metamodels of the objective and constraint

functions have been used to assist EAs for computationally expensive black-box

optimization. In particular, surrogates for the objective function have been used to

approximate objective function values (e.g., Regis and Shoemaker (2004)), while

surrogates for the constraint functions have been used by Kramer et al. (2009) to

check feasibility, repair infeasible solutions, and rotate the mutation ellipsoid in

CMA-ES. Examples of surrogate models that have been used with EAs include

multivariate quadratic polynomials (Araujo et al. 2009; Regis and Shoemaker 2004;

Shi and Rasheed 2008; Wanner et al. 2005), multilayer perceptron neural networks

(Jin et al. 2002), kriging and Gaussian process models (Emmerich et al. 2002; Zhou

et al. 2007), radial basis functions (Isaacs et al. 2007, 2009; Ong et al. 2003; Regis

2014b; Regis and Shoemaker 2004; Zhou et al. 2007), support vector machines

(SVM) (Gieseke and Kramer 2013; Loshchilov et al. 2012; Shi and Rasheed 2008)

and nearest neighbors regression (Runarsson 2004). Moreover, multiple surrogates

may be used to balance exploration and exploitation in an evolutionary algorithm

(e.g., Montao et al. (2012)). A recent survey on surrogate-assisted EAs is provided

by Jin (2011).

Penalty functions are also commonly used to handle constraints in surrogateassisted EAs. For example, Shi and Rasheed (2008) use a stochastic penalty function

and an adaptive mechanism for switching from lower complexity polynomial models

to higher complexity SVM models while Runarsson (2004) uses a penalty-based

Stochastic Ranking ES combined with a nearest neighbor regression model. However,

Powell (1994) notes that the use of a penalty might not be the most effective way to

handle expensive black-box constraints since information about individual constraint

55

violations is lost. In fact, some numerical evidence to support this idea can be found in

Regis (2014b). Instead, Powell (1994) suggests treating the constraints individually

by building individual surrogates, one for each constraint.

According to Mezura-Montes and Coello Coello (2011), surrogates are still

seldom used to approximate constraints in nature-inspired algorithms. One example

is a GA combined with Feasible Sequential Quadratic Programming (FSQP) developed by Ong et al. (2003), where local RBF surrogates are used to model the objective

and constraint functions. Other examples are given by Araujo et al. (2009) and Wanner et al. (2005), where quadratic models are used to approximate the objective and

constraint functions in GAs. Moreover, Isaacs et al. (2007, 2009) used RBF networks to model objective and constraint functions in evolutionary multi-objective

optimization. In addition, Emmerich et al. (2006) proposed using local Gaussian

Random Field Metamodels for modeling constraint functions in single- and multiobjective evolutionary optimization. More recently, Gieseke and Kramer (2013) used

SVMs to estimate nonlinear constraints in CMA-ES for expensive optimization.

While there are relatively few algorithms that use surrogates to approximate

black-box constraints, there are even fewer algorithms that have been used on

high-dimensional (more than a hundred decision variables) and highly constrained

problems. In Ong et al. (2003), the GA coupled with FSQP that uses local RBF surrogates was tested only on problems with at most 20 decision variables and at most 4

inequality constraints. The metamodel-based CiMPS method (Kazemi et al. 2011)

was only tested on problems with at most 13 decision variables and 9 inequality

constraints. On the other hand, ConstrLMSRBF (Regis 2011), CEP-RBF (Regis

2014b) and COBRA (Regis 2014a) all use global RBF surrogates and were all successful compared to alternatives on well-known benchmark problems and on the

MOPTA08 automotive application with 124 decision variables and 68 black-box

inequality constraints (Jones 2008). One of the goals of this paper is to develop a

new surrogate-assisted EP that improves upon the surrogate-assisted EP in Regis

(2014b) on benchmark test problems and on the MOPTA08 automotive problem.

Programming Using Surrogates

3.3.1 Overview

This section describes a pseudo-code for the proposed TRICEPS algorithm, which

is a new surrogate-assisted EP for optimization problems with black-box inequality constraints. A detailed description is given in the next subsection. Unlike many

constrained EAs in the literature, TRICEPS does not use a penalty function. Instead,

it is similar to the surrogate-assisted EP by Regis (2014b) in that it treats each inequality constraint separately and builds and updates a surrogate model for each constraint

function using all previously evaluated points (both feasible and infeasible points).

56

R.G. Regis

Moreover, as in Regis (2014b), each parent generates multiple trial offspring in every

generation and then the surrogates for the objective and constraint functions are used

to rank these trial offspring according to rules that favor offspring with the best

predicted objective function values among those with the minimum number of predicted constraint violations. The computationally expensive simulations (evaluations

of the objective and constraint functions) are then carried out only on the most

promising offspring of each parent.

TRICEPS differs from the surrogate-assisted EP by Regis (2014b) in that it incorporates a trust-region-like approach to refine the best solution at the end of each

generation. That is, after performing simulations at the offspring of the current

generation, TRICEPS solves a trust-region-like subproblem where it finds a minimizer of the surrogate of the objective function within a trust region centered at

the current best solution and subject to surrogate inequality constraints with a small

margin and with a distance requirement from previously evaluated points. The idea

of refining the best solution at the end of each generation has been implemented in

surrogate-assisted particle swarm algorithms for bound constrained problems (e.g.,

Parno et al. (2012); Regis (2014c)). However, these previous approaches did not use

trust regions that can be expanded or reduced. In TRICEPS, the adjustment of the

trust region depends on whether the subproblem solution turned out to be feasible,

whether the ratio of the actual improvement to the improvement predicted by the

surrogate exceeds or falls below certain thresholds, and also whether the number of

consecutive successful local refinements or the number of consecutive unsuccessful

local refinements have reached certain thresholds. Also, the idea of using a margin

on the surrogate inequality constraints was first proposed by Regis (2014a) and its

purpose is to increase the chances of obtaining feasible points.

When the optimization problem has a large number of decision variables and has

many black-box inequality constraints, Regis (2011, 2014b) implemented a Block

Coordinate Search (BCS) strategy where new trial solutions (or offspring) are generated by perturbing only a small fraction of the coordinates of the current solution

under consideration (i.e., a particular parent solution, including possibly the current

best feasible solution). The BCS strategy resulted in a dramatic improvement for

the ConstrLMSRBF (Regis 2011) and CEP-RBF (Regis 2014b) when applied to

the MOPTA08 benchmark problem from the auto industry proposed by Jones (2008)

involving 124 decision variables and 68 black-box inequality constraints. When only

a small number of coordinates of a parent solution are perturbed, fewer constraint

violations are likely to be introduced in the trial offspring and the trial offspring will

tend to be closer to the parent solution. If this parent solution is feasible, many of the

trial offspring will tend to be feasible thereby making it more likely to find a feasible

solution with an improved objective function value. Hence, the BCS strategy is also

implemented in TRICEPS when it is used for high-dimensional problems with many

black-box inequality constraints.

Figure 3.1 presents a flowchart that shows the main steps of the TRICEPS algorithm. The algorithm begins by initializing the parent population and algorithm

parameters and then calculating the objective and constraint functions at the initial

57

Initialize parent

population and

algorithm parameters

functions at initial population

Stop

yes

Computational

budget

reached?

Update

trustregion

at trust-region point

no

Update surrogates

of the objective and

constraint functions

and constraints

Solve trust-region

subproblem

Evaluate

surrogates

at trial

ospring

for each parent

Evaluate objective

and constraint functions at best ospring

for each parent

population. Then TRICEPS goes through a main loop that terminates only when the

computational budget (i.e., maximum number of function evaluations) is reached.

In the first part of the loop, TRICEPS performs the same steps as in CEP-RBF

(Regis 2014b). That is, TRICEPS fits the surrogates for the objective and constraint

functions, generates a large number of trial offspring for each parent, and then uses

the surrogates to select only the most promising trial offspring and this is where

the function evaluations are performed. In the second part of the loop, TRICEPS

performs a trust-region-like refinement of the best parent solution. That is, the surrogates are updated using information from recently evaluated points, the trust-region

subproblem is solved, then function evaluations are performed on the solution to the

trust-region subproblem, and finally, the algorithm parameters and the trust region

are updated. Note that the surrogates are updated twice in a single iteration, once

before the trial offspring are generated and once before the trust-region step. Hence,

surrogate modeling is integrated into the optimization process in two ways by using

it: (1) to select the most promising among multiple trial offspring for each parent

solution and (2) to identify a local refinement point for the current best solution

during the trust-region step.

58

R.G. Regis

The main input to TRICEPS is an optimization problem of the form (3.1) together

with a simulator (a computer code) that yields the values of f (x), g1 (x), . . . , gm (x)

for any input x [a, b] Rd . Moreover, assume that a feasible starting point x0 is

provided. This assumption is not unreasonable since for some real-world engineering

optimization problems, an initial feasible solution to the problem is provided and the

goal is simply to find a better feasible solution. If a feasible solution is not initially

available, then one can develop an extension of TRICEPS that can handle infeasible starting points by using an approach that is similar to the two-phase approach

described in Regis (2014a). The first phase finds a feasible point while the second

phase improves on this feasible point. This two-phase approach will be included in

future work.

Below is a detailed description of the TRICEPS algorithm. It has several userspecified parameters:

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

(the number of trial offspring generated for each parent)

init (the initial standard deviation of the Gaussian mutations)

pmut (the probability of perturbing a coordinate of a parent solution when

generating a trial offspring)

init , min , max (the initial, minimum, and maximum trust-region radii,

respectively)

0 , 1 (the ratio thresholds indicating whether the trust-region iterations were

successful or not)

0 < 0 < 1 < 1 (contraction and expansion factors for the trust region,

respectively)

Tfail (tolerance for the number of consecutive unsuccessful trust-region iterations before the trust region is reduced)

Tsuccess (threshold for the number of consecutive successful trust-region iterations before the trust region is expanded)

init > 0 (initial margin for the surrogate inequality constraints)

Tinfeas (threshold for the number of consecutive generations where a feasible

solution to the trust-region subproblem was not found)

(distance requirement from previous sample points)

Each individual is a pair of d-dimensional vectors (xi (t), i (t)), where t is the generation number, i is the index of the individual in the current population, xi (t) is

the vector of values of the decision variables, and i (t) is the vector of standard

deviations for the Gaussian mutations.

( + )-TRICEPS for Constrained Black-Box Optimization

(1) Set generation counter t = 0 and set initial population P(0) = {(x1 (0), 1 (0)),

. . . , (x (0), (0))}, where i (0) = init for i = 1, . . . , and x1 (0) = x0

(feasible starting point).

59

(3) Initialize the counters for the number of consecutive successful local refinements

Csuccess = 0 and the number of consecutive unsuccessful local refinements

Cfail = 0. Also, initialize the counter for the number of consecutive generations where a feasible solution for the trust-region subproblem was not found:

Cinfeas = 0.

(4) Evaluate the objective and constraint functions at the points in P(0): For each

i = 1, . . . , , run the simulator to determine f (xi (0)), g1 (xi (0)), . . . , gm (xi (0)).

Relabel the subscripts of the individuals in P(0) so that x1 (0) is the best point in

P(0).

(5) While the termination criteria are not satisfied do

(0) (1)

(m)

(5.1) Fit or update surrogates st , st , . . . , st for the objective and constraint

functions f , g1 , . . . , gm , respectively, using all available function values

from previous simulations (see Sect. 3.3.3).

(5.2) For i = 1, . . . ,

(5.2(a)) For j = 1, . . . , , generate (xij (t), ij (t)) = Mutate((xi (t), i (t)), pmut ).

(5.2(b)) Evaluate the surrogates of the objective and constraint functions at the

(t), (t)), . . . , (x (t), (t))}: For

=

{(xi1

points in Pi (t)

i1

i

i

(0)

j = 1, . . . , , calculate st (xij (t)), st(1) (xij (t)), . . . , st(m) (xij (t)).

(5.2(c)) (xi (t), i (t)) = Select(Pi (t)).

(5.2(d)) Evaluate the objective and constraint functions at the selected point: Run

the simulator to determine f (xi (t)), g1 (xi (t)), . . . , gm (xi (t)).

End

(5.3) P(t + 1) = Select(P(t) P (t)) where P (t) = {(x1 (t), 1 (t)), . . . , (x (t),

(t))}. Relabel the subscripts of the individuals in P(t + 1) so that x1 (t + 1)

is the best point in P(t + 1).

(0) (1)

(m)

(5.4) Update surrogates st , st , . . . , st for the objective and constraint functions f , g1 , . . . , gm , respectively, using newly obtained function values from

simulations on the current offspring.

(5.5) Relabel all previously evaluated points by v1 , . . . , vn and let vn be the best

feasible point so far. Solve the subproblem below and let

x(t) be the solution

obtained:

(0)

min st (x)

s.t. x Rd , a x b

x vn t

(i)

st (x) + t 0, i = 1, 2, . . . , m

x vj , j = 1, . . . , n

(3.2)

x(t) be the solution

x(t) be the best solution

obtained and reset Cinfeas = 0. Otherwise, let

(infeasible for (3.2)) among a set of randomly generated points within the

trust region {x [a, b] : x vn t } and reset Cinfeas = Cinfeas + 1.

60

R.G. Regis

x(t): Run the simulator to

x(t)), . . . , gm (

x(t)).

determine f (

x(t)), g1 (

(5.8) If

x(t) is feasible, then do

(5.8(a)) Calculate predicted and actual improvements: pred = st(0) (x1 (t + 1))

st(0) (

x(t)) and actual = f (x1 (t + 1)) f (

x(t)).

(5.8(b)) If f (

x(t)) < f (x1 (t + 1)), then replace x1 (t + 1) (the best parent in the

next generation) by

x(t) and reset Csuccess = Csuccess + 1 and Cfail = 0.

Otherwise, reset Csuccess = 0 and Cfail = Cfail + 1.

(5.8(c)) If pred > 0, then do

actual

(i) If t = pred 1 and Csuccess Tsuccess , then t+1 =

actual

(ii) Else if t = pred < 0 and Cfail Tfail , then t+1 =

Else

(iii) If Cfail Tfail , then t+1 = max(0 t , min ) and reset Cfail = 0.

End.

Else

(5.8(d)) Set Csuccess = 0 and Cfail = Cfail + 1.

(5.8(e)) If Cfail Tfail , then t+1 = max(0 t , min ) and reset Cfail = 0.

End.

(5.9) If Cinfeas Tinfeas , then reduce the margins t+1 = t /2 and reset Cinfeas = 0.

Otherwise, t+1 = t .

(5.10) Increment generation counter: t t + 1.

End While

(6) Return best solution found.

As in the surrogate-assisted Constrained EP in Regis (2014b), Step 1 of TRICEPS

generates the initial parent population and initializes the standard deviations of the

mutations. Step 2 initializes the trust-region radius and the margin for the surrogate inequality constraints while Step 3 initializes the counters that keep track of

the number of consecutive successful local refinements, the number of consecutive

unsuccessful local refinements and the number of consecutive generations where the

trust-region subproblem did not yield any feasible points. Then, in Step 4, the simulator is run times to determine the objective and constraint function values of the

initial parent solutions. For convenience, the initial parent population is reordered

so that the first one is the best point. Since a feasible starting point is provided, the

best parent solution must be feasible.

Next, in Step 5, TRICEPS loops through the generations. At the beginning of each

generation, surrogates for the objective and constraint functions are built using all

(0)

available function values from previous simulations (Step 5.1). In particular, st is

(1)

(m)

the surrogate for f while st , . . . , st are the surrogates for g1 , . . . , gm , respectively.

Next, for each of the parent solutions, trial offspring are generated by mutation

(Step 5.2(a)). Then, the surrogates for the objective and constraints are evaluated

61

at each trial offspring (Step 5.2(b)) and the most promising of the trial offspring

from each parent is chosen (Step 5.2(c)). Next, the simulator is run to determine the

objective and constraint function values at the selected offspring (Step 5.2(d)). Then,

the algorithm selects the the parent population for the next generation (Step 5.3). As

before, the new parent population is reordered so that the first one is the best point.

The next several steps attempt to refine the current best solution, which is the best

parent in the next generation x1 (t + 1). In Step 5.4, the surrogates for the objective

and constraints are updated using the newly obtained function values at the offspring

of the current generation. In Step 5.5, a trust-region subproblem (3.2) is solved. For

convenience, all points in the search space where the simulator has been run are

relabeled as v1 , . . . , vn and let vn be the best feasible point found so far. Because of

previous relabeling, vn = x1 (t +1). In this step, the algorithm finds a local minimizer

of the surrogate of the objective within the trust region of radius t centered at the

current best point and subject to the surrogate inequality constraints with a small

margin t and subject to a distance requirement from previously evaluated points.

Then, in Step 5.6,

x(t) is either a solution to the trust-region subproblem (3.2) or it is

the best infeasible solution to (3.2) from a set of randomly generated points within

the trust region. Here,

x(t) is referred to as the local refinement point. In Step 5.7,

the simulator is run to determine the objective and constraint function values at the

local refinement point

x(t). Then, in Step 5.8, the local refinement point replaces

the best parent in the next generation (which is also the current best solution) if

the former is a better point than the latter. Moreover, the trust-region radius is either

expanded or reduced depending on whether the local refinement point

x(t) is feasible,

whether the ratio of the actual improvement to the improvement originally predicted

by the surrogate for

x(t) exceeds 1 or falls below 0 , and also whether the counters

Csuccess or Cfail have reached the thresholds Tsuccess or Tfail . In addition, in Step 5.9,

the margin for the surrogate inequality constraints is reduced if the counter Cinfeas

reached the threshold Tinfeas . Then, Step 5.10 increments the generation counter and

the algorithm goes back into the loop until a stopping criterion is satisfied. Finally,

the best solution found is returned in Step 6. As with the surrogate-assisted EP in

Regis (2014b), the stopping criterion is a fixed number of simulations.

As in Regis (2014b), each parent generates trial offspring, only one of which

becomes an actual offspring for the current generation. The value of the parameter is

chosen to be large so that the expensive simulations are only run on trial offspring that

are very promising as predicted by the surrogates. Moreover, TRICEPS allows for the

possibility of using the BCS strategy from Regis (2011, 2014b) for high-dimensional

or highly constrained problems. In BCS, the mutations are more conservative in that

only a fraction of the components of the parent vector is perturbed when generating

the trial solutions so the probability of perturbing any component pmut < 1. (When

pmut = 1, the algorithm does not use the BCS strategy.) As explained in Regis

(2011, 2014b), the BCS strategy is helpful for high-dimensional problems or highly

constrained problems because perturbing too many components of a parent vector

that is already good is either likely to make the objective function value worse or it

is likely to result in more constraint violations.

62

R.G. Regis

More precisely, in Step 5.2(a), each parent (xi (t), i (t)) in generation t creates

exactly trial offspring (xij (t), ij (t)) for j = 1, . . . , as follows: For k = 1, . . . , d,

(1) Generate a random number from the uniform distribution on [0, 1].

(2) If pmut , then

xij (t)(k) = xij (t)(k) + ij (t)(k) Nk (0, 1),

ij (t)(k) = ij (t)(k) exp( N(0, 1) + Nk (0, 1)).

Else

xij (t)(k) = xij (t)(k) ,

ij (t)(k) = ij (t)(k) .

End.

In Step 5.2(c), the trial offspring solutions are ranked in the same manner as in

Regis (2014b):

(1) Between two solutions that are predicted to be feasible, the one with the better

predicted objective value wins.

(2) Between a solution that is predicted to be feasible and a solution that is predicted

to the infeasible, the former wins.

(3) Between two solutions that are predicted to be infeasible, the one with the fewer

number of predicted constraint violations wins.

(4) Between two solutions that are predicted to be infeasible with the same number

of predicted constraint violations, the one with the better predicted objective

value wins.

In implementing TRICEPS, a continuously differentiable surrogate whose gradient is easy to compute is highly recommended so that efficient gradient-based

techniques can be used to solve the trust-region subproblem (3.2). One such example

of a surrogate is provided in the next section. Note that the gradients of the trustregion constraints and the distance constraints are easy to calculate. In particular, for

the trust-region constraint Tt (x) = x vn t 0 and the distance constraints

Dt,j (x) = x vj 0 for j = 1, . . . , n, the gradients are given by:

Tt (x) =

(x vj )

x vn

and Dt,j (x) =

.

x vn

x vj

TRICEPS can be implemented using any type of surrogate but, as pointed out above,

it is recommended to use one that is continuously differentiable and whose gradients

63

are easy to calculate. One popular choice is kriging or Gaussian process modeling,

but this method is computationally intensive and requires an enormous amount of

memory in high dimensions. This study uses the simpler radial basis function (RBF)

model in Powell (1992) that has been successfully used to develop various RBF

methods (e.g., Bjrkman and Holmstrm 2000; Gutmann 2001; Regis 2011; Regis

and Shoemaker 2007; Wild et al. 2008). Fitting this model differs from the training

method typically used for RBF networks. It involves solving a linear system that

possesses good theoretical properties that can be taken advantage of to solve the

system in a stable and efficient manner.

Given n distinct points x1 , . . . , xn Rd and the function values u(x1 ), . . . , u(xn ),

where u(x) could be the objective function or one of the constraint functions, TRICEPS is implemented below using an interpolant of the form

s(x) =

n

i (x xi ) + p(x), x Rd ,

i=1

in d variables, and can take one of severalforms, including (r) = r 3 (cubic),

(r) = r 2 log r (thin plate spline), (r) = r 2 + 2 (multiquadric) and (r) =

exp( r 2 ) (Gaussian). Here, is a parameter to be determined.

In the numerical experiments, a cubic RBF model is used because it has been

successfully used in various surrogate-based and surrogate-assisted optimization

algorithms (e.g., Bjrkman and Holmstrm 2000; Gutmann 2001; Regis and Shoemaker 2004; Wild et al. 2008), including those that performed relatively well on the

124-dimensional MOPTA08 problem (Regis 2011, 2014a, b) and on problems with

200 decision variables Regis and Shoemaker (2013b). One advantage of this cubic

RBF model over the Gaussian RBF model is that it does not require a parameter.

The parameter in the Gaussian RBF is typically found using leave-one-out crossvalidation and this adds to the computation time for fitting the model. Moreover,

recent work by Wild and Shoemaker (2011) suggests that cubic RBFs might be more

suitable than Gaussian RBFs for surrogate-based optimization. Finally, in preliminary numerical experiments, some settings of the parameter result in Gaussian

RBF models that have many more local minima than the black-box functions that

they are trying to approximate. In contrast, this did not seem to be a problem for the

cubic RBF model.

To fit the above cubic RBF model, define the matrix Rnn by: ij :=

(xi xj ), i, j = 1, . . . , n. Also, define the matrix P Rn(d+1) so that its ith

row is [1, xiT ]. Now, the cubic RBF model that interpolates the points (x1 , u(x1 )), . . . ,

(xn , u(xn )) is obtained by solving the system

P

PT 0(d+1)(d+1)

U

,

=

c

0d+1

(3.3)

64

R.G. Regis

0d+1 Rd+1 is a vector of zeros, = (1 , . . . , n )T Rn and c = (c1 , . . . , cd+1 )T

Rd+1 consists of the coefficients for the linear polynomial p(x). The coefficient

matrix in (3.3) is invertible if and only if rank(P) = d + 1 (Powell 1992). This

condition is equivalent to having a subset of d + 1 affinely independent points among

the points {x1 , . . . , xn }.

The above RBF model is used to construct surrogates for the objective function

f (x) and each of the constraint functions g1 (x), . . . , gm (x) in every generation. For a

given set of data points where the objective and constraint function values are known,

the same interpolation matrix is used so fitting multiple RBF models can be done

relatively efficiently even when m is large by means of standard matrix factorizations.

For the local refinement step, the gradients of the RBF surrogates for the objective

and constraint functions are used to solve the trust-region subproblem. The gradient

of the above RBF model is given by

s(x) =

n

i=1

i (x vi )

(x vi )

+ p(x), x Rd ,

x vi

x = vi for all i,

3.4.1 Benchmark Constrained Optimization Problems

The proposed TRICEPS-RBF algorithm is tested on 18 well-known benchmark test

problems, mostly from Mallipeddi and Suganthan (2010), Michalewicz and Schoenauer (1996), and on a large-scale black-box optimization problemfrom the auto

industry proposed by Don Jones (2008) at the MOPTA (Modeling and Optimization:

Theory and Applications) 2008 conference. The test problems have 230 decision

variables and 111 inequality constraints and they are given in Appendix A and

also in Regis (2014b). They include four 30-dimensional problems from Mallipeddi

and Suganthan (2010) and many of the problems from Michalewicz and Schoenauer

(1996) that only have inequality constraints or bound constraints. As explained in

Regis (2014b), the constraint functions of some of these test problems are rescaled by

either dividing by some positive constant or by applying a logarithmic transformation

without changing the feasible region.

The automotive optimization problem from Jones (2008) is called MOPTA08 and

it is available as a Fortran code at http://anjos.mgi.polymtl.ca/MOPTA2008Bench

mark.html. The MOPTA08 problem has a single black-box objective function to be

minimized, 124 decision variables normalized to [0, 1], and 68 black-box inequality

constraints that are well normalized (Jones 2008). It is much larger and more complex

65

than the problems typically used in surrogate-based or surrogate-assisted optimization (e.g., Basudhar et al. 2012; Egea et al. 2009; Viana et al. 2010). The goal of this

problem is to determine the values of the decision variables (e.g., shape variables)

that minimize the mass of the vehicle subject to performance constraints (e.g., crashworthiness, durability). The MOPTA08 problem is a relatively inexpensive model

of an actual automotive design problem. It is based on kriging response surfaces to

a real automotive problem. Each simulation of this problem takes about 0.32 s on

an Intel(R) Core(TM) i7 CPU 860 2.8 Ghz desktop machine while each simulation

of the real version could take 13 days (Jones 2008). However, as in Regis (2011,

2014b) the different algorithms are compared by assuming that the simulations are

expensive.

The effectiveness of the proposed TRICEPS-RBF algorithm is evaluated by

comparing it with a previously developed surrogate-assisted EP called CEP-RBF

(Regis 2014b) and also with a standard EP for constrained problems described in

Regis (2014b). Moreover, TRICEPS-RBF is compared with Stochastic Ranking Evolution Strategy (SRES) (Runarsson and Yao 2000), Scatter Search (eSS) (Egea et

al. 2007), and with an RBF-assisted EP for bound constrained problems that has

been modified to handle the inequality constraints via a penalty approach (Regis

2014b). In addition, the proposed method is compared with the ConstrLMSRBF

(Regis 2011) heuristic and with a sequential penalty derivative-free algorithm called

SDPEN (Liuzzi et al. 2010) that has a mathematically rigorous convergence guarantee. Although there are other surrogate-assisted evolutionary algorithms for constrained optimization in the literature (e.g., kriging-assisted scatter search (Egea et al.

2009) and surrogate-assisted SRES (Runarsson 2004)), the codes for these methods

are not yet publicly available.

In the results below, the TRICEPS-RBF algorithm is labeled as ( + )-TRICEPSRBF while the previously developed RBF-assisted EP from Regis (2014b) is labeled

as ( + )-CEP-RBF. Moreover, this paper uses the algorithm labels from Regis

(2014b) such as the ( + )-CEP for the standard constrained EP and the ( + )PenCEP-RBF for the RBF-assisted penalty-based constrained EP. In addition, an

algorithm label is given a BCS suffix if the algorithm uses the BCS strategy that

is meant for high-dimensional problems. As in Regis (2014b), the BCS strategy is

applied only to the 124-dimensional highly constrained MOPTA08 problem.

66

Table 3.1 Parameter settings

for TRICEPS-RBF

R.G. Regis

Parameter

Value

init

pmut

0

min

max

0

1

0

1

Tfail

Tsuccess

init

Tinfeas

2 or 5

= min(103 d, 104 )

0.05([a, b])

0.1 (with BCS) or 1 (without BCS)

0.05([a, b])

0.0125([a, b])

0.1([a, b])

0

0.5

0.5

2

min(max(
pmut d, 5), 30)

2

0.0005([a, b])

max(3, d)

0.0005([a, b])

The number of parents in each generation for the EP methods (including the

RBF-assisted ones) is = 2 or 5 and the initial standard deviation of the Gaussian

mutations is init = 0.2([a, b]), where ([a, b]) is the side length of the hypercube

[a, b] in (3.1). For the RBF-assisted EPs (TRICEPS-RBF, CEP-RBF and PenCEPRBF), the number of trial offspring for each parent is = min(103 d, 104 ). Moreover,

when applying the BCS strategy, the probability of perturbing a coordinate is pmut =

0.1 as in Regis (2014b). The other parameters for the ( + )-TRICEPS-RBF are

summarized in Table 3.1.

All algorithms are run on Matlab 7.12 using an Intel(R) Core(TM) i7 CPU

860 2.8 Ghz desktop machine. In particular, a Matlab version of SDPEN, called

SDPENm, is used on the test problems. Each algorithm is run for 10 trials on the

MOPTA08 problem and 30 trials on each of the other test problems. Moreover, each

trial of each algorithm is run for 1,000 simulations on the MOPTA08 problem, 300

simulations on the 30-dimensional test problems, and 200 simulations on the remaining (mostly lower dimensional) problems. Each trial begins with a feasible point that

is the same for all algorithms. For the MOPTA08 problem, only one feasible starting

point is given in Jones (2008) so all trials use this point. This feasible point has an

objective function value of 251.0706, and according to Jones (2008), any algorithm

that can achieve a feasible objective function value of 228 or lower within a relatively

limited number of simulations (say a few thousand simulations) is a good algorithm

for this problem. Moreover, each trial of an EP (with or without RBF surrogates)

begins with the feasible initial point together with a randomly generated Latin hypercube design (LHD) consisting of d + 1 affinely independent points, none of which

67

are guaranteed to be feasible. The case where no feasible point is available at the

beginning will be dealt in future work. In addition, all EP algorithms (with or without

RBF surrogates) use the same LHD in a given trial and their initial parent populations

consist of the best points from d + 2 points: the d + 1 LHD points and the feasible

starting point.

The settings for the alternative methods are the same as those used in Regis

(2014b). For example, for SRES (Runarsson and Yao 2000), = 8 and = 50

for the regular test problems and = 20 and = 140 for the MOPTA08 problem.

The initial population consists of the best points from the same initial points used

by the EP algorithms and the default values are used for the other parameters. For

the eSS code (Egea et al. 2007), the default parameters are modified to reduce the

time spent on the initialization phase. For example, the number of solutions generated by the diversificator is set to 2d, whereas the default is 10d. In addition,

ConstrLMSRBF is initialized by the LHDs used by the RBF-assisted EPs so it is

labeled as ConstrLMSRBF-LHD. Finally, SDPEN has no user-specified parameters

but it requires an initial point, which is the best point among the LHD points and the

feasible starting point.

3.5.1 Performance and Data Profiles

TRICEPS-RBF is compared to other methods using performance and data profiles

(Mor and Wild 2009) instead of the average progress curves used in Regis (2011,

2014b). An average progress curve is a plot of the mean of the best feasible objective

function value obtained by an algorithm versus the number of simulations. It has the

disadvantage of providing a somewhat inaccurate picture of the comparisons when

the distributions of the best feasible objective function values are strongly skewed,

thereby making the mean inaccurate as a measure of the center of a distribution.

Performance and data profiles do not have this difficulty and they greatly simplify

the comparisons in that the analysis can be done for an entire collection of test

problems instead of doing separate analysis for each test problem.

Let P be the set of problems where a given problem p corresponds to a particular test problem and a particular feasible starting point. Since there are 18 test

problems and 30 feasible starting points (corresponding to the 30 trials), there are

18 30 = 540 problems for the profiles. Moreover, let S be the set of solvers

(e.g., (2+2)-TRICEPS-RBF, (2+2)-CEP-RBF, (2+2)-PenCEP-RBF, (2+2)-CEP,

ConstrLMSRBF, Scatter Search, Stochastic Ranking ES, and SDPEN). For any pair

(p, s) of a problem p and a solver s, the performance ratio is

rp,s =

tp,s

,

min{tp,s : s S }

68

R.G. Regis

where tp,s is the number of simulations required to satisfy the convergence test defined

below. Here, one simulation means one evaluation of the objective and each of the

inequality constraint functions. Clearly, rp,s 1 for any p P and s S , and the

best solver for a given problem attains rp,s = 1. By convention, rp,s = whenever

solver s fails to yield a solution that satisfies the convergence test.

Now, for any solver s S and for any 1, the performance profile of s with

respect to is the fraction of problems where the performance ratio is at most , i.e.,

s () =

1

{p P : rp,s } .

|P|

For any solver s S , the performance profile curve of s is the graph of the performance profiles of s for a range of values of .

In derivative-free, constrained expensive black-box optimization, algorithms are

compared given a fixed and relatively limited number of simulations. Hence, the

convergence test by Mor and Wild (2009) uses a tolerance > 0 and the minimum

feasible objective function value fL obtained by any of the solvers on a particular

problem within a given number s of simulations and it checks if a feasible point x

obtained by a solver satisfies

f (x (0) ) f (x) (1 )(f (x (0) ) fL ),

where x (0) is a feasible starting point corresponding to the given problem. That is,

x is required to achieve a reduction that is 1 times the best possible reduction

f (x (0) ) fL . Here, feasibility is determined according to some constraint tolerance,

which is set to 106 ([a, b]) in this study. Moreover, the parameter is set to 0.05 in

the numerical experiments.

Next, given a solver s S and > 0, the data profile of s with respect to

(Mor and Wild 2009) is given by

tp,s

1

,

pP :

ds () =

|P|

np + 1

where tp,s is the number of simulations required by solver s to satisfy the convergence

test on problem p and np is the number of decision variables in problem p. For any

solver s S , the data profile curve of s is the graph of the data profiles of s for

a range of values of . For a given solver s and any > 0, ds () is the fraction

of problems solved (i.e., problems where the solver generated a feasible point

satisfying the convergence test) by s within (np + 1) simulations (equivalent to

simplex gradient estimates (Mor and Wild 2009)).

Mor and Wild (2009) point out that data profiles are more suitable for comparing optimization algorithms when function evaluations are computationally expensive. This is because performance profiles can only compare algorithms at a fixed

69

computational budget (say after 200 simulations) while data profiles can compare

algorithms at different computational budgets and this is more valuable to users in

the computationally expensive setting where the short-term behavior of algorithms is

more important than long-term behavior. Moreover, since the number of simulations

needed to satisfy the above convergence test typically grows with the problem size,

data profiles take into account the number of decision variables in the problems.

On the other hand, performance profiles ignore problem size. Hence, in some cases

below, only the data profiles are shown to avoid clutter in the presentation of results.

on the Benchmark Test Problems

First, TRICEPS-RBF is compared with CEP-RBF (Regis 2014b), which is a recently

developed RBF-assisted EP, and also with a standard constrained EP described in

Regis (2014b). Figure 3.2 shows the performance and data profile curves of (2 + 2)TRICEPS-RBF, (5+5)-TRICEPS-RBF, (2+2)-CEP-RBF, (5+5)-CEP-RBF, (2+2)CEP and (5 + 5)-CEP after 200 simulations on the 18 test problems. It is clear

from both profiles that the RBF-assisted EPs (TRICEPS-RBF and CEP-RBF) are

dramatically much better than the corresponding standard EPs. Moreover, (2 + 2)TRICEPS-RBF is better than the (2 + 2)-CEP-RBF but (5 + 5)-TRICEPS-RBF

does not seem to have any advantage over (5 + 5)-CEP-RBF. However, when the

set of problems is restricted to the 30-dimensional test problems from Mallipeddi

and Suganthan (2010) (30 trials with different feasible starting points on C07, C08,

C14, and C15), the resulting performance and data profiles after 300 simulations in

Fig. 3.3 show that the two TRICEPS-RBF algorithms are now both better than the

corresponding CEP-RBF algorithms. Also, the advantage of (2 + 2)-TRICEPS-RBF

over (2 + 2)-CEP-RBF is more pronounced. Moreover, a similar result is obtained

when the set of problems is restricted to test problems that have at least 5 inequality

constraints or problems that have at least 20 decision variables (30 trials on Speed

Reducer, Welded Beam, G3MOD, G7, G10, Hesse, C07, C08, C14, and C15) as

can be seen from the resulting performance and data profiles in Fig. 3.4. A possible

explanation for this is that the more thorough local refinement step that uses the

gradients of the RBF models of the objective and constraint functions is able to yield

a more promising point than the one provided by the simpler sampling procedure

in CEP-RBF on more difficult problems (either high-dimensional or has many constraints). These results provide evidence that the trust-region-like local refinement

step in TRICEPS-RBF yields better results than the previously developed CEP-RBF

on the higher dimensional or more highly constrained problems.

70

R.G. Regis

Performance profiles after 200 simulations (constraint tolerance = 106 )

1

0.9

0.8

0.7

0.6

(2+2)TRICEPSRBF

(5+5)TRICEPSRBF

(2+2)CEPRBF

(5+5)CEPRBF

(2+2)CEP

(5+5)CEP

() 0.5

s

0.4

0.3

0.2

0.1

0

1.5

2.5

3.5

Performance Factor

6

1

0.9

0.8

0.7

0.6

(2+2)TRICEPSRBF

(5+5)TRICEPSRBF

(2+2)CEPRBF

(5+5)CEPRBF

(2+2)CEP

(5+5)CEP

d ()

s

0.5

0.4

0.3

0.2

0.1

0

10

15

20

25

30

35

40

45

50

Fig. 3.2 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on all test problems

71

1

0.9

0.8

0.7

0.6

(2+2)TRICEPSRBF

(5+5)TRICEPSRBF

(2+2)CEPRBF

(5+5)CEPRBF

(2+2)CEP

(5+5)CEP

() 0.5

s

0.4

0.3

0.2

0.1

0

1.5

2.5

3.5

Performance Factor

6

1

(2+2)TRICEPSRBF

(5+5)TRICEPSRBF

(2+2)CEPRBF

(5+5)CEPRBF

(2+2)CEP

(5+5)CEP

0.9

0.8

0.7

0.6

d () 0.5

s

0.4

0.3

0.2

0.1

0

10

Fig. 3.3 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on the 30-dimensional test problems

72

R.G. Regis

Performance profiles after 200 simulations (constraint tolerance = 106 )

1

0.9

0.8

0.7

0.6

(2+2)TRICEPSRBF

(5+5)TRICEPSRBF

(2+2)CEPRBF

(5+5)CEPRBF

(2+2)CEP

(5+5)CEP

s() 0.5

0.4

0.3

0.2

0.1

0

1.5

2.5

3.5

Performance Factor

Data profiles up to 50 simplex gradients (constraint tolerance = 106 )

1

0.9

0.8

0.7

0.6

d ()

s

0.5

0.4

0.3

(2+2)TRICEPSRBF

(5+5)TRICEPSRBF

(2+2)CEPRBF

(5+5)CEPRBF

(2+2)CEP

(5+5)CEP

0.2

0.1

0

10

15

20

25

30

35

40

45

50

Fig. 3.4 Performance and data profiles for (+)-TRICEPS-RBF, (+)-CEP-RBF and (+)CEP on test problems with at least 20 decision variables or with at least 5 inequality constraints

73

Methods on the Benchmark Test Problems

The (2 + 2)-TRICEPS-RBF is also compared with alternative methods including

(2 + 2)-PenCEP-RBF, ConstrLMSRBF, Scatter Search, Stochastic Ranking ES, and

SDPEN. The performance profiles on the test problems after 200 simulations and

the data profiles up to a maximum number of simulations equivalent to 50 simplex

gradients are shown in Fig. 3.5. It is clear from the performance and data profiles

that the (2 + 2)-TRICEPS-RBF is generally much better than other alternatives,

including the mathematically rigorous sequential penalty derivative-free algorithm

SDPEN that is published in a prestigious optimization journal. However, to be fair,

Scatter Search, Stochastic Ranking ES and SDPEN do not use surrogates and it would

be interesting to see how their performance would change if they are also combined

with surrogates.

To get some idea of how the different algorithms compare on individual test

problems, figures in Appendix B show the data profiles on some of the test problems.

For example, Figs. 3.10, 3.11, 3.12, 3.13 and 3.14 show the data profiles on some

problems where the (2 + 2)-TRICEPS-RBF performed very well in comparison with

the alternatives. However, although the (2 + 2)-TRICEPS-RBF is generally much

better than the alternatives on the test problems, Figs. 3.15, 3.16 and 3.17 show some

test problems where its performance is not as good as some of the alternatives.

on the MOPTA08 Automotive Application Problem

Table 3.2 provides the statistics on the best feasible objective function value (over

10 trials) obtained by TRICEPS-RBF and the alternative methods after 1,000 simulations of the MOPTA08 problem. Some of these results are taken from Regis

(2014a, b). It is clear from this table that the (2 + 2)-TRICEPS-RBF-BCS is the best

among the different algorithms used on the MOPTA08 problem. In particular, the

(2 + 2)-TRICEPS-RBF-BCS is an improvement over the (2 + 2)-CEP-RBF-BCS

and it is better than ConstrLMSRBF-LHD-BCS (Regis 2011) on the MOPTA08

problem. Moreover, (2 + 2)-TRICEPS-RBF (without the BCS strategy) is a substantial improvement over (2 + 2)-CEP-RBF (without BCS). This suggests that the

trust-region-like local refinement step in TRICEPS-RBF is also helpful for the larger

and more complex MOPTA08 problem. As before, it is of interest to note that the

(2 + 2)-TRICEPS-RBF-BCS, (2 + 2)-TRICEPS-RBF, and (2 + 2)-CEP-RBF-BCS

performed much better than SDPEN, which is a sequential penalty derivative-free

algorithm with a mathematically rigorous convergence guarantee.

74

R.G. Regis

6

1

0.9

0.8

0.7

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.6

() 0.5

s

0.4

0.3

0.2

0.1

0

Performance Factor

Data profiles up to 50 simplex gradients (constraint tolerance = 106)

1

0.9

0.8

0.7

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.6

d () 0.5

s

0.4

0.3

0.2

0.1

0

10

15

20

25

30

35

40

45

50

Fig. 3.5 Performance and data profiles for ( + )-TRICEPS-RBF and alternative methods on all

test problems

75

Table 3.2 Statistics on best feasible objective function value after 1,000 simulations of the

MOPTA08 problem (10 trials)

Algorithm

Best

Median

Worst

Mean

Std Error

(2+2)-TRICEPS-RBF

(2+2)-TRICEPS-RBF-BCS

(2+2)-CEP-RBF

(2+2)-CEP-RBF-BCS

(2+2)-PenCEP-RBF

(2+2)-PenCEP-RBF-BCS

(2+2)-CEP

Stochastic Ranking ES

Scatter Search (eSS)

ConstrLMSRBF-LHD-BCS

SDPEN

227.27

225.48

231.18

226.76

251.07

246.96

251.07

251.07

251.07

225.75

231.77

228.18

226.19

238.62

228.51

251.07

247.84

251.07

251.07

251.07

227.30

231.77

228.76

227.42

251.07

228.92

251.07

248.99

251.07

251.07

251.07

228.64

231.77

228.20

226.43

240.13

228.16

251.07

247.84

251.07

251.07

251.07

227.27

231.77

0.14

0.22

2.10

0.23

0.00

0.22

0.00

0.00

0.00

0.26

0

As can be seen from Sect. 3.3.2, TRICEPS depends on many user-specified parameters. This section analyzes how sensitive TRICEPS-RBF is to some of these parameters. In particular, the (2 + 2)-TRICEPS-RBF is run on the same test problems by

varying the values of the parameters (the number of trial offspring generated for

each parent), init (the initial standard deviation of the Gaussian mutations), and init

(the initial trust-region radius). As before, the (2 + 2)-TRICEPS-RBF using a given

set of parameters is run for 30 trials for each test problem. The sensitivity analysis

is only performed on three of the parameters since a full analysis of all parameters

is computationally prohibitive since the use of surrogates in TRICEPS-RBF incurs

substantial computing cost.

Figure 3.6 shows the data profiles of (2+2)-TRICEPS-RBF with = min(1,000d,

104 ) (default), = min(500d, 104 ), and = min(100d, 104 ). Note that there

does not seem to be much difference in performance between the default and

= min(500d, 104 ) but there was some deterioration in performance for the much

smaller value = min(100d, 104 ). This indicates that (2 + 2)-TRICEPS-RBF is not

very sensitive to when it is reasonably large. This is somewhat expected because

when the value of is large enough to generate trial offspring that adequately sample

the neighborhood of a parent solution, adding more trial offspring is not expected

to improve performance. However, a much smaller value of could result in a less

thorough search for promising offspring for each parent solution thereby resulting

in diminished performance.

Figure 3.7 shows the data profiles of (2 + 2)-TRICEPS-RBF with init =

0.05([a, b]) (default), init = 0.1([a, b]), and init = 0.2([a, b]) on all test problems. (Recall that for all test problems, [a, b] = [0, 1]d so ([a, b]) = 1.) Moreover,

Fig. 3.8 shows the data profiles of the same algorithms on the problems with at least 5

76

R.G. Regis

6

1

0.9

0.8

0.7

0.6

d () 0.5

s

0.4

0.3

0.2

0.1

10

15

20

25

30

Fig. 3.6 Data profiles for (2 + 2)-TRICEPS-RBF with different values of on all test problems

1

0.9

0.8

0.7

0.6

d ()

s

0.5

0.4

0.3

0.2

0.1

(2+2)TRICEPSRBF (

init

10

15

20

25

= 0.2)

30

Fig. 3.7 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on all test problems

77

6

1

0.9

0.8

0.7

0.6

d ()

s

0.5

0.4

0.3

(2+2)TRICEPSRBF (init = 0.05)

0.2

(2+2)TRICEPSRBF (

= 0.1)

(2+2)TRICEPSRBF (

= 0.2)

init

0.1

init

10

15

20

25

30

Fig. 3.8 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on problems with

at least 5 inequality constraints

1

0.9

0.8

0.7

0.6

d ()

s

0.5

0.4

0.3

(2+2)TRICEPSRBF (init = 0.05)

0.2

(2+2)TRICEPSRBF (

= 0.1)

(2+2)TRICEPSRBF (

= 0.2)

init

0.1

init

10

15

20

25

30

Fig. 3.9 Data profiles for (2 + 2)-TRICEPS-RBF with different values of init on all test problems

78

R.G. Regis

sensitive to the choice of the initial standard deviation of the Gaussian mutations

init with the default setting (the smallest init ) being the best choice among the three

settings for the test problems used. A possible explanation for this is that, in a constrained problem, it makes sense to be conservative with the mutations when starting

from a feasible point. Larger values of init are more likely to generate points that

violate one of the constraints, especially when there are many constraints.

Finally, Fig. 3.9 shows the data profiles of (2 + 2)-TRICEPS-RBF with init =

0.05([a, b]) (default), init = 0.1([a, b]), and init = 0.2([a, b]). Note that the

(2 + 2)-TRICEPS-RBF appears to be somewhat sensitive to the choice of the initial

trust-region radius init . In particular, on the test problems used, a larger initial trustregion radius than the default value seems to result in better performance possibly

because it allows for larger steps.

3.6 Conclusions

This paper developed the TRICEPS algorithm, which is a surrogate-assisted Evolutionary Programming (EP) algorithm for computationally expensive constrained

optimization problems having only black-box inequality constraints and bound constraints. It is meant to be an improvement over CEP-RBF (Regis 2014b) in that the

algorithm performs a trust-region-like local refinement step at the end of every generation where it finds a minimizer of the surrogate model of the objective within a trust

region subject to surrogate inequality constraints with a small margin and subject to

some distance requirement from previously evaluated points. Moreover, TRICEPS

is implemented using a cubic RBF with a linear polynomial tail and a gradient-based

algorithm is used to solve the trust-region-like subproblem. TRICEPS-RBF and CEPRBF are among the few surrogate-assisted EAs that use surrogates to approximate

the constraints and that have been successfully applied to a problem that is considered

large-scale in surrogate-based or surrogate-assisted optimization. TRICEPS-RBF is

compared with alternatives, including CEP-RBF and the mathematically rigorous

sequential penalty derivative-free algorithm SDPEN (Liuzzi et al. 2010), on 18 wellknown benchmark problems and on the MOPTA08 automotive application with 124

decision variables and 68 black-box inequality constraints, which is much larger than

the typical problem used in this area.

TRICEPS-RBF and the alternatives are compared on the 18 test problems using

performance and data profiles (Mor and Wild 2009) instead of average progress

curves such as the ones used in Regis (2014b). Moreover, the algorithms are compared in terms of the best feasible objective function value obtained after only 1,000

simulations on the MOPTA08 problem. The profile curves show that TRICEPS-RBF

is an improvement over CEP-RBF on problems that are either high-dimensional or

highly constrained. Moreover, the results confirm the previous findings in Regis

(2014b) that using an RBF surrogate can dramatically improve the performance of a

constrained EP. Furthermore, the (2 + 2)-TRICEPS-RBF algorithm is substantially

79

and consistently much better than the SDPEN algorithm, an RBF-assisted penaltybased EP, Stochastic Ranking Evolution Strategy (SRES) and Scatter Search (eSS)

on the problems in this study when the algorithms are given a very limited computational budget. In addition, TRICEPS-RBF is also better than the ConstrLMSRBFLHD heuristic (Regis 2011). Finally, sensitivity analyses of TRICEPS-RBF to some

of the user-specified parameters on the test problems suggest that it is somewhat

sensitive to the choice of the initial standard deviation of the Gaussian mutations and

the initial trust-region radius but not so much on the number of trial offpsring for

each parent solution.

On the MOPTA08 problem, (2 + 2)-TRICEPS-RBF-BCS is better than both

(2+2)-CEP-RBF-BCS (Regis 2014b) and ConstrLMSRBF-LHD-BCS (Regis 2011)

while requiring much less computational overhead than ConstrLMSRBF-LHD-BCS.

Moreover, both (2 + 2)-TRICEPS-RBF-BCS and (2 + 2)-CEP-RBF-BCS are much

better than the other alternatives, including SDPEN, on the MOPTA08 problem.

In addition, the results also confirm the previous finding in Regis (2014b) that the

BCS strategy (Regis 2011, 2014b) is very promising for high-dimensional problems

and highly constrained problems. Overall, TRICEPS-RBF is very promising for

computationally expensive constrained black-box optimization and it helps push the

frontier of surrogate-assisted constrained evolutionary optimization.

Acknowledgments Special thanks to Don Jones from General Motors Product Development for

proposing the MOPTA08 benchmark problem and for making a Fortran simulation code for this

problem publicly available. I would also like to thank Prof. Thomas Philip Runarsson for the Matlab

code for Stochastic Ranking Evolution Strategy, Dr. Julio Bangas research group for the Matlab

code for Scatter Search, and Drs. Mallipeddi and Suganthan for the codes that implement the

benchmark problems from the CEC 2010 competition.

Appendix

A. Test Problems

There are four engineering design test problems: Welded Beam Design Problem

(WB4) (Coello Coello and Mezura-Montes 2002; Hedar 2004), Pressure Vessel

Design Problem (PVD4) (Coello Coello and Mezura-Montes 2002; Hedar 2004),

Gas Transmission Compressor Design Problem (GTCD) (Beightler and Phillips

1976), and Speed Reducer Design for small aircraft engine (SR7) (Floudas and

Pardalos 1990). Nine of the test problems are from the well-known constrained optimization test problems in Michalewicz and Schoenauer (1996). These are labeled

G2, G3MOD, G4, G5MOD, G6, G7, G8, G9, and G10. The G3MOD and G5MOD

problems are obtained from G3 and G5 by replacing all equality constraints with

inequality constraints. The Hesse problem is from Hesse (1973). Finally, four of the

test problems are the 30-dimensional versions of the problems C07, C08, C14 and

C15 from Mallipeddi and Suganthan (2010).

80

R.G. Regis

dividing by a positive constant or by applying a logarithmic transformation without

changing the feasible region. A similar modification of the constraint functions was

performed by Jones (2008) on the MOPTA08 problem so that the constraints are wellnormalized. The plog transformation used in some of the constraints was introduced

in Regis and Shoemaker (2013a) and it is defined by

plog(x) =

log(1 + x)

log(1 x)

if x 0

if x < 0

where log is the natural logarithm. The mathematical properties of this transformation

are discussed in Regis and Shoemaker (2013a). In particular, it is strictly increasing,

symmetric with respect to the origin, and it tones down extremely high or extremely

negative function values without changing the location of the local minima and

maxima.

Welded Beam (WB4) (Coello Coello and Mezura-Montes 2002; Hedar 2004):

f (x) = 1.10471x12 x2 + 0.04811x3 x4 (14.0 + x2 )

s.t.

P = 6, 000, L = 14, E = 30 106 , G = 12 106

tmax = 13600, smax = 30, 000, xmax = 10, dmax = 0.25

E/G

4.013E

3

x

x

Pc =

1

0.25x

3 4

3

6L 2

L

t = t12 + t1 t2 x2 /R + t22

s = 6PL/(x4 x32 )

d = 4PL 3 /(Ex4 x33 )

g1 (x) = (t tmax )/tmax 0

g2 (x) = (s smax )/smax 0

g3 (x) = (x1 x4 )/xmax 0

g4 (x) = (0.10471x12 + 0.04811x3 x4 (14.0 + x2 ) 5.0)/5.0 0

g5 (x) = (d dmax )/dmax 0

g6 (x) = (P Pc )/P 0

0.125 x1 10, 0.1 xi 10 for i = 2, 3, 4

81

Pressure Vessel Design (PVD4) (Coello Coello and Mezura-Montes 2002; Hedar

2004):

f (x) = 0.6224x1 x3 x4 + 1.7781x2 x32 + 3.1661x12 x4 + 19.84x12 x3

s.t.

g1 (x) = x1 + 0.0193x3 0

g2 (x) = x2 + 0.00954x3 0

g3 (x) = plog( x32 x4 43 x33 + 12, 96, 000) 0

0 x1 , x2 1, 0 x3 50, 0 x4 240

Speed Reducer (SR7) (Floudas and Pardalos 1990):

f (x) = 0.7854x1 x22 A 1.508x1 B + 7.477C + 0.7854D

where

A = 3.3333x32 + 14.9334x3 43.0934

B = x62 + x72

C = x63 + x73

D = x4 x62 + x5 x72

s.t.

g1 (x) = (27 x1 x22 x3 )/27 0

g2 (x) = (397.5 x1 x22 x32 )/397.5 0

g3 (x) = (1.93 (x2 x64 x3 )/x43 )/1.93 0

g4 (x) = (1.93 (x2 x74 x3 )/x53 )/1.93 0

0.5

A1 = (745x4 /(x2 x3 ))2 + (16.91 106 )

B1 = 0.1x63

g5 (x) = ((A1/B1) 1100)/1100 0

0.5

A2 = (745x5 /(x2 x3 ))2 + (157.5 106 )

B2 = 0.1x73

g6 (x) = ((A2/B2) 850)/850 0

g7 (x) = (x2 x3 40)/40 0

g8 (x) = (5 (x1 /x2 ))/5 0

g9 (x) = ((x1 /x2 ) 12)/12 0

g10 (x) = (1.9 + 1.5x6 x4 )/1.9 0

g11 (x) = (1.9 + 1.1x7 x5 )/1.9 0

2.6 x1 3.6, 0.7 x2 0.8, 17 x3 28

7.3 x4 , x5 8.3, 2.9 x6 3.9, 5.0 x7 5.5

82

R.G. Regis

2/3 1/2

x4

+ (3.69 104 )x3

+ (7.72 108 )x11 x20.219 (765.43 106 )x11

1/2

s.t.

20 x1 50, 1 x2 10, 20 x3 50, 0.1 x4 60

d

i=1 cos4 (xi ) 2 di=1 cos2 (xi )

f (x) =

d

2

i=1 ixi

s.t.

d

/plog(10d ) 0

g1 (x) = plog xi + 0.75

g2 (x) =

i=1

d

xi 7.5d /(2.5d) 0

i=1

0 xi 10 for i = 1, 2, . . . , d

G3MOD (Michalewicz and Schoenauer 1996) (d = 20):

d

d

xi

f (x) = plog ( d)

i=1

s.t.

g1 (x) =

d

xi2 1 0

i=1

0 xi 1 for i = 1, 2, . . . , d

G4 (Michalewicz and Schoenauer 1996):

f (x) = 5.3578547x32 + 0.8356891x1 x5 + 37.293239x1 40792.141

s.t.

u = 85.334407 + 0.0056858x2 x5 + 0.0006262x1 x4 0.0022053x3 x5

g1 (x) = u 0

g2 (x) = u 92 0

g3 (x) = v + 90 0

g4 (x) = v 110 0

w = 9.300961 + 0.0047026x3 x5 + 0.0012547x1 x3 + 0.0019085x3 x4

g5 (x) = w + 20 0

g6 (x) = w 25 0

78 x1 102, 33 x2 45, 27 xi 45 for i = 3, 4, 5

f (x) = 3x1 + 106 x13 + 2x2 + (2 106 /3)x23

s.t.

g1 (x) = x3 x4 0.55 0

g2 (x) = x4 x3 0.55 0

g3 (x) = 1,000 sin(x3 0.25) + 1,000 sin(x4 0.25) + 894.8 x1 0

g4 (x) = 1,000 sin(x3 0.25) + 1,000 sin(x3 x4 0.25) + 894.8 x2 0

g5 (x) = 1,000 sin(x4 0.25) + 1,000 sin(x4 x3 0.25) + 1294.8 0

0 x1 , x2 1, 200, 0.55 x3 , x4 0.55

f (x) = (x1 10)3 + (x2 20)3

s.t.

g1 (x) = ((x1 5)2 (x2 5)2 + 100)/100 0

g2 (x) = ((x1 6)2 + (x2 5)2 82.81)/82.81 0

13 x1 100, 0 x2 100

G7 (Michalewicz and Schoenauer 1996):

f (x) = x12 + x22 + x1 x2 14x1 16x2 + (x3 10)2 + 4(x4 5)2

+ (x5 3)2 + 2(x6 1)2 + 5x72 + 7(x8 11)2

+ 2(x9 10)2 + (x10 7)2 + 45

s.t.

g1 (x) = (4x1 + 5x2 3x7 + 9x8 105)/105 0

g2 (x) = (10x1 8x2 17x7 + 2x8 )/370 0

g3 (x) = (8x1 + 2x2 + 5x9 2x10 12)/158 0

g4 (x) = (3(x1 2)2 + 4(x2 3)2 + 2x32 7x4 120)/1258 0

83

84

R.G. Regis

g6 (x) = (0.5(x1 8)2 + 2(x2 4)2 + 3x52 x6 30)/834 0

g7 (x) = (x12 + 2(x2 2)2 2x1 x2 + 14x5 6x6 )/788 0

g8 (x) = (3x1 + 6x2 + 12(x9 8)2 7x10 )/4048 0

10 xi 10 for i = 1, 2, . . . , 10

G8 (Michalewicz and Schoenauer 1996):

f (x) =

x13 (x1 + x2 )

s.t.

g1 (x) = x12 x2 + 1 0

g2 (x) = 1 x1 + (x2 4)2 0

0 x1 , x2 10

G9 (Michalewicz and Schoenauer 1996):

f (x) = (x1 10)2 + 5(x2 12)2 + x34 + 3(x4 11)2

+10x56 + 7x62 + x74 4x6 x7 10x6 8x7

s.t.

g1 (x) = (2x12 + 3x24 + x3 + 4x42 + 5x5 127)/127 0

g2 (x) = (7x1 + 3x2 + 10x32 + x4 x5 282)/282 0

g3 (x) = (23x1 + x22 + 6x62 8x7 196)/196 0

g4 (x) = 4x12 + x22 3x1 x2 + 2x32 + 5x6 11x7 0

10 xi 10 for i = 1, . . . , 7

G10 (Michalewicz and Schoenauer 1996):

f (x) = x1 + x2 + x3

s.t.

g1 (x) = 1 + 0.0025(x4 + x6 ) 0

g2 (x) = 1 + 0.0025(x4 + x5 + x7 ) 0

g3 (x) = 1 + 0.01(x5 + x8 ) 0

g4 (x) = plog(100x1 x1 x6 + 833.33252x4 83333.333) 0

g5 (x) = plog(x2 x4 x2 x7 1, 250x4 + 1, 250x5 ) 0

g6 (x) = plog(x3 x5 x3 x8 2, 500x5 + 12, 50, 000) 0

102 x1 104 , 103 x2 , x3 104 ,

10 xi 103 for i = 4, 5, . . . , 8

85

Hesse (1973):

f (x) = 25(x1 2)2 (x2 2)2 (x3 1)2 (x4 4)2 (x5 1)2 (x6 4)2

s.t.

g1 (x) = (2 x1 x2 )/2 0

g2 (x) = (x1 + x2 6)/6 0

g3 (x) = (x1 + x2 2)/2 0

g4 (x) = (x1 (3x2 ) 2)/2 0

g5 (x) = (4 (x3 3)2 x4 )/4 0

g6 (x) = (4 (x5 3)2 x6 )/4 0

0 x1 5, 0 x2 4, 1 x3 5

0 x4 6, 1 x5 5, 0 x6 10

f (x) =

d1

[100(zi2 zi+1 )2 + (zi 1)2 ]

i=1

the code by Mallipeddi and Suganthan 2010

s.t.

d

1

yi2

g1 (x) = 0.5 exp 0.1

d

i=1

d

1

3 exp

cos(0.1yi ) + exp(1) 1

d

i=1

140 xi 140, i = 1, . . . , d

C08 (Mallipeddi and Suganthan 2010):

f (x) =

d1

[100(zi2 zi+1 )2 + (zi 1)2 ]

i=1

given in the code by Mallipeddi and Suganthan 2010

s.t.

86

R.G. Regis

d

1

g1 (x) = 0.5 exp 0.1

yi2

d

i=1

d

1

3 exp

cos(0.1yi ) + exp(1) 1

d

i=1

140 xi 140, i = 1, . . . , d

f (x) =

d1

i=1

the code by Mallipeddi and Suganthan (2010)

s.t.

d

(yi cos( |yi |)) d 0

g1 (x) =

i=1

g2 (x) =

d

(yi cos( |yi |)) d 0

i=1

d

g3 (x) =

(yi sin( |yi |)) 10d 0

i=1

1,000 xi 1,000, i = 1, . . . , d

f (x) =

d1

[100(zi2 zi+1 )2 + (zi 1)2 ]

i=1

given in the code by Mallipeddi and Suganthan (2010)

s.t.

d

(yi cos( |yi |)) d 0

g1 (x) =

i=1

g2 (x) =

d

87

(yi cos( |yi |)) d 0

i=1

d

g3 (x) =

(yi sin( |yi |)) 10d 0

i=1

1,000 xi 1,000, i = 1, . . . , d

Figures 3.10, 3.11, 3.12, 3.13, 3.14, 3.15, 3.16 and 3.17.

Data profiles up to 10 simplex gradients (constraint tolerance = 106 )

1

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.9

0.8

0.7

0.6

d () 0.5

s

0.4

0.3

0.2

0.1

0

10

Fig. 3.10 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G3MOD problem

88

R.G. Regis

6

1

0.9

0.8

0.7

0.6

d () 0.5

s

0.4

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.3

0.2

0.1

0

10

Fig. 3.11 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the C07 problem

1

0.9

0.8

0.7

0.6

ds()

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.5

0.4

0.3

0.2

0.1

0

10

15

20

25

30

Fig. 3.12 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Hesse problem

89

1

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.9

0.8

0.7

0.6

d () 0.5

s

0.4

0.3

0.2

0.1

0

10

20

30

40

50

60

70

80

90

100

Fig. 3.13 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G8 problem

1

0.9

0.8

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.7

0.6

d ()

s

0.5

0.4

0.3

0.2

0.1

0

10

15

20

25

30

Fig. 3.14 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Speed Reducer

(SR7) problem

90

R.G. Regis

Data profiles up to 10 simplex gradients (constraint tolerance = 106 )

1

0.9

0.8

0.7

0.6

ds() 0.5

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.4

0.3

0.2

0.1

0

10

Fig. 3.15 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the C08 problem

1

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.9

0.8

0.7

0.6

d ()

s

0.5

0.4

0.3

0.2

0.1

0

10

15

20

25

30

Fig. 3.16 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the G9 problem

91

6

1

0.9

0.8

0.7

0.6

ds()

0.5

(2+2)TRICEPSRBF

(2+2)CEPRBF

(2+2)PenCEPRBF

ConstrLMSRBF

Scatter Search

Stochastic Ranking ES

SDPENm

0.4

0.3

0.2

0.1

0

10

15

20

25

30

35

40

45

50

Fig. 3.17 Data profiles for ( + )-TRICEPS-RBF and alternative methods on the Pressure Vessel

Design (PVD4) problem

References

Araujo MC, Wanner EF, Guimares FG, Takahashi RHC (2009) Constrained optimization based on

quadratic approximations in genetic algorithms. In: Mezura-Montes E (ed) Constraint-handling

in evolutionary computation. Studies in Computational Intelligence, vol 198, Chapter 9. Springer,

Berlin, pp 193217

Arnold DV, Hansen NA (2012) (1 + 1)-CMA-ES for constrained optimisation. In: 2012 genetic

and evolutionary computation conference (GECCO 2012), Philadelphia, July 2012. ACM Press,

pp 297304

Basudhar A, Dribusch C, Lacaze S, Missoum S (2012) Constrained efficient global optimization

with support vector machines. Struct Multidiscip Optim 46(2):201221

Beightler CS, Phillips DT (1976) Applied geometric programming. Wiley, New York

Bjrkman M, Holmstrm K (2000) Global optimization of costly nonconvex functions using radial

basis functions. Optim Eng 1(4):373397

Coello Coello CA (2012) Constraint-handling techniques used with evolutionary algorithms. In:

Proceedings of the genetic and evolutionary computation conference (GECCO 2012) companion,

pp 849872

Coello Coello CA, Mezura-Montes E (2002) Constraint-handling in genetic algorithms through the

use of dominance-based tournament selection. Adv Eng Inform 16(3):193203

Coello Coello CA, Landa-Becerra R (2004) Efficient evolutionary optimization through the use of

a cultural algorithm. Eng Optim 36(2):219236

Datta R, Deb K (2013) Individual penalty based constraint handling using a hybrid bi-objective and

penalty function approach. In: 2013 IEEE congress on evolutionary computation (CEC 2013),

Cancn, Mxico, June 2013. IEEE Press, pp 27202727

92

R.G. Regis

Deb K, Datta R (2013) A bi-objective constrained optimization algorithm using a hybrid evolutionary and penalty function approach. Eng Optim 45(5):503527

Egea JA, Rodriguez-Fernandez M, Banga JR, Mart R (2007) Scatter search for chemical and

bioprocess optimization. J Glob Optim 37(3):481503

Egea JA, Vazquez E, Banga JR, Mart R (2009) Improved scatter search for the global optimization

of computationally expensive dynamic models. J Glob Optim 43(23):175190

Emmerich MTM, Giannakoglou K, Naujoks B (2006) Single- and multiobjective evolutionary

optimization assisted by Gaussian random field metamodels. IEEE Trans Evol Comput 10(4):421

439

Emmerich M, Giotis A, zdemir MM, Bck T, Giannakoglou K (2002) Metamodel-assisted evolution strategies. In: Parallel problem solving from nature VII, pp 362370

Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization

algorithms. Springer, Berlin

Gieseke F, Kramer O (2013) Towards non-linear constraint estimation for expensive optimization.

In: Esparcia-Alczar AI, Isabel A (eds) Evoapplications. Lecture Notes in Computer Science, vol

7835. Springer, Berlin, pp 459468

Gutmann H-M (2001) A radial basis function method for global optimization. J Glob Optim

19(3):201227

Hedar A (2004) Studies on metaheuristics for continuous global optimization problems. PhD thesis,

Kyoto University, Japan

Hesse R (1973) A heuristic search procedure for estimating a global solution of nonconvex programming problems. Oper Res 21:12671280

Isaacs A, Ray T, Smith W (2007) An evolutionary algorithm with spatially distributed surrogates

for multiobjective optimization. In: Randall M et al (eds) Proceedings of the 3rd Australian

conference on progress in artificial life (ACAL 2007) Lecture Notes in Computer Science, vol

4828. Springer, pp 257268

Isaacs A, Ray T, Smith W (2009) Multiobjective design optimization using multiple adaptive spatially distributed surrogates. Int J Prod Dev 9(13):188217

Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.

Swarm Evol Comput 1(2):6170

Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate

fitness functions. IEEE Trans Evol Comput 6(5):481494

Jones DR (2008) Large-scale multi-disciplinary mass optimization in the auto industry. In: MOPTA,

(2008) modeling and optimization: theory and applications conference, Ontario, Canada, August

2008

Kazemi M, Wang GG, Rahnamayan S, Gupta K (2011) Metamodel-based optimization for problems

with expensive objective and constraint functions. ASME J Mech Des 133(1):014505

Kramer O, Barthelmes A, Rudolph G (2009) Surrogate constraint functions for CMA evolution

strategies. In: Mertsching B, Hund M, Aziz MZ (eds) KI, Lecture Notes in Computer Science,

vol 5803. Springer, pp 169176

Liuzzi G, Lucidi S, Sciandrone M (2010) Sequential penalty derivative-free methods for nonlinear

constrained optimization. SIAM J Optim 20(5):26142635

Loshchilov I, Schoenauer M, Sebag M (2012) Self-adaptive surrogate-assisted covariance matrix

adaptation evolution strategy. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2012), pp 321328

Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010

competition on constrained real-parameter optimization. Technical report, Nanyang Technological University, Singapore

Mezura-Montes E, Coello Coello CA (2005) A simple multimembered evolution strategy to solve

constrained optimization problems. IEEE Trans Evol Comput 9(1):117

Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1(4):173194

93

simple evolutionary algorithm. In: Proceedings of the 15th IEEE international conference on

tools with artificial intelligence, November 2003, pp 149156

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132

Montao AA, Coello Coello CA, Mezura-Montes E (2012) Multi-objective airfoil shape optimization using a multiple-surrogate approach. In: Proceedings of the IEEE congress on evolutionary

computation 2012. IEEE Press, pp 11881195

Mor J, Wild S (2009) Benchmarking derivative-free optimization algorithms. SIAM J Optim

20(1):172191

Mugunthan P, Shoemaker CA, Regis RG (2005) Comparison of function approximation, heuristic

and derivative-based methods for automatic calibration of computationally expensive groundwater bioremediation models. Water Resour Res 41:W11427

Ong YS, Nair PB, Keane AJ (2003) Evolutionary optimization of computationally expensive problems via surrogate modeling. AIAA J 41(4):687696

Parno MD, Hemker T, Fowler KR (2012) Applicability of surrogates to improve efficiency of

particle swarm optimization for simulation-based problems. Eng Optim 44(5):521535

Powell MJD (1992) The theory of radial basis function approximation in 1990. In: Light W (ed)

Advances in numerical analysis, volume 2: wavelets, subdivision algorithms and radial basis

functions. Oxford University Press, Oxford, pp 105210

Powell MJD (1994) A direct search optimization methods that models the objective and constraint

functions by linear interpolation. In: Gomez S, Hennart JP (eds) Advances in optimization and

numerical analysis. Kluwer, Dordrecht, pp 5167

Regis RG (2011) Stochastic radial basis function algorithms for large-scale optimization involving

expensive black-box objective and constraint functions. Comput Oper Res 38(5):837853

Regis RG (2014a) Constrained optimization by radial basis function interpolation for highdimensional expensive black-box problems with infeasible initial points. Eng Optim 46(2):218

243

Regis RG (2014b) Evolutionary programming for high-dimensional constrained expensive blackbox optimization using radial basis functions. IEEE Trans Evol Comput 18(3):326347

Regis RG (2014c) Particle swarm with radial basis function surrogates for expensive black-box

optimization. J Comput Sci 5(1):1223

Regis RG, Shoemaker CA (2004) Local function approximation in evolutionary algorithms for

costly black box optimization. IEEE Trans Evol Comput 8(5):490505

Regis RG, Shoemaker CA (2007) A stochastic radial basis function method for the global optimization of expensive functions. INFORMS J Comput 19(4):497509

Regis RG, Shoemaker CA (2013a) A quasi-multistart framework for global optimization of expensive functions using response surface models. J Glob Optim 56(4):17191753

Regis RG, Shoemaker CA (2013b) Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Eng Optim 45(5):529555

Runarsson TP (2004) Constrained evolutionary optimization by approximate ranking and surrogate

models. In: Parallel problem solving from nature VII (PPSN-2004), Lecture Notes in Computer

Science, vol 3242. Springer, pp 401410

Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE

Trans Evol Comput 4(3):284294

Shi L, Rasheed K (2008) ASAGA: an adaptive surrogate-assisted genetic algorithm. In: Proceedings

of the genetic and evolutionary computation conference (GECCO 2008), pp 10491056

Takahama T, Sakai S (2012) Efficient constrained optimization by the epsilon constrained rankbased differential evolution. In: Proceedings of 2012 IEEE congress on evolutionary computation

(CEC2012), Brisbane, pp 6269

Tessema B, Yen GG (2006) A self adaptive penalty function based algorithm for constrained optimization. In: IEEE congress on evolutionary computation, (CEC 2006), pp 246253

94

R.G. Regis

Tolson BA, Shoemaker CA (2007) Dynamically dimensioned search algorithm for computationally

efficient watershed model calibration. Water Resour Res 43:W01413

Viana FAC, Haftka RT, Watson LT (2010) Why not run the efficient global optimization algorithm with multiple surrogates? In: 51st AIAA/ASME/ASCE/AHS/ASC structures, structural

dynamics, and materials conference. Orlando

Wang Y, Cai Z (2012) Combining multiobjective optimization with differential evolution to solve

constrained optimization problems. IEEE Trans Evol Comput 16(1):117134

Wanner EF, Guimars FG, Takahashi RH, Saldanha RR, Fleming PJ (2005) Constraint quadratic

approximation operator for treating equality constraints with genetic algorithms. In: 2005 IEEE

congress on evolutionary computation (CEC 2005), vol 3. IEEE Press, Edinburgh, pp 22552262

Wild SM, Shoemaker CA (2011) Global convergence of radial basis function trust region derivativefree algorithms. SIAM J Optim 21(3):761781

Wild SM, Regis RG, Shoemaker CA (2008) ORBIT: optimization by radial basis function interpolation in trust-regions. SIAM J Sci Comput 30(6):31973219

Zhou Z, Ong YS, Nair PB, Keane AJ, Lum KY (2007) Combining global and local surrogate

models to accelerate evolutionary optimization. IEEE Trans Syst, Man, Cybern Part C: Appl Rev

37(1):6676

Chapter 4

in Optimization

Richard Allmendinger and Joshua Knowles

to most readers: hard and soft. Hard constraints delineate absolutely between

feasible and infeasible solutions, whereas soft constraints essentially specify

additional objectives. In this chapter, we describe a third type of constraint, much

less familiar and only investigated recently, which we call ephemeral resource

constraints (ERCs). ERCs differ from the other constraints in three major ways.

(i) The constraints are dynamic or temporary (i.e., may be active or not active), and

occur only during optimizationthey do not affect the feasibility of final solutions.

(ii) Solutions violating the constraints cannot be evaluated on the objective function

in fact that is their main defining property. (iii) The constraints that are active are

usually a function of previous solutions evaluated, bringing in a time-linkage aspect to

the optimization. We explain with examples how these constraints arise in real-world

optimization problems, especially when solution evaluation depends on experimental processes (i.e. in closed-loop optimization). Using a theoretical model based

on Markov chains, the effects of these constraints on evolutionary search, e.g., drift

effects on the search direction, are described. Next, a number of strategies for coping

with ERCs are summarized, and evidence for their robustness is provided. In the

final section, we look to the future and consider the many open questions there are

in this new area.

Keywords Closed-loop optimization Constrained optimization Dynamic optimization Evolutionary computation Instrument setup optimization Optimization

R. Allmendinger (B)

Department of Biochemical Engineering, University College London,

Torrington Place, London WC1E 7JE, UK

e-mail: r.allmendinger@ucl.ac.uk

URL: http://www.ucl.ac.uk/ucberal

J. Knowles

University of Manchester, School of Computer Science, Oxford Road,

Manchester M13 9PL, UK

e-mail: j.knowles@manchester.ac.uk

URL: http://www.cs.man.ac.uk/jknowles

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_4

95

96

4.1 Introduction

In this chapter, we discuss a new and broad class of constraint that departs quite

strongly from those considered usually in optimization. While typical or standard

constraints place limits on the feasible region (hard constraints), or suggest strong

preferences on solutions (soft constraints), the constraints we describe here instead

pose limits on which solutions in a search space are evaluable. That is to say, when

a solution violates one or more of these constraints, it is not possible to evaluate

that solution on the objective function, even though it may later turn out to be a

good solution to the problem, and one that is feasible in the normal sense. The type

of constraint we discuss here is called an ephemeral resource constraint (or ERC),

and we have introduced it in a number of recent papers (Allmendinger and Knowles

2010, 2011, 2013).

As the name suggests, ERCs arise only temporarily or dynamically during optimization (i.e., are ephemeral) and come about due to limitations on the resources

needed to evaluate (or construct) a solution. As we will explain in detail below, the

motivation for these constraints comes about from considering (mainly though not

exclusively) problems sometimes referred to as closed-loop optimization problems.1

In a closed-loop problem, candidate solutions are evaluated experimentally, and may

need to be realized physically, chemically, or in some other tangible way, thus requiring the use or availability of resources. From this reliance on resourceswhich may

be limitedit follows that candidate solutions cannot be guaranteed to be evaluable

(realizable) at all times during optimization. Thus, both evaluable and non-evaluable

solutions can coexist in the search space, and the boundaries between them can be

described as dynamic (or ephemeral) constraints.

These constraints, and the non-evaluability of solutions, is not rare in practical

applications; for example, Finkel and Kelley (2009) lists eight references where

solutions were non-evaluable, and more examples are given in Knowles (2009),

Allmendinger (2012), as well as later in this chapter. We are also aware from

personal communication that such resourcing issues have been faced by Schwefel

(in his famous jet nozzle optimization experiments from the 70s) (Schwefel 1968;

Klockgether and Schwefel 1970) and others, even if not always reported in the literature. Since closed-loop problems are quite various (see, e.g.,Schwefel (1968),

Klockgether and Schwefel (1970), Judson and Rabitz (1992), Shir (2008), Caschera

et al. (2010), Small et al. (2011), Vaidyanathan et al. (2003), OHagan et al. (2005,

2007), Thompson (1996), Herdy (1997), Knowles (2009) and the tutorials Shir and

Bck (2009), Bck et al. (2010)) and are growing in importance in a number of

domains (e.g., high-throughput automated science, as in Bedau (2010)), it seems

timely to consider the effects these resourcing issues (ERCs) can have on optimization performance, and this has been our objective in recent work.

In this chapter, our aims are threefold. First, we wish to summarize the terminology

and framework for describing ERCs reported in earlier papers (Sects. 4.2 and 4.3).

1

When an EA is used, closed-loop optimization may also be referred to as evolutionary experimentation (Rechenberg 2000) or experimental evolution.

97

Secondly, we wish to augment this earlier work with a theoretical study that considers

the fundamental effects of ERCs on simple evolutionary algorithms (Sect. 4.4). Third,

we evaluate some of the methods we have proposed for handling ERCs and consider

how these can be developed further (Sects. 4.54.8).

Problems (ERCOPs) in Overview

Ephemeral Resource-Constrained Optimization Problems (ERCOPs) are best seen as

standard constrained or unconstrained optimization problems2 augmented with one

or more resource constraints, which cause some candidate solutions to be temporarily

non-evaluable. Figure 4.1 shows the loop of an optimization process in which candidate solutions are designed or specified on a computer, but realized and/or evaluated

ex-silico. This is the main type of setup in which ERCs arise, although they can arise

even when computer simulations are used for evaluation too. The resources required

to evaluate solutions (such as equipment, operators, consumables) might run out,

break down or be unavailable, e.g., as a function of time, or previous actions taken

(or both).

Optimization algorithm

(running on a computer)

Decision variables (genotype) of solution x

Physical Experimentation or

expensive computer simulations

Prototype x

E.g.: - Mix drugs

- Adjust instrument

- Run simulation

- Ranking, selection, variation

- Prepare performance statistics

Noisy measurements

of quality f ( x)

Phenotype of x

E.g.: - Try drug combination in vivo

- Run sample through instrument

- Aggregate simulated data

Fig. 4.1 Schematic of closed-loop optimization. The genotype of a candidate solution x is generated

on the computer but its phenotype is experimentally prototyped. The quality or fitness f (x) of a

solution may be obtained experimentally too and thus may be subject to measurement errors (noise)

2

98

The main job in defining ERCOPs, and simulating them so that they can be studied,

is to specify what happens when a candidate solution cannot be evaluated. In a real

situation, when a candidate solution proposed by the optimization algorithm is found

to be non-evaluable, an operator or scientist within the loop (if there is such a person)

may notice, and can choose to ignore this solutionto miss it out. This may seem

to be an adequate solution, but there are several issues here. We need to consider

at what time it is known that a solution cannot be evaluated, for how long it can

remain non-evaluable, whether new resources can be requested in order to fulfill the

optimizers request to evaluate that solution, whether the optimizer is informed that

the solution could not be evaluated, and so on.

If we are able to specify these things, then we can also imagine a range of possible

(automated) remedial actions that the optimizer can take when it is informed about

non-evaluable solutions. It could automatically order more resources, it could wait

(stopping all solution evaluations until the non-evaluable one is again evaluable), it

could carry on and assign the non-evaluable solution a dummy value (or no value

at all), or it could place the non-evaluable solution in a queue to be evaluated later on.

All these types of responses need to be possible within the framework that we use

to describe ERCOPs. To keep things as general and flexible as possible, our ERCOP

framework consists of just two essentials: (1) ERCs are functions of a number of

(visible or hidden) variables which determine when they are switched on and (2) the

optimizer has access to a number of additional functions that allow it to operate in

a well-defined manner when a solution is non-evaluable. To achieve this, and to be

able to talk meaningfully about the performance of optimizers, we also embed the

optimization process in a global clock, so that every action is synchronized and its

time cost can be accounted for. In the following, we put these essentials in a more

mathematical form.

ERCOPs can be defined generically, as follows:

maximize y = f (x) subject to x X

where x = (x1 , . . . , xl ) is a solution vector, X the feasible search space, f the objective

function, and with the additional side-condition (only relevant during optimization)

that for optimization timesteps t = 1, . . . , ,

f (xt ) if xt E(t ) X

yt =

null otherwise,

where E(t ) represents a set of evaluable solutions (or evaluable region) at time

step t. The set E(t ) changes over time as a function of a set of problem-specific

99

ERCOP, information about how the resource constraints should evolve over time,

and depend on resource levels, random events, and so on, is encoded in t and E(t ).

The purpose of an ERCOP, as defined mathematically above, is to simulate real

experimental optimization scenarios, in particular the way that non-evaluable solutions arise (i.e., as a function of parameters such as time, search history, or costs),

and how they are to be handled.

From the definition, we can now see there are three major differences between

ERCOPs and other constrained and dynamic optimization problems:

While the objective function f (thus also the global optimum) is static and does not

change over time in a standard ERCOP, ERCs are dynamic or temporary (i.e., may

be active or not active), and occur only during optimizationthey do not affect the

feasibility of final solutions. This feature makes ERCOPs materially different from

traditional dynamic optimization problems (Branke 2001) because the objective

space in ERCOPs does not change over time and thus the optimal solution does

not need to be tracked.

Compared to standard soft and hard constraints (Michalewicz and Schoenauer

1996; Nocedal and Wright 1999; Coello 2002) as well as dynamic constraints

(Nguyen 2010), the meaning of ERCs is different: a solution x that violates an

ERC at time t, is not infeasible but non-evaluable at time step t. That is, the

experiment that is associated with x cannot be conducted, thus causing the fitness

of solution x at time t to be undefined (or null).

The constraints that are active are usually a function of previous solutions evaluated, bringing in a time-linkage aspect to the optimization (see e.g., commitment

relaxation ERCs in Sect. 4.3.1). Moreover, time in an ERCOP can be seen as the

simulated time defined by the real closed-loop experimental problem that is to be

simulated. Hence, time may refer not only to function evaluations of single solutions, as is the case in standard optimization problems, but also, e.g., to real time

units (e.g., seconds) or cost units (e.g., pounds). Although we find an interesting

parallel with some work on online (dynamic) optimization problems (Borodin and

El-Yaniv 1998; Bosman and Poutr 2007), which exhibits time-linkage too, there

are clear and important differences to our problem: most importantly, the aim in

online (dynamic) optimization is to improve a cumulative score over some period

of time, whereas ours is to find a single optimal (and ultimate) solution.

Despite the core difference in ERCs and ERCOPs from other related areas, as

explained above, we believe that techniques and/or inspiration for their design for

coping with ERCs can carry over from these areas into our work. For a more formal

problem definition of ERCOPs please refer to Allmendinger (2012), Allmendinger

and Knowles (2013).

100

Ephemeral resource constraints arise in practical optimization problems for a number of different reasons: periodic availabilities of equipment or people; consumable resources that may run out; commitments to particular configurations due

to the cost of changing a configuration; and random breakdowns or other random

events. Considering these distinct reasons, we have in earlier work (Allmendinger

2012; Allmendinger and Knowles 2013) defined a number of fundamental classes

of ERCs, which we now describe. Technically, the constraints differ in how they are

triggered (switched on an off), and how they relate to the search space and other

basic properties. Before we summarize these details for three different ERC types,

we first set out some defining terms common to all ERC types: the constraint time

frame, the activation period, and the constraint schema.

Constraint time frame

Activation period

Constraint schema

start (ERC ) t < t end (ERC )} where t represents

{t|tctf

i

i

ctf

some counter unit (e.g., function evaluations of solutions). The constraint ERC i may be active only during the

ctf, i.e., E(t ) X, t ctf, and not outside of the ctf,

start

/ ctf. The period of time 0 t < tctf

i.e., E(t ) = X, t

end

and tctf t T (T is the total optimization time) is the

preparation period and recovery period, respectively (see

Fig. 4.2).

The activation period k(ERC i ) of ERC i , k Z+ , is the

number of counter units for which that ERC remains

active once it is switched on.

For convenience reasons we define the evaluable search

region E(t ) by a set of constraint schemata H(ERC i )

into which solutions have to fall in order to be evaluable.

For instance, if we are dealing with a binary search space

or X {0, 1}l , and an ERC is associated with a schema

H = (1 0), then a solution is deemed evaluable only

if it has a 1 and 0-bit at positions 2 and 5, respectively;

the wildcard symbol gives a bit position the freedom

to take on any possible value, i.e., 0 and 1 in the binary

case. In non-discrete spaces, H might restrict solution

parameters to lie within or out of certain parameter value

ranges rather than to take specific parameter values. Two

general properties of a schema are its order o(H) and

length l(H), representing the number of defined bit positions and the distance between the first and last defined

bit position, respectively (Reeves and Rowe 2003); for

the above example we have o(H) = 2 and l(H) = 3.

preparation period

start

t ctf

101

recovery period

end

t ctf

Fig. 4.2 An illustration of how the available optimization time T can be divided into the preparation

start , the constraint time frame t start t < t end , and the recovery period t end t T

period 0 t < tctf

ctf

ctf

ctf

A commitment relaxation ERC commits (forces) an optimizer to a specific variable

value combination (i.e., constraint schema) for some (variable) period of time whenever it uses this particular combination. Forcing a variable or linked combination

of variables to be fixed for some time models real-world problems involving (large)

change-over costs, such as a cleaning step or a component replacement. We refer to

the period of time during which some variable(s) setting (or schema) H is forbidden

from changing as an epoch, and denote its duration by V . We define the activation

period k(j), 0 k(j) V to be the duration of the period of time we have to commit

to a particular setting H during the jth epoch. Figure 4.3 illustrates the partition of

the optimization time into epochs, and a possible distribution of activation periods.

Imagine the six epochs illustrated by the figure to represent six working days, each

consisting of V = 9 h (assuming working hours to be from 8 am to 5 pm). The

limitation that causes the commitment relaxation ERC to arise, can be:

In an optimization problem involving the selection of instrument settings, the configuration,

b, once set, cannot be changed during the remainder of the working day.

In the above example, the constraint schema H represents the parameter combination that corresponds to instrument configuration b. The length of an activation

period is bounded by 0 k(j) 9. For instance, imagine we select instrument

configuration b in the middle of the day, say at 1pm, as indicated by epoch j = 1 in

the figure. This will activate the ERC for a period of k(1) = 4 (= 5 pm1 pm) hours

(indicated by the dashed part). Activating the ERC later, earlier, or not at all during

a working day changes k(j) accordingly.

start , t end , V , H).

We denote commitment relaxation ERCs by commRelaxERC(tctf

ctf

An extension to this simple commitment relaxation ERC is to maintain not only

V

k ( 1)

0

k ( 2)

...

t

T

Fig. 4.3 An illustration of how a commitment relaxation ERC may partition the optimization time

into epochs of length V , and how it may be potentially activated. The activation period k(j) during

the jth epoch is represented by the dashed part

102

one but several commitment relaxation ERCs with different constraint schemata Hi .

In this case, we need to consider three aspects: (i) a solution is non-evaluable if it

violates at least one ERC, (ii) a repaired solution has to satisfy all activated ERCs

and not only the ones that were violated, and (iii) it needs to be checked whether a

repaired solution activates an ERC that was not activated before. This extension will

be considered later in Sect. 4.6.

A periodic ERC models the availability of a specific resource, represented by

a constraint schema H, at regular time intervals. That is, the ERC is activated

every P time steps (period length) for an activation period of exactly k time steps

(see Fig. 4.4). As the ERC models the availability of resources, an individual has

to be a member of H during the activation period. An example of a periodic

ERC is:

In an optimization problem requiring skilled engineers to operate instruments, on Mondays,

only engineer engi is available.

In the above example, the activation period is k = 1 (assuming a time step is a day),

the period length is P = 7 (i.e., a week), and the constraint schema H represents the

parameter combination that corresponds to the instruments (or their settings) operated

start , t end , k, P, H).

by engineer engi . We denote periodic ERCs by perERC(tctf

ctf

The last type of ERCs we cover here are commitment composite ERCs. This ERC type

is slightly more complex than the other two types because it combines several realworld limitations. A commitment composite ERC occurs when some variables of a

candidate solution define a composite that requires resources to be locally available

(e.g., in a cache) in order for the solution as a whole to be realized and/or evaluated.

We use the notion of schemata to describe the resource-requiring composite part of

P

k

0

t start

ctf

t end

ctf T

Fig. 4.4 An illustration of a periodic ERC perERC(tctf

ctf

every P time steps for an activation period of always k time steps

103

bit positions 3, 4, 5, 11, and 12 define a composite; we refer to the bit positions

denoted by # as the composite-defining bits, and the order o(H# ) to be the number

of composite-defining bits in the schema (we refer to H# as the high-level constraint

schema). Here, the composite-defining bits are static, and form a part of the ERC

problem definition.

When a solution is to be evaluated, we must look at the composite-defining bits

of its genotype and compare them to a local cache of composites. Each composite in

the cache is indexed by a bit-string of the same length as the order of the high-level

constraint schema. If there is a match, the solution can be evaluated. Otherwise, the

solution may not be evaluated at the current time step.

We define the cache to be made up of a number of storage cells, #SC. Typically,

the number of storage cells is smaller than the space of possible composites, which is

2o(H# ) in a binary search space. A composite available in a storage cell may be used

in the evaluation of more than one solution: each composite may be used up to RN

(reuse number) times and has a shelf life of SL time steps, and we assume SL RN.

Finally, the composites available in the cache at time t are a function of previous

purchase orders made, and a fixed time lag TL between a purchase being made

and its arriving. When composites arrive at a particular time, they are immediately

put in a storage cell (and any existing composite in that cell is discarded); which

storage cell is selected is defined either at the time of purchase or at the time of

arrival.

To make the constraint more realistic we associate costs of corder and ctime_step

units with each submitted composite order and time step, respectively. The available

budget, which cannot be exceeded, is denoted by C. Any composite can be purchased

as often as desired, as long as we are within the budget. Figure 4.5 gives a visual

example of the ERC.

An example of a commitment composite ERC is:

In an optimization problem involving the selection/design of vehicle parts least harmful to

pedestrians in case of a crash, we wish among others to identify the most suitable configuration for the tyres of the vehicle. A tyre is made of several parameters, such as size, thickness,

and rubber material. Upon defining these parameters, we order the tyres, which is associated

with a fixed cost of 500 and a delivery period of 3 days. To allow for a valid assessment of

tyres, a set of tyres can be involved in at most five crash test trials, and can be kept in storage

for not more than 1 month. The storage itself is limited in size to 10 sets of tyres. Every day

of crash testing involves a fixed charge of 3,000 including things like labor, rent of venue,

and electricity.

In this example a composite is a tyre and the composite-defining bits are the

variables defining a tyre. Ordering tyres is associated with a time lag of TL = 3

(assuming a time step is one day), and tyres have a reuse number of RN = 5 and

a shelf life of SL = 30 (assuming one month consists of 30 days). The number of

storage cells is #SC = 10, and the costs associated with a composite order and time

step are corder = 500 and ctime_step = 3,000, respectively.

104

# Cell

001

101

SL = 2 SL = 3

RN = 5 RN = 4

000

111

SL = 7 SL = 1

RN = 1 RN = 6

(c+ = ( corder 2))

x = ( 10101)

EA

Experiment

f ( x)

000

111

SL = 6 SL = 0

RN = 1 RN = 6

001

101

SL = 1 SL = 2

RN = 5 RN = 3

empty cell

Composites 011

and 110 arrrived

store 011 in cell 4

and queue of not arrived

and 110 in cell 1

composite orders

EA

t +1

step )

110

101

000

SL = 20 SL = 2 SL = 6

RN = 10 RN = 3 RN = 1

011

SL = 20

RN = 10

...

...

(c+ = ctime

Fig. 4.5 A visual example of the commitment composite ERC commCompERC(H# = {###

}, #SC = 4, TL = 1, RN = 10, SL = 20); each composite order and time step costs corder and

ctime_step units, respectively. The evaluation step at time step t reduces the reuse number of the

composite in cell 2. At the same time step, the shelf life of the composite in cell 4 expires, and two

new composites are ordered. One time step later, t + 1, the ordered composites arrive and put into

cells determined by the EA

SL).3 For a more formal description of this ERC please refer to Allmendinger and

Knowles (2010).

Having defined ERCOPs and several ERCs, we conduct in this section an initial

theoretical analysis on the impact of ERCs on evolutionary search. The analysis uses

the concept of Markov chains to investigate the impact of periodic ERCs on two

selection and reproduction schemes commonly used within EAs. After giving a brief

introduction to Markov chains and their application to EAs, the Markov model (transition probabilities) that accounts for periodic ERCs is derived, and subsequently the

start , t end , c

We leave out the variables tctf

order , ctime_step , and C from commCompERC(. . .) for ease

ctf

of presentation. They will be specified where appropriate.

105

simulation results are analyzed and summarized. The Markov chain model presented

here is based on an analysis we carried out in Allmendinger (2012).

A Markov process is a random process that has no memory of where it has been in

the past such that only the current state of the process can influence the next state.

If the process can assume only a finite or countable set of states, then it is usual to

refer to it as a Markov chain (Norris 1998).

One can think of a Markov chain as a sequence X0 , X1 , X2 , . . . of random events

occurring in time (Reeves and Rowe 2003). Suppose S0 , . . . , S are the +1 possible

values that each of the random variables Xt can take. Then, a chain moves from a state

Sm at time t, to a state Sr at time t + 1 with a probability of pmr = P(Xt+1 Sr |Xt

Sm ). The probabilities pmr (m, r = 0, . . . , ) are called transition probabilities and

form the + 1 + 1 matrix P, the transition matrix. Thus, the probability that

the chain is in state Sr at time t is the rth entry in the probability vector

ut = u0 Pt ,

(4.1)

where u0 is the (+1)-dimensional probability vector that represents the initial distribution over the set of states.

When an EA is modeled by a Markov chain it is easy to see that the population is

the natural choice for describing a state. The transition probabilities then express the

likelihoods that an EA changes from a current population to any other possible population after applying the stochastic effects of selection, crossover, and/or mutation.

It is also possible to consider other effects such as noisy fitness functions (Nakama

2008), niching (Horn 1993) and elitism (He and Yao 2002). Once the transition

matrix is calculated it can be used to calculate a variety of measurements, such as

the first hitting time of a particular state or the probability of hitting a state at all. An

overview of tools of Markov chain analysis can be found in any general textbook on

stochastic processes, such as Norris (1998), Doob (1953).

The drawback of modeling EAs with Markov chains is that the size of the required

transition matrix grows exponentially in both the population size and string length. To

keep Markov chain models manageable it is therefore common to use small population sizes and string lengths (Goldberg and Segrest 1987; Horn 1993). Other options,

which allow the modeling of more realistic EAs, are to make simplifying assumptions about the state space (Mahfoud 1991) or to use matrix notation only (Vose and

Liepins 1991; Nix and Vose 1992; Davis and Principe 1993).

In this section we derive the transition probabilities for EAs optimizing in the presence

of periodic ERCs. Our Markov chain model is based on the model of Goldberg and

106

types: Type A has always a fixed objective value (or fitness) of f (A), while type B

has a fitness of f (B). This limitation allows for an intuitive definition of states. For a

fixed population size of , there are +1 possible states, where state Sm represents a

population with m type A individuals and m type B individuals. Furthermore, in

this simple EA model we do not apply mutation and crossover such that an offspring

shall be simply a copy of the selected parent.

Goldberg and Segrest (1987) used this model to investigate the effect of drift for a

simple EA that used a generational reproduction scheme combined with fitness proportionate selection. They also extended the model to include mutation. Horn (1993)

extended it further to include niching. We extend it to include periodic ERCs and use

the resulting model to analyze the impact of the ERC on two selection strategies, fitness proportionate and binary tournament selection, and two reproduction schemes,

generational and steady-state reproduction, both without elitism.

Readers not interested in the technical details of this Markov chain model can

safely skip to Sect. 4.4.3 where the results of simulations are presented.

Under fitness proportionate selection (FPS) we choose an individual of the current

population to serve as a parent (in our environment, to be in the next population) with

a probability that is proportional to its (relative) fitness. In our simple environment,

the probability of choosing a type A individual for the next population while being

in a state Sm is simply

Pm (A) =

mf (A)

.

mf (A) + ( m)f (B)

(4.2)

As there are only two individuals types in total, the probability of choosing a type B

individual is Pm (B) = 1 Pm (A). From the above equation it is apparent that once

a uniform population is reached, i.e., m = 0 or , there is no chance of selecting

individuals from the other type. Thus, the two corresponding states S0 and S are

absorbing states.

Under tournament selection we first randomly select a number of individuals from

the population (with replacement) and then perform a tournament among them with

the fittest one serving subsequently as a parent. It is common to use a tournament

size of two, which will also be used here; this selection strategy is known as binary

tournament selection (BTS). The result of a tournament is clear: the individual with

the higher fitness wins the tournament; there is a draw if an individual meets another

individual with the same fitness in which case the winner is randomly determined; and

an individual will be the winner of a tournament with itself. We distinguish two cases

regarding the fitness of the individual types: (i) f (A) = f (B) and (ii) f (A) > f (B).

107

The following selection probabilities are obtained for each of the cases:

f (A) = f (B) :

f (A) > f (B) :

2

m

m( m)

+

2

2

m

m( m)

Pm (A) =

+2

.

Pm (A) =

(4.3)

In our environment, the transition probabilities depend on the selected reproduction

scheme, which in turn depends on the selected selection strategy. We first consider a

generational reproduction scheme as already used in the original genetic algorithm

of Holland (1975); we denote this scheme by GGA. With GGA, the entire current

population is replaced by the offspring population. That is, selection steps are

carried out per time step (with replacement). Using the selection probability Pm (A)

either for FPS or BTS, the transition probabilities pmr = P(Xt+1 Sr |Xt Sm ) for

GGA of moving at time t from a state Sm with m type A individuals, to a state Sr

with r type A individuals at time t + 1, are defined as follows:

For m = 0

pmm = 1

pmr = 0,

(4.4)

r = 1, . . . , .

pmr

=

Pm (A)r (1 Pm (A))r .

r

For m =

pmr = 0, r = 0, . . . , 1

pmm = 1.

With steady state reproduction, the population is updated after each selection

step. Usually, an offspring individual replaces the worst individual in the population. This replacement strategy, however, is elitist and ensures that the number of

the less fit individual type in the population does not increase. Thus, to allow for a

fair comparison with GGA, an offspring does not replace the worst individual in the

population but a randomly chosen one regardless of its fitness; we denote this reproduction scheme by SSGA (rri), where rri refers to replacing a random individual. It

has been shown elsewhere (Syswerda 1991) that GGA and SSGA (rri) yield similar

performance. Bearing in mind that one time step corresponds to one selection step

with SSGA (rri), we obtain the following transition probabilities:

108

For m = 0

pmm = 1

pmr = 0,

(4.5)

r = 1, . . . , .

= 0,

pmr

pmm1

pmm

pmm+1

pmr

r = 0, . . . , m 2

m

= (1 Pm (A))

m

m

= Pm (A) + (1 Pk (A))

n

( m)

= Pm (A)

= 0, r = m + 2, . . . , .

For m =

pmr = 0, r = 0, . . . , 1

pmm = 1.

The transition probabilities of either GGA or SSGA (rri) will be the entries of the

transition matrix P.

We have mentioned in the previous section that GGA performs selection steps per

time step, while SSGA (rri) performs one selection step per time step. To be able

to compare the effect of an ERC on the two reproduction schemes, we thus express

ERCs in this section in terms of selection steps rather than time steps.

Let us now derive the transition probabilities in the presence of a periodic ERC.

For this, consider the general periodic ERC, perERC(i, (i + 1), k, , H = (A))

(i N, 0 k ), which is activated at selection step i for a period of

selection steps, i.e., one time step (or generation) for GGA and time steps for

SSGA (rri). During the activation period of k selection steps, we can only

select (and evaluate) type A individuals. Let us assume that if we select a type B

individual during this period, this individual is repaired by simply forcing it into the

right schema; i.e., it is converted into a type A individual. This repairing procedure

is a simple constraint-handling strategy for dealing with non-evaluable solutions;

alternative constraint-handling strategies will be introduced in the following sections.

Before we derive the constrained transition probabilities for GGA we want to point

out a few aspects:

109

If we are in state S0 and the ERC is activated, then S0 is not an absorbing state

anymore and we move directly to state Sk .

As a population contains at least k type A individuals after lifting the constraint,

we are not able to move to a state Sr with r < k during the constrained generation

(time step).

The ERC reduces the number of freely selected offspring down to new = k.

Moving to a state Sr with r > k is already achieved by selecting r new = r k

(instead of r) type A individuals from the current population.

Considering these points, we derive for the time step for which the ERC is activated

the following constrained transition probabilities for GGA:

For m = 0

pmr = 0, r = 0, . . . , k 1, k + 1, . . . ,

pmk = 1.

(4.6)

pmr = 0

For 0 < m < and k r

pmr

new

new

new

new

Pm (A)r (1 Pm (A)) r .

=

new

r

For m =

pmr = 0, r = 0, . . . , 1

pmm = 1.

The above periodic ERC is set such that the activation period of k selection steps

is upper bounded by the population size , and, in the case of GGA, starts and

ends within a single time step (generation). This does not need to be necessarily the

case. In fact, a periodic ERC can feature an activation period k that is so long that it

constrains selection steps within two or more successive generations, or so short that

several activation periods may start during a single generation. In such scenarios,

one needs to constrain all generations that are subject to constrained selection steps.

The number of constrained selection steps within a generation, referred to as k in

Eq. (4.6), is then simply the sum of all selection steps that happen to be constrained

during any particular generation. That is, depending on the ERC, the number of

constrained selection steps may change between generations.

With SSGA (rri), the population is updated after each selection step, which remember is a single time step with this scheme. This means that we need to determine for

each selection step (time step) separately whether it lies within the activation period

and thus is constrained or not. During the activation period, the periodic ERC of

110

above prevents us from moving from a current state Sm to a state Sm1 , which can

only be reached if a type B individual replaces a type A individual. As above, if the

constraint is active, then the state S0 is not an absorbing state anymore, and we move

directly to state S1 . We obtain the following new transition probabilities for each of

the k constrained time steps:

For any m = 0

pmr = 0, r = 0, 2, 3, . . . ,

pm1 = 1.

(4.7)

pmr

pmm

pmm+1

pmr

= 0, r = 0, . . . , m 1

m

=

m

=

= 0, r = m + 2, . . . , .

For any m =

pmr = 0, r = 0, . . . , 1

pmm = 1.

We will denote the transition matrix with the constrained transition probabilities

by Pc .

One way to analyze the impact of an ERC on different selection and reproduction

schemes is to monitor the proportion of the two individual types in a population. To

do so one needs to first calculate the probability of ending up in any of the possible

states Si , i = 0, . . . , after t time steps. In an unconstrained environment, this

can be done according to Eq. (4.1) (see Sect. 4.4.1) using the transition matrix P;

in this equation, the +1 state probabilities at time t are represented in form of

the probability vector ut . In a constrained environment we cannot use the transition

matrix P across all t time steps but have to swap it with the constrained transition

matrix Pc for time steps that consist of constrained selection steps; this dependence

of the transition matrix on time makes it a non-homogeneous Markov chain (Norris

1998). Let us consider the same periodic ERC as in the previous section but this time

with a constraint time frame spanning over g N periods (as opposed to exactly

one), i.e., perERC(i, (i + g), k, , H = (A)). For this ERC we can calculate the

111

ut = u0 Pt ,

ut =

ut =

0 t < i,

u0 Pi Pti

c ,

i g tgi

u0 P Pc P

,

i t < g + i,

g + i t,

where the entries of the transition matrices P and Pc are calculated using Eqs. (4.4)

and (4.6), respectively. The probability vector u0 of the initial state distribution has

a value of 1 at the ith entry and a value of 0 in the others, if we want to start with a

population of exactly i type A individuals.

One time step with GGA corresponds to time steps with SSGA (rri). To compute

the probability vector u for SSGA (rri) we thus need to look at the state distributions

at time step t:

ut = u0 Pt ,

ut =

ut =

0 t < i,

u0 P (Pkc Pk )(ti) ,

u0 Pi (Pkc Pk )g P(tgi) ,

i

i t <g+i

g + i t,

where the transition matrices P and Pc are calculated according to Eqs. (4.5) and

(4.7), respectively.

Having obtained the probabilities of ending up in all the different states, we can

calculate the expected proportions ct (A) and ct (B) of type A and B individuals in a

population at time step t (or t in the case of SSGA (rri)) as follows:

ct (A) =

1 i

iut , ct (B) = 1 ct (A),

i=0

This section uses the measure of the expected individual type proportion to analyze the impact of period ERCs on two selection strategies, FPS and BTS, and two

reproduction schemes, GGA and SSGA (rri). We consider first the case where both

individual types have equal fitness values, and then the case where they are different.

If not otherwise stated, the population size is set to = 50.

4.4.3.1 Identical Fitness Values: f (A) = f (B)

In this case there is no selection pressure and thus both selection strategies behave

identically. Ideally, an EA maintains an equal proportion of the two individual

Proportion of type B individuals ct (B)

112

perERC(400,450,20,50,H=(A))

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA - constrained, expected proportion

SSGA (rri) - constrained, real proportion

SSGA (rri) - constrained, expected proportion

200

400

600

800

1000

1200

1400

#Selection steps

Fig. 4.6 A plot showing the proportion of type B individuals ct (B) for GGA and SSGA (rri) as a

function of the number of selection steps for the ERC perERC(400, 450, 20, 50, H = (A)). Both

individual types have equal fitness and the constraint settings used are given above the plot. The

terms real and expected refer to proportions obtained by actually running the EA, respectively, by

running the Markov chain. The EA results are averaged across 500 independent runs

types in the population. However, because of genetic drift this is impossible and

an EA eventually converges to a uniform population (i.e., states Si = 0, n). As

the probability of ending up in one of the two states is proportional to the initial

state, the expected individual type proportion is identical to the initial proportion,

which is specified by u0 . Thus, for a random initialization, the expected proportion

is 0.5.

From Fig. 4.6 we can see that an expected proportion of 0.5 is achieved until

selection step 400 at which we activate the periodic ERC, perERC(400, 450, 20, 50,

H = (A)), which has a unique activation period of k = 20 selection steps.4 This

ERC forces us to evaluate k = 20 type A individuals and subsequently, reduces

(increases) the proportion of type B (A) individuals in the population. After the ERC

is lifted at selection step 420, the expected individual type proportion does not get

back to the initial proportion. Although this effect can be put down to the specifics

of the model (no selection pressure toward either individual type), we will see in the

following theoretical and experimental studies several results which display a similar

pattern. That is, a constraint can have a permanent or long-lived effect on search

performance even if it was active for a short time only.

From the figure we can also see that the proportion is affected more severely for

GGA than for SSGA (rri). The reason that SSGA (rri) is more robust is that with this

reproduction scheme there is a chance that an offspring of type A replaces another type

A individual that is currently in the population. Of course, if an offspring replaces

a solution of the same type, then this will not affect the proportion. By contrast,

with GGA, all offspring are carried over to the population of the next generation.

4 Note, in an EA performing optimization of a function, the number of performed selection steps

displayed on the x-axes of Fig. 4.6 would be equivalent to the number of performed function evaluations.

perERC(50,200,k,150,H=(A))

Proportion of type B individuals ct (B)

proportion of type B

individuals ct (B) for GGA

and SSGA (rri) at selection

step 200 as a function of the

activation period k for the

ERC perERC(50, 200, k, 150,

H = (A)). Both individual

types have equal fitness

113

0.5

GGA

SSGA (rri)

0.4

0.3

0.2

0.1

0

25

50

75

100

125

150

Activation period k

function of the activation period. This effect is also apparent from Fig. 4.7, where the

performance of both reproduction schemes is shown as a function of the activation

period k. From the figure one can see that SSGA (rri) is able to maintain a proportion

of around ct (B) = 0.2 after an activation period of k = 50, which is equal to the

population size. On the other hand, GGA cannot maintain a single type B individual

in the population because of its linear dependence on k. Note, in the case where

k > 50, the constraint is activated for more than one time step when using GGA. For

example, for k = 70 the constraint restricts all 50 selection steps within one time

step and 20 selection steps within the subsequent one.

As the Markov chain results are exact we omit the experimentally obtained proportions in the following plots.

4.4.3.2 Different Fitness Values: f (A) = f (B)

When both individual types have different fitness values, the aim of an EA is to

converge as quickly as possible to a population state consisting only of the fitter

individual type. We focus our investigations mainly on the more interesting case

where an ERC has a negative effect on the convergence behavior. Hence, the fitness

of the individual type that we have to select during the activation period, in our case

type A, needs to be lower than the fitness of type B individuals. If not otherwise

stated, the fitness values are set to f (A) = 1.0 and f (B) = 1.3.

As the basis for our analysis we use the periodic ERC perERC(50, 400, 20, 50,

H = (A)). This ERC is activated after the initialization (i.e., at selection or evaluation

step 50) for seven periods, each consisting of P = 50 selection steps whereby k = 20

of them are constrained. Figure 4.8 shows the impact of the periodic ERC on the

expected proportion ct (B) for all combinations of the selection and reproduction

Proportion of type B individuals ct (B)

114

perERC(50,400,20,50,H=(A))

1

0.9

SSGA (rri) with FPS, unconstrained

GGA with FPS

SSGA (rri) with FPS

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

200

400

600

800

1000

1200

1400

#Selection steps

perERC(50,400,20,50,H=(A))

1

0.9

0.8

0.7

0.6

0.5

0.4

SSGA (rri) with BTS, unconstrained

GGA with BTS

SSGA (rri) with BTS

0.3

0.2

0.1

0

200

400

600

800

1000

1200

1400

#Selection steps

Fig. 4.8 Plots showing the proportion of type B individuals ct (B) for FPS (top) and BTS (bottom)

as a function of the number of selection steps for the ERC perERC(50, 400, 20, 50, H = (A)). The

term unconstrained refers to the proportions obtained in an ERC-free environment

schemes: GGA with FPS and SSGA (rri) with FPS (top plot), and GGA with BTS

and SSGA (rri) with BTS (bottom plot).5

We want to point out that during activation periods, SSGA (rri) with BTS and

FPS perform identically, since independently of selection type, an A offspring will

replace an individual selected at random. But during the inactive periods, the stronger

selection pressure of BTS recovers more of the B-to-A replacements, so that overall

BTS maintains a higher proportion of Bs. This behavior can be seen in the zigzag

shape, where there is the same steep falloff of fitness in both methods, but a steeper

recovery for BTS. Overall, the same is true for GGA, (BTS is better for the same

reason) but it is not possible to see this so clearly in the plots.

We get the zigzag-shaped line for SSGA (rri) during the constraint time frame because ct (B) is

plotted after each time step containing here of one selection step. For GGA the change in ct (B) is

smooth because a time step consists of selection steps.

start

perERC(tstart

ctf ,tctf +350,20,50,H=(A))

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

200

400

600

800

1000

ctf

115

perERC(50,400,k,50,H=(A))

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

10

20

30

40

50

Activation period k

perERC(50,400,20,50,H=(A))

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0.5

1.5

2.5

3.5

4.5

Fig. 4.9 Plots showing the proportion of type B individuals ct (B) at selection step 1,500 as a

start (left) and the activation period k (right) for

function of the start of the constraint time frame tctf

start , t start + 350, 20, 50, H = (A)) and perERC(50, 400, k, 50, H = (A)),

the ERCs perERC(tctf

ctf

respectively

perERC(50,550,25,50,H=(A))

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

GGA with FPS

SSGA (rri) with BTS

SSGA (rri) with FPS

0.5

1.5

2.5

3.5

4.5

Fig. 4.10 Plots showing the proportion of type B individuals ct (B) at selection step 1,500 as a

function of the fitness ratio f (A)/f (B) for the ERCs perERC(50, 400, 20, 50, H = (A)) (left) and

perERC(50, 550, 25, 50, H = (A)) (right)

Figures 4.9 and 4.10 indicate how the proportion of type B individuals is affected

when altering the constraint parameters. We can observe that:

Longer activation periods degrade the performance of all EAs (see right plot of

Fig. 4.9).

Fixing the constraint time frame duration, but translating it (see left plot of Fig. 4.9),

yields a non-monotonic effect on performance (of all EAs, but most apparently

with FPS): more preparation time gives more time to fill the population with fit

individuals, whereas little recovery time detriments final fitness. These two effects

trade off against each other.

Changing the fitness ratio (see Fig. 4.10) has only a switching effect on BTS (when

the fitter individual changes), but for FPS the ratio smoothly affects final proportion

up to a saturation point.

116

Overall, comparing GGA with SSGA we see that SSGA achieves the higher proportion of fit individuals during the constraint time frame, and it recovers more

rapidly after the constraint is lifted, but its rate of recovery does not reach the rate

achieved by GGA, and ultimately GGA reaches a higher proportion (see Figs. 4.7

and 4.8). This can be explained by the replacement strategy of SSGA (rri): offspring

may replace individuals in the population that are from the same type. During the

activation period, this is beneficial as the number of poor type A individuals in the

population does not increase linearly with the activation period. However, during

the unconstrained selection steps, this may be disruptive in the sense that fit type

B offspring may replace other type B individuals of the current population, which

slows down the convergence.

We used Markov chains to analyze the impact of periodic ERCs for a simple environment and EA model. The environment was composed of only two individual types

and the EA model applied only a selection operator. In the EA model we considered two selection strategies, FPS and BTS, and two reproduction schemes, GGA and

SSGA (rri). We observed that for one and the same reproduction scheme, BTS is more

robust than FPS due to its independence to the fitness value of the individual types.

However, FPS was able to match and even outperform the performance of BTS if the

ratio of the individual type fitnesses was high, i.e., if a larger selection pressure than

for BTS was obtained. The crucial difference between the two reproduction schemes

we considered is that GGA carries out many selection steps before the population is

updated, while SSGA (rri), or steady-state reproduction in general, carries out only

a single one. This enables SSGA (rri) during the activation periods to replace less fit

individuals with other less fit individuals of the current population, but also prevents

SSGA (rri) in the long run from a quicker convergence in the remaining periods. By

contrast, the performance of GGA depends linearly on the activation period but there

are now drawbacks if the ERC is not activated. This crucial difference between the

reproduction schemes means that SSGA (rri) is able to outperform GGA during the

activation period and in situations where the advantage over GGA gained in the activation period(s) can be maintained until the next activation period or until the end of the

optimization. In terms of the constraint parameters, this occurs when there is a long

activation period, a short recovery period, and the constraint time frame is set late.

In this section we summarize five static constraint-handling strategies (three repairing

and two non-repairing strategies) and showcast their robustness for commitment

relaxation ERCs and periodic ERCs (though the strategies are applicable in similar

117

Offspring

Regenerating

X

Forcing

Member of Pop

xt

Member of SP

xt,repaired

xt,repaired

E( t )

xt,repaired

Member of both

Pop and SP

Offspring individual and the

potential repaired versions of it

Subpopulation strategy

Fig. 4.11 A depiction of the current population Pop (filled circles and squares) and an offspring

individual xt , which is feasible but not evaluable (because it is in X but not in E(t )). Solutions

indicated by the filled squares coexist in both the actual EA population Pop and the population SP

maintained by the subpopulation strategy. The three solutions xt,repaired indicate repaired solutions

that might have resulted after applying one of the three repairing strategies to xt : while forcing

simply flips incorrectly set bits of xt and thus creates a repaired solution that is as close as possible

to xt but not necessarily fit, regenerating creates a new solution in E(t ) using the genetic material

available in Pop. Similarly, the subpopulation strategy creates also a new solution but uses the

genetic material available in the subpopulation SP (empty and filled squares), which contains only

solutions from E(t )

form to other ERCs). The strategies are static in the sense that they deal with a nonevaluable solution always in the same pre-specified way, as opposed to learningbased strategies that switch between different static strategies during search (see

Sect. 4.6). Some of the static strategies are based on constraint-handling strategies

developed for standard constraints, and this will be pointed out where applicable.

Figure 4.11 depicts how the three repairing strategies, forcing, regenerating, and the

subpopulation strategy, may handle a non-evaluable solution. Below we describe

each static strategy in detail.

1. Forcing. Upon encountering a non-evaluable solution, this strategy forces it into

the constraint schemata Hi of all activated ERCs ERC i , i = 1, . . . , r by flipping

all solution bits that are different from the order-defining bit values of Hi . Similar

repairing strategies have been proposed, e.g., in Liepins and Potter (1991).

2. Regenerating. This strategy, which is similar to the death penalty method

(Schwefel 1975), avoids the evaluation of a non-evaluable solution by iteratively

creating new solutions, based on the current parent population, until an evaluable

one has been created or until L regeneration trials have passed without success. In

the latter case, we pick the solution created within the L trials that has the smallest

sum of Hamming distances to the schemata Hi of all activated ERCs and apply forcing to it. The goal of this strategy is to avoid the potential drawback of forcing of

destroying good genotypes by enforcing changes in decision variable values. On the

118

expensive for large L, while for small L, it could be that it reduces often to the forcing

strategy.

3. Subpopulation strategy. Assuming the presence of a single ERC, i.e., r = 1, this

strategy keeps record of the fittest J solutions from H1 evaluated so far, and stores

them in a subpopulation (which is maintained alongside the actual population). Upon

encountering a non-evaluable solution, a new solution is created by applying one

selection and variation step to the subpopulation. In case the new solution is nonevaluable, which may happen due to mutation, forcing is applied to it. If multiple

ERCs are present, then (i) the number of subpopulations maintained is upper-bounded

by 2r (r is the number of ERCs), the power set of the total number of ERCs and (ii) a

solution is created using the subpopulation defined by the (set of) schemata Hi of

activated ERCs.

4. Waiting. This strategy avoids repairing a non-evaluable solution by freezing the

optimization (i.e., incrementing the time counter without evaluating a solution) until

the activation periods of all ERCs violated by the solution have passed. It is easy to

see that waiting prevents drift-like effects in the search direction caused by ERCs, but

this might be associated with a smaller number of solutions being evaluated, which

can be a drawback if optimization time is limited.

5. Penalizing. Similar to waiting, this strategy avoids repairing but, instead of freezing the optimization, a non-evaluable solution is penalized by assigning a poor objective value c to it. The effect is that non-evaluated solutions will be allowed to enter the

population but are unlikely to survive for many generations are selected as parents

due to their poor quality. This strategy can be regarded as a static penalty function

method (Coello 2002).

The advantage of penalizing over waiting is that the optimization does not freeze

upon encountering a non-evaluable solution; i.e., the solution generation process

continues and thus solutions might actually be evaluated (without needing to penalize

them) during an activation period. However, since evaluated solutions will have to

fall into the schemata Hi of all currently activated ERCs, penalizing might be subject

to drift-like effects, thus potentially losing the advantage of waiting.

Experimental setup. To evaluate the different strategies for commitment relaxation

and periodic ERCs we augment them on a standard EA that uses a ( + )-ES

reproduction scheme for environmental selection, binary tournament selection (with

replacement) for parental selection, which has shown to be a robust operator in

the theoretical study, uniform crossover (Syswerda 1989) and bit flip mutation. The

parameter settings of the EA are given in Table 4.1. Regarding the constraint-handling

strategy, regenerating uses L = 10,000 regeneration trials (before applying forcing),

Table 4.1 EA parameter

settings as used in the study

of static constraint-handling

strategies

119

Parameter

Setting

Offspring population size

Per-bit mutation probability

Crossover probability

50

50

1/l

0.7

value of c = 0 for non-evaluable solutions; these settings have been found to yield

generally robust and good results.

With regard to test functions, it might be considered ideal to use a set of real

experimental problems featuring real resource constraints. However, this approach

is generally not realistic due to the time and/or budgetary burden associated with

physical experimentation. Hence, our studies presented in this and the subsequent

sections will use a range of more familiar artificial test problems. In this section we

show results obtained on the OneMax problem, augmented with ERCs. However,

the impact of the same ERC type on performance tends to be similar for different

problem types, and the interested reader is referred to Allmendinger and Knowles

(2013) for additional results obtained for TwoMax, MAX-SAT, and NK landscapes,

as well as a study involving data and ERCs from a real closed-loop problem.

Experimental results. Figure 4.12 shows how different configurations of a commitment relaxation ERC impact the performance of the static constraint-handling

strategies on the OneMax problem; in this experiment the order-defining bits of a

constraint schema H represented poor genetic material, i.e., 0-bits on the OneMax

problem. From the figure it is apparent that ERCs impact search performance negatively, and clear patterns emerge relating ERC parameters to performance effects:

Altering the order of the constraint schema o(H) controls the trade-off between the

probability of activating an ERC (probability decreases exponentially with o(H))

and the probability that an activation causes a performance impact (probability

is greater for low orders o(H)). This causes the performance to degrade up to an

order of o(H) 4 for strategies that apply repairing and lower orders for waiting

and penalizing, and then again to improve for higher orders (see top left plot).

The performance of waiting only depends on the probability of activating an ERC.

As this probability is largest at o(H) = 1, the performance is poorest at o(H) = 1

and improves exponentially thereafter.

The epoch duration V is correlated positively with the length of an activation

period, causing the performance of a strategy to decrease with increasing V (see

top right plot). Longer activation periods cause waiting to freeze the optimization

for longer and thus result in a poorer performance. The performance of the other

strategies reduces until a certain level beyond which further increases in V have

no effect.

Increasing the recovery time improves the performance of all strategies with

recovery speed being a function of the effort needed to escape from a (semi-)

homogeneous population state (see bottom left plot).

120

commRelaxERC(0,700,15,H=(0o(H)***...))

commRelaxERC(0,700,V,H=(00***...))

0.95

0.9

Forcing

Regenerating

Waiting

Subpopulation strategy

Penalizing

0.85

0.8

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0.84

10

12

14

commRelaxERC(0,700,15,H=(00***...))

1

0.98

0.96

0.94

0.92

0.9

700

10

15

20

Epoch duration V

start

start

commRelaxERC(tstart

ctf ,tctf +700,15,H=(00***...)), T = tctf +700

1

0.98

0.96

0.94

0.92

0.9

800

900

1000

Optimization time T

1100

1200

100

200

300

400

500

ctf

Fig. 4.12 Plots showing the average best solution fitness found (across 500 EA runs) and its

standard error on OneMax as a function of the order of the constraint schema o(H) (top left), the

epoch duration V (top right), the optimization time T (bottom left), and the start of the constraint

start (bottom right). Note, while the optimization time in the top plots is fixed to

time frame tctf

T = 700 evaluations, the parameter T varies in the bottom plots. For each setting shown on the

abscissa, a Friedman test (significance level of 5 %) has been carried out. In the top left plot, waiting

performs best in the range 2 < o(H) < 6, while, in the top right plot, it performs best in the range

2 < V < 12 with the subpopulation strategy being best in the range V > 12. In the bottom left

plot, the subpopulation strategy performs best for T = 750, while in the bottom right plot, waiting

start < 300. There is no clear winner for the other settings

performs best in the range 0 < tctf

Shifting the start time of the constraint time frame further to the end of the optimization decreases the probability of activating a commitment relaxation ERC

that is associated with a poor constraint schema and thus has a beneficial impact

on the performance of all strategies (see bottom right plot).

Figure 4.13 analyzes the performance impact of ERCs with constraint schemata that

represent both good and poor genetic material, i.e., 0 and 1-bits are present in H. It is

obvious from the figure that the performance is affected most significantly for loworder schemata regardless of the quality of the genetic material they represent, and

schemata of higher order given they represent good genetic material (i.e., schemata

along or near the diagonal). Other schemata setups have little or no performance

impact as they do not lie on an optimizers search path, reducing the probability of

activating the associated ERC.

121

Picking schemata

at random

#1s in H

15

10

1

0.99

0.98

0.97

0.96

0.95

15

0.94

0.93

0.92

0.91

0.9

20

#1s in H

20

Picking schemata

at random

10

0.84

0.82

0.8

0

5

10

15

20

Order of constraint schema H, o(H)

0.98

0.96

0.94

0.92

0.9

0.88

0.86

0

5

10

15

20

Order of constraint schema H, o(H)

Fig. 4.13 Plots showing the average best solution fitness obtained (across 500 EA runs) by

forcing (left) and waiting (right) on OneMax (with l = 30 bits) as a function of the order of

the constraint schema o(H), and the number of order-defining bits in H with value 1 for the

ERC commRelaxERC(0, 700, 15, H). The straight line represents the expected performance when

picking a schema (i.e., the order-defining bits and their values) with a particular order at random.

The performance obtained in an unconstrained environment is represented by the square at

o(H) = #1s = 0

From Fig. 4.14 we can see that the performance of the strategies is affected differently when the activation period is set deterministically as done by periodic ERCs.

From the left plot we can clearly see that waiting performs worst for all ERC settings. This is due to the high probability of encountering a non-evaluable solution

during the activation period and subsequently freezing the optimization regardless of

Forcing

Regenerating

Waiting

Subpopulation strategy

Penalizing

0.99

0.98

15

Picking schemata

at random

0.99

0.98

#1s in H

1

20

perERC(0,700,20,50,H=(0o(H)***...))

0.97

0.96

10

0.97

0.96

0.95

5

0.95

0.94

0.93

0

4

6

8

10

12

Order of constraint schema H, o(H)

14

0.94

0

5

10

15

20

Order of constraint schema H, o(H)

Fig. 4.14 The left plot shows the average best solution fitness found and its standard error (across

500 EA runs) on OneMax (with l = 30 bits) as a function of the order of the constraint schema

o(H). For each setting shown on the abscissa, a Friedman test (significance level of 5 %) has been

carried out revealing that the subpopulation strategy performs best for o(H) = 2; there are no clear

winners for the other settings. The right plot shows the average best solution fitness obtained by

the subpopulation strategy as a function of both o(H) and the number of order-defining bits in

H with value 1 for the ERC perERC(0, 700, 20, 50, H). The straight line represents the expected

performance when picking a schema (i.e., the order-defining bits and their values) with a particular

order at random

122

the order and genetic material represented by a constraint schema. The performance

of the other strategies decreases more smoothly as a function of the order and the

quality of the genetic material represented, as can be seen from the right plot for the

subpopulation strategy.

The previous section provided evidence that it is possible to select a suitable (static)

constraint-handling strategy for an ERCOP offline if the ERCs are known in advance.

Inspired by this observation, this section outlines two strategies that learn either

offline (using a reinforcement learning approach) or online (using a multi-armed

bandit algorithm) when to switch between the static constraint-handling strategies

during the optimization process. Finally, the strategies are investigated for commitment relaxation ERCs.

Offline learning-based strategy. To learn offline when to switch between static

constraint-handling strategies during an optimization run, we use the tabular reinforcement learning (RL) algorithm, Sarsa() (Rummery and Niranjan 1994; Sutton

and Barto 1998). The general goal of an RL algorithm is to learn some optimal

policy , a mapping from an environmental state s S to an action a A(s), so

as to maximize some reward R. Sara() achieves this goal by estimating a so-called

action-value function Q(s, a), which represents the expected reward received after

taking action a in state s and following some policy thereafter.

To employ an RL algorithm we need to define a state s, the possible actions a,

and the reward R. Here, we characterize a state by the current population average

fitness and the current time step; we assume that fitness values lie in the interval

[0, 1], and that the optimization time is limited by T . To keep the number of total

states manageable, we bin both variables into 5 equally-sized intervals, resulting

in 25 states in total. In each state, we provide the agent with 5 actions, which are

the static constraint-handling strategies. The reward shall be the average fitness of

the final population to reflect our aim of performing well at the end of the search.

Alternatively, the reward may be the best solution fitness found.

We want to point out that some aspects need further consideration when

applying RL to dynamic constraints, such as ERCs. First, the number and set of

states visited during the optimization depend on how often and when non-evaluable

solutions are encountered during the search, and thus may vary with each optimization run. Secondly, if a non-evaluable solution is encountered, then the first action

(i.e., constraint-handling strategy) selected in a particular state is applied to all nonevaluable solutions encountered in that state.

Online learning-based strategy. To learn online when to switch between static

constraint-handling strategies, we consider the learning problem as a multi-armed

bandit (MAB) problem with the static strategies serving as independent arms. To

tackle the problem we employ an adaptive operator selection method known as the

123

dynamic multi-armed bandit (D-MAB) algorithm (Hartland et al. 2006, 2007; Costa

et al. 2008). The goal of the algorithm is to maximize the sum of rewards received

over a number of actions (or arms played) taken. D-MAB is dynamic in the sense

that it monitors the sequence of rewards obtained using statistical testing, and then

restarts the MAB on detecting a significant deviation in the sequence.6

Unlike the RL agent, a MAB algorithm requires that the play of an arm is followed

by a subsequent reward. We provide a reward immediately after the play of an arm,

and it is the raw fitness of the resulting solution, which is a common credit assignment

scheme.

Note, some alternative common credit assignment schemes are not directly

applicable in the presence of ERCs, such as ones that assign a credit based on the

fitness improvement of an offspring compared to its parent after applying a variation

operator to it. With ERCs, the parent would be the individual that is to be repaired and

the offspring the repaired individual after applying a constraint-handling strategy to

the parent. As we do not know the fitness of the parent because it is non-evaluable,

we cannot quantify by how much its fitness differs from the one of the repaired

individual.

Experimental setup. To evaluate the learning-based strategies for commitment

relaxation ERCs we use the same experimental setup as used in the previous section

(see Table 4.1) with the difference that the EA is equipped with an elitist reproduction

scheme, i.e., = 1. The reason for using a modified setup is that we specifically

tuned the EA to perform well on the test problems considered in this section.

For the RL-based strategy, denoted here by RL-EA, we use a training and testing

scheme (similar to Pettinger and Everson (2003)). In the training phase (consisting

of 5,000 EA runs), the RL agent estimates the action-value function Q(s, a), while

in the testing phase (consisting of 100 EA runs), the Q-function is frozen and the

greedy actions a are always selected in each state.7

Experimental results. Suppose we are faced with a closed-loop scenario that

is subject to the following two, a priori known, commitment relaxation ERCs:

ERC(0, 2000, 20, H = (10101 . . .)) and commRelaxERC(0, 2000, 20, H =

( . . . 101)). That is, one ERC constrains the first 5 solution bits, while the other

the last 3 bits. These two ERCs are inspired by change-over restrictions of instrument

parameters encountered in the closed-loop work by OHagan et al. (2005, 2007).

For D-MAB we set the threshold parameter to PH = 0.1, the tolerance parameter to = 0.01,

and the scaling factor to C = 1.

7 RL-EA also employed the -greedy action selection method ( = 0.1), optimistic initial values

for the action-value estimates, and replacing eligibility traces with the eligibility trace being set to

0 at the beginning of each EA run. The decay factor was set to = 1, the discount factor to = 1,

and the learning rate to = 0.1.

6

124

comm Relax (0,2000,20, H =( ... 101)) , N = 30, K = 2

1.0

Forcing

0.8

Regenerating

0.6

Waiting

0.4

Subpop.strategy

Penalizing

0.2

training phase

0

0

400

Timecounter t

Fig. 4.15 A plot showing the greedy actions a learnt by the RL agent for each state s. Training

was done across 5,000 different NK landscapes with N = 30 and K = 2. (For unvisited sates, a

default strategy would need to be selected)

It is unknown whether the schemata associated with the two ERCs represent good

or poor instrument setups. As in OHagan et al. (2005, 2007) we assume that the

fitness landscape to be optimized is subject to epistasis. Please refer to OHagan

et al. (2005, 2007), Allmendinger and Knowles (2011), Allmendinger (2012) for a

detailed description of the closed-loop problem and the ERCs.

We use NK landscapes (Kauffman 1989) to investigate the impact of the two ERCs

as a function of different levels of epistasis. Prior to applying RL-EA online we train

the RL agent offline on 5,000 different NK landscapes with N = 30 and K = 2,

which represent problems with low epistatis. Figure 4.15 shows the greedy actions

(optimal static strategies) a learnt by the agent for each state s during the training

phase. Clear patterns can be observed from the plot: the agent learned to use mainly

waiting at the beginning of the optimization process (to avoid introducing a search

bias early on), penalizing in the middle part of the optimization, and, depending on

the population average fitness, either forcing, waiting, or the subpopulation strategy,

in the final part of the optimization. Other policies, such as using only a repairing

strategy at the beginning of the optimization, were not learnt by the agent as they are

associated with the risk of converging to a homogeneous population state of which

it is difficult to escape if needed (e.g., if schemata represent poor genetic material).

Figure 4.16 compares how the policy learned by the RL agent fares against the

online-learning approach, D-MAB, and the static strategies themselves for NK landscapes with N = 30 and K = {3, 4}; using different problems for training and testing

allows us to assess the robustness of the policy learned. We can see from the plots

that although RL-EA performs poorly at the beginning of the search, at time step

t 800 the performance kicks up due to a change in the static strategy employed,

allowing RL-EA to be the best performing strategy at the end of the search. D-MAB

is not able to perform as well as RL-EA because it selects the currently most useful

static strategy (which is typically a repairing strategy) without accounting for future

consequences this might have. On the other hand, RL-EA is tuned here to optimize

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=3

0.7

0.65

Forcing

Regenerating

Waiting

Subpop. strategy

Penalizing

RL-EA

D-MAB

Unconstrained EA

0.6

0.55

0.5

500

1000

1500

0.75

0.75

2000

125

commRelaxERC(0,2000,20,H=(10101***...)),

commRelaxERC(0,2000,20,H=(*...**101)), N=30, K=4

0.7

0.65

Forcing

Regenerating

Waiting

Subpop. strategy

Penalizing

RL-EA

D-MAB

Unconstrained EA

0.6

0.55

0.5

Time counter t

500

1000

1500

2000

Time counter t

Fig. 4.16 Plots showing the population average fitness (we do not show the standard error as it was

negligible) obtained by the different constraint-handling strategies on NK landscapes with N = 30

and K = 3 (left) and K = 4 (right) as a function of the time counter t; results are averaged over

100 independent runs using a different randomly generated NK problem instance for each run.

All instances were subject to the commitment relaxation ERCs commRelaxERC(0, 2000, 20, H =

(10101 . . .)) and commRelaxERC(0, 2000, 20, H = ( . . . 101)). The results of Unconstrained EA were obtained by running the EA on the same problem instances but without the ERCs.

According to the Kruskal-Wallis test (significance level of 5 %), the final population average fitness obtained by RL-EA is significantly better than the one obtained with the second best strategy,

waiting, for both problems

the final performance only allowing it to adjust to the problem at hand. For instance, if

the would shorten the optimization time T , then the RL agent would learn a different

policy, while D-MAB would behave the same.

Overall, the strong performance of the RL-EA is encouraging, but we want to

mention that in order to achieve that performance, some tuning of the agent may be

required. For a more in-depth discussion on this topic and an experimental analysis

of alternative agent settings please refer to Allmendinger and Knowles (2011).

In this section, our focus shifts to online resource-purchasing strategies to cope with

commitment composite ERCs (see Sect. 4.3.3 for a description of the ERC). We give a

brief description of the strategies only, and refer the interested reader to Allmendinger

and Knowles (2010) for details.

To deal with this ERC a strategy needs to address three aspects:

1. Decide when and which composite (defined by a high-level constraint schema

H# ) is ordered thereby accounting for a lag of TL time steps for the composite to

arrive, and a budget of C limiting the usage of the composites.

2. Determine the storage cell into which a composite is stored once it arrives. As

the number of composites that can be maintained simultaneously is limited by

the number of storage cells #SC, this may also mean to decide which of the

storage cells is to emptied, i.e., which composite is removed, to make space for a

126

a storage cell after a shelf life of SL time steps and/or after is has been reused RN

times.

3. Deal with non-evaluable solutions, e.g., by selecting an alternative composite

from the storage.

We summarize and evaluate three resource-purchasing strategies (for use in a generational EA) that address the above-mentioned aspects in different ways: a just-in-time

strategy, a just-in-time strategy with repairing, and a sliding window strategy.

Just-in-time (JIT) strategy. This strategy avoids repairing by first scheduling the

evaluation of solutions intelligently and then making purchase orders so that composites arrive just in time for the scheduled experiment time. The scheduling involves

to arrange solutions of a population into contiguous groups based on the composites

they require so as maximize the availability of resources. For example, if a, b, c,

and d represent different composites required by solutions, then a potential grouping

would be bbbaddcc . . . . If composites are available in the storage cells because we

have ordered them previously (we call such composites old composites), then the

scheduler aims at using up these first so as to reduce the number of purchase orders

made. For example, suppose the composites aadcac are required, and composite c

is available in one of the storage cells and has 3 uses and 5 time steps of its shelf life

remaining. Then, by evaluating the solutions requiring c first, the evaluation schedule

ccdaaa will save us a purchase order since only two c composites are needed. At

any given time, JIT (and JIT with repairing) ensure that non-identical composites are

kept in storage.

Once an ordered composite arrives, it is stored in an empty storage cell or, if no

cell is empty, replaces an old composite that can be used in the smallest number of

evaluations within the subsequent generation. That is, in the latter case we account

for the remaining reuses and shelf lifes of old composites.

Just-in-time strategy with repairing (JITR). Avoiding repairing as done by JIT

may result in a waste of composite reuses as well as optimization time spent waiting

for composite orders to arrive. For example, suppose each solution of a population

requires a different composite, then up to (RN 1) reuses might be wasted. The

JIT with repairing (JITR) strategy aims at reducing wastage by repairing solutions

such that they use a composite that is nearly the one required (while maintaining

the remaining mechanisms of JIT). Solutions to be repaired are identified by first

clustering their composites using k-medoids (Kaufman and Rousseeuw 1990), and

then trying to find an assignment of solutions to clusters that minimize the total

Hamming distance of all repairs. The medoid composite of a cluster is the composite

that would be used to repair (using the static constraint-handling strategy, forcing) all

solutions in that cluster that require a different composite. To be able to control the

number of repairs needed to perform, we perform several rounds of clustering and

solution-to-cluster assignments for different values of k. The cluster configuration

with the smallest weighted sum score of the total Hamming distance of all repairs

and the number of clusters k is the one according to which we repair. Annealing the

127

allows us to keep the number of repairs low at the beginning of the search (i.e.,

strive for cluster configurations with many clusters and small total Hamming way

distances) and increase it toward the end (i.e., strive for cluster configurations with

few clusters and a large total Hamming way distances), which is a good strategy as

we have seen in the previous section.

Sliding window (SW) strategy. Unlike JIT and JITR, the sliding window (SW) strategy submits solutions for evaluation in the order they are generated by the EA, and

non-evaluable solutions are always repaired. To facilitate this process, the strategy

aims to maintain the most useful composites in storage by (i) ordering composites

pre-emptively every min(RN, SL) time steps so as to avoid empty storage cells and

(ii) ensuring that storage cells are filled with composites that were recently requested

by the optimizer.

To achieve the second aspect we maintain a sliding window defined here as a set

(t) containing composites that were requested most recently but were unavailable

at the time of the request. Consequently, whenever new composites are needed we

order the ones from (t) that have been added to this set most recently. To avoid

ordering the same composites, which results in a loss of the population diversity,

we apply mutation to the composites from (t) before ordering them (for simplicity

reasons we use a fixed per-bit mutation rate of 0.05).

We replace all composites in the storage cells upon the arrival of new composites.

In case a non-evaluable solution is encountered, we repair it by forcing it to use

a composite from the storage cell that has the smallest Hamming distance to the

actually required composite.

Experimental setup. We augment the three online resource-purchasing strategies on

the same elitist generational EA as used in Sect. 4.5. As the test problem we consider

a MAX-SAT (Zhang 2001) problem instance with l = 50 binary variables.8 We

choose the order-defining bits of the high-level constraint schema H# at random at

each run but, of course, use the same schemata across the strategies analyzed.

Experimental results. First, we want to investigate how the key parameters of a commitment composite ERC affect the performance of the three conceptually different

online resource-purchasing strategies. With SW the performance depends crucially

on the number of storage cells #SC and the reuse number RN, as can also be observed

from the left plot of Fig. 4.17; SW performs better as the number of storage cells

increases and/or the reuse number decreases. The reason for this pattern is that, with

8

The instance considered is a uniform random 3-SAT problem and can be downloaded online at

http://people.cs.ubc.ca/~hoos/SATLIB/benchm.html; the name of the instance is uf50-218/uf5001.cnf. The instance consists of 218 clauses and is satisfiable. We treat this 3-SAT instance as a

MAX-SAT optimization problem, with fitness calculated as the proportion of satisfied clauses.

128

Sliding window - Probability of achieving the

population average fitness of an ERC-free search

0.6

64

0.5

0.4

40

0.3

28

0.2

16

0.1

0

3

12

21

30

39

48

Number of storage cells #SC

0.5

52

Reuse number RN

52

Reuse number RN

population average fitness of an ERC-free search

0.6

64

0.4

40

0.3

28

0.2

16

0.1

0

0

12

24

36

Time lag TL

48

Fig. 4.17 Plots showing the probability of SW (left) and JIT (right) of achieving the population

average fitness of our base algorithm obtained in an ERC-free environment given a budget and

time limit of C = T = 1,500. For SW this probability is shown as a function of #SC and RN for

the ERC commCompERC(o(H# ) = 30, #SC, TL =10, RN, SL = RN), and for JIT it is shown as a

function of TL and RN for the ERC commCompERC(o(H# ) = 10, #SC =10, TL, RN, SL = RN);

cost parameters were set to corder = 0, ctime_step = 1, and C = 1,500

SW, more storage cells means that the probability of having a required composite

available increases, which in turn reduces the number of repairs. On the other hand, a

smaller reuse number (or shorter shelf life SL) shortens the time gap between asking

for a composite, i.e., adding it to the sliding window, and having it available in a

storage cell.

The performance of a just-in-time strategy, such as JIT and JITR, depends largely

on the time it takes for a resource to arrive once ordered. Consequently, we observe

from the right plot of Fig. 4.17 that the performance of JIT (also for JITR) improves

with shorter time lags TL. An increase in the reuse number RN (or shelf life SL)

yields a slight performance improvement too. The reason for this is that composites

can be kept for longer in the storage cells and thus allow for a more efficient usage of

old composites. A similar effect can be achieved by increasing the number of storage

cells SC (results not shown here).

While JIT and JITR perform similarly for large budgets, there are differences

for scenarios where budget is a limiting factor as can be seen from the right plot of

Fig. 4.18. For small budgets, in the range 0 < c 600, 0 ctime_step 0.5, JITR is

able to outperform JIT as repairing allows the evaluation of more solutions while JIT

would have to wait for suitable composites to arrive. The weak performance of JIT

for small budgets is also apparent when comparing it to SW (left plot of Fig. 4.18).

For large budgets c > 1,200, JIT is able to match and sometimes even outperform

JITR and SW as it does not introduce any search bias coming from repairing.

In the previous experiment, the number of storage cells was relatively low, which

is beneficial for SW. An increase in #SC means that more composites are regularly

ordered to fill all the storage cells. This approach is expensive and dampens the

performance of SW when compared to JIT (and JITR) as can be observed from

Fig. 4.19.

1.4

1.6

1.4

10

1.2

1

10

0.8

101

0.6

0.4

10

0.2

0

104

Cost per time step ctime_step

1.6

400

800

1200

Cost counter c

101

0

10

1.2

1

-1

10

0.8

10-2

0.6

0.4

-3

10

0.2

0

-1

10

0

129

-4

10

1600

400

800

1200

Cost counter c

1600

Fig. 4.18 Plots showing the ratio P(f (x) > fJIT )/P(f (x) > fSW ) (left) and P(f (x) > fJITR )/P(f (x) >

fJIT ) (right) as a function of c and ctime_step for the ERC commCompERC(o(H# ) = 10, #SC =

5, TL = 5, RN = 30, SL = 30) and corder = 1. Here, x is a random variable that represents

solutions drawn uniformly at random from the search space and f the population average fitness

obtained with policy . If P(f (x) > f )/P(f (x) > f ) > 1, then strategy is able to achieve a

higher average best solution fitness than strategy and a greater advantage of is indicated by a

darker shading in the heat maps; similarly, if P(f (x) > f )/P(f (x) > f ) < 1, then is better than

and a lighter shading indicates a greater advantage of

4.8 Conclusion

In this chapter we have considered a new type of (dynamic or temporary) constraint

that differs in several aspects from the traditional hard and soft constraints. Hard

constraints define the feasible region in the search space, and soft constraint express

objectives or preferences on solutions, while the constraints we discussed here specify

the set of solutions in the search space that can be evaluated at any moment in time.

That is, a solution that violates one of these constraints cannot be evaluated at the

moment although it may be a feasible solution to the problem. This constraint type

is called ephemeral resource constraint (or ERC) and is commonly encountered

in closed-loop optimization problems, where it models limitations on the resources

needed to construct and/or evaluate solutions.

constraint schema H# , o(H#)

the ratio P(f (x) > fJIT )/

P(f (x) > fSW ) as a function

of the number

of #SC and o(H# ) for the ERC

commCompERC(o(H# ), #SC,

TL = 25, RN = 25, SL = 25),

corder =ctime_step = 1,

C = 1,500. Please refer to

the caption of Fig. 4.18 for

an explanation of the

performance metric

50

10

42

10

34

10

26

10

18

10

10

2

3

12

21

30

39

48

10

-1

-2

-3

130

We pursued three goals in this chapter. First, we have summarized the framework

and terminology for describing ERC problems, and defined three ERC types that arise

commonly in practical applications including (i) absence of resources at regular time

intervals (periodic ERCs), (ii) temporary commitment to a certain resource triggered

on using that resource (commitment relaxation ERCs), and (iii) an ERC where costly

resources need to be purchased in advance, kept in capacity-limited storage, and

used up within a certain number of experiments or a fixed time frame (commitment

composite ERCs).

Secondly, we have extended our previous work with a theoretical study focused on

understanding the fundamental effects of ERCs on simple evolutionary algorithms

(EAs). Using the concept of Markov chains, the study concluded that (i) an order

relation-based selection operator, such as tournament selection, is more robust to

simple ERCs than a fitness proportionate-based selection operator, and (ii) while an

EA with a non-elitist generational reproduction scheme converges more quickly to

some optimal population state than with a non-elitist steady state scheme when the

ERC is active, the opposite is the case when the ERC is inactive. This result implies

that ERCs should be accounted for when tuning EAs for ERCOPs.

Third, we have summarized and evaluated empirically several of the constrainthandling methods we have proposed for handling ERCs including static and learningbased strategies (Sects. 4.5 and 4.6), as well as resource-purchasing strategies for

dealing with commitment composite ERCs (Sect. 4.7). Generally, the empirical study

revealed that ERCs affect the performance of an optimizer and that different strategies should be favored as a function of the ERC and its parameters. Moreover, we

have demonstrated here and in more detail in our previous work (Knowles 2009;

Allmendinger and Knowles 2010, 2011, 2013) that the effect of a particular ERC

is similar across different problem types, meaning that knowing about the ERC is

sufficient to select a constraint-handling strategy. Overall, we can therefore say that

if the ERCs are known in advance, then a promising strategy is one that learns offline

how to deal best with the ERCs during the optimization. As an example, in this

chapter we have seen that good results can be achieved with a reinforcement learning approach that learns offline when to switch between different static strategies

during the optimization.

Although we have established some of the building blocks for dealing with ERCs,

there remains much else to learn about the effects of ERCs on search and how to

handle them. We now discuss several directions for future research toward achieving

this goal.

Gaining a more robust understanding for the search strategies developed. To

gain a more robust understanding of the behavior of the search strategies developed, it

would be beneficial to consider further and perhaps more realistic fitness landscapes

131

(featuring also real or mixed integer variables) than the ones we considered so far. Of

course, it would be ideal to validate the search strategies on real-world closed-loop

problems featuring real resource constraints. However, this approach is generally not

realistic due to time and/or budgetary requirements. The next best thing we can do is

to simulate a fitness landscape based on data obtained from real-world experiments.

This is the approach we have taken in Allmendinger and Knowles (2011), and more

studies of this kind are needed.

Further theoretical analysis of resourcing issues. In Sect. 4.4 we have used Markov

chains to analyze theoretically the effect of a particular ERC type on simple EAs.

Although our analysis used a simplified optimization environment (two solution types

only), valuable observations were made with respect to the applicability of different

selection and reproduction schemes. We also gained some understanding about the

impact of ERCs on evolutionary search, which ultimately, may help us in the design

of effective and efficient search strategies for closed-loop optimization. However,

our theoretical results were limited in the sense that we did not derive mathematical equations relating, for instance, ERC configurations to optimal EA parameter

settings. It remains to be seen whether it is possible to derive such expressions, and

how applicable they would be in practice. A number of recent advances in EA theory

might present the possibility of understanding ERCs more deeply, including drift

analysis (Auger and Doerr 2011) and the fitness level method (Chen et al. 2009;

Lehre 2011).

Understanding the effects of non-homogeneous experimental costs in closedloop optimization. So far, we have made the assumption that all solution evaluations

take equal time or resources. This need not be the case. For instance, when dealing

with commitment composite ERCs, it is a very realistic scenario that the composites

to be ordered vary in their prices and delivery periods. Under a limited budget, this

scenario might cause an optimizer not only to follow fitness gradients but also to

account for variable experimental costs. Hence, further work should investigate how

to trade-off these two aspects effectively. For inspiration, we may look at strategies

employed in the Robot Scientist study (King et al. 2004), where this scenario has

been encountered within an inference problem rather than an optimization problem.

Broadening the application of machine learning and surrogate modeling techniques in closed-loop optimization. We have shown (in Sect. 4.6) that evolutionary

search augmented with machine learning techniques, such as reinforcement learning

(RL), can be a powerful optimization tool to cope with ERCs. To increase the applicability of learning-based optimizers to different types of optimization problems, one

could also try combining offline learning with online learning. For instance, RL

can be used to learn offline a policy until some distant point in time, and this policy can then be refined or slightly modified online using the anticipation approach

of (Bosman 2005). Another avenue worth pursuing is to extend an optimizer with

surrogate modeling techniques (Jin 2011) in order to help cope with ERCs. In the

simplest case, surrogate modeling would be used to approximate the objective values

of solution that cannot be evaluated due to a lack of resources. More sophisticated

132

approaches might use surrogate modeling to scan the search space for promising

regions from which solutions are then created. If the active ERCs are known, or can

be well predicted, then scanning can be used to avoid the non-evaluable parts of the

search space, while still concentrating the search on the most promising areas in

terms of fitness.

References

Allmendinger R (2012) Tuning evolutionary search for closed-loop optimization. PhD thesis,

Department of Computer Science, University of Manchester, UK

Allmendinger R, Knowles J (2010) On-line purchasing strategies for an evolutionary algorithm

performing resource-constrained optimization. In: Proceedings of parallel problem solving from

nature, pp 161170

Allmendinger R, Knowles J (2011) Policy learning in resource-constrained optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 19711978

Allmendinger R, Knowles J (2013) On handling ephemeral resource constraints in evolutionary

search. Evol Comput 21(3):497531

Auger A, Doerr B (2011) Theory of randomized search heuristics. World Scientific, Singapore

Bck T, Knowles J, Shir OM (2010) Experimental optimization by evolutionary algorithms.

In: Proceedings of the genetic and evolutionary computation conference (companion),

pp 28972916

Bedau MA (2010) Coping with complexity: machine learning optimization of highly synergistic

biological and biochemical systems. In: Keynote talk at the international conference on genetic

and evolutionary computation

Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University

Press, Cambridge

Bosman PAN (2005) Learning, anticipation and time-deception in evolutionary online dynamic

optimization. In: Proceedings of genetic and evolutionary computation conference, pp 3947

Bosman PAN, Poutr HL (2007) Learning and anticipation in online dynamic optimization with

evolutionary algorithms: the stochastic case. In: Proceedings of genetic and evolutionary computation conference, pp 11651172

Branke J (2001) Evolutionary optimization in dynamic environments. Kluwer Academic Publishers,

Dordrecht

Caschera F, Gazzola G, Bedau MA, Moreno CB, Buchanan A, Cawse J, Packard N, Hanczyc MM

(2010) Automated discovery of novel drug formulations using predictive iterated high throughput

experimentation. PLoS ONE 5(1):e8546

Chen T, He J, Sun G, Chen G, Yao X (2009) A new approach for analyzing average time complexity

of population-based evolutionary algorithms on unimodal problems. IEEE Trans Syst Man Cybern

B 39(5):10921106

Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng

191(1112):12451287

Costa LD, Fialho A, Schoenauer M, Sebag M (2008) Adaptive operator selection with dynamic

multi-armed bandits. In: Proceedings of genetic and evolutionary computation conference,

pp 913920

Davis TE, Principe JC (1993) A Markov chain framework for the simple genetic algorithm. Evol

Comput 1(3):269288

Doob JL (1953) Stochastic processes. Wiley, New York

Finkel DE, Kelley CT (2009) Convergence analysis of sampling methods for perturbed Lipschitz

functions. Pac J Optim 5:339350

133

Goldberg DE, Segrest P (1987) Finite Markov chain analysis of genetic algorithms. In: Proceedings

of the international conference on genetic algorithms, pp 18

Hartland C, Gelly S, Baskiotis N, Teytaud O, Sebag M (2006) Multi-armed bandits, dynamic

environments and meta-bandits. In: NIPS workshop online trading of exploration and exploitation

Hartland C, Baskiotis N, Gelly S, Sebag M, Teytaud O (2007) Change point detection and metabandits for online learning in dynamic environments. In: CAp, pp 237250

He J, Yao X (2002) From an individual to a population: an analysis of the first hitting time of

population-based evolutionary algorithms. IEEE Trans Evol Comput 6(5):495511

Herdy M (1997) Evolutionary optimization based on subjective selection-evolving blends of coffee.

In: European congress on intelligent techniques and soft computing, pp 640644

Holland JH (1975) Adaptation in natural and artificial systems. MIT Press, Boston

Horn J (1993) Finite Markov chain analysis of genetic algorithms with niching. In: Proceedings of

the international conference on genetic algorithms, pp 110117

Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.

Swarm Evol Comput 1(2):6170

Judson RS, Rabitz H (1992) Teaching lasers to control molecules. Phys Rev Lett 68(10):15001503

Kauffman S (1989) Adaptation on rugged fitness landscapes. In: Lecture notes in the sciences of

complexity, pp 527618

Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley,

New York

King RD, Whelan KE, Jones FM, Reiser PGK, Bryant CH, Muggleton SH, Kell DB, Oliver SG

(2004) Functional genomic hypothesis generation and experimentation by a robot scientist. Nature

427:247252

Klockgether J, Schwefel H-P (1970) Two-phase nozzle and hollow core jet experiments. In: Engineering aspects of magnetohydrodynamics, pp 141148

Knowles J (2009) Closed-loop evolutionary multiobjective optimization. IEEE Comput Intell Mag

4(3):7791

Lehre PK (2011) Fitness-levels for non-elitist populations. In: Proceedings of the conference on

genetic and evolutionary computation, pp 20752082

Liepins GE, Potter WD (1991) A genetic algorithm approach to multiple-fault diagnosis. In: Handbook of genetic algorithms, pp 237250

Mahfoud SW (1991) Finite Markov chain models of an alternative selection strategy for the genetic

algorithm. Complex Syst 7:155170

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132

Nakama T (2008) Theoretical analysis of genetic algorithms in noisy environments based on

a Markov model. In: Proceedings of the genetic and evolutionary computation conference,

pp 10011008

Nguyen TT (2010) Continuous dynamic optimisation using evolutionary algorithms. PhD thesis,

University of Birmingham

Nix A, Vose MD (1992) Modeling genetic algorithms with Markov chains. Ann Math Artif Intell

5:7988

Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York

Norris JR (1998) Markov chains (Cambridge Series in Statistical and Probabilistic Mathematics).

Cambridge University Press, Cambridge

OHagan S, Dunn WB, Brown M, Knowles J, Kell DB (2005) Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of

the metabolomes of human serum and of yeast fermentations. Anal Chem 77(1):290303

OHagan S, Dunn WB, Knowles J, Broadhurst D, Williams R, Ashworth JJ, Cameron M, Kell DB

(2007) Closed-loop, multiobjective optimization of two-dimensional gas chromatography/mass

spectrometry for serum metabolomics. Anal Chem 79(2):464476

Pettinger JE, Everson RM (2003) Controlling genetic algorithms with reinforcement learning. Technical report, The University of Exeter

134

Rechenberg I (2000) Case studies in evolutionary experimentation and computation. Comput Methods Appl Mech Eng 24(186):125140

Reeves CR, Rowe JE (2003) Genetic algorithmsprinciples and perspectives: a guide to GA theory.

Kluwer Academic Publishers, Boston

Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report

CUED/F-INFENG/TR 166, Cambridge University Engineering Department

Schwefel H-P (1968) Experimentelle Optimierung einer Zweiphasendse, Teil 1. AEG Research

Institute Project MHD-Staustrahlrohr 11.034/68, Technical report 35, Berlin

Schwefel H-P (1975) Evolutionsstrategie und numerische Optimierung. PhD thesis, Technical University of Berlin

Shir O, Bck T (2009) Experimental optimization by evolutionary algorithms. In: Tutorial at the

international conference on genetic and evolutionary computation

Shir OM (2008) Niching in derandomized evolution strategies and its applications in quantum

control: a journey from organic diversity to conceptual quantum designs. PhD thesis, University

of Leiden

Small BG, McColl BW, Allmendinger R, Pahle J, Lpez-Castejn G, Rothwell NJ, Knowles J,

Mendes P, Brough D, Kell DB (2011) Efficient discovery of anti-inflammatory small molecule

combinations using evolutionary computing. Nat Chem Biol (to appear)

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

Syswerda G (1989) Uniform crossover in genetic algorithms. In: Proceedings of the international

conference on genetic algorithms, pp 29

Syswerda G (1991) A study of reproduction in generational and steady state genetic algorithms.

In: Foundations of genetic algorithms, pp 94101

Thompson A (1996) Hardware evolution: automatic design of electronic circuits in reconfigurable

hardware by artificial evolution. PhD thesis, University of Sussex

Vaidyanathan S, Broadhurst DI, Kell DB, Goodacre R (2003) Explanatory optimization of protein

mass spectrometry via genetic search. Anal Chem 75(23):66796686

Vose MD, Liepins GE (1991) Punctuated equilibria in genetic search. Complex Syst 5:3144

Zhang W (2001) Phase transitions and backbones of 3-SAT and maximum 3-SAT. In: Proceedings

of the international conference on principles and practice of constraint programming, pp 153167

Chapter 5

for Constrained Evolutionary Optimization

Sanghoun Oh and Yaochu Jin

optimization problems (COPs). To solve these problems, a variety of evolutionary

algorithms have been proposed by incorporating different constraint-handling techniques. However, many of them have difficulties in achieving the global

optimum due to the presence of highly constrained feasible regions in the search

space. To effectively address the low degree of feasibility, this chapter presents an

incremental approximation strategy-assisted constraint-handling method in combination with a multi-membered evolution strategy. In the proposed approach, we

generate an approximate model for each constrained function with increasing accuracy, from a linear-type approximation to a model that has a complexity similar to

the original constraint functions, thereby manipulating the complexity of the feasible

region. Thanks to this property, our constrained evolutionary optimization algorithm

can acquire the optimal solution conceivably. Simulations are carried out to compare

the proposed algorithm with well-known references on 13 benchmark problems and

three engineering optimization problems. Our computational results demonstrate that

the proposed algorithm is comparable or superior to the state of the art on most of

the test problems used in this study and a spring design optimization problem.

Keywords Constrained optimization Evolutionary algorithms Approximation

Surrogate

S. Oh (B)

School of Information and Communications,

Gwangju Institute of Science and Technology, Gwangju 500-712, Korea

e-mail: oosshoun@gist.ac.kr

Y. Jin

Department of Computing, University of Surrey,

Guildford, Surrey GU2 7XH, UK

e-mail: yaochu.jin@surrey.ac.uk

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_5

135

136

S. Oh and Y. Jin

5.1 Introduction

Evolutionary algorithms (EAs) have been widely employed to solve (COP)s, which

are commonly seen in solving real-world optimization problems (Jin et al. 2010; Oh

et al. 2011). Without loss of generality, COPs can be formulated as a minimization

problem subject to one or more (in)equality constraints as follows:

minimize f (x), x = (x1 , . . . , xn ) R n

subject to h i (x) = 0, i = {1, 2, . . . , r }

g j (x) 0, j = {r + 1, . . . , m},

(5.1)

(5.2)

(5.3)

the parametric constraints of x i xi x i , i = {1, . . . , n}, f (x) is an objective

function, h i (x) and g j (x) are r equality constraints and m r inequality constraints,

respectively.

In COPs, conventional evolutionary approaches are generally afflicted with

the highly constrained feasibility, particularly those with separated, small feasible

regions. To cope with this limitation, a considerable number of evolutionary optimization algorithms have been suggested by incorporating various constraint-handling

techniques: separation of objective and constraints, special operators and hybrid

techniques (Coello 2002; Michalewicz and Schoenauer 1996).

1. Penalty functions reduce a COP to an unconstrained optimization problem by

penalizing the objective function with a penalty factor of j and constraints violations. The penalized objective function can be defined as follows:

F(x, ) = f (x) +

r

j Hj +

= f (x) +

j Gj

(5.4)

j=r +1

j=1

m

m

j Gj ,

(5.5)

j=1

where H j = |h j (x)| and G j = max 0, g j (x) functions of the constraints

h j and g j , and and are constants which are set to 1 or 2, respectively. By

virtue of introducing a small tolerance value of , equality constraints can be

converted into inequality constraints, i.e., |H j | 0 (Coello 2002). Thus,

given that = = 1, the original formula (5.4) can be reformulated as (5.5),

where Gj indicates inequality constraints Gj {|H j | , G j }. The penalty

function-based approaches may work well for some COPs; however, it is not

straightforward to determine an optimal value for the penalty factor. In particular,

a too small value of may mislead the EA because of insufficient penalty. By

contrast, a too large penalty factor may prevent the EA from finding the optimal

solution. To determine the penalty factor, four types of penalty handling methods

137

such as death penalties, static penalties, dynamic penalties, and adaptive penalties

are proposed (Coello 2002).

2. Another constraint-handling approach is the separate consideration between the

objective and the constraints during optimization. It is typically categorized by

three major techniques. The first approach was a (SRES) proposed by Runarsson

and Yao in (Runarsson and Yao 2000). The aim of SRES was to balance the influence of the objective function and the constraints in selection by using the dominance comparison between the fitness and constraint violations by the use-defined

parameter of P f . Coello and Montes suggested a method (Coello and Montes

2002) inspired by a well-known constraint technique in the niched-Pareto genetic

algorithm. It designed a new dominance-based selection scheme to integrate constraints into the fitness function used for global optimization. Montes and Coello

introduced another method based on a simple diversity mechanism (Montes and

Coello 2005).

3. A few ad hoc constraint-handling techniques, viz., special representations and

operators, have also been suggested (Coello 2002). This fundamental idea is to

simplify the shape of the feasible search space and to preserve feasible solutions

found during the evolutionary process. Several examples are Daviss work (Davis

and Mitchell 1991), random key (Bean 1994), GENOCOP (Michalewicz 1996),

constraint consistent GAs (Kowalczyk 1997), locating the boundary of the feasible

region (Glover and Kochenberger 1996), and a homomorphous mapping (HM)

to transform COP into an unconstrained one using a high-dimensional cube and

a feasible search space (Koziel and Michalewicz 1999).

4. Finally, hybrid techniques have also been proposed. They combine either a mathematical or heuristic approach such as Lagrangian multipliers (Adeli and Cheng

1994), fuzzy logic (Le 1995), immune system (Smith et al. 1993), cultural algorithms (Reynolds 1994), differential evolutions (Das and Suganthan 2011), and

ant colony optimization (Dorigo and Gambardella 1997).

This chapter is concerned with constraint optimization problems that are affected

by the highly constrained feasible regions, i.e., separated and small feasible regions.

To systematically alleviate the low degree of feasibility, we propose the incremental

approximation model-assisted constraint-handling approach. The model starts with

a rough approximation of the constraints using a linear model. As the evolution

proceeds, the accuracy of the approximate constrained functions should increase

gradually. At the end of the search process, the accuracy of approximate constraint

functions is desired. We term this approach, where an originally stationary optimization problem is converted into a dynamic optimization problem (Paenke et al. 2006;

Nguyen et al. 2012; Jin et al. 2013) to make the problem easier to solve. Here, the

approximate model, also known as a (Jin 2011), plays a key role.

In this study, we adopt two representative methods, i.e., Neural Network and

(GP), for constructing the approximate models. The proposed algorithms have been

compared with a few state-of the-art algorithms on 13 benchmark problems and a

tension/compression design optimization problem.

138

S. Oh and Y. Jin

problems has been reported. For example, quadratic approximation models have

been used to estimate both the objective function and constraints (Wanner et al.

2005), which has been shown to enhance the convergence performance. In addition,

surrogate models have also been used to approximate computationally expensive

constraint functions in (Goh et al. 2011; Regis 2014). However, none of the above

work intentionally controls the complexity of the approximate model to manipulate

the size of feasible region.

The rest of this chapter is organized as follows. In Sect. 5.2.1, we discuss our

hypothesis and the basic idea of the work, followed by Sect. 5.2.2 that provides a

brief description of the evolutionary algorithm used in this work, and the details of

our approach for COPs are presented in Sect. 5.2.3. Empirical studies on the test

functions and spring design optimization are presented in Sect. 5.3. This paper is

concluded with a brief summary in Sect. 5.4.

Algorithm

5.2.1 Incremental Approximation of the Constraint Functions

The highly constrained feasible regions in COPs, as illustrated in Fig. 5.1, prevent

evolutionary search algorithms from achieving the global optimum (Jin et al. 2010).

Here, 1 is a feasibility proportion in accordance with whole search spaces. To

cope with this problem, we synthetically enlarge the feasible regions by means of

approximating the constraint functions.

In the first stage of evolutionary search, the proposed model endeavors to approximate the original constraint functions roughly by using the small number of sampling

data for training. Step-by-step, we increase the accuracy of approximate constraints

by increasing the number of samplings. In this manner, we are able to secure a large

feasible region in the beginning and resort to the original feasible region at the end

of evolutionary search.

We adopt the incremental approximation technique for accomplishing good

approximate models of constraints since it can satisfy our assumption well; that

is, the accuracy is increased according to the increasing number of training data.

Figure 5.2 shows the procedures of our incremental approximation of nonlinear constraints. In the beginning, a smaller number of training data are sampled from the

constrained functions to obtain the rough approximation of the constraints, as shown

in Fig. 5.2b. As the number of sampled data points increases, our approximation

It is defined as |F|/|S|, where |S| is the random solutions generated (S =1,000,000) and |F| is the

number of feasible solutions found out of the total |S| solutions randomly generated (Michalewicz

and Schoenauer 1996).

(a)

g2 (x) = (x1 6)2 +( x2 5) 82.81 0

139

(b)

g1 (x) = x21 x2 +1 0

Feasibility proportion:

Feasible Regions

= 0.8560%

Feasibility proportion:

= 0.0066%

Feasible Regions

Fig. 5.1 Illustrations of feasible regions and feasibility proportion in two benchmark problems.

a Benchmark problem: g06. b Benchmark problem: g08

(a)

(b)

Original Feasible Regions

(c)

g1 (x)

g1 (x)

Approximate Feasible Regions

g1 (x)

g1 (x)

g2 (x)

g2 (x)

g2 (x)

g2 (x)

g1 (x)

g2 (x)

Fig. 5.2 Synthetical change of the feasible regions by incremental approximation models of two

constrained functions. a The design space has small feasible regions with two nonlinear constrained

functions. b With a linear approximation of both constraints, the approximated feasible regions

become larger. c The approximate nonlinear constraint functions become more accurate to original

constraints

of the nonlinear constraints becomes more accurate, as described in Fig. 5.2c. Note

however that the system should switch back to the original constraints at the end

of the evolutionary optimization so that the obtained optimal solutions are always

feasible.

To successfully achieve the global optimum, we adopt a multi-membered evolution strategy (, )-ES based on the stochastic ranking (SR) selection. In our

EOA, each individual is composed of a set of two real-valued vectors (x, ) =

{(x1 , . . . , xn ), (1 , . . . , n )}, where x is the design variable, is the step size, and n

140

S. Oh and Y. Jin

is the dimension of the given problem. In the initialization, both vectors are generated

by a uniform distribution within a lower bound of x j and an upper bound of x j , and

variables.

To produce high quality offspring () from parent (), genetic operators such

as a global intermediate recombination and Gaussian mutation are applied. The

former operator generates a new step size as performing the arithmetic average of

both individuals, which are stochastically selected from the parent population. This

operator is formulated as follows:

(g)

h, j =

i, j + k, j

,

2

(5.6)

index from i. Its recombination operator is iterated until offspring are generated.

After the first operator, we will update mean step sizes by virtue of a log-normal

rule (5.7) for the mutation operator.

(g+1)

(g)

(5.7)

h, j = h, j exp N (0, 1) + N j (0, 1) ,

is an expected rate of convergence which is set to 1, and N (0, 1) is the normal

distribution with a zero expectation and one variance. Then, each design variable is

mutated in the following manner:

(g+1)

x h, j

(g)

(g+1)

= x h, j + h, j

N j (0, 1).

(5.8)

selection scheme, to balance between the objective and constraint violations. In this

selection, a probability (P f ) should be set to use only the objective function for

comparisons in ranking of the infeasible solutions (Runarsson and Yao 2000). Note

that in our work, we utilize our designated constraints for calculating the constraint

violations:

m

max 0, g j (x) ,

(5.9)

Gj (x) =

j=1

where Gj (x) denotes the sum of all constraint violations and the constant is set

to 1. Our defined constraints are called the synthesized constraints2 of g j (x)

{g j (x), g j (x)}.

Given the pair of objective and constraint violations ( f (x j ), G(x j )), where x j

denotes the solution of the jth offspring individual, j = {1, , }, they will be

They are assembled as comparing the degree of feasibility between the original constraint of g j (x)

and the incremental approximate constraint of g j (x).

2

141

ranked according to the stochastic ranking algorithm. The details of the stochastic

ranking algorithm can be found in (Runarsson and Yao 2000).

In our algorithm, all equality constraints are modified on inequalities by introducing a tolerance (), i.e., |h j (x)| 0, where the constant is set to 1.

The parameter is updated according the generation number, as formulated below

(Hamida and Schoenauer 2002).

(t + 1) =

(t)

.

(5.10)

Here, the initial value of the tolerance 0 and the allowable value of tolerance are

denoted as 3 and 1.0168, respectively, as recommended in (Hamida and Schoenauer

2002). This approach is analogous to our proposed approximation of constraints due

to the concept of the dynamic setting of the tolerance. In other words, the accuracy

of the altered constraints should increase gradually during generations. Thanks to its

property, we need not apply our approximate mechanism into equality constraints.

Algorithm

We propose the incremental approximation approach to handle highly constrained

feasible regions by synthetically enlarging feasible regions. The proposed constrainthandling technique is embedded in our evolution strategy using the SR selection. The

main components of the proposed evolutionary algorithm are depicted in Fig. 5.3.

The major feature of our algorithm is that a set of synthesized constraints will

be created and used in the SR selection. Figure 5.4 describes the procedure of how

to create the constraints. In the initial step, we derive the approximate models with

respect to the original constrained functions by the incremental approximation technique. Based on this handling method, we are able to attain a synthesized search space

larger than original. However, the approximate constraints can occasionally lead to

proposed constrained

evolutionary optimization

algorithm

142

S. Oh and Y. Jin

Manipulate

synthesized constraints

No

?

Yes

j=1

Yes

gj is inquality

?

Re-trained approximate

constraint gj

No

j =j+1

No

NF gj N Fgj

?

Yes

Add gj into

Add gj into

synthesized constraints

synthesized constraints

j = Noc

No

?

Yes

Fig. 5.4 Synthesized constraints via a competition between original and approximate constraints,

where N F is the number of feasible solutions and Noc is the number of original constraints

synthesized constraints by competing between the approximated constraints and the

given constraints on the basis of the number of feasible solutions in the population.

Thanks to its manipulation of both constraints, we are able to navigate evolutionary

algorithm to the global optimum. In particular, for the jth constraint, if the original constraint function g j (x) attains more feasible solutions than the approximate

constraint g j (x), the original constraint function will be included in the synthesized

constraint g j (x) = g j (x). Otherwise, the approximate constraint function will be

included as g j (x) = g j (x). Also, in case of the equality constraint, we regard the

original constraint as the synthesized constraint without comparing with the approximate model for the sake of simplicity, partly because the dynamically set tolerance

works in a sense similar to approximate constraints.

To properly update the approximate models as evolution

proceeds, we specify the

k

(i 1)2 , where tk is

updated generations as tk = tk1 +10(k 1)2 = t0 +10 i=1

the generation number in which the incremental approximation model is re-trained,

the initial generation t0 is set to 0, and k is the sampling times k = {1, 2, . . . , kmax }.

143

However, the condition of tkmax tmax should be satisfied, where tmax is the allowed

maximum number of generations. During the remaining generations of tmax tkmax ,

only the original constrained functions are considered for guaranteeing the obtained

optimal solution, avoiding the under-fitting problem. Also, we should formulate

how many samples are used for training our approximation model to approximate

constrained functions. In this work, we heuristically designate the number of the

samples Nk = n j k 2 , where n j is the number of design variables involved on the

jth constraint function and k is the number of sampling times k = {1, 2, . . . , kmax }.

For instance, in the initial generation (k = 1) of approximate constraint functions

on g08, each pair of training data (2 12 ) is sampled individually, because both

constraints of g1 (x) = x12 x2 + 1 0 and g2 (x) = 1 x1 (x2 4)2 0

consist of only two variables of x1 and x2 . Based on two sampled data, we obtains

two approximate models derived by GP, one of representative symbolic regression

models, with regard to two constraints of g08, i.e., g 1 (x) = 3x1 x2 + 1 0 and

g 2 (x) = x1 x2 + 11 0, as shown in Fig. 5.2b. Later, we compare the number of

feasible solutions with regard to each approximate constraint of g j and the original

constraint of g j , j = {1, 2}. Based on the comparisons, we create a set of synthesized

constraints, i.e., g j (x) = {g 1 (x), g 2 (x)}, since all approximate constraints result in

more feasible solutions than original ones.

Our assumption is that the initial approximate models start from a simple model

such as a linear approximation of the nonlinear constraints. Then we increase the

number of samples as evolution proceeds. Therefore, we can achieve more accurate

approximate models. In particular, at the sixth sampling time k

= 6 of g08, our

6

(i 1)2 , and

approximate models are updated in 550 generation, t6 = t0 +10 i=1

2

generate 72 samples following the defined rule as N6 = 26 . Based on the sampled

data, we approximate both constraints as g 1 = x12 x2 + cos(sin(x2 )) 0 and g 2 =

1x1 (x2 4)2 0 by GP (see Fig. 5.2c). At this time, we comprise the synthesized

constraints g j (x) = {g 1 (x), g2 (x)} by comparing approximate constraints with the

original ones according to the feasibility degrees.

The location of the samples is determined by a (LHS) which generates an arbitrary

number of dimensions, whereby each sample is the only one in each axis-aligned

hyperplane containing it (Jin and Branke 2005).

There are two proposed incremental approximation models such as neural network-assisted approximation model and guided approximation model adopted in this

study.

Neural network-assisted approximation model for ES: NNA-ES

In this work, we adopt a (MLP) network with one hidden layer (Reed and Marks

1998) (refer to Fig. 5.5) for approximating the nonlinear constraints. Both the hidden

neurons and the output neurons use a tan-sigmoid transfer function. The number of

input nodes equals the number of parameters in the constrained function plus one

(a constant input as bias), the number of hidden nodes is set to three times that of the

input nodes, and the number of output node is one.

144

S. Oh and Y. Jin

X1

w1,1

w1,2

w2,1

X2

w1,1

w2,2

w2,1

wn,1

wn,2

Xn

L

x3

sin

exp

x1 0.5 0.2 x2

0.5

To obtain an adjustable approximation for constraint functions, we adopt a new

type of GP to replace the conventional GP whose chromosomes are represented by

nonlinear-style (i.e., the variable length), which causes a difficulty in applying the

crossover operator (Oh et al. 2009). For tackling this problem, each chromosome

of our GP, which is a candidate solution (i.e., the approximate model for the constraint), is stated as the linear strings by adding introns and selectors. Its expression

is termed as an expanded parse tree (EPT) which is shown in Fig. 5.6, where each

solid line and each dashed line are expressed as internal nodes and external nodes,

and the gray nodes indicate introns (Oh et al. 2009). The initial population are symbolized as a uniform distribution with two predefined sets, i.e., a functional set and

a terminal set. Elements of the former set consist of unary and binary functions

F = {+, , , , sin, cos, L , R}, where is a protected division operator which

allows the division by 0 as returning the value of 1, and L and R are selector operators as L(x1 , x2 ) = x1 and R(x1 , x2 ) = x2 , individually. The other set is composed

of design variables of the given COP such as {x1 , . . . , xn } and a random value (R)

within the range [0, 1]. Next, we evaluate the difference between the fitness of each

Parent

Offspring

sin

exp

x3 1 x1

0.5 0.2

+

x2 1

x1 x3

0.2

x1

x1 x2

0.5

x2 1

0.2

0.4

0.5

exp

0.2

Crossover

sin

x1 x3

0.5

145

exp

x3 1 x1

exp

0.2

x2

0.5

0.4

Offspring

Parent

Mutation point

L

x3

sin

exp

Mutation

x1 0.5 0.2 x2

+

1 0.5

sin

x1 x3 0.3 x1 0.2 x2

+

1 0.5

chromosome and the object of an constrained function in accordance with the given

inputs. On the basis of the fitness value of each individual, our GP operates the pairwise tournament selection without replacement to improve the average quality of the

population by passing the high quality chromosomes to the next. To explore the search

spaces, the variation operators (i.e., crossover and mutation), which are described in

Figs. 5.7 and 5.8, respectively, are applied on the selected chromosome(s). The GP

iterates two procedures including evaluation and genetic operators until a stopping

criterion is satisfied. At the end, the GP is able to obtain a robust approximation

of the original nonlinear constraint function. Based on the discovered approximate

constraints, we assemble synthesized constraints, which are created and used in the

SR selection.

146

S. Oh and Y. Jin

In this section, we compare the proposed incremental approximation approach

guided algorithms such as NN-assisted approximate approach for evolutionary

strategy (NNA-ES) and GP guided approximate method for evolutionary strategy

(GPA-ES) with a few state-of-the-art evolutionary algorithms for constraint handling on 13 benchmark functions in Sect. 5.3.1. We also compare our approach with

six recently reported evolutionary methods on a spring design optimization problems

in Sect. 5.3.2.

We carry out statistical analysis of the results on 13 benchmark functions widely

used in the literature. Table 5.1 describes each attribute of benchmark problems,

where n is the number of design variables, |F|/|S| is the proportion of the feasible

regions in the entire search spaces, the range of constraint types, and the number

of constraints: linear inequalities (LI), nonlinear inequalities (NI), linear equalities

(LE) and nonlinear equalities (NE), and a is the number of active constraints at the

optimum solution (Liang et al. 2006).

In the proposed algorithm, we update the approximate models of constraints

according

to the heuristically predefined generation such as tk = t0 + 10

k

(i

1)2 = {0, 10, 50, 140, 300, 550, 910}, where k is the updated time

i=1

k = {1, 2, 3, 4, 5, 6 , 7}, and t0 is an initial generation which is set to 0. During the rest generations, we only used the original constraints to guarantee that

the obtained solutions are feasible. At that time, we require the sampling training

Table 5.1 Summary of 13 benchmark functions

fcn

n

Type of f

|F|/|S| (%)

g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

g12

g13

13

20

10

5

4

2

10

2

7

8

2

3

5

Quadratic

Nonlinear

Polynomial

Quadratic

Cubic

Cubic

Quadratic

Nonlinear

Polynomial

Linear

Quadratic

Quadratic

Nonlinear

0.0111

99.9971

0.0000

52.1230

0.0000

0.0066

0.0003

0.8560

0.5121

0.0010

0.0000

4.7713

0.0000

LI

NI

LE

NE

9

0

0

0

2

0

3

0

0

3

0

0

0

0

2

0

6

0

2

5

2

4

3

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

3

0

0

0

0

0

1

0

3

6

1

1

2

3

2

6

0

2

6

1

0

3

147

data for updating our approximations, which are extracted by the prefixed rule

Nk = n j k 2 = {n j , 4n j , 9n j , 16n j , 25n j , 36n j , 49n j }, where n j is the number of design variables involved in the jth constraint function. Note that, if Nk = 1,

the minimum number of samples should be 2, and if Nk 200, the maximum

samples should be set to 200.

Our NNA-ES is used for training the MLP for 150 iterations every time when the

MLP network models need to be updated, where the learning rate is set to 0.1. Also,

the system parameters of GPA-ES are designated as the depth of tree is set to 4, the

size of population is equal to the number of sampled data, the maximum generations

are three times the size of population, and probabilities of crossover and mutation

are set to 1.0 and 0.5, respectively.

To study our performances, we utilize the state-of-the-art EAs, which are briefly

described below:

1. Self-adaptive fitness formulation (SAFF) employed the penalty function method

for solving the COPs, where infeasible solutions that have a high fitness value are

also favored in selection (Farmani and Wright 2005). In the SAFF, the infeasible

constraint violations were handled by the designed two-stage penalties.

2. Homomorphous mapping (HM) designed a special operator (i.e., decoders) to

discover the optimal solution in COPs. Thanks to these decoders, all solutions

were mapped into n-dimensional cube for maintaining feasible states (Koziel and

Michalewicz 1999).

3. Stochastic ranking evolutionary strategy (SRES) considered the separation between objective and constraints (Runarsson and Yao 2000). This algorithm utilized

the SR selection mechanism to balance objective and constraint violations directly

and explicitly in the optimization with the probabilistic factor to include infeasible

solutions.

4. Simple multi-membered evolutionary strategy (SMES) was also based on the

separated objective and constraint violations (Montes and Coello 2005). Its main

feature was to devise three diversity mechanisms: diversity mechanism, combined recombination, and reduction of the initial step sizes of ES. All designed

techniques were operated on the basis of the number of infeasible solutions in the

population.

5. Adaptive tradeoff model-based evolutionary strategy (ATMES) was proposed

for facilitating a more explicit tradeoff between objective and constraints (Wang

et al. 2008). It developed three different search techniques which were classified

by the feasibility ratio in the current population.

Table 5.2 presents the parameter setups of each compared algorithm. It shows the

size of population, the number of generations, and the number of fitness evaluations.

The proposed NNA-ES and GPA-ES discovered a better best result in six problems

(g02, g04, g05, g06, g07 and g09) and a similar best result in four problems

148

S. Oh and Y. Jin

Table 5.2 Parameter setups of the compared algorithms, where (, ) is the set of parent and

offspring

Population size

Generations

Fitness evaluations

SAFF (Farmani and Wright 2005)

HM (Koziel and Michalewicz 1999)

SRES (Runarsson and Yao 2000)

SMES (Montes and Coello 2005)

ATMES (Wang et al. 2008)

NNA-ES

GPA-ES

70

70

(30,200)

(100,300)

(50,300)

(30,200)

(30,200)

20,000

20,000

1,200

800

800

1,200

1,200

1,400,000

1,400,000

240,000

240,000

240,000

240,000

240,000

(g01, g03, g08 and g11). Our first algorithm found a better best result in g10 than

the SAFF; on the other hand, GPA-ES obtained a worse best result. In addition,

our algorithm reached better and similar solutions in a mean result in most of the

problems except for g04 and g06 in case of GPA-ES and NNA-ES, separately. No

comparisons were made with two functions, g12 and g13, since the results from

SAFF are not available.

All our algorithms obtained better solutions in a best result on all problems. The

proposed algorithms also obtained superior or comparable mean result, whereas

the HM found better solutions on two problems (g02 and g04) in this result. However,

we were not able to make the comparison on three problems (g05, g12, and g13) as

no results on these problems are available from HM.

Compared to SRES, GPA-ES could achieve better and similar best results on all

problems. In addition, it found a better mean result on six problems (g02, g06,

g07, g09, g10, and g13) and a similar result on five problems (g01, g03, g08, g11,

and g12). SRES only discovered two better mean results in g04 and g05.

The remaining NNA-ES could obtain superior or comparable to SRES in all cases

excluding four instances (i.e., g02, g05, g06, and g10). Besides, it discovered a better

or similar mean result on ten problems.

All proposed algorithms discovered four superior solutions on g05, g07, g09, g13,

and seven comparable solutions on g01, g03, g04, g06, g08, g11, and g12 to SMES

149

in a best result. Also, each of our algorithms such as NNA-ES and GPA-ES found

a competitive mean result on ten problems, respectively. Meanwhile, the SMES

discovered slightly better mean results in four functions of g04, g06, g09, and g10.

Especially, the mean value of SMES in g09 was much smaller than that of both of

them.

Compared to ATMES, the proposed NNA-ES found the same best solution in g09 of

the 13 test functions, and a better best solution in test function g10. Our algorithm

also achieved better mean and worst solution compared to ATMES in test function

g02.

The other GPA-ES achieved a similar best result on eleven functions (g01, g03,

g04, g05, g06, g07, g08, g09, g11, g12, and g13). The ATMES found better solution

in the best result on g10; on the other hand, we could achieve a better best result

on g02. GPA-ES also achieved a better mean solution compared to ATMES in the

function g02.

GPA-ES algorithm discovered four better solutions in g02, g05, g06, and g07 and

eight similar solutions in g01, g03, g04, g08, g09 g11, g12, and g13, respectively.

Only one worse solution was found in g10. From a mean result, we could discover

four better solutions and six similar solutions, respectively. In three problems (g02,

g04, and g05), NNA-ES achieved better solutions.

The best results as well as the mean results of GPA-ES and other compared

algorithms on the above 13 benchmark problems are summarized in Tables 5.3 and

5.4, respectively. From these results, we could verify the performance of the proposed

approach. However, our algorithm could not find better solutions than other compared

approaches in two test functions of g04 and g10.

In addition to the test problems, we compare two kinds of the proposed

algorithms of NNA-ES and GPA-ES with six novel heuristic approaches utilizing

various constraint-handling techniques of a spring design optimization problem. The

reference algorithms are described as below:

g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

g12

g13

15.000

0.803619

1.0000

30665.539

5126.498

6961.814

24.306

0.095825

680.630

7049.331

0.75

1.000

0.053950

15.000

0.802970

1.0000

30665.500

5126.989

6961.800

24.480

0.095825

680.640

7061.340

0.75

N.A

N.A

Available

fcn

Optimum

SAFF

(Farmani and

Wright 2005)

14.786

0.799530

0.9997

30664.500

N.A

6952.100

24.620

0.095825

680.910

7147.900

0.75

N.A

N.A

HM

(Koziel and Michalewicz

1999)

15.000

0.803481

1.0000

30665.539

5126.498

6961.814

24.314

0.095825

680.633

7053.064

0.75

1.000

0.054008

SRES

(Runarsson

and Yao 2000)

15.000

0.803601

1.0000

30665.539

5126.599

6961.814

24.327

0.095825

680.632

7051.903

0.75

1.000

0.053986

SMES

(Montes and

Coello 2005)

15.000

0.803388

1.0000

30665.539

5126.498

6961.814

24.306

0.095825

680.630

7052.253

0.75

1.000

0.053950

ATMES

(Wang et al.

2008)

15.000

0.803185

1.0000

30665.539

5126.505

6961.807

24.309

0.095825

680.630

7056.710

0.75

1.000

0.053950

NNA-ES

15.000

0.803532

1.0000

30665.539

5126.498

6961.814

24.306

0.095825

680.630

7081.948

0.75

1.000

0.053950

GPA-ES

obtained by the proposed GAP-EA as well as five references on 13 benchmark functions, where N.A. = Not

150

S. Oh and Y. Jin

15.000

0.790148

1.0000

30665.539

15.000

0.803619

1.0000

30665.539

g01

g02

g03

g04

1.60.E 14

1.30.E 02

5.90.E 05

7.40.E 12

15.000

0.00.E + 00

0.790100 1.20.E 02

0.9999

7.50.E 05

30665.200 4.85.E 01

5432.08

3.89.E + 03

6961.800 0.00.E + 00

26.580

1.14.E + 00

0.095825 0.00.E + 00

680.720

5.92.E 02

7627.890

3.73.E + 02

0.75

0.00.E + 00

N.A

N.A

N.A

N.A

ATMES (Wang et al. 2008)

Mean

St. dev

15.000

0.803619

1.0000

30665.539

5126.498

6961.814

24.306

0.095825

680.630

7049.331

0.75

1.000

0.053950

Optimal

g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

g12

g13

fcn

15.000

0.794128

1.0000

30665.539

14.708

0.796710

0.9989

30655.300

N.A

6342.600

24.826

0.0891568

681.160

8163.600

0.75

N.A

N.A

NNA-ES

Mean

0.00.E + 00

8.04.E 03

1.90.E 04

2.05.E 04

St. dev

N.A

N.A

N.A

N.A

N.A

N.A

N.A

N.A

N.A

N.A

N.A

N.A

N.A

15.000

0.791084

1.0000

30648.853

15.000

0.775346

1.0000

30665.525

5132.882

6875.442

24.364

0.095825

680.658

7472.902

0.75

1.000

0.083290

GPA-ES

Mean

6.29.E 07

8.03.E 03

1.35.E 05

4.98.E + 01

St. dev

0.00.E + 00

2.35.E 02

2.90.E 04

6.32.E 02

8.61.E + 00

1.53.E + 02

5.59.E 02

2.82.E 17

4.20.E 02

4.20.E 02

4.20.E 02

0.00.E + 00

9.70.E 02

Table 5.4 Comparison of the mean results obtained by the proposed GAP-EA as well as five references on 13 Benchmark

Available

fcn Optimum

SAFF (Farmani and Wright 2005) HM (Koziel and Michalewicz 1999) SRES (Runarsson and Yao

2000)

Mean

St. dev

Mean

St. dev

Mean

St. dev

15.000

0.785238

1.0000

30665.539

5174.492

6961.284

24.475

0.095825

680.643

7253.047

0.75

1.000

0.166385

(continued)

0.00.E + 00

1.67.E 02

2.09.E 04

0.00.E + 00

5.01.E + 01

1.85.E + 00

1.32.E 01

0.00.E + 00

1.55.E 02

1.36.E + 02

1.52.E 04

0.00.E + 00

1.77.E 01

2005)

Mean

St. dev

151

g05

g06

g07

g08

g09

g10

g11

g12

g13

5126.498

6961.814

24.306

0.095825

680.630

7049.331

0.75

1.000

0.053950

5127.648

6961.814

24.316

0.095825

680.639

7250.437

0.75

1.000

0.053959

1.80.E 14

4.60.E 12

1.10.E 02

2.80.E 17

1.00.E 02

1.20.E + 02

3.40.E 04

1.00.E 03

1.30.E 05

fcn

Optimal

ATMES (Wang et al. 2008)

Mean

St. dev

5133.481

6758.018

24.327

0.095825

680.648

7409.876

0.75

1.000

0.091730

NNA-ES

Mean

9.05.E + 00

1.62.E + 02

2.01.E 02

2.82.E 17

2.74.E 02

4.38.E + 02

9.80.E 04

0.00.E + 00

9.95.E 02

St. dev

5152.634

6961.814

24.315

0.095825

680.648

7342.196

0.75

1.000

0.054024

GPA-ES

Mean

St. dev

4.14.E + 01

4.63.E 12

1.83.E 02

2.82.E 17

2.23.E 02

2.25.E + 02

1.25.E 03

4.10.E 05

1.40.E 04

152

S. Oh and Y. Jin

153

of a fitness function combined with a GA to find the optimal solution (Coello

2000).

2. GA2 proposed the separate consideration between objective and constraint

violations using the pair-wise tournament selection mechanism (Coello and

Montes 2002).

3. HE-PSO suggested a new particle swarm optimization (PSO) for solving COPs as

adopting the death penalty mechanism, which did not use all infeasible solutions

during whole procedures (Hu et al. 2003).

4. CPSO proposed co-evolution based PSO algorithm to provide a framework for

handling decision solutions and constraints (He and Wang 2007a). The aim of

this algorithm was to search for the optimal solutions and penalty factors.

5. HPSO utilized the feasibility-based rule to manage constraints without additional

parameters and to guide the particles into the feasible region, quickly (He and

Wang 2007b). In addition, a simulated annealing (SA) was applied on the best

solution for avoiding the premature convergence.

6. NM-PSO integrated the Nelder-Mead (NM) simplex search method with PSO

algorithm (Zahara and Kao 2009). This algorithm took on the special operators,

i.e., the gradient repair method and the constraint fitness priority-based ranking,

to convert infeasible solutions into feasible ones.

The problem taken from Arora is to minimize the weight of a tension/compression

spring subject to constraints of minimum deflection, shear stress, surge frequency,

and limits on outside diameter and on design variables which are set to the wire

diameter 0.05 x1 2.0, the mean coil diameter 0.25 x2 1.3 and the number

of active coils 2.0 x3 15.0.

minimize

f (x) = (x3 + 2)x12 x2

(5.11)

subject to

g1 (x) = 1

g2 (x) =

x23 x3

71785x14

4x22 x1 x2

12566(x13 x2 x14 )

1

5108x12

10

140.45x1

0

x22 x3

x1 +x2

1.5 1 0.

g3 (x) = 1

g4 (x) =

(5.12)

Table 5.5 illustrates their statistical results such as best, mean, worst, and standard

deviation outcomes from whole algorithms. It can be seen in Table 5.5 that the performance of GPA-ES is even better than those of our compared algorithms, and our

worst solution is smaller than the optimal values of the compared ones.

To sum up the experimental results and comparisons of the above three engineering

optimization problems, we could verify the superiority of the proposed incremental

approximation-assisted algorithms.

154

S. Oh and Y. Jin

Table 5.5 The comparison of the statistics on tension/compression spring optimization problem

Method

Best

Mean

Worst

St. dev

GA1 (Coello 2000)

GA2 (Coello and Montes 2002)

HE-PSO (Hu et al. 2003)

CPSO (He and Wang 2007a)

HPSO (He and Wang 2007b)

NM-PSO (Zahara and Kao 2009)

NNA-ES

GPA-ES

0.0127048

0.0126810

0.0126661

0.0126747

0.0126652

0.0126302

0.0098725

0.0098725

0.0127690

0.0127420

0.0127190

0.0127300

0.0127072

0.0126314

0.0098741

0.0098725

0.0128220

0.0129730

N.A

0.0129240

0.0127190

0.0126330

0.0098930

0.0098725

3.94.E 05

5.90.E 05

6.45.E 05

5.20.E 04

1.58.E 05

8.74.E 07

4.69E 06

9.87.E 03

5.4 Conclusion

This chapter has presented the new evolutionary algorithm for solving COPs. We

particularly targeted problems that are highly constrained and thus the feasible

regions are small and separated. To methodically solve problems caused by an

extremely low degree of feasibility, we suggested the incremental approximation

models. Thanks to a manipulated, gradually increasing feasible region managed by

the approximate constraints, we could handle the highly constrained problems more

effectively. We have empirically compared our approach with a few state-of-the-art

algorithms for handling COPs on 13 benchmark problems and one engineering optimization problem. As a whole, the proposed method has shown to be promising as

they produced better or comparable results on most test problems.

Acknowledgments The authors would like to thank Chang Wook Ahn for useful discussions.

References

Adeli H, Cheng N-T (1994) Augmented Lagrangian genetic algorithm for structural optimization.

J Aerosp Eng 7:104118

Bean J (1994) Genetic algorithms and random keys for sequencing and optimization. ORSA J

Comput 6:154160

Coello CAC (2000) Use of a self-adaptive penalty approach for engineering optimization problems.

Comput Ind 41(2):113127

Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191

(1112):12451287

Coello CAC, Montes EM (2002) Constraint-handling in genetic algorithms through the use of

dominance-based tournament selection. Adv Eng Inform 16(3):193203

Das S, Suganthan P (2011) Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol

Comput 15(1):431

Davis LD, Mitchell M (eds) (1991) Handbook of genetic algorithms. Van Nostrand Reinhold, New

York

155

Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the

traveling salesman problem. IEEE Trans Evol Comput 1:5366

Farmani R, Wright J (2005) Self-adaptive fitness formulation for constrained optimization. IEEE

Trans Evol Comput 7(5):445455

Glover F, Kochenberger G (1996) Critical event tabu search for multidimensional knapsack problems. Meta heuristics: theory and applications. Kluwer Academic Publishers, Dordrecht

Goh C, Lim D, Ma L, Ong Y, Dutta P (2011) A surrogate-assisted memetic co-evolutionary algorithm

for expensive constrained optimization problems. In: IEEE congress on evolutionary computation,

pp 744749

Hamida S, Schoenauer M (2002) ASCHEA: new results using adaptive segregational constraint

handling. In: Proceedings of IEEE conference on evolutionary computation 2002. Honolulu,

Hawaii, pp 8287

He Q, Wang L (2007a) An effective co-evolutionary particle swarm optimization for constrained

engineering design problems. Eng Appl Artif Intell 20(1):8999

He Q, Wang L (2007b) A hybrid particle swarm optimization with a feasibility-based rule for

constrained optimization. Appl Math Comput 186(2):14071422

Hu X, Eberhart R, Shi Y (2003) Engineering optimization with particle swarm. In: Proceedings of

the IEEE swarm intelligence symposium 2003 (SIS 2003). Indianapolis, Indiana, pp 5357

Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges.

Swarm Evol Comput 1(2):6170

Jin Y, Branke J (2005) Evolutionary optimization in uncertain environmentsa survey. IEEE Trans

Evol Comput 9:303317

Jin Y, Oh S, Jeon M (2010) Incremental approximation of nonlinear constraints functions for evolutionary constrained optimization. In: Proceedings of IEEE conference on evolutionary computation 2010 (CEC 2010), Barcelona, Spain, pp 18

Jin Y, Tang K, Yu X, Sendhoff B, Yao X (2013) A framework for finding robust optimal solutions

over time. Memet Comput 5(1):318

Kowalczyk R (1997) Constraint consistent genetic algorithms. In: Proceedings of IEEE international

conference on evolutionary computation. Indianapolis, pp 343348

Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):1944

Le TV (1995) A fuzzy evolutionary approach to constrained optimization problems. In: Proceedings

of parallel problem solving form nature, vol 274278. Perth

Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello CAC, Deb K (2006)

Problem definitions and evaluation criteria for the CEC 2006 special session on constrained

real-parameter optimization. Technical report, Nanyang Technological University, Singapore

Michalewicz Z (1996) Genetic algorithms + data structures = evolution programs. Springer, New

York

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4:132

Montes EM, Coello CAC (2005) A simple multimembered evolution strategy to solve constrained

optimization problems. IEEE Trans Evol Comput 9(1):117

Nguyen T, Yang S, Branke J (2012) Evolutionary dynamic optimization: a survey of the state of the

art. Swarm Evol Comput 6:124

Oh S, Lee S, Jeon M (2009) Evolutionary optimization programming with probabilistic models. In:

International conference on bio-inspired computing. Beijing, P.R. China, pp 16

Oh S, Jin Y, Jeon M (2011) Approximate models for constraint functions in evolutionary constrained

optimization. Int J Innov Comput, Inf Control 7(11):65856603

Paenke I, Branke J, Jin Y (2006) Efficient search for robust solutions by means of evolutionary

algorithms and fitness approximation. IEEE Trans Evol Comput 10(4):405420

Reed RD, Marks RJ (1998) Neural smithing: supervised learning in feedforward artificial neural

networks. MIT Press, Cambridge

156

S. Oh and Y. Jin

optimization using radial basis functions. IEEE Trans Evol Comput 18(3):326347

Reynolds RG (1994) An introduction to cultural algorithms. In: Proceedings of third annual conference on evolutionary programming. World Scientific, River Edge, pp 131139

Runarsson T, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE

Trans Evol Comput 4(3):284294

Smith RE, Forrest S, Perelson AS (1993) Searching for diverse, cooperative populations with genetic

algorithms. Evol Comput 1:127149

Wang Y, Cai Z, Zhou Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary

optimization. IEEE Trans Evol Comput 12(1):8092

Wanner E, Guimaraes F, Takahashi RSR, Fleming P (2005) Constraint quadratic approximation

operator for treating equality constraints with genetic algorithms. In: IEEE congress on evolutionary computation, pp 22552262

Zahara E, Kao Y-T (2009) Hybrid Nelder-Mead simplex search and particle swarm optimization

for constrained engineering design problems. Expert Syst Appl 36(2):38803886

Chapter 6

by the Constrained Differential Evolution

with Rough Approximation

Tetsuyuki Takahama and Setsuko Sakai

approximation model with low accuracy and without learning process, to reduce the

number of function evaluations in unconstrained optimization. Although the approximation errors between true function values and the approximation values estimated

by the rough approximation model are not small, the rough model can estimate the

order relation of two points with fair accuracy. The estimated comparison, which

omits the function evaluations when the result of the comparison can be judged by

the approximation values, proposed to use this nature of the rough model. In this

chapter, a constrained optimization method is proposed by combining the constrained method and the estimated comparison, where rough approximation is used

not only for an objective function but also for constraint violation. The proposed

method is an efficient constrained optimization algorithm that can find near-optimal

solutions in a small number of function evaluations. The advantage of the method is

shown by solving well-known nonlinear constrained problems.

Keywords Rough approximation model Constrained optimization constrained

method Estimated comparison Differential evolution

6.1 Introduction

Constrained optimization problems, especially nonlinear optimization problems,

where objective functions are minimized under given constraints, are important

and frequently appear in the real world. There exist several studies on solving

T. Takahama (B)

Hiroshima City University, 3-4-1 Ozuka-higashi, Asaminami-ku,

Hiroshima 731-3194, Japan

e-mail: takahama@info.hiroshima-cu.ac.jp

S. Sakai

Hiroshima Shudo University, 1-1-1 Ozuka-higashi, Asaminami-ku,

Hiroshima 731-3195, Japan

e-mail: setuko@shudo-u.ac.jp

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_6

157

158

2002; Mezura-Montes and Coello 2011; Michalewicz 1995; Takahama and Sakai

2005a). EAs basically lack the mechanism to incorporate the constraints of a given

problem in the fitness value of individuals. Thus, numerous studies have been dedicated to handle the constraints in EAs. In most successful constraint-handling techniques, the objective function value and the sum of constraint violations, or the

constraint violation, are separately handled and an optimal solution is searched by

balancing the optimization of the function value and the optimization of the constraint

violation.

The constrained differential evolution (DE) has been proposed, which adopted

one of such techniques called the constrained method and also adopted differential

evolution (DE) as an optimization engine. The DE can solve constrained problems successfully and stably (Takahama and Sakai 2006, 2009b, 2010a, b), including engineering design problems (Takahama and Sakai 2006). The constrained

method (Takahama and Sakai 2009b) is an algorithm transformation method, which

can convert algorithms for unconstrained problems into algorithms for constrained

problems using the level comparison and compares search points or individuals

based on the pair of objective value and their constraint violation. It has been shown

that the method has general-purpose properties.

Generally, a disadvantage of EAs is that they need a large number of function

evaluations before a well-acceptable solution can be found. An effective method for

reducing function evaluations is to build an approximation model for the objective

function and to solve the problem using the approximation values (Jin 2005). If an

approximation model with high accuracy can be built, it is possible to largely reduce

the function evaluations. However, building a high quality approximation model is

difficult and time-consuming. It needs to learn the model from many pairs of known

solutions and their function value. Also, a proper approximation model depends on

the problem to be optimized. It is difficult to design a general-purpose approximation

model with high accuracy.

An approximation model has been proposed to utilize with low accuracy and without learning process to reduce the number of function evaluations effectively. In the

following, the approximation model is called a rough approximation model. Although

the approximation errors between the true function values and the approximation values estimated by the rough approximation model are not small, the approximation

model can estimate whether the function value of a point is smaller than that of the

other point or not with fair accuracy. For example, Fig. 6.1 shows a correct order relation even when the errors between the true values and the approximation values are

large. In order to use this nature of the rough approximation model, estimated comparison (Sakai and Takahama 2010; Takahama and Sakai 2008a, b, 2009a, 2010c)

for unconstrained optimization has been proposed.

In the estimated comparison, the approximation values are compared first. When

a value is worse than the other value, the estimated comparison returns an estimated

result without evaluating the true function. When it is difficult to judge the result from

the approximation values, true values are obtained by evaluating the true function

and the estimated comparison returns a true result based on the true values. Using

159

large error

correct

order relation

the estimated comparison, the evaluation of the true function is sometimes omitted

and the number of function evaluations can be reduced.

In this chapter, the estimated comparison is applied to constrained optimization and DEpm , which is a combination of the constrained method and the estimated comparison (Takahama and Sakai 2013) using a potential model defined and

improved by approximating not only the objective function but also the constraint

violation. The potential model without learning process is adopted as a rough approximation model (Takahama and Sakai 2008b). DEpm is an efficient constrained optimization algorithm that can find near-optimal solutions in a small number of function

evaluations. The effectiveness of DEpm is shown by solving well-known 13 constrained problems mentioned in Coello (2002) and comparing the results of DEpm

with those of representative methods. It is shown that DEpm can solve problems

with a much smaller, about half, number of function evaluations compared with the

representative methods.

In Sect. 6.2, constrained optimization methods and approximation methods are

reviewed. The constrained method and the estimated comparison using the potential

model are explained in Sects. 6.3 and 6.4, respectively. The DEpm is described in

Sect. 6.5. In Sect. 6.6, experimental results on 13 constrained problems are shown and

the results of DEpm are compared with those of other methods. Finally, conclusions

are described in Sect. 6.7.

6.2.1 Constrained Optimization Problems

In this study, the following optimization problem (P) with inequality constraints,

equality constraints, upper bound constraints, and lower bound constraints are

discussed.

160

subject tog j (x) 0, j = 1, . . . , q

h j (x) = 0, j = q + 1, . . . , m

li xi u i , i = 1, . . . , n,

(6.1)

g j (x) 0, and h j (x) = 0 are q inequality constraints and m q equality constraints,

respectively. Functions f, g j and h j are linear or nonlinear real-valued functions.

Values u i and li are the upper bound and lower bound of xi , respectively. Also, let

the feasible space in which every point satisfies all constraints be denoted by F and

the search space in which every point satisfies the upper and lower bound constraints

be denoted by S ( F).

EAs for constrained optimization can be classified into several categories according

to the way the constraints are treated as follows (Takahama and Sakai 2005a):

1. Constraints are only used to see whether a search point is feasible or not.

Approaches in this category are usually called death penalty methods. In this category, generating initial feasible points is difficult and computationally demanding

when the feasible region is very small.

2. The constraint violation, which is the sum of the violation of all constraint

functions, is combined with the objective function. The penalty function method

belongs to this category (Coello 2000b; Homaifar et al. 1994; Joines and Houck

1994; Michalewicz and Attia 1994). The main difficulty of the method is the selection of an appropriate value for the penalty coefficient that adjusts the strength

of the penalty. In order to solve the difficulty, some methods, where a kind of the

penalty coefficient is adaptively controlled (Tessema and Yen 2006; Wang et al.

2008), are proposed.

3. The constraint violation and the objective function are used separately. In this

category, both the constraint violation and the objective function are optimized

by a lexicographic order in which the constraint violation precedes the objective function. Deb (2000) proposed a method that adopts the extended objective

function and realizes lexicographic ordering. Takahama and Sakai proposed the

constrained method (Takahama and Sakai 2000) and constrained method

(Takahama and Sakai 2005b) that adopt a lexicographic ordering with relaxation

of the constraints. Runarsson and Yao (2000) proposed the stochastic ranking

method that adopts the stochastic lexicographic order which ignores the constraint

violation with some probability. Mezura-Montes and Coello (2005) proposed a

comparison mechanism that is equivalent to lexicographic ordering. Venkatraman

and Yen (2005) proposed a two-step optimization method, which first optimizes

constraint violation and then objective function. These methods were successfully

applied to various problems.

161

4. Every constraint and objective function are used separately. In this category,

constrained optimization problems are solved as multi-objective optimization

problems in which the objective function and the constraint functions are objectives to be optimized (Aguirre et al. 2004; Camponogara and Talukdar 1997;

Coello 2000a; Ray et al. 2002; Runarsson and Yao 2003; Surry and Radcliffe

1997; Wang et al. 2007). However, in many cases solving a constrained problem

as a mult-iobjective optimization problem is a more difficult and expensive task

than solving the constrained problem as essentially a single objective optimization

problem in categories 1, 2, and 3.

5. Hybridization methods. In this category, constrained problems are solved by combining some of the above-mentioned methods. Mallipeddi and Suganthan (2010)

proposed a hybridization of the methods in categories 2, 3, and 4.

In this section, EAs using approximation models are briefly reviewed.

Various approximation models are utilized to approximate the objective function.

In most approximation models, model parameters are learned by the least square

method, gradient method, maximum likelihood method, and so on. In general, learning model parameters is a time-consuming process, especially to obtain models

with higher accuracy and models of larger functions such as functions with large

dimensions.

EAs with approximation models can be classified as follows:

1. All individuals have only approximation values. A high quality approximation

model is built and the objective function is optimized using approximation values

only. It is possible to reduce function evaluations greatly. However, these methods

can be applied to well-informed objective function and cannot be applied to

general problems.

2. Some individuals have approximation values and others have true values. The

methods in this type are called evolution control approaches and can be classified as individual-based and generation-based control. Individual-based control

means that good individuals (or randomly selected individuals) use true values

and others use approximation values in each generation (Jin et al. 2000; Jin and

Sendhoff 2004). Generation-based control means that all individuals use true values once in a fixed number of generations and use approximation values in other

generations (Jin et al. 2000, 2002). In the approaches, the approximation model

should be accurate because the approximation values are compared with the true

values. Also, it is known that approximation models with high accuracy sometimes generate a false optimum or hide a true optimum. Individuals may converge

into the false optimum while they are optimized using the approximation models

in some generations. Thus, these approaches are much affected by the quality of

the approximation models. It is difficult to utilize rough approximation models.

162

3. All individuals have true values. Some methods in this type are called surrogate

approaches. In surrogate approaches, an estimated optimum is searched using an

approximation model called a surrogate model, which is usually a local model.

The estimated optimum is evaluated, the true value is obtained, and the true value

is also used to improve the approximation model (Bche et al. 2005; Guimares

et al. 2006; Ong et al. 2006). If the true value is good, the value is included

as an individual. In the approaches, rough approximation models might be used

because approximation values are compared with other approximation values.

These approaches are less affected by the quality of the approximation model than

the evolution control approaches. However, they have the process of optimization

using the approximation model only. If the process is repeated many times, they

are much affected by the quality of the approximation model.

The estimated comparison method is classified in the last category because all

individuals have true values. However, the method is different from the surrogate

approaches. It uses a global approximation model of current individuals using the

potential model. It does not search for the estimated optimum, but judges whether

a new individual is worth evaluating its true value or not. Also, it can specify the

margin of approximation error when comparison is carried out. Thus, it is not much

affected by the quality of the approximation model.

6.3.1 Constraint Violation and Level Comparisons

In the constrained method, constraint violation (x) is defined. The constraint

violation can be given by the maximum of all constraints or the sum of all constraints.

(x) = max{ max {0, g j (x)},

1 jq

(x) =

q

j=1

max

|h j (x)|}

(6.2)

||h j (x)|| p

(6.3)

q +1 j m

||max{0, g j (x)}|| p +

m

j=q+1

The -level comparison is defined as an order relation on a pair of objective function value and constraint violation ( f (x), (x)). If the constraint violation of a point

is greater than 0, the point is not feasible and its worth is low. The level comparisons

are defined basically as a lexicographic order in which (x) precedes f (x), because

the feasibility of x is more important than the minimization of f (x). This precedence

can be adjusted by the parameter .

163

point x1 (x2 ), respectively. Then, for any satisfying 0, level comparisons

< and between ( f 1 , 1 ) and ( f 2 , 2 ) are defined as follows:

f 1 < f 2 , if 1 , 2

( f 1 , 1 ) < ( f 2 , 2 ) f 1 < f 2 , if 1 = 2

1 < 2 , otherwise

f 1 f 2 , if 1 , 2

( f 1 , 1 ) ( f 2 , 2 ) f 1 f 2 , if 1 = 2

1 < 2 , otherwise

(6.4)

(6.5)

In case of = , the level comparisons < and are equivalent to the ordinary

comparisons < and between function values. Also, in case of = 0, <0 and

0 are equivalent to the lexicographic orders in which the constraint violation (x)

precedes the function value f (x).

The constrained method converts a constrained optimization problem into an unconstrained one by replacing the order relation in direct search methods with the level

comparison. An optimization problem solved by the constrained method, that is,

a problem (P ) in which the ordinary comparison is replaced with the level comparison, is defined as follows:

(P ) minimize f (x),

(6.6)

Also, a problem (P ) is defined such that the constraints of (P), that is, (x) = 0, is

relaxed and replaced with (x) :

(P ) minimize f (x)

subject to (x)

(6.7)

where (P0 ) is equivalent to (P) because the feasible solution satisfies (x) = 0.

For the three types of problems, (P ), (P ) and (P), the following theorems are

given based on the constrained method (Takahama and Sakai 2005b).

Theorem 1 If an optimal solution (P0 ) exists, any optimal solution of (P ) is an

optimal solution of (P ).

Theorem 2 If an optimal solution of (P) exists, any optimal solution of (P0 ) is an

optimal solution of (P).

Theorem 3 Let {n } be a strictly decreasing nonnegative sequence and converge

to 0. Let f (x) and (x) be continuous functions of x. Assume that an optimal solution

164

x of (P0 ) exists and an optimal solution x n of (Pn ) exists for any n . Then, any

accumulation point to the sequence {xn } is an optimal solution of (P0 ).

Theorems 1 and 2 show that a constrained optimization problem can be converted

into an equivalent unconstrained optimization problem by using the level comparison. So, if the level comparison is incorporated into an existing unconstrained

optimization method, constrained optimization problems can be solved. Theorem 3

shows that, in the constrained method, an optimal solution of (P0 ) can be given by

converging to 0 as well as by increasing the penalty coefficient to infinity in the

penalty method.

Optimization

The potential model is explained as a rough approximation model and the estimated

comparison method is described (Sakai and Takahama 2010; Takahama and Sakai

2008a, b, 2009a, 2010c).

Potential energy is stored energy that depends on the relative position of various parts

of a system. The gravity potential energy is an example of potential energy. If there

is an object of which mass is m, there exists gravity potential energy E g around the

object. If there is another object of which mass is m at a distance r from the object,

there exists the attractive force Fg between two objects.

E g = G

m

mm

, Fg = G 2

r

r

(6.8)

It is supposed that when a solution x exists, there is potential for objective U f and

potential for congestion Uc at a distance r from the solution as follows:

f (x)

r pd

1

Uc = p

r d

Uf =

(6.9)

(6.10)

for simplicity.

When a set of solutions X = {x1 , x2 , . . . , x N } are given and the objective values

f (xi ), i = 1, 2, . . . , N are known, two potential functions at a point y can be defined

as follows:

U f (y) =

Uc (y) =

165

(6.11)

f (xi )

d(xi , y) pd

(6.12)

1

d(xi , y) pd

It is obvious that U f shows a measure of the function value at y and Uc shows the

congestion of the point y . If U f is big, the function value tends to be big. If Uc is

big, there are many points near the point.

The approximation value f(y) at the point y can be defined as follows:

f(y) = U f (y)/Uc (y)

(6.13)

The estimated comparison is used to compare a new point with an old point. If the

new point is better than the old according to the approximation values, the new point

is evaluated and the comparison result using true values is returned. Otherwise, the

comparison returns no and the evaluation of the new one can be omitted. This flow

can be described as follows:

EstimatedBetter(new, old) {

if(MaybeBetter(approximated new, approximated old)) {

Evaluate new;

if(Better(true new, true old)) return yes;

}

return no;

}

When the true function values ( f (xi ), (xi )) of all points in P = {xi , i =

1, 2, , N } are known and a new child point xi is generated from a parent point xi ,

the approximation values at points xi are given as follows:

U f (xi ) =

Uc (xi ) =

(6.14)

j =i

f (x j )

d(x j , xi )

(6.15)

j =i

1

d(x j , xi )

(6.16)

166

Also, the approximation values of the constraint violation at the points xi and xi

are given as follows:

( )

U (xi ) =

(x j )

j =i

d(x j , xi )

(6.17)

( )

(x

i

i

i

(6.18)

It should be noted that the parent point xi ( j = i) is omitted in the equation. If the

parent point is not omitted, the approximation value of the parent point becomes an

almost true value. As a result, the difference between the precision of approximation

at the parent point and that at the child point becomes big, and it is difficult to compare

the approximation values.

When search points are far from the feasible region, the -level comparison precedes the constraint violations. In this case, the constraint violation values are approximated. When search points are near the feasible region, the -level comparison precedes the objective values. In this case, the objective values are approximated. The

far case and the near case are judged by the number of feasible solutions. In this study,

the near case is identified when the ratio of feasible solutions in the population is

greater than or equal to 0.8. The estimated comparison for constrained optimization

using the constrained method can be defined as follows:

EstimatedBetter (xi , xi , ) {

if(the number of feasible solutions 0.8N ) {

// approximation of objective function

if( f(xi ) < f(xi ) + ) {

Evaluate xi ;

if(( f (xi ), (xi )) < ( f (xi ), (xi )))

return yes;

}

}

else {

// approximation of constraint violation

i ) + 2|(xi ) (x

i )|) {

) < (x

if((x

i

Evaluate xi ;

if(( f (xi ), (xi )) < ( f (xi ), (xi )))

return yes;

}

}

return no;

}

where the true value at the parent point ( f (xi ), (xi )) is known. In this study, the error

margin for the objective value is defined based on the error level of the population.

In contrast, the error margin for the constraint violation is defined based on the error

167

level of each individual because it is thought that feasible solutions and infeasible

solutions have different error levels. The error margin parameter 0 controls the

margin value for the approximation error. When is 0, the estimated comparison

can reject many children and omit a large number of function evaluations. However,

the possibility of rejecting good child becomes high and a true optimum sometimes

might be skipped. When is large, the possibility of rejecting good child becomes

low. However, the estimated comparison can reject fewer children and omit a small

number of function evaluations. Thus, should have a proper value.

The estimation error can be given as the standard deviation of errors between

approximation values and true values.

=

1

(ei e)

2

N

(6.19)

1

ei

ei = f(xi ) f (xi ), e =

N

(6.20)

In potential model, current population P is used as the set of solutions that have

known objective values. When searching process progresses, the area where individuals exist may become elliptical. In order to handle such a case, the normalized

distance is introduced, in which the distance is normalized by the width of each

dimension in the current population P.

d(x, y) =

j

maxxi P

x j yj

xi j minxi P xi j

2

(6.21)

In this section, DE is described first and then the constrained DE with estimated

comparison using potential model (DEpm ) is defined.

Differential evolution was proposed by Storn and Price (1997). DE is a stochastic direct search method which uses population or multiple search points. DE

has been successfully applied to optimization problems including nonlinear, nondifferentiable, non-convex, and multi-modal functions. It has been shown that DE is

fast and robust to these functions.

There are some variants of DE that have been proposed and the variants are

classified using the notation DE/base/num/cr oss such as DE/rand/1/exp. base

168

indicates the method of selecting a parent that will form the base vector. For example, DE/rand selects the parent for the base vector at random from the population.

DE/best selects the best individual in the population. In DE/rand/1, for each individual xi , three individuals x p1 , x p2 and x p3 are chosen from the population without

overlapping xi and each other. A new vector, or a mutant vector xm is generated by

the base vector x p1 and the difference vector x p2 x p3 as follows, where F is a

scaling factor.

(6.22)

xm = x p1 + F(x p2 x p3 )

num indicates the number of difference vectors used to perturb the base vector. cr oss indicates the crossover operation used to create a child. For example,

bin shows that the crossover is controlled by binomial crossover using constant

crossover rate, and exp shows that the crossover is controlled by a kind of twopoint crossover using exponentially decreasing the crossover rate. A new child xi

is generated from the parent xi and the mutant vector xm , where CR is a crossover

rate.

DEpm is the DE that adopts the constrained method and the estimated comparison

using potential model.

The algorithm of the DEpm is as follows:

1. Initialization of the individuals. Initial N individuals {xi , i = 1, 2, . . . , N } are

randomly generated in search space S and form an initial population. All individuals are evaluated and true values are obtained.

2. Initialization of the level. An initial level is given by the level control function

(0).

3. Termination condition. If the number of function evaluations exceeds the maximum number of evaluations F E max , the algorithm is terminated.

4. DE operation. Each individual xi is selected as a parent. A trial vector or a child xi

is generated by DE/rand/1/exp operation with a scaling factor F and a crossover

rate CR.

5. Survivor selection. The estimated comparison is used for comparing the trial

vector and the parent. The child xi is accepted for the next generation if the trial

vector is better than the parent xi by using the estimated comparison. Until all

individuals are selected, go back to 4 in order to select the next individual as a

parent.

6. Control of the level. The level is updated by the level control function (t).

7. Go back to 3.

169

The level is controlled according to Eqs. (6.23) and (6.24). The initial level (0)

is the constraint violation of the top th individual in the initial search points. The

level is updated until the number of iterations t becomes the control generation Tc .

After the number of iterations exceeds Tc , the level is set to 0 to obtain solutions

with the minimum constraint violation.

(0) = (x )

(0)(1

(t) =

0,

(6.23)

t cp

Tc ) ,

0 < t < Tc ,

t Tc

(6.24)

Small and large cp make the convergence to the feasible region fast although the

fast convergence would result in trapping a local optimal solution. = 0.2N and

cp = 5 are standard parameter values adopted in many studies (Takahama and Sakai

2006, 2010a; Takahama et al. 2006). This control is effective to solve problems with

equality constraints.

Figure 6.2 shows the algorithm of the DEpm .

DEpm /rand/1/exp()

{

// Initialize the individuals

P =N individuals {xi } randomly generated in S and are evaluated;

// Initialize the level

=(0);

for(t=1; termination condition is false; t++) {

=estimation of approximation error in P ;

for(i=1; i N ; i++) {

xi =generated by DE/rand/1/exp operation;

// estimated comparison

if(EstimatedBetter (xi , xi , )) xi =xi ;

}

// Control the level

=(t);

}

}

Fig. 6.2 The algorithm of the constrained differential evolution with estimated comparison using

potential model, where (t) is the level control function

170

Thirteen benchmark problems that are mentioned in some studies (Mezura-Montes

and Coello 2005; Runarsson and Yao 2000; Takahama and Sakai 2005a) are optimized, and the results by DEpm are compared with those results.

In the 13 benchmark problems, problems g03, g05, g11, and g13 contain equality constraints. In problems with equality constraints, the equality constraints are

relaxed and converted into inequality constraints according to |h j (x)| 104 , which

is adopted in many methods. Problem g12 has disjointed feasible regions. Table 6.1

shows the outline of the 13 problems (Farmani and Wright 2003; Mezura-Montes and

Coello 2005). The table contains the number of variables n, the form of the objective

function, the number of linear inequality constraints (LI), nonlinear inequality constraints (NI), linear equality constraints (LE), nonlinear equality constraints (NE),

and the number of constraints active at the optimal solution.

The parameters for DEpm are as follows (Takahama and Sakai 2006, 2009b,

2010a): The number of search points N = 40, the maximum number of evaluations

FE max =100,000, the scaling factor F = 0.7, and the crossover rate CR = 0.9. The

parameters for the constrained method are as follows: Every constraint violation

is defined as a simple sum of constraints, or p = 1 in Eq. (6.3). The level is

controlled using Eqs. (6.23) and (6.24) for problems with equality constraints and

is 0 for the other problems. The control generation Tc =1,000, the control parameter

cp = 5, and = 0.2N . For the estimated comparison, the parameter for the potential

Table 6.1 Summary of test

problems

Form of f

LI

NI

LE

NE

Active

g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

g12

g13

13

20

10

5

4

2

10

2

7

8

2

3

5

Quadratic

Nonlinear

Polynomial

Quadratic

Cubic

Cubic

Quadratic

Nonlinear

Polynomial

Linear

Quadratic

Quadratic

Nonlinear

9

1

0

0

2

0

3

0

0

3

0

0

0

0

1

0

6

0

2

5

2

4

3

0

93

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

3

0

0

0

0

0

1

0

2

6

1

1

2

3

2

6

0

2

6

1

0

3

171

Table 6.2 Experimental results on 13 benchmark problems using standard settings; 30 independent

runs were performed

Best

Median

Mean

Worst

st. dev.

g01 15.000

Optimal

15.000000

15.000000

15.000000

15.000000

4.193e 12

g02 0.803619

0.803547

0.803056

0.802406

0.790861

2.255e 03

g03 1.000

1.000500

1.000500

1.000500

1.000499

1.134e 07

0.000e + 00

g05 5126.498

5126.496714

5126.496714

5126.496714

5126.496714

g06 6961.814

6961.813876

6961.813876

6961.813876

6961.813876

2.803e 12

g07 24.306

24.306209

24.306209

24.306210

24.306214

1.215e06

g08 0.095825

0.095825

0.095825

0.095825

0.095825

0.000e + 00

g09 680.630

680.630057

680.630057

680.630057

680.630057

0.000e + 00

g10 7049.248

7049.248021

7049.28021

7049.248021

7049.248026

1.028e 06

g11 0.750

0.749900

0.749900

0.749900

0.749900

0.000e + 00

g12 1.000000

1.000000

1.000000

1.000000

1.000000

0.000e + 00

g13 0.053950

0.0539415

0.0539415

0.0539415

0.0539415

0.000e + 00

pd = 2 and the margin parameter = 0.1. In this paper, 30 independent runs are

performed.

Table 6.2 summarizes the experimental results. The table shows the known optimal

solution for each problem and the statistics from the 30 independent runs. These

include the best, median, mean, and worst values and the standard deviation of the

objective values found.

For problems g01, g04, g05, g06, g08, g09, g11, g12, and g13, optimal

solutions are found consistently in all 30 runs. For problems g03, g07, and g10,

optimal or near-optimal solutions are found in all 30 runs. These results show that

DEpm is an efficient and stable algorithm. As for g02, it is a multi-modal problem

that has many local optima with peaks near the global optimum within the feasible

region. Many other methods cannot constantly obtain near-optimal solutions, but

DEpm attained about 0.802 on average within 100,000 FEs. Thus, it is thought

that DEpm has high ability to solve multi-modal problems.

The results show that DEpm is an efficient and stable algorithm.

In order to show the effectiveness of DEpm , the number of function evaluations

of DEpm to find a near-optimal solution is compared with the FEs of the original

DE , which does not use function approximation. Also, DEpm is compared with

172

DEpm without the approximation of the constraint violation, or DEpm -, where xi

is always evaluated when the number of the feasible solutions is small.

The number of evaluations of the objective function and the constraints to reach a

near-optimal solution, where the difference between the objective value of the nearoptimal solution and the optimal solution is within 104 , is shown in Table 6.3. The

average number of evaluations for the objective function and the constraints over 30

runs are shown in the columns labeled #func and #const respectively. The standard

deviations of the number of evaluations for the objective function and the constraints

are shown in parentheses. Also, the ratios of FEs of DEpm and DEpm - compared

with FEs of the DE and statistical significance are shown under the standard deviations. Statistical differences between DEpm and DEpm - and between DEpm and

DE using Welchs t-test are shown by ++/, +/ and as significantly different (smaller/greater) with p-value p < 0.01, significantly different (smaller/greater)

with p < 0.05 and otherwise, respectively.

Apparently, DEpm attained the best results followed by DEpm -. DEpm is

statistically faster than DE in 12 problems and faster than DEpm - in 9 problems.

DEpm can reduce the evaluation of the constraints by about 550 % compared with

DE. DEpm - can reduce the evaluation of the constraints by 0 to about 45 %.

Also, DEpm can reduce the evaluation of the objective function by about 1550 %

compared with DE. DEpm - can reduce the evaluation of the objective function

by about 045 %.

These results show that the potential model is effective not only for objective

function but also for constraint violation. Thus, it is thought that the potential model

is a general-purpose rough approximation model.

In the constrained method, the objective function and the constraints are treated

separately. So, when the order relation of the search points can be decided only by

the constraint violation of the constraints, the objective function is not evaluated, or

the evaluation of the objective function can often be omitted. Thus, the number of

evaluations of the objective function is less than the number of evaluations of the

constraints. This nature of the constrained method contributes to the efficiency of

the algorithm, especially when the objective function is computationally demanding.

The number of evaluations of the constraint violations to find the near-optimal solution ranged from about 500 to 120,000. The number of evaluations of the objective

function ranged between about 200 and 50,000. For these problems, DEpm can omit

the evaluation of the objective function by about 1590 %. Therefore, DEpm can

find optimal solutions very efficiently, especially from the viewpoint of the number

of evaluations for the objective function.

There are some methods that have solved the same 13 problems. In the methods,

for comparative studies we chose the simple multi-membered evolution strategy

(SMES) proposed by Mezura-Montes and Coello (2005), the adaptive trade-off

model (ATMES) proposed by Wang et al. (2008), multi-objective method (HCOEA)

173

Table 6.3 Comparison of the number of FEs to attain within 104 error from the optimal value

f

DEpm

DEpm -

DE

#const

#func

#const

#func

#const

#func

g01

g02

g03

g04

g05

g06

g07

g08

g09

g10

g11

g12

g13

44099.2

(1250.4)

0.76,++,++

123382.6

(11190.3)

0.83,,++

39489.8

(9040.0)

0.97,,

13556.8

(671.1)

0.56,,++

25007.6

(1435.7)

0.65,++,++

3344.7

(251.8)

0.53,++,++

54781.8

(4487.7)

0.76,,++

462.4

(85.9)

0.49,++,++

14700.6

(873.3)

0.69,++,++

45332.1

(2872.1)

0.72,++,++

10302.3

(3335.6)

0.60,++,++

2127.7

(419.1)

0.53,+,++

22304.5

(1049.0)

0.66,++,++

13626.1

(344.9)

0.82,,++

51697.8

(4062.7)

0.87,,++

11827.3

(483.2)

0.86,++,++

5087.9

(240.9)

0.54,,++

10173.6

(537.5)

0.74,++,++

1468.5

(176.4)

0.48,,++

15278.5

(1194.8)

0.77,,++

206.2

(67.8)

0.52,,++

7047.1

(398.2)

0.71,,++

7975.0

(463.2)

0.76,,++

8681.2

(2684.1)

0.70,++,++

207.4

(60.4)

0.56,,++

7618.8

(1211.1)

0.65,++,++

45899.6

(1411.9)

0.79

123382.6

(11190.3)

0.83

38707.7

(2530.4)

0.95

13589.1

(494.9)

0.56

38502.9

(409.4)

1.00

4110.0

(249.0)

0.65

56584.8

(3509.1)

0.79

713.3

(82.6)

0.75

15662.9

(946.7)

0.74

48126.4

(3182.2)

0.77

17105.3

(5476.2)

1.00

2447.7

(532.9)

0.61

33869.8

(691.6)

1.00

13782.8

(375.8)

0.83

51697.8

(4062.7)

0.87

13587.7

(287.3)

0.98

5061.7

(169.8)

0.54

13663.1

(225.8)

1.00

1418.2

(118.6)

0.46

15443.9

(878.9)

0.78

212.1

(54.8)

0.53

7225.8

(409.5)

0.73

8095.5

(577.7)

0.77

12380.3

(4027.3)

1.00

218.7

(55.6)

0.59

11662.2

(1133.7)

1.00

58135.3

(1306.0)

1.00

148677.6

(13972.9)

1.00

40566.8

(3575.5)

1.00

24063.7

(1124.7)

1.00

38502.9

(409.4)

1.00

6336.6

(366.5)

1.00

71619.5

(4163.2)

1.00

946.0

(142.5)

1.00

21177.6

(959.0)

1.00

62695.3

(3647.7)

1.00

17105.3

(5476.2)

1.00

4041.9

(1122.6)

1.00

33869.8

(691.6)

1.00

16667.1

(293.6)

1.00

59273.8

(5224.9)

1.00

13818.7

(341.5)

1.00

9410.9

(326.1)

1.00

13663.1

(225.8)

1.00

3058.8

(201.8)

1.00

19851.5

(1051.2)

1.00

397.8

(108.5)

1.00

9947.2

(439.3)

1.00

10466.0

(578.9)

1.00

12380.3

(4027.3)

1.00

370.0

(105.8)

1.00

11662.2

(1133.7)

1.00

g05

5126.4967

g04

30665.5387

g03

1.0005

g02

0.803619

g01

15.000

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

15.000000

15.000000

15.000000

15.000000

4.19e12

0.803547

0.803056

0.802406

0.790861

2.26e03

1.000500

1.000500

1.000500

1.000499

1.13e07

30665.538672

30665.538672

30665.538672

30665.538672

0.00e+00

5126.496714

5126.496714

5126.496714

5126.496714

0.00e+00

15.000000

15.000000

15.000000

15.000000

0.00e+00

0.803618

0.803614

0.803613

0.803588

5.59e06

1.000500

1.000500

1.000500

1.000500

6.46e09

30665.538670

30665.538670

30665.538670

30665.538670

0.00e+00

5126.496714

5126.496714

5126.496714

5126.496714

1.82e12

15.000

15.000

15.000

15.000

0.00e+00

0.803601

0.792549

0.785238

0.751322

1.67e02

1.000

1.000

1.000

1.000

2.09e04

30665.539

30665.539

30665.539

30665.539

0.00e+00

5126.599

5160.198

5174.492

5304.167

5.006e+01

15.000

15.000

15.000

15.000

1.6e14

0.803388

0.792420

0.790148

0.756986

1.3e02

1.000

1.000

1.000

1.000

5.9e05

30665.539

30665.539

30665.539

30665.539

7.4e12

5126.498

5126.776

5127.648

5135.256

1.8e+00

15.000000

15.000000

15.000000

14.999998

4.297e07

0.803241

0.802556

0.801258

0.792363

3.832e03

1.000000

1.000000

1.000000

1.000000

1.304e12

30665.539

30665.539

30665.539

30665.539

5.404e07

5126.4981

5126.4981

5126.4981

5126.4984

1.727e07

15.0000

15.0000

15.0000

15.0000

0.00e+00

0.8036191

0.8033239

0.7998220

0.7851820

6.29e03

1.0005

1.0005

1.0005

1.0005

0.0e+00

30665.5387

30665.5387

30665.5387

30665.5387

0.0e+00

5126.4967

5126.4967

5126.4967

5126.4967

0.0e+00

Table 6.4 Comparison of statistical results among the DEpm , the DE, SMES, ATMES, HCOEA, ECHT-EP2, and A-DDE

Stat.

DEpm

DE

SMES

ATMES

HCOEA

ECHT-EP2

f &optimal

F E max

100,000

200,000

240,000

240,000

240,000

240,000

(continued)

15.000

15.000

15.000

15.000

7.00e06

0.803605

0.777368

0.771090

0.609853

3.66e02

1.000

1.000

1.000

1.000

9.30e12

30665.539

30665.539

30665.539

30665.539

3.20e13

5126.497

5126.497

5126.497

5126.497

2.10e11

A-DDE

180,000

174

T. Takahama and S. Sakai

g09

680.630057

g08

0.095825

g07

24.3062

g06

6961.8139

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Stat.

f &optimal

F E max

DE

200,000

6961.813876

6961.813876

6961.813876

6961.813876

0.00e+00

24.306209

24.306209

24.306209

24.306209

4.27e09

0.095825

0.095825

0.095825

0.095825

0.00e+00

680.630057

680.630057

680.630057

680.630057

0.00e+00

DEpm

100,000

6961.813876

6961.813876

6961.813876

6961.813876

2.80e12

24.306209

24.306209

24.306210

24.306214

1.22e06

0.095825

0.095825

0.095825

0.095825

0.00e+00

680.630057

680.630057

680.630057

680.630057

0.00e+00

6961.814

6961.814

6961.284

6952.482

1.85e+00

24.327

24.426

24.475

24.843

1.32e01

0.095825

0.095825

0.095825

0.095825

0.00e+00

680.632

680.642

680.643

680.719

1.55e02

SMES

240,000

6961.814

6961.814

6961.814

6961.814

4.6e12

24.306

24.313

24.316

24.359

1.1e02

0.095825

0.095825

0.095825

0.095825

2.8e17

680.630

680.633

680.639

680.673

1.0e02

ATMES

240,000

6961.81388

6961.81388

6961.81388

6961.81388

8.507e12

24.3064582

24.3073055

24.3073989

24.3092401

7.118e04

0.095825

0.095825

0.095825

0.095825

2.417e17

680.6300574

680.6300574

680.6300574

680.6300578

9.411e08

HCOEA

240,000

6961.8139

6961.8139

6961.8139

6961.8139

0.00e+00

24.3062

24.3063

24.3063

24.3063

3.19e05

0.09582504

0.09582504

0.09582504

0.09582504

0.0e+00

680.630057

680.630057

680.630057

680.630057

2.61e08

ECHT-EP2

240,000

(continued)

6961.814

6961.814

6961.814

6961.814

2.11e12

24.306

24.306

24.306

24.306

4.20e05

0.095825

0.095825

0.095825

0.095825

9.10e10

680.63

680.63

680.63

680.63

1.15e10

A-DDE

180,000

175

g13

0.0539415

g12

1.000

g11

0.749900

g10

7049.248

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Best

Median

Mean

Worst

Stat.

f &optimal

F E max

7049.248021

7049.248021

7049.248021

7049.248026

1.03e06

0.749900

0.749900

0.749900

0.749900

0.00e+00

1.000000

1.000000

1.000000

1.000000

0.00e+00

0.0539415

0.0539415

0.0539415

0.0539415

0.00e+00

DEpm

100,000

7049.248021

7049.248021

7049.248021

7049.248021

0.00e+00

0.749900

0.749900

0.749900

0.749900

0.00e+00

1.000000

1.000000

1.000000

1.000000

0.00e+00

0.053942

0.053942

0.053942

0.053942

0.00e+00

DE

200,000

7051.903

7253.603

7253.047

7638.366

1.36e+02

0.75

0.75

0.75

0.75

1.52e04

1.0000

1.0000

1.0000

1.0000

0.00e+00

0.053986

0.061873

0.166385

0.468294

1.77e01

SMES

240,000

7052.253

7215.357

7250.437

7560.224

1.2e+02

0.75

0.75

0.75

0.75

3.4e04

1.000

1.000

1.000

0.994

1.0e03

0.053950

0.053952

0.053959

0.053999

1.3e05

ATMES

240,000

7049.286598

7049.486145

7049.525438

7049.984208

1.502e01

0.750000

0.750000

0.750000

0.750000

1.546e12

1.000000

1.000000

1.000000

1.000000

0.00e+00

0.0539498

0.0539498

0.0539498

0.0539499

8.678e08

HCOEA

240,000

7049.2483

7049.2488

7049.2490

7049.2501

6.60e04

0.7499

0.7499

0.7499

0.7499

0.0e+00

1.0000

1.0000

1.0000

1.0000

0.0e+00

0.0539415

0.0539415

0.0539415

0.0539415

1.00e12

ECHT-EP2

240,000

7049.248

7049.248

7049.248

7049.248

3.23e4

0.75

0.75

0.75

0.75

5.35e15

1.000

1.000

1.000

1.000

4.10e11

0.053942

0.053942

0.079627

0.438803

9.60e02

A-DDE

180,000

176

T. Takahama and S. Sakai

177

(2010), and the DE proposed by Takahama and Sakai (2009b), because the results

of these methods are better than the results of the other methods and they report

good quality statistical information. Also, A-DDE proposed by Mezura-Montes and

Palomeque-Ortiz (2009), which adopts adaptive parameter control, is included in the

comparison.

Table 6.4 shows the comparisons of the best, median, mean, worst values and

the standard deviation for the seven methods. The maximum number of FEs is also

shown in FE max .

All methods found optimal solutions in all 30 runs for g01, g03, g04, g08,

g11, and g12. In other problems, from the viewpoint of quality of solutions, it is

thought that DE are the best methods followed by ECHT-EP2 and DEpm , where

the difference between ECHT-EP2 and DEpm is very small. However, the number

of function evaluations in DEpm is very small, that is only about half, compared

with that in DE and ECHT-EP2. Thus, it is thought that DEpm is better than DE

and ECHT-EP2 from the viewpoint of efficiency.

6.7 Conclusions

In order to utilize a rough approximation model in constrained optimization, a new

scheme of combining the constrained method and the estimated comparison using

potential model is proposed. The potential model is used to approximate not only the

objective function but also the constraint violation. This idea is introduced to differential evolution, which is known as a simple, efficient, and robust search algorithm

that can solve unconstrained optimization problems, and the DEpm is proposed.

It is shown that DEpm could solve 13 benchmark problems most efficiently

compared with many other methods. Also, it is shown that the potential model is

a general-purpose rough approximation model and the approximation of both the

objective function and the constraint violation can improve the efficiency of DE.

In the future, we will apply DEpm to various real-world problems that have

expensive objective functions.

Acknowledgments This research is supported in part by Grant-in-Aid for Scientific Research

(C) (No. 24500177, 26350443) of Japan society for the promotion of science and Hiroshima City

University Grant for Special Academic Research (General Studies).

References

Aguirre AH, Rionda SB, Coello CAC, Lizrraga GL, Montes EM (2004) Handling constraints using

multiobjective optimization concepts. Int J Numer Methods Eng 59(15):19892017

Bche D, Schraudolph NN, Koumoutsakos P (2005) Accelerating evolutionary algorithms with

Gaussian process fitness function models. EEE Trans Syst, Man, Cybern, Part C: Appl Rev

35(2):183194

178

Camponogara E, Talukdar SN (1997) A genetic algorithm for constrained and multiobjective optimization. In: Alander JT (ed) 3rd Nordic workshop on genetic algorithms and their applications

(3NWGA), University of Vaasa, Vaasa pp 4962

Coello CAC (2000a) Constraint-handling using an evolutionary multiobjective optimization technique. Civ Eng Environ Syst 17:319346

Coello CAC (2000b) Use of a self-adaptive penalty approach for engineering optimization problems.

Comput Ind 41(2):113127

Coello CAC (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11

12):12451287

Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods

Appl Mech Eng 186(2/4):311338

Farmani R, Wright JA (2003) Self-adaptive fitness formulation for constrained optimization. IEEE

Trans Evol Comput 7(5):445455

Guimares FG, Wanner EF, Campelo F, Takahashi RH, Igarashi H, Lowther DA, Ramrez JA (2006)

Local learning and search in memetic algorithms. In: Proceedings of the 2006 IEEE congress on

evolutionary computation, Vancouver. pp 98419848

Homaifar A, Lai SHY, Qi X (1994) Constrained optimization via genetic algorithms. Simulation

62(4):242254

Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft

Comput 9:312

Jin Y, Olhofer M, Sendhoff B (2000) On evolutionary optimization with approximate fitness functions. In: Proceedings of the genetic and evolutionary computation conference. Morgan Kaufmann, pp 786792

Jin Y, Olhofer M, Sendhoff B (2002) A framework for evolutionary optimization with approximate

fitness functions. IEEE Trans Evol Comput 6(5):481494

Jin Y, Sendhoff B (2004) Reducing fitness evaluations using clustering techniques and neural

networks ensembles. In: Genetic and evolutionary computation conference. LNCS, vol 3102,

Springer, pp 688699

Joines J, Houck C (1994) On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with GAs. In: Fogel D (ed) Proceedings of the first IEEE conference on evolutionary computation. IEEE Press, Orlando, pp 579584

Mallipeddi R, Suganthan PN (2010) Ensemble of constraint handling techniques. IEEE Trans Evol

Comput 14(4):561579

Mezura-Montes E, Coello CAC (2005) A simple multimembered evolution strategy to solve constrained optimization problems. IEEE Trans Evol Comput 9(1):117

Mezura-Montes E, Coello CAC (2011) Constraint-handling in nature-inspired numerical optimization: past, present and future. Swarm Evol Comput 1:173194

Mezura-Montes E, Palomeque-Ortiz AG (2009) Parameter control in differential evolution for constrained optimization. In: Proceedings of the 2009 IEEE congress on evolutionary computation,

pp 13751382

Michalewicz Z (1995) A survey of constraint handling techniques in evolutionary computation

methods. In: Proceedings of the 4th annual conference on evolutionary programming. The MIT

Press, Cambridge, pp 135155

Michalewicz Z, Attia N (1994) Evolutionary optimization of constrained problems. In: Sebald A,

Fogel L (eds) Proceedings of the 3rd annual conference on evolutionary programming. World

Scientific Publishing, River Edge, pp 98108

Ong YS, Zhou Z, Lim D (2006) Curse and blessing of uncertainty in evolutionary algorithm using

approximation. In: Proceedings of the 2006 IEEE congress on evolutionary computation. Vancouver, pp 98339840

Ray T, Liew KM, Saini P (2002) An intelligent information sharing strategy within a swarm for

unconstrained and constrained optimization problems. Soft ComputFusion Found, Methodol

Appl 6(1):3844

179

Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE

Trans Evol Comput 4(3):284294

Runarsson TP, Yao X (2003) Evolutionary search and constraint violations. In: Proceedings of the

2003 congress on evolutionary computation, vol 2. IEEE Service Center Piscataway, New Jersey,

pp 14141419

Sakai S Takahama T (2010) A parametric study on estimated comparison in differential evolution

with rough approximation model. In: Kitahara M, Morioka K (eds) Social systems solution by

legal informatics. Economic sciences and computer sciences, Kyushu University Press, Fukuoka,

pp 112134

Storn R, Price K (1997) Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341359

Surry PD, Radcliffe NJ (1997) The COMOGA method: constrained optimisation by multiobjective

genetic algorithms. Control Cybern 26(3):391412

Takahama T, Sakai S (2000) Tuning fuzzy control rules by the constrained method which solves

constrained nonlinear optimization problems. Electron Commun Japan, Part 3: Fundam Electron

Sci 83(9):112

Takahama T, Sakai S (2005a) Constrained optimization by applying the constrained method to

the nonlinear simplex method with mutations. IEEE Trans Evol Comput 9(5):437451

Takahama T, Sakai S (2005b) Constrained optimization by constrained particle swarm optimizer

with -level control. In: Proceedings of the 4th IEEE international workshop on soft computing

as transdisciplinary science and technology (WSTST05), pp 10191029

Takahama T, Sakai, S (2006) Constrained optimization by the constrained differential evolution

with gradient-based mutation and feasible elites. In: Proceedings of the 2006 IEEE congress on

evolutionary computation, pp 308315

Takahama T, Sakai S (2008a) Efficient optimization by differential evolution using rough approximation model with adaptive control of error margin. In: Proceedings of the joint 4th international conference on soft computing and intelligent systems and 9th international symposium on

advanced intelligent systems, pp 14121417

Takahama T, Sakai S (2008b) Reducing function evaluations in differential evolution using rough

approximation-based comparison. In: Proceedings of the 2008 IEEE congress on evolutionary

computation, pp 23072314

Takahama T, Sakai S (2009a) A comparative study on Kernel smoothers in differential evolution

with estimated comparison method for reducing function evaluations. In: Proceedings of the 2009

IEEE congress on evolutionary computation, pp 13671374

Takahama T, Sakai S (2009b) Fast and stable constrained optimization by the constrained differential evolution. Pac J Optim 5(2):261282

Takahama T, Sakai S (2010a) Constrained optimization by the constrained differential evolution

with an archive and gradient-based mutation. In: Proceedings of the 2010 IEEE congress on

evolutionary computation, pp 16801688

Takahama, T, Sakai S (2010b) Efficient constrained optimization by the constrained adaptive

differential evolution. In: Proceedings of the 2010 IEEE congress on evolutionary computation,

pp 20522059

Takahama T, Sakai S (2010c) Reducing function evaluations using adaptively controlled differential evolution with rough approximation model. In: Tenne Y, Goh C-K (eds) Computational

intelligence in expensive optimization problems. Adaptation learning and optimization, vol 2.

Springer, Berlin, pp 111129

Takahama T, Saka S (2013) Efficient constrained optimization by the constrained differential

evolution with rough approximation using kernel regression. In: Proceedings of the 2013 IEEE

congress on evolutionary computation, pp 6269

Takahama T, Sakai S, Iwane N (2006) Solving nonlinear constrained optimization problems by the

constrained differential evolution. In: Proceedings of the 2006 IEEE adaptation learning and

optimization, pp 23222327

180

Tessema B, Yen G (2006) A self adaptive penalty function based algorithm for constrained optimization. In: Yen GG, Lucas SM, Fogel G, Kendall G, Salomon R, Zhang B-T, Coello CAC,

Runarsson TP (eds) Proceedings of the 2006 IEEE congress on evolutionary computation. IEEE

Press, Vancouver, pp 246253

Venkatraman S, Yen GG (2005) A generic framework for constrained optimization using genetic

algorithms. IEEE Trans Evol Comput 9(4):424435

Wang Y, Cai Z, Cuo G, Zhou Z (2007) Multiobjective optimization and hybrid evolutionary

algorithm to solve constrained optimization problems. IEEE Trans Syst, Man Cybern, Part B

37(3):560575

Wang Y, Cai Z, Xhau Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary

computation. IEEE Trans Evol Comput 12(1):8092

Chapter 7

of Multi-recombinative Evolution Strategies

Applied to a Conically Constrained Problem

Jeremy Porter and Dirk V. Arnold

Abstract Many step size adaptation techniques for evolution strategies have been

developed with unconstrained optimization problems in mind. In constrained settings, the interplay between step size adaptation and constraint handling is both of

crucial importance and often not well understood. We consider a linear optimization

problem with a feasible region defined by a right circular cone symmetric about the

gradient direction, such that the optimal solution is located at the cones apex. We

provide a detailed analysis of the behaviour of a multi-recombinative evolution strategy that employs cumulative step size adaptation and a simple constraint handling

technique. The results allow studying the influence of parameters of both the problem

class at hand, such as the angle at the cones apex, and of the strategy considered,

including its population size parameters. The impact of assuming different models

for the cost of objective and constraint function evaluations is discussed.

Keywords Evolution strategy Constraint handling

adaptation Conically constrained problem

7.1 Introduction

While numerous constraint handling techniques used in connection with evolution

strategies exist and are in common use (compare Mezura-Montes and Coello Coello

(2011)), the understanding of their properties lags behind that of strategy variants for

unconstrained problems. Of particular significance for the success of the strategies is

the interaction between step size adaptation and constraint handling technique. Generally, convergence to non-stationary points is more easily avoided in unconstrained

settings than in constrained ones.

J. Porter (B) D.V. Arnold

Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada

e-mail: jporter@cs.dal.ca

D.V. Arnold

e-mail: dirk@cs.dal.ca

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_7

181

182

including evolution strategies, is the dynamical systems approach comprehensively

described by Meyer-Nieberg and Beyer (2012). In essence, the approach considers

test functions that pose interesting challenges to optimization strategies while being

simple enough to yield interpretable results. Test functions considered usually exhibit

strong symmetries, making it possible to describe the behaviour of adaptive optimization strategies applied to them in terms of dynamical systems with low-dimensional

state spaces. By choosing the state variables appropriately, the evolution equations

generate a time-invariant Markov process with a stationary limit distribution. That

limit distribution is expanded in terms of its moments, moments after an iteration are

computed as functions of those before, and stationarity is enforced by equating one

set to the other. The result is a system of as many equations as there are moments

considered in the expansion of the distribution. Solving the system for the unknowns

yields interpretable results that can be studied numerically.

As in unconstrained environments, the complexity of both the settings and the

algorithms considered has been increasing gradually. Early work, such as that of

Rechenberg (1973), Schwefel (1981), and Beyer (1989), analyzes (1 + 1) and (1, )

evolution strategies1 in connection with simple constrained problems where the normal vectors of the constraint planes are perpendicular to the gradient of the objective

function. In more recent work, Arnold and Brauer (2008) and Arnold (2011b) consider the same strategies for a linear problem with a single linear constraint of general

orientation. One of the simplest methods for constraint handling is used, which is

to resample infeasible offspring until they are feasible. In the same environment,

constraint handling through the projection of infeasible candidate solutions onto the

feasible region is analyzed by Arnold (2011a), who finds fundamental differences

between the two constraint handling approaches when used in connection with cumulative step size adaptation. Arnold (2013b) extends the analyses by considering the

(/, )-ES, which selects more than a single candidate solution per iteration and

employs multi-recombination.

Commonly used test problems in benchmarking studies of evolutionary algorithms, such as those considered by Michalewicz and Schoenauer (1996), have optimal solutions located on the boundary of the feasible region. Often, there is more

than a single linear constraint active at the location of the optimum. In an attempt

to model such situations, Arnold (2013a) considers the behaviour of the (1, )-ES

for a problem where the feasible region is bounded by a right circular cone and the

optimum is located at the cones apex. In this work, we expand this to the more

general case of the (/, )-ES with 1, yielding new insights with regard to the

use of non-singleton populations and multi-recombination. Moreover, we consider

different models for the cost of objective and constraint function evaluations, and

their impact on optimal strategy behaviour.

The remainder of this chapter is organized as follows. In Sect. 7.2, we give an

overview of the (/, )-ES algorithm with cumulative step size adaptation, as well

as a description of the optimization problem we will consider. In Sect. 7.3, we describe

1

See Beyer and Schwefel (2002) for an overview of evolution strategy terminology.

183

the expected behaviour of a single step of the iterative algorithm. Section 7.4 expands

on these results to model the strategy as a Markov process and describes its steady

state for scale invariant step size. Section 7.5 expands that analysis to consider cumulative step size adaptation, and derives update rules for related quantities. Finally,

in Sect. 7.6 we provide a summary of our results and discuss their implications. An

Appendix contains details of computations related to Sect. 7.3.

In this section, we first give a brief description of the (/, )-ES with cumulative

step size adaptation. We then define the constrained optimization problem considered

in the remainder of the chapter.

7.2.1 Algorithm

The (/, )-ES with cumulative step size adaptation (CSA) is an iterative algorithm for solving N -dimensional, real-valued optimization problems. The variant

considered throughout this paper resamples infeasible offspring candidate solutions

until they are feasible (compare Oyman et al. (1999)). Its state is described by the

population centroid x R N , the step size R, and the search path s R N . A

single iteration is described in detail in Algorithm 1.

Algorithm 1 Single iteration of (/, )-ES with CSA

Input: f : R N R

1: for k = 1 do

2:

repeat

3:

z(k) = N (0, I)

4:

x(k) = x + z(k)

5:

until IsFeasible(x(k) )

6: endfor

7: sort [z(1) , . . . , z() ], [ f (x(1) ), . . . , f (x() )]

1 (k)

z

8: z =

resample until feasible

sort z(k) by values of f (x(k) )

k=1

9: x = x + z

10: s = (1 c)s

c)z

+ 2 c(2

s N

11: = exp

2D N

update s

update

sampling normally distributed random vectors in the neighbourhood of the population

centroid x R N . If a candidate solution generated is infeasible, it is resampled until

184

a feasible offspring candidate solution has been generated (Lines 16). Parameter

determines the variance and thereby the step size of the strategy; vectors z(k) are

referred to as mutation vectors. For the purpose of selection, the objective function

of the problem at hand is then used to evaluate the quality of the offspring candidate

solutions. Recombination averages the best offspring candidate solutions to form

the next population centroid and is implemented by averaging the mutation vectors

corresponding to the selected offspring (Lines 79).

The cumulative step-size adaptation approach introduced by Ostermeier et al.

(1994) modifies the step size parameter of the strategy based on past averaged

mutations. It employs an exponentially fading record of recent steps referred to as

the search path (Line 10), where

c (0, 1) is a constant that controls the rate of

exponential fading. The factor (2 c)/c in the update rule normalizes the nonunit variances of the steps, and ensures that if successive steps are uncorrelated, the

search path is of expected length N . The step size of the strategy is then increased if

recent steps of the strategy are positively correlated (as indicated by search paths with

length exceeding the dimension of the problem), and it is decreased if correlations

between recent steps are negative (if search paths are short). The factor D in the

update rule (Line 11) is a damping constant and controls how rapidly the step size

can be adapted. The search path and step size are initialized as s = 0 and = 1,

respectively.

We would like to analyze the behaviour of this strategy in a constrained optimization

setting where the optimal solution is located on the boundary of the feasible region.

As a model for this scenario, consider minimizing the objective function

f (x) = x1

(7.1)

= x12

N

xi2 0

(7.2)

x1 0.

(7.3)

i=2

the slack for the first constraint. The feasible region given by this pair of inequalities

defines a conic region with its apex at the origin and its axis coinciding with the

positive x1 axis. The shape of the feasible region is controlled by parameter > 0,

with smaller values resulting in a wider cone. As tends to zero, the feasible region

approaches the half-space with non-negative x1 coordinates, and as approaches

infinity the feasible region is restricted to the x1 axis itself.

185

Sect. 7.2.1 applied to the conically constrained problem thus defined, while assuming

that the dimension N of the search space is high. Formally, we will omit terms that

disappear in the limit N in order to arrive at simpler equations. While not

exact, the equations will approximate results for large but finite N , and computer

experiments will be used to verify their accuracy.

Notice that the behaviour of the evolution strategy considered is invariant with

respect to translations and rotations of the coordinate system. The analysis below

thus applies to the more general case where the feasible region of the problem forms

a right circular cone and the cones axis coincides with the gradient direction of the

objective function. The particular choice of coordinate system employed here has

the advantage of resulting in relatively simple equations.

Although the mutation vectors are drawn from a standard normal distribution, the act

of enforcing the feasibility constraints through resampling affects the distribution of

feasible offspring. Averaging the best mutation vectors as in Line 8 of Algorithm 1

further affects the distribution of z , which we now describe in the context of a single

iteration of the strategy.

We first observe that any vector x = x1 , x2 , . . . , x N may be written as x = x1 +

x2...N , where x1 = x1 , 0, . . . , 0 and x2...N = 0, x2 , . . . , x N . In the context of a

particular parental centroid x, any mutation vector z may be decomposed into three

mutually orthogonal components: the vector z1 that is its projection onto x1 /x1 ,

the vector z that is its projection onto x2...N /x2...N , and the vector z that lies

in the N 2 dimensional hyperplane orthogonal to both x1 and x2...N . The sum of

these components gives the original vector, as z = z1 + z + z , and we will write

z 1 , z , and z to refer to their respective magnitudes. Note that z is the magnitude

of the component of z that points to x from the axis of the cone defining the feasible

region, and that the cones axis coincides with the x1 axis for the current problem. If

we write

N

xi2 = x2...N

(7.4)

R=

i=2

to denote the distance from x to the axis of the cone within the N 1 dimensional

hyperplane determined by x1 , then z can be written as

186

z =

N

1

xi z i .

R

i=2

In each generation, all of the offspring must be feasible before recombination can

occur. In other words, for any offspring both

x1 + z 1 0

and

(x1 + z 1 )

2

N

(7.5)

(xi + z i )2 0

(7.6)

i=2

=

N

R

(7.7)

N x12 R 2

N

= 2 =

R

R2

(7.8)

x1 = R

+ .

N

(7.9)

Substituting this into Eq. (7.5) and using Eq. (7.7) gives us the equivalent statement

+ +

z1 0

N

N

using normalized quantities. Assuming that both and tend to finite limit values

as N increases (and it will be confirmed

below that they do), then taking the limit

be satisfied with overwhelming probability. Similarly, by using Eqs. (7.7), (7.8), and

(7.9) the inequality of Eq. (7.6) becomes

+ 2 z 1

N

2 2

+ z +

z i2 0.

z1

N

N

i=2

Since the z i are all standard

1 N

distributed and the term N i=2 z i2 converges almost surely to E[z i2 ] = 1 by the

strong law of large numbers. Omitting other terms that disappear in the limit N

and solving for z gives condition

187

+ 2 z 1 2

z

2

(7.10)

for a mutation vector to result in a feasible offspring candidate solution. Since both

the z i and z are standard normally distributed, the probability of the offspring

candidate solution x + z being feasible can thus be expressed using the conditional

probability of z 1 as

Pfeas =

1

2

+2 x 2

2

ex

2 /2

ey

2 /2

dy dx

1

+ 2 x 2

2

dx

=

ex /2

2

2

2

=

2 + 2

(7.11)

where () denotes the cumulative distribution function of the standard normal distribution. Equality between the second and third lines is established by use of an

identity from Arnold (2002, p. 117).

Having computed the probability Pfeas of generating feasible offspring, we can now

describe the expected behaviour of an individual step of the (/, )-ES. Where

before we considered individual offspring before selection and recombination, we

now refer to the results z 1 , z , and z of averaging across the best feasible offspring

in a generation of individuals. Using Eq. (7.10), the joint probability density for

the z 1 and z components of a feasible offspring is

+ 2 x 2

2

otherwise.

1

2

2

e(x +y )/2

p1, (x, y) = 2 Pfeas

if y

(7.12)

p1 (x) =

p1, (x, y) dy

1

2 Pfeas

x 2 /2

+ 2 x 2

2

(7.13)

188

z components of the average selected mutation vectors are computed in Eqs. (7.27)

and (7.30) of the Appendix. Since the coefficient of variation for the 2 distribution decreases with increasing N , the components of the z vector for feasible

offspring are independently standard normally distributed in the limit. Averaging

such vectors results in a vector of expected squared length

N

2

=

E z

(7.14)

To analyze the steady state behaviour of the algorithm applied to the conically constrained problem, for now we assume that the normalized step size is constant.

The step size is then said to be scale invariant. As a result, only remains as a

parameter describing the state of the strategy. The case of dynamically varying step

size under the control of CSA will be considered in Sect. 7.5.

The update rule

N

2

2

xi(t) + (t) z i(t)

(t+1) = x1(t) + (t) z 1(t)

i=2

for the slack is directly implied by Eq. (7.6), where superscripts indicate iteration

number. To derive the update rule for the normalized slack , this can be combined

with Eq. (7.8) to write

(t+1) =

R (t)

R (t+1)

(t)

(t)

(t) + 2 (t) z (t)

+ z

1

N

2

(t) (t) 2

(t) 2

(t) 2

z 1

+

z

+ z

N

where z 1 , z , and z refer to the respective component lengths averaged from the

best offspring. The update rule for distance R is derived from Eq. (7.4) to be

2

R (t+1) =

189

N

2

(xi + z i (t) )

i=2

=R

(t) 2

1+

z +

N

N

(7.15)

Using Eq. (7.14), combining this with Eq. (7.15), and taking the limit as N ,

the update rule becomes

2

.

(t+1) = (t) + 2 z 1 2 z

(7.16)

Markov process with the single state variable . At each iteration, this state variable

is influenced by the component z 1 of the step made along the gradient direction, and

the component z in the direction of the axis of the cone from the current population

centroid x.

Iterating Eq. (7.16) yields a sequence of normalized slack values. After initialization effects have faded, those values are drawn from a stationary limit distribution. In

order to study this, we apply the dynamical systems approach using a shifted Dirac

delta function as a model for the stationary distribution of , resulting in stationarity

condition

E (t+1) = (t) .

Using Eq. (7.16) yields

E z 1

= 2

E z .

(7.17)

The expected values E[z 1 ] and E[z ] are functions of , and expressions for both

can be found in the Appendix.

Figure 7.1 plots the average normalized slack for the (/, )-ES with = 10

and {1, 3}. The curves were computed by numerically solving Eq. (7.17) with

Eqs. (7.27) and (7.30) using Eqs. (7.28) and (7.31). The data points were found by

artificially restricting the normalized step size of Algorithm 1 to a fixed value of

and initializing runs with a point on the boundary of the constrained region. For

each run, the first 40N iterations were discarded to allow for initialization conditions

to subside, then the average normalized slack over the next 20,000 iterations was

recorded. An upper limit for resampling was set at 1,000, so that a run for generating

a data point would be aborted if any offspring remained infeasible after 1,000 resampling operations. In this event, all subsequent data points were also omitted from

the graph. As observed for the = 1 case in Arnold (2013a), the normalized slack

increases with increasing and increasing . The same holds true for > 1. The

190

1.0e+03

normalized slack *

1.0e+02

= 10.0

1.0e+01

= 1.0

1.0e+00

1.0e-01

= 0.1

1.0e-02

1.0e-03

1.0e-01

1.0e+00

1.0e+01

Fig. 7.1 Average normalized slack plotted against the normalized step size . Solid lines

represent results for = 1, while dashed lines represent results for = 3. In both cases, = 10.

Marked points represent experimental data from runs of the strategy with scale invariant step size

and dimension N = 40

case of = 3 shows larger overall values of normalized slack than for = 1. This

can be explained by noting that by averaging across multiple offspring, selection

pressure for remaining close to the constraint boundary is reduced, and candidate

solutions will tend to drift farther away. The data points appear to match very closely

to the predicted curves throughout, which suggests that using the Dirac delta model

is suitable for the range of parameters considered in the plot.

Assuming scale invariant step size, the (/, )-ES will either converge linearly to

the optimal solution at the cones apex or linearly diverge. That is, when plotting the

logarithm of the objective function value of the population centroid over the iteration

number, one will observe a noisy, linear decrease (or increase). Following Auger and

Hansen (2006), the convergence rate is defined as

"

!

f (x + z )

= N E log

f (x)

and is the negative of the slope of the line observed in the graph of logarithmic objective function values scaled with N . Positive convergence rates indicate convergence

while negative values signify divergence of the strategy. Using Eqs. (7.7) and (7.9)

this may be rewritten in terms of normalized quantities as

= N E log 1 +

z 1

N + /N

"

.

(7.18)

191

Dropping quadratic and higher order terms from the Taylor series expansion of the

logarithm and taking expected values, as N this becomes

E z 1

.

=

(7.19)

That is, convergence rates are affected by the normalized step size of the strategy as

well as by the population size parameters and that are implicit in E[z 1 ].

Higher convergence rates can be achieved by using larger values of and .

However, increasing the population size parameters also increases the computational costs of a single iteration of the algorithm. We consider two cost models for

comparing different parameter settings. In the first model, we assume that objective

function evaluations have a uniform cost that dominates the cost of all other operations involved in Algorithm 1. In particular, the cost of constraint function evaluations

is assumed to be negligible in this model. In the second cost model, we assume that

the cost of constraint function evaluations dominates all other costs. Optimal performance under the first cost model requires optimizing obj = /, as the number

of objective function evaluations per iteration equals . Optimal performance under

the second cost model involves optimizing feas = Pfeas /, as /Pfeas is the

expected number of constraint function evaluations per iteration.

In Fig. 7.2, the probability Pfeas of generating feasible offspring is shown for the

(/, )-ES with scale invariant step size for = 10 and {1, 3}. The lines have

been obtained from Eq. (7.11), with the normalized slack computed using the Dirac

delta model as above. The data points were calculated from averages over the same

runs of 20,000 iterations used to generate Fig. 7.1. As observed for the = 1 case

in Arnold (2013a), the probability Pfeas decreases with increasing , going below

1.0

probability Pfeas

0.8

0.6

= 0.1

0.4

0.2

0.0

1.0e-01

= 1.0

= 10.0

1.0e+00

1.0e+01

Fig. 7.2 Probability Pfeas of a random offspring candidate solution being feasible plotted against

the normalized step size . Solid lines represent results for = 1, while dashed lines represent

results for = 3. In both cases, = 10. Marked points represent experimental data from runs of

the strategy with scale invariant step size and dimension N = 40

192

convergence rate *

2.0

= 10.0

1.5

= 1.0

1.0

0.5

0.0

-0.5

1.0e-01

= 0.1

1.0e+00

1.0e+01

Fig. 7.3 Convergence rate plotted against the normalized step size . Solid lines represent

results for = 1, while dashed lines represent results for = 3. In both cases, = 10. Marked

points represent experimental data from runs of the strategy with scale invariant step size and

dimension N = 40

one half and appearing to approach zero for large . For equal normalized step

size, Pfeas is larger for = 3 than for = 1, which is unsurprising as it has been

observed in Fig. 7.1 that = 3 results in larger normalized slack values.

Figure 7.3 shows the convergence rate of the (/, )-ES with scale invariant

step size for = 10 and {1, 3}. The data points were calculated from averages

computed over the same runs used to generate Figs. 7.1 and 7.2, and the curves were

computed using Eq. (7.19) after solving Eq. (7.17) numerically for the normalized

slack. As observed for the = 1 case in Arnold (2013a), each curve first increases

with increasing step size before it starts decreasing and eventually turns negative

(indicating divergence of the strategy). This overall pattern introduces the notion

of an optimal normalized step size that maximizes the rate of convergence .

Larger values of , which correspond to more narrow cones delimiting the feasible

region, appear to admit higher maximal convergence rates. In terms of the strategys

behaviour, this suggests that narrower regions of feasibility funnel the candidate

solutions toward the optimum solution by inherently limiting the choice of offspring

in perpendicular directions.

Figure 7.4 shows the behaviour of various quantities when the normalized step

and which maximize and ,

size is fixed at the optimum values obj

feas

obj

feas

respectively. The resulting probability of generating feasible offspring, convergence

rates relative to the number of objective and constraint function evaluations, and the

optimal step size itself are all plotted for the (/, )-ES with = 10 and {1, 3}.

The data for the curves was generated by numerically computing the optimal values

and using Eqs. (7.11) and (7.19) with the Dirac delta model.

obj

feas

(shown with solid lines), a cost model is assumed where objective function

For obj

evaluations dominate overall computational costs. The case of = 1 corresponds to

the observations made in Arnold (2013a). The probability Pfeas is higher for = 3

for sufficiently large . For all choices

than for = 1, and the same is true for obj

12.0

8.0

=3

4.0

0.0

1.0e-02

=1

1.0e-01

1.0e+00

0.8

0.4

=3

0.10

=1

0.05

1.0e-01

1.0e+00

constraint parameter

1.0e-01

1.0e+00

1.0e+01

constraint parameter

0.20

0.15

=1

0.2

0.0

1.0e-02

1.0e+01

=3

0.6

constraint parameter

0.00

1.0e-02

193

1.0

probability Pfeas

1.0e+01

0.20

0.15

0.10

=3

0.05

=1

0.00

1.0e-02

1.0e-01

1.0e+00

1.0e+01

constraint parameter

Fig. 7.4 Optimal normalized step size , probability Pfeas of generating feasible offspring, convergence rate obj relative to the number of objective function evaluations, and convergence rate feas

relative to the number of constraint function evaluations are plotted against constraint parameter

for = 10 and {1, 3}. All figures use solid lines to indicate the optimal normalized step

, and dotted lines to indicate the optimal normalized step size

size obj

feas

normalized convergence rate relative to the assumed computational costs, the strategy

with = 1 outperforms that with = 3 for small values of while the situation

is reversed for larger values of the constraint parameter. Additionally, larger values

, for sufficiently large .

of appear to correspond with larger optimal values obj

This agrees with the observations of Fig. 7.3, and suggests that the choice of larger

encourages larger step size when the region is more narrow, subsequently improving

the expected rate of convergence.

(shown with dotted lines), a cost model is assumed where constraint

For feas

function evaluations dominate overall computational costs. The behaviour differs

for larger values of , yet appears almost identical for smaller

from that of obj

values. For these narrow regions of feasibility, the optimal step size is relatively

smaller, while the probability Pfeas remains at or above 0.5. Over approximately

the same interval of , the convergence rate obj is smaller and the convergence

. Taken together, these results suggest

rate feas is larger than when optimizing obj

that the second cost model is able to improve its expected rate of convergence by

encouraging smaller step size when dealing with more narrow regions of feasibility.

, corresponding probability P

In Fig. 7.5, the optimal normalized step size obj

feas

of generating feasible offspring, and convergence rates relative to both cost models

are shown for = 10 and varying . All points were generated by computing the

using the same method as in Fig. 7.4. The values for

optimal normalized step size obj

Pfeas increase monotonically with increasing truncation ratio /. The curves for the

16.0

12.0

= 0.01

= 0.1

1.0

= 1.0

= 10.0

probability Pfeas

194

8.0

4.0

0.0

0.0

0.2

0.4

0.6

0.8

0.8

0.6

0.2

0.0

0.0

1.0

= 0.01

= 0.1

= 1.0

= 10.0

0.2

0.1

0.0

0.0

0.2

0.4

0.6

0.8

truncation ratio /

0.2

0.4

0.6

0.8

1.0

truncation ratio /

convergence rate *feas

truncation ratio /

0.3

= 0.01

= 0.1

= 1.0

= 10.0

0.4

1.0

0.3

= 0.01

= 0.1

= 1.0

= 10.0

0.2

0.1

0.0

0.0

0.2

0.4

0.6

0.8

1.0

truncation ratio /

, probability P

Fig. 7.5 Optimal normalized step size obj

feas of generating feasible offspring,

normalized convergence rate obj relative to the number of objective function evaluations, and

normalized convergence rate feas relative to the number of constraint function evaluations are

.

plotted against truncation ratio / for = 10. All figures use the optimal normalized step size obj

The data points are joined by lines for ease of visibility

normalized convergence rate relative to the two cost models show optimal behaviour

for intermediate values of , except for very small values of where = 1 is

optimal. For both models, the optimal value of appears to increase monotonically

with respect to .

, corresponding probability P

In Fig. 7.6, the optimal normalized step size feas

feas

of generating feasible offspring, and convergence rates relative to both cost models

are shown for = 10 and varying . All points were generated by computing the

using the same method as in Fig. 7.5, adjusted for

optimal normalized step size feas

the different cost model. Throughout, the values seem more tightly clustered than

in Fig. 7.5. The optimal value of for both cost models still appears to increase

monotonically with respect to .

While we have assumed constant in the analysis up to now, that assumption is

of course unrealistic as the distance to the cones axis is unknown to the algorithm.

Practically, the step size needs to be adapted using one of a number of control

schemes. In this section, we consider the case that the step size of the algorithm is

controlled by CSA as described in Sect. 7.2.1. As before, the notation

16.0

12.0

= 0.01

= 0.1

8.0

4.0

0.0

0.0

0.2

0.4

0.6

0.8

0.8

0.6

0.2

0.0

0.0

1.0

= 1.0

= 10.0

0.2

0.1

0.0

0.0

0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

1.0

truncation ratio /

1.0

= 0.01

= 0.1

= 0.01

= 0.1

= 1.0

= 10.0

0.4

truncation ratio /

0.3

195

1.0

= 1.0

= 10.0

probability Pfeas

0.3

= 0.01

= 0.1

= 1.0

= 10.0

0.2

0.1

0.0

0.0

0.2

0.4

0.6

0.8

1.0

truncation ratio /

truncation ratio /

, probability P

Fig. 7.6 Optimal normalized step size feas

feas of generating feasible offspring,

normalized convergence rate obj relative to the number of objective function evaluations, and

normalized convergence rate feas relative to the number of constraint function evaluations are

.

plotted against truncation ratio / for = 10. All figures use the optimal normalized step size feas

The data points are joined by lines for ease of visibility, and the scales are kept identical to Fig. 7.5

for straightforward comparison

s =

N

1

si xi

R

(7.20)

i=2

refers to the magnitude of the component of vector s which points in the direction

from the axis of the cone to candidate solution x. Together with the component s1 ,

normalized slack , normalized step size , and deviation s2 N , this describes

the state of the strategy. This gives a five-dimensional parameter space for modeling the Markov process, compared to the one-dimensional parameter space used in

Sect. 7.4. Using the consequence given in Eq. (7.17) of the existing update rule for

, and known expected values E[z 1 ], E[z ] as computed in the Appendix, then by

following a similar approach to that of Arnold (2013a) and Arnold and Beyer (2010)

we will derive update rules and model the stationary distributions for s1 , s , and s2

in order to completely describe the expected behaviour of the system when using

CSA.

An immediate consequence of the update of the search path in Line 10 of Algorithm 1 is the update equation

(t+1)

s1

(t)

= (1 c)s1 +

(t)

c(2 c)z 1

196

where superscripts indicate iteration number, for the component of s contained in the

subspace spanned by the x1 axis. Employing the Dirac delta model in the dynamical

(t+1)

(t)

systems approach and requiring that E[s1

] = s1 results in

s1 =

(2 c)

E[z 1 ]

c

(7.21)

the strategy operates in a stationary state.

For the component s , using Eq. (7.20) with the search path update equation in

Line 10 of Algorithm 1 gives

(t+1)

s

#

R (t)

(t) (t)

(t)

(t)

s

= (t+1) (1 c) s +

z

N 2...N 2...N

R

(t) (t) 2

(t)

z2...N

+ c(2 c) z +

.

N

Then applying Eqs. (7.14) and (7.15) while omitting terms that disappear in the limit

N yields

(t+1)

s

(t)

(1 c)s

c(2 c) z +

(t+1)

s =

.

(t)

] = s , we have

(2 c)

E[z ] +

c

(7.22)

the strategy operates in a stationary state.

Considering the squared length s2 of the search path, the corresponding update

rule is

2

s(t+1) =

N

(t)

(t) 2

(1 c)si + c(2 c)z i

i=1

(t) (t)

= (1 c)2 s(t) 2 + 2(1 c) c(2 c)(z 1 s1

(t) (t)

+ z s ) + c(2 c)z(t) 2 .

Taking expected values, imposing the condition E[s(t+1) 2 ] = s(t) 2 , and recalling

that E[z2 ]/N = 1/ for large N , this becomes

197

s2 = (12c +c2 )s2 +2(1c) c(2 c)(E z 1 s1 +E z s )+c(2c)N .

Using Eqs. (7.21) and (7.22) gives

2(1 c)

E z 1

s N =

c

2

+ E z

+ E z

(7.23)

as an approximation for the average deviation of the squared length of the search

path from the expected value in the case of uncorrelated steps.

Finally, considering the normalized step size, using Eqs. (7.7) and (7.15) with the

update rule in Line 11 of Algorithm 1 results in

(t+1)

s(t+1) 2 N

R (t) (t)

= (t+1)

exp

.

2D N

R

1

=$

2

(t)

(t)

1 + 2 z /N +

/(N )

(t)

s(t+1) 2 N

exp

2D N

Using the Taylor expansions for 1/ 1 + x and exp(x) and dropping all terms of

quadratic and higher order we arrive at

#

(t+1)

(t)

1

1

N

(t)

(t) z

(t)

+

2

s(t+1) 2 N

+

2D N

.

E[z ] +

s2 N

2

=

.

2

2D

Applying Eq. (7.23) to the right hand side while again taking expected values, this

yields

2(1 c)

2

2

2

=

E[z ] .

E[z 1 ] + E[z ] +

E[z ] +

2

2cD

c may be set to 1/ N , and the damping

constant D may be set to 1/c = N . Re-arranging the terms above while simplifying

and omitting those that vanish as N gives

2 = 22 E[z 1 ]2 + E[z ]2

(7.24)

12.0

10.0

1.0

=1

=3

probability Pfeas

198

8.0

6.0

4.0

2.0

0.0

1.0e-02

1.0e-01

1.0e+00

0.8

0.6

0.4

0.2

0.0

1.0e-02

1.0e+01

=1

=3

0.1

0.0

1.0e-02

1.0e-01

1.0e+00

1.0e-01

1.0e+00

1.0e+01

constraint parameter

1.0e+01

constraint parameter

constraint parameter

0.2

=1

=3

0.10

=1

=3

0.05

0.00

1.0e-02

1.0e-01

1.0e+00

1.0e+01

constraint parameter

Fig. 7.7 Normalized step size , probability Pfeas of generating feasible offspring, convergence

rate obj relative to the number of objective function evaluations, and convergence rate feas relative

to the number of constraint function evaluations plotted against constraint parameter . All plots

represent runs using CSA to control step size. Values for = 1 and = 3 are compared for

= 10. In all figures, the marked points represent experimental data from runs of the strategy using

dimension N = 40 (+) and dimension N = 400 (). The extra black dotted lines are provided for

reference, and indicate the curves for normalized step size optimized for obj as shown in Fig. 7.4

as an approximation to the average normalized step size that CSA will generate in

the stationary state of the strategy.

In Fig. 7.7, the average normalized step size, the probability Pfeas of generating

feasible offspring, and the normalized convergence rates relative to the two cost

models are plotted when using CSA to control the value of . The curves were

generated by numerically solving Eqs. (7.17) and (7.24) with Eqs. (7.27) and (7.30).

The data points were determined by averaging results from runs of 20,000 iterations

of the (/, )-ES using CSA. As before, the first 40N iterations were discarded to

avoid initialization biases, and resampling offspring over 1,000 times resulted in no

further data points included from that run. Step sizes generated using CSA with = 3

are larger than those generated with = 1, and in both cases the values generated

are close to the optimal ones for the obj cost model (shown with dotted lines)

except where is large and CSA results in significantly smaller than optimal values.

Considering Pfeas , the probability of generating feasible offspring decreases with

increasing constraint parameter, though not as rapidly as in Fig. 7.4 when optimized

for . Values of the convergence rate obj relative to the number of objective

function evaluations are close to optimal throughout, provided that N is large enough

for the approximations to be sufficiently accurate. Values of the convergence rate

feas relative to the number of constraint function evaluations decrease and lose

= 0.1

= 1.0

0.8

1.0

= 10.0

probability Pfeas

probability Pfeas

1.0

0.6

0.4

0.2

0.0

10

100

= 0.1

= 1.0

0.8

0.4

0.2

1

1.2

= 10.0

convergence rate

convergence rate

= 0.1

= 1.0

0.8

0.4

0.0

10

100

dimension N

10

100

1000

dimension N

dimension N

1.6

= 10.0

0.6

0.0

1000

199

1000

2.0

= 0.1

= 1.0

1.5

= 10.0

1.0

0.5

0.0

10

100

1000

dimension N

Fig. 7.8 Probability Pfeas and convergence rate plotted against search space dimension N . The

left hand graphs represent results for = 1 and those on the right for = 3. In both cases,

= 10. The horizontal lines represent results obtained using the dynamical systems approach

assuming N . The marked points represent results measured in runs of the (/, )-ES with

cumulative step size adaptation

accuracy with increasing constraint parameter, mirroring the behaviour of Pfeas . The

relatively inaccurate predictions of the convergence rates for = 3 and N = 40

can be explained from the large observed values of the normalized slack causing

significant error when dropping the term /N compared to in the calculation going

from Eqs. (7.18) to (7.19). Measurements for N = 400 are noticeably more accurate

in this case.

Finally, Fig. 7.8 illustrates the accuracy of the predictions made using the dynamical systems approach in the limit N by comparing the estimates for the

probability Pfeas of generating feasible offspring and the convergence rate with

measurements made in runs of the (/, )-ES with cumulative step size adaptation

as described above. It can be seen that the error in the predictions decreases with

increasing search space dimensionality, though not necessarily monotonically. Predictions for small values of are more accurate than those for larger values of the

constraint parameter, and the error in the predictions of is generally larger for

= 3 than it is for = 1. While in the latter case the error is below 15 % for N as

small as 20, = 3 requires N an order of magnitude larger in order to achieve that

level of accuracy for larger values of .

200

7.6 Conclusion

We have analyzed the behaviour of the (/, )-ES with cumulative step size adaptation applied to a conically constrained problem where the gradient direction coincides with the cones axis, and the optimal solution lies in the cones apex, on the

boundary of the feasible region. Under the assumption of scale invariant step size,

we used a Markov process model to estimate the evolving slack of candidate solutions and the overall operation of the strategy probabilistically. More narrow conic

regions of feasibility were found to result in higher convergence rates, for appropriately chosen normalized step size. If choosing the step size to maximize the rate of

convergence, the strategy performed better with larger choices of when the feasible

region was narrow, while = 1 was a better choice for feasible regions approaching

the half-space.

An offsetting factor for the high convergence rates in narrow regions of feasibility was that these regions also resulted in a lower probability of feasible offspring,

requiring more resampling in each generation on average. Selecting more offspring

for recombination with larger could improve the probability of offspring being

feasible in these narrow regions, but would not improve the rate of convergence in

more broad regions of feasibility. As the region approaches the half-space, choosing

> 1 would eventually reduce the convergence rate. The balance between the probability of generating feasible offspring and the rate of convergence was considered

using two cost models: one that assumes that objective function evaluations dominate

computational costs, and one that assumes that constraint function evaluations play

that role.

Using cumulative step size adaptation was found to lead to convergence, usually

at a rate close to the optimal one, at least for sufficiently large N . However, the

predicted convergence rates were notably inaccurate when both and were large

and the feasible region was narrow. In these cases, the strategy moves farther from

the constraint boundary, developing a large average value of normalized slack. With

dimension N = 40, the error term then dominates the predicted convergence rate.

With larger dimensional problems, the observed values once again approached the

predicted rate.

Acknowledgments This research was supported by the Natural Sciences and Engineering Research

Council of Canada (NSERC).

7.7 Appendix

The derivation of expressions for E[z 1 ] and E[z ] closely follows similar calculations

by Arnold (2013b), with differences due to the task here being minimization rather

than maximization and the underlying probability distributions differing from those

that hold for the linearly constrained problem.

201

The (/, )-ES averages the mutation vectors corresponding to the selected offspring. Since the objective is minimization of f (x) = x1 , the vectors that are selected

are those with the smallest z 1 components. If the vectors are sorted so that z(k;) refers

to the vector with the kth smallest z 1 component, then by using elementary results

from the field of order statistics (see Balakrishnan and Rao (1998)), the probability

density function of the z 1 component for the mutation vector with the k-th smallest

objective function value may be written as

(k;)

p1

(x) =

!

p1 (x) [1 P1 (x)]k [P1 (x)]k1 .

( k)!(k 1)!

(7.25)

Since the value of z 1 is the average of the best individuals, its expected value can

be expressed as

1

E z 1 =

!

=

(k;)

x p1

(x) dx

k=1

[1 P1 (x)]k [P1 (x)]k1

dx.

x p1 (x)

( k)!(k 1)!

k=1

1

Q k [1 Q]k1

=

z 1 (1 z)1 dz (7.26)

( k)!(k 1)!

( 1)!( 1)!

Q

k=1

E z 1

= ( )

x p1 (x)

1P

1 (x)

z 1 (1 z)1 dz dx.

order of integration, this becomes

E z 1

= ( )

x p1 (x) p1 (y) [1 P1 (y)]1 [P1 (y)]1 dy dx

= ( )

(7.27)

202

where

y

I1 (y) =

x p1 (x) dx.

We introduce abbreviations

Ax =

and

+ 2 x 2

2

2

B=

2 + 2

v = ex /2

2

v = xex /2

2

u = (A x )

2

u = eA x /2 / 2

yielding

I1 (y) =

y

1

2 Pfeas

xex

2 /2

(A x ) dx

1

2 Pfeas

= p1 (y) +

ey 2 /2 A y + 1

2

1

2 Pfeas

y

e(x

2 +A2 )/2

x

y

ex

2 /2

eA x /2 dx

2

dx.

The remaining integral can be solved by quadratic completion of the argument to the

exponential function and subsequent change of variable, resulting in

1

1

2

1 + Ay B .

eB /2

I1 (y) = p1 (y) +

2 Pfeas 1 +

(7.28)

Together with Eq. (7.27), the expression in Eq. (7.28) allows numerically computing

the expected value of z 1 .

Due to the resampling of infeasible candidate solutions, the z components of mutation vectors resulting in feasible offspring are not independent of the respective z 1

components. Their conditional probability density is

p (y | z 1 = x) =

203

p1, (x, y)

,

p1 (x)

where the densities on the right hand side are given in Eqs. (7.12) and (7.13). The

corresponding conditional expected value is therefore

E z | z1 = x =

p1, (x, y)

dy

p1 (x)

1

2

2

ex /2 eA x /2 .

2 p1 (x)Pfeas

(7.29)

We use Eqs. (7.25) and (7.26) to express the expected value of this component for

the average of the best individuals, and write analogously to the calculations for

E[z 1 ]

E z

1

=

(k;)

E z | z 1 = x p1 (x) dx

k=1

= ( )

p1 (y) [1 P1 (y)]1 [P1 (y)]1 I2 (y) dy (7.30)

where

y

I2 (y) =

E z | z 1 = x p1 (x) dx.

1

I2 (y) =

2 Pfeas

y

e(x

2 +A2 )/2

x

dx.

Again using quadratic completion for the argument to the exponential function and

performing a change of variable results in

I2 (y) =

2

eB /2

1 + Ay B .

2 Pfeas 1 +

1

(7.31)

Together with Eq. (7.30), the expression in Eq. (7.31) allows numerically computing

the expected value of z .

204

References

Arnold DV (2002) Noisy optimization with evolution strategies. Kluwer Academic Publishers,

Dordrecht

Arnold DV (2011a) Analysis of a repair mechanism for the (1, )-ES applied to a simple constrained

problem. In: Genetic and evolutionary computation conferenceGECCO 2011. ACM Press, pp

853860

Arnold DV (2011b) On the behaviour of the (1, )-ES for a simple constrained problem. In: Beyer

H-G, Langdon WB (eds) Foundations of genetic algorithmsFOGA 2011. ACM Press, New

York, pp 1524

Arnold DV (2013a) On the behaviour of the (1, )-ES for a conically constrained problem. In:

Genetic and evolutionary computation conferenceGECCO 2013. ACM Press, pp 423430

Arnold DV (2013b) Resampling versus repair in evolution strategies applied to a constrained linear

problem. Evol Comput 21(3):389411

Arnold DV, Beyer H-G (2010) On the behaviour of evolution strategies optimising Cigar functions.

Evol Comput 18(4):661682

Arnold DV, Brauer D (2008) On the behaviour of the (1 + 1)-ES for a simple constrained problem.

In: Rudolph G et al (eds) Parallel problem solving from naturePPSN X. Springer, Berlin, pp

110

Auger A, Hansen N (2006) Reconsidering the progress rate theory for evolution strategies in finite

dimensions. In: Genetic and evolutionary computation conferenceGECCO 2006. ACM Press,

pp 445452

Balakrishnan N, Rao CR (1998) Order statistics: an introduction. In: Balakrishnan N et al (eds)

Handbook of statistics, vol 16. Elsevier, New York, pp 324

Beyer H-G (1989. Ein Evolutionsverfahren zur mathematischen Modellierung stationrer Zustnde

in dynamischen Systemen. PhD thesis, Hochschule fr Architektur und Bauwesen, Weimar

Beyer H-G (2001) The theory of evolution strategies. Springer, Heidelberg

Beyer H-G, Schwefel H-P (2002) Evolution strategiesa comprehensive introduction. Nat Comput

1(1):352

Meyer-Nieberg S, Beyer H-G (2012) The dynamical systems approachprogress measures and

convergence properties. In: Rozenberg G et al (eds) Handbook of natural computing. Springer,

Berlin, pp 741814

Mezura-Montes E, Coello Coello CA (2011) Constraint-handling in nature-inspired numerical optimization: past, present, and future. Swarm Evol Comput 1(4):173194

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132

Ostermeier A, Gawelczyk A, Hansen N (1994) Step-size adaptation based on non-local use of

selection information. In: Davidor Y et al (eds) Parallel problem solving from naturePPSN III.

Springer, Berlin, pp 189198

Oyman AI, Deb K, Beyer H-G (1999) An alternative constraint handling method for evolution

strategies. In: Proceedings of the 1999 IEEE congress on evolutionary computation. IEEE Press,

pp 612619

Rechenberg I (1973) EvolutionsstrategieOptimierung technischer Systeme nach Prinzipien der

biologischen Evolution. Friedrich Frommann Verlag, Stuttgart

Schwefel H-P (1981) Numerical optimization of computer models. Wiley, Chichester

Chapter 8

Regions of a Search Space with a Particle

Swarm Optimizer

Mohammad Reza Bonyadi and Zbigniew Michalewicz

search space may consist of many disjoint regions and the global optimal solution

might be within any of them. Thus, locating these feasible regions (as many as possible, ideally all of them) is of great importance. In this chapter, we introduce niching

techniques that have been studied in connection with multimodal optimization for

locating feasible regions, rather than for finding different local optima. One of the

successful niching techniques was based on the particle swarm optimizer (PSO) with

a specific topology, called nonoverlapping topology, where the swarm was divided

into several nonoverlapping sub-swarms. Earlier studies have shown that PSO with

such nonoverlapping topology, with a small number of particles in each sub-swarm, is

quite effective in locating different local optima if the number of dimensions is small

(up to 8). However, its performance drops rapidly when the number of dimensions

grows. First, a new PSO, called mutation linear PSO, MLPSO, is proposed. This

algorithm is effective in locating different local optima when the number of dimensions grows. MLPSO is applied to optimization problems with up to 50 dimensions,

and its results in locating different local optima are compared with earlier algorithms.

Second, we incorporate a constraint handling technique into MLPSO; this variant

is called EMLPSO. We test different topologies of EMLPSO and evaluate them in

terms of locating feasible regions when they are applied to constraint optimization

problems with up to 30 dimensions. The results of this test show that the new method

M.R. Bonyadi (B) Z. Michalewicz

Optimization and Logistics, School of Computer Science, University of Adelaide,

Adelaide, SA 5005, Australia

e-mail: mrbonyadi@cs.adelaide.edu.au; vardiar@gmail.com

Z. Michalewicz

Institute of Computer Science, Polish Academy of Sciences, ul. Ordona 21,

01-237 Warsaw, Poland

e-mail: zbyszek@cs.adelaide.edu.au

Z. Michalewicz

Polish-Japanese Institute of Information Technology, ul. Koszykowa 86,

02 008 Warsaw, Poland

Z. Michalewicz

Chief of Science at Complexica (www.complexica.com), Adelaide, Australia

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_8

205

206

with nonoverlapping topology with small swarm size in each sub-swarm performs

better in terms of locating different feasible regions in comparison to other topologies,

such as the global best topology and the ring topology.

Keywords Constrained optimization Feasible regions Disjoint feasible regions

Particle swarm optimization

8.1 Introduction

A constrained optimization problem (COP) is formulated as follows:

(a)

y S f (z) f (y)

find z S RD such that gi (z) 0, for i = 1 to q (b)

(8.1)

(i.e. S R), q is the number of inequalities, and m is the number of equalities. The

search space S is defined as a D dimensional rectangle in RD such that lj zj uj ,

j = 1, . . . , D (lj and uj are the lower and upper bounds of the jth variable). The

set of all feasible points which satisfy constraints (b) and (c) are denoted by F

(Michalewicz and Schoenauer 1996). We consider a single objective case in this

chapter.

Usually in a COP, the equalities are replaced by the following inequalities (Takahama and Sakai 2010):

hj (x) ,

for j = 1 to m

(8.2)

104 , the same as in other studies (Liang et al. 2010; Takahama and Sakai 2010).

Accordingly, Eq. 8.1 is rewritten as

y S f (z) f (y)

(a)

(8.3)

find z S RD such that

gi (z) 0, for i = 1 to m + q (b)

where gj (x) = hj (x) for 1 < j m. In this chapter, we refer to Eq. 8.3

whenever we use the term COP.

Each optimization method which deals with COPs generally consists of two main

parts: an optimization algorithm and a constraint handling technique (CHT). The optimization algorithm can be any optimization algorithm such as the particle swarm optimization (PSO) (Kennedy and Eberhart 1995), the genetic algorithm (GA) (Goldberg

1989), the covariance matrix adaptation evolutionary strategy (CMA-ES) (Hansen

2006), the gradient descent algorithms (Gilbert and Nocedal 1992), the conjugate

gradient algorithms (Gilbert and Nocedal 1992), or the linear programming (Dantzig

Fig. 8.1 An example of a

search space. The gray

regions are the feasible

regions. The point is the

global optimum solution

207

Search space

c

a

d

b

1998), among others. The task of the optimization algorithm is to generate new solutions at every iteration. In each optimization algorithm, an operator is needed to

compare candidate solutions thus enabling the optimizer to select one (or more) of

the solutions.1 This comparison operator plays a key role in the performance of the

algorithm in finding better solutions. In unconstrained problems, this comparison

operator is simple, and, for a minimization problem, it is implemented as

x S is better than y S iff f (x) < f (y)

(8.4)

where f (.) : RD R is the objective function and x and y are two samples from

the search space. However, in COPs, in addition to the objective function, there are

constraints that need to be considered in the comparison procedure. There are three

cases for comparing two solutions x and y in a COP:

1. x and y F , i.e. both are feasible

2. x

/ F and y

/ F , i.e. both are infeasible

3. x

/ F and y F , one is feasible the other is infeasible.

If the solutions follow the case (1) then the comparison is easy because it is

made in the same way as in Eq. 8.4 (both solutions are feasible). In cases (2) and

(3) however, this comparison is more complicated. Figure 8.1 provides examples to

show the reason behind the complications within cases (2) and (3).

In Fig. 8.1, both solutions a and b are infeasible. Also, assume that all constraint

values for solution a are smaller than the constraint values for solution b (i.e. gj (a) <

gj (b) for all j). However, solution b is much closer to the optimal solution than

solution a (d is the optimal solution). Thus, if solution b is selected, there is a greater

chance for the algorithm to improve the solution thereby reaching the optimal solution

in the next steps. Clearly, choosing one of a or b is not an easy task because solution

a is better than b in terms of one aspect (the value of constraints), while solution

b is better than a in terms of another aspect (closeness to the optimal solution).

1

Note that this selection can be performed by a direct decision (the better solution is selected) or

by some analysis to find out the potential of the solutions. However, in either approach, the concept

of being better needs to be defined.

208

Also, choosing one of the solutions in case (3) is complicated. As an example, let

us concentrate on solutions b (an infeasible solution) and c (a feasible solution) in

Fig. 8.1. If solution c is selected, it is harder for the optimization algorithm to move

the solutions in the next steps toward the optimal solution, i.e., d. However, if solution

b is selected, although it is infeasible it is easier for the optimization algorithm to

move the solutions in the next steps toward the optimal solution. Clearly, the easiest

case is case (1) as the standard comparison between solutions can be used. However,

there are complications in regard to cases (2) and (3).

The aim of a CHT is to compare two solutions and decide which solution is the

better. Note that such a comparison needs to consider all the three aforementioned

cases. There are several categories of techniques for handling constraints that can

be incorporated in an optimization algorithm (Michalewicz and Schoenauer 1996);

these categories include penalty functions, special operators, repairs, decoders, and

hybrid techniques. In the category of penalty functions, the objective function is

combined with constraints in such a way that the problem is turned into an unconstrained problem. Thus, all solutions are feasible and, hence, comparisons follow

case (1) thereby making the comparison easy. In the category of special operators,

an operator is designed that always maps a feasible solution to a feasible solution.

Note that to use a technique in this category, the initial solutions need to be feasible. Because the solutions are always feasible all comparisons follow case (1), and

hence, comparison is done easily. In the category of repair, each infeasible solution

is repaired and a feasible solution is generated. In this case, two possibilities can be

considered: the original solution is kept in the population and is known as Baldwinian

evolution (Whitley et al. 1994), or it is replaced by the repaired solution known as

Lamarckian evolution (Whitley et al. 1994). In this category, because the solutions

are always feasible (repaired), again all comparisons follow case (1), thereby making the comparisons easier. In the category of decoder-based techniques, mapping

from genotype to phenotype is established such that any genotype is mapped into

a feasible phenotype. In this category, as with the previous categories, all solutions

are feasible, thus making it unnecessary to consider cases (2) and (3). Finally, the

last category, hybrid, includes all possible combinations of CHTs. It seems that all

CHTs try to apply some modification to the solutions (e.g., via repairing, applying

penalty) to get rid of the complications in comparison within cases (2) and (3).

There have been some attempts to design methods to explore the search space

of COPs to find a feasible solution: these methods are called constraint satisfaction methods (Tsang 1993). The acceptance criterion for a constraint satisfaction

method is at least one feasible solution. Normally, this feasible solution, found by

the constraint satisfaction method, is fed into an optimization method as an initial

solution, and the method improves the quality of this solution in terms of objective

value while maintaining feasibility. As feasible regions in COPs might have irregular shapes (e.g., disjoint, with holes, connected with narrow passages, non-convex),

the quality of the final solution, namely the improved solution by the optimization

method, is highly dependent on the location of the initial feasible solution. Figure 8.2

shows some examples of irregular shapes of feasible regions.

209

space with several irregular

shape feasible regions. The

dark and light grey regions

are feasible regions and the

search space respectively

Search space

Feasible passage

it might be difficult for the method to explore the solutions in the feasible region

B. The reason is that regions A and B are disjoint and usually, infeasible solutions

between A and B are considered to be of lower quality than the solutions within

A or B. Hence, as optimization methods normally tend to move solutions closer

to good known solutions, i.e., they are attracted by higher quality solutions, it is

very unlikely that they are successful in moving a solution in region A to region B.

Also, note that A and B might be far from each other and B can be a very small

region, which makes it harder to move a solution in region A to region B. In addition,

even though the regions C and D are connected, if the initial feasible solution is

located in region C, it is hard for the optimization method to move that solution to

region D. The reason for this is that the feasible passage between regions C and D

is very narrow. Hence, it is hard for the optimization method to find that passage to

move the solutions through it toward D. Thus, rather than locating only one feasible

solution, it is better to generate different feasible solutions that are in potentially

disjoint feasible regions.2 From now on, we use the term feasible regions rather than

potentially disjoint feasible regions. In this case, there is an increased probability of

locating feasible regions which contain high quality solutions, as well as of locating

the feasible region that contains the optimal solution, i.e., optimal region. However,

there have not been many attempts to design algorithms capable of locating feasible

regions.

Clearly, there are similarities between locating feasible regions in a COP and the

concept of niching in multi-modal optimization, i.e., locating different, ideally all,

local optima of an objective function (Brits et al. 2002). We use these similarities

to propose a method that is able to locate feasible regions in the search space. The

particle swarm optimization algorithm (PSO) is used in this chapter for optimization

purposes. Some issues related to the niching abilities of the PSO are investigated and

a new PSO (called mutation linear PSO, MLPSO) is proposed, which addresses these

2

The term potentially disjoint feasible regions refers to disjoint feasible regions and the regions

that are connected with narrow passages. Also, note that without information about the topology of

the search space, it is not possible to claim that the found solutions are in disjoint feasible regions.

210

issues. Then, the MLPSO is extended in such a way that it can locate feasible regions

in a COP. To confirm that the proposed method performs effectively in locating

feasible regions, the performance of the method is tested through some test cases

where the locations of their feasible regions are known.

The rest of the chapter is organized as follows: some background on COPs and

CHTs are provided in Sect. 8.2. An overview of the PSO algorithm including variants,

issues, topologies, and niching abilities, is given in Sect. 8.3. The proposed method

for locating feasible regions is proposed and discussed in Sect. 8.4 and it is tested

later in Sect. 8.5. At the end, we conclude the chapter and provide suggestions for

future research directions in Sect. 8.6.

In this section, a brief background is provided on COPs, including the CHT used in

this chapter, and locating feasible regions.

In this subsection, a CHT which has been used in our proposed method is described.

It is called epsilon-level constraint handling (ELCH) (Takahama and Sakai 2010)

which belongs to the penalty functions category. In this technique, the constraint

violation value for solution x is defined as follows:

G (x) =

q

max{0, gi (x)} +

i=1

m

|hi (x)|k

(8.5)

i=1

where k is a constant (in all of the experiments represented in this paper, k = 2).

Each solution x is represented by the pair (f , G) where f is the objective value at x

and G is its constraint violation value. If f1 and f2 are the objective values and G1 and

G2 are constraint violation values of the solution points x1 and x2 , then the level

comparison operator is defined as follows:

x1 x2

f1 f 2

G1 G2

if G1 , G2 or G1 = G2

otherwise

(8.6)

In other words, the -level comparison compares two solutions by constraint violation

value first. If both solutions have a violation value under a small threshold , or they

have the same level of violation, the two solutions are then ranked by the objective

function value only. Otherwise, the constraint violation value is taken into account.

There are some techniques to control the value of (Takahama and Sakai 2005).

211

There have not been many attempts so far to design algorithms that locate feasible

regions. However, designing algorithms for locating feasible regions (ideally all of

them) in COPs is valuable as it reduces the probability of locating feasible regions

with poor quality solutions, in terms of objective value. Several multi-start methods

(e.g. Bonyadi et al. 2013; Jabr 2012; Lasdon and Plummer 2008; Smith et al. 2013)

have been proposed to locate feasible regions in COPs. Normally, these methods start

with a set of random points and improve them to find a feasible point. As an example,

in Lasdon and Plummer (2008), a multi-start nonlinear programming (MSNLP) was

proposed. In this method a set of random points is generated within the search space.

Then, the points are filtered according to two filters, a merit filter and a distance

filter. The merit filter ensures that the quality of the points in terms of constraint and

objective values is higher than a predefined threshold. The point that does not meet

this level of quality is filtered. The distance filter ensures that the generated points

are sufficiently diverse. In fact, a hyper-sphere neighbor of the points is evaluated to

find if two points are close to each other. Accordingly, some of the points are filtered.

An algorithm based on Constraint Consensus (CC) was proposed to identify areas

that may contain a feasible region (Smith et al. 2013). In this method, a certain number

of points are generated randomly in the search space. Then for each point, by using

the gradient of the violated constraints, a vector is generated which moves that point

to a new location. It is expected that the new location is closer to one of the feasible

regions. After moving all points, a clustering method is used to group the points

based on their distances from each other. At the end, the best point in each cluster, in

terms of its objective value if the point is feasible or in terms of its constraint violation

value if the point is not feasible, is selected as the representative of a feasible region.

A multi-start genetic algorithm with a local search was proposed to locate feasible

regions in the search space (Jabr 2012). In this method, a GA was run to generate

solutions which are in a predefined threshold of constraint violation value, defined by

the weighted sum of the value of all constraint corresponding to each solution. The

results from GA were then improved by a local search method in terms of objective

value. With the aim of generating different feasible solutions, GA method was run

several times, each time with a new seed, crossover and mutation rate.

A multi-start PSO was proposed by the authors of this chapter (Bonyadi et al.

2013). In that paper, a PSO was proposed that used ELCH to handle the constraints.

Also, a method based on the covariance matrix adaptation evolutionary strategy

(CMA-ES) was proposed, which used the same technique to handle the constraints.

Experiments showed that PSO has better performance in finding feasible solutions

while CMA-ES performs better in optimizing the objective value. Thus, a hybrid

method was proposed which runs PSO to find the first feasible solution and then that

solution was improved by CMA-ES. To prevent PSO from finding a poor-quality

feasible region, a multi-start strategy was proposed in which several instances of

PSO were run to generate different feasible solutions. Then the best among those

solutions were fed into CMA-ES for further improvement.

212

In this section, some background on PSO including variants, known issues, different

topologies, niching abilities, and abilities in dealing with COPs is given.

The Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995) algorithm is

a population-based algorithm, referred to as swarm, of n > 1 particles; each particle

is defined by three D-dimensional vectors:

Position (xti )is the position of the ith particle in the tth iteration. This is used to

evaluate the particles quality.

Velocity (vti )direction and length of movement of the ith particle in the tth iteration.

Personal best (pit )is the best position3 that the ith particle has visited in its

lifetime (up to the tth iteration). This vector serves as a memory for keeping

knowledge of quality solutions (Kennedy and Eberhart 1995).

All of these vectors are updated at every iteration t for each particle (i):

i

= xti , vti , Nti , for i = 1, . . . , n

vt+1

(8.7)

i

i

, for i = 1, . . . , n

xt+1

= xti , vt+1

(8.8)

pit f ptt f xt+1

, for i = 1, . . . , n

= i

xt+1

otherwise

pit+1

(8.9)

In Eq. 8.7, Nti (known as the neighbor set of the particle i) is a subset of personal

best positions of some particles which contribute

updating rule of

to the velocity

that particle at iteration t, i.e. Nti = pkt k Tti {1, 2, . . . , n} where Tti is a

set of indices of particles which contribute to the velocity updating for particle i

at iteration t. Clearly, the strategy of determining Tti might be different for various

types of PSO algorithms and it is usually referred to as the topology of the swarm.

Many different topologies have been defined so far (Kennedy and Mendes 2002),

e.g., the global best topology (gbest), the ring topology, the nonoverlapping, and

the pyramid, that are discussed later in this paper. The function (.) calculates

the new velocity vector for the particle i according to its current position, current

velocity vti , and neighborhood set Nti . In Eq. 8.8, (.) is a function that calculates

3

In general, personal best can be a set of best positions, but all PSO types listed in this paper use

single personal best.

213

the particle

i according to its previous position and its new

i

i

= xti + vt+1

is accepted for updating the position of

velocity. Usually xti , vt+1

particle i. In Eq. 8.9, the new personal best position for the ith particle is updated

according to the objective values of its previous personal best position and the current

position. In the rest of this paper, these usual forms for the position updating rule

(Eq. 8.8) and for updating the personal best (Eq. 8.9) are assumed. In PSO, three

updating rules (Eqs. 8.7, 8.8, and 8.9) are applied to all particles iteratively until a

predefined termination criterion, e.g., the maximum number of iterations, is met.

In the original version of PSO (Kennedy and Eberhart 1995), the function (.) in

Eq. 8.7 is defined as

i

i

i

pit xti + 2 R2t

gt xti

= vti + 1 R1t

vt+1

(8.10)

In this equation, 1 and 2 are two real numbers called acceleration coefficients,4

and pit and gt are the personal best (of particle i) and the global best vector, respectively, at iteration t. Also, the role of the vectors PI = pit xti (Personal Influence)

and SI = gt xti (Social Influence) is to attract the particles to move toward known

quality solutions, i.e., personal and global best. Moreover, R1t and R2t are two d d

diagonal matrices,5 where their elements are random numbers distributed uniformly

(U (0, 1)) in [0, 1]. Note that matrices R1t and R2t are generated at each iteration

for each particle separately.

In 1998, Shi and Eberhart (1998) introduced a new coefficient , known as inertia

weight, to control the influence of the last velocity value on the updated velocity.

Indeed, Eq. 8.10 was written as

i

i

i

pit xti + 2 R2t

gt xti

= vti + 1 R1t

vt+1

(8.11)

The coefficient controls the influence of the previous velocity on movement. The

iterative application of Eq. 8.11 (plus position updating) causes the particles to oscillate around personal and global best vectors (Clerc and Kennedy 2002). This oscillation is controlled by three parameters , 1 , and 2 so that the larger is, with

respect to 1 and 2 , the more explorative the particles are, and vice versa. In this

chapter, this variant is known as the standard PSO. In the standard PSO, if the random matrices are replaced by random values, the new variant is called the linear PSO

(LPSO).

There are several well-studied issues in the standard PSO, such as stagnation

(Bergh and Engelbrecht 2002, 2010), line search (Spears et al. 2010; Wilke et al.

2007a), swarm size (Bergh and Engelbrecht 2002, 2010), local convergence (Bergh

4

These two coefficients control the effect of personal and global best vectors on the movement

of particles and they play an important role in the convergence of the algorithm. They are usually

determined by a practitioner or by the dynamic of particles movement.

5 Alternatively, these two random matrices are often considered as two random vectors. In this case,

the multiplication of these random vectors by PI and SI is element-wise.

214

and Engelbrecht 2010), and rotation variance (Spears et al. 2010; Wilke et al. 2007b).

Apart from these issues within PSO, there have been some attempts to extend the

algorithm to work with COPs (Liang et al. 2010; Paquet and Engelbrecht 2007;

Takahama and Sakai 2005), to support niching6 (Brits et al. 2002, 2007; Engelbrecht

et al. 2005; Li 2010), to work effectively with large-scale problems (Helwig and

Wanka 2007), and to work in nonstationary environments (Wang and Yang 2010).

One of the issues in the standard PSO was as follows: if the acceleration coefficients

and inertia weight in the algorithm are set to inappropriate values, the velocity vector

might grow to infinity; or, in other words, there might be a swarm explosion. A swarm

explosion results in moving particles to infinity, which is not desirable (Clerc and

Kennedy 2002). One of the early solutions for this issue was to restrict the value of

each dimension of the velocity in a particular interval [Vmax , Vmax ], where Vmax

can be considered as the maximum value of the lower bound and upper bound of

the search space (Helwig and Wanka 2007); this is known as the nearest strategy.

Also, there are some other strategies to restrict the velocity in such a way that the

swarm explosion is prevented, e.g., the nearest with turbulence, random. However,

none of these strategies is comprehensive enough to prevent the swarm explosion

effectively in all situations (see Helwig and Wanka (2007) for details). Thus, many

researchers theoretically analyzed the behavior of the particles to find the reasons

behind the swarm explosion from different points of view (Clerc and Kennedy 2002;

Trelea 2003; Bergh and Engelbrecht 2006). The aim of these analyses was to define

criteria for the acceleration coefficients such that particles converge to a point in the

search space. One of the earliest attempts of this sort was made in Clerc and Kennedy

(2002) where a constriction coefficient PSO (CCPSO) was proposed. The authors

revised the velocity updating rule to:

i

= vti + c1 R1t pit xti + c2 R2t gt xti

vt+1

=

2k

/

2c c2 4c

(8.12)

(8.13)

is called the constriction factor and it is proposed to set its value by Eq. 8.13.

Also, c = c1 + c2 > 4. Note that this notation is algebraically equivalent to that in

Eq. 8.11. The authors proved that if these conditions hold for the constriction factor,

particles converge to a stable point and the velocity vector does not grow to infinity.

The values of c1 and c2 are often set to 2.05 and the value of k is in the interval

Niching is the ability of the algorithm to locate different optima rather than only one local optima.

The niching concept is used usually in the multi-modal optimization.

215

[0, 1] (usually set to 1). Note that with these settings, the value of is in the interval

[0, 1]. This analysis was also done from other perspectives by Trelea (2003), Bergh

and Engelbrecht (2006).

Although the constriction coefficient guarantees converging the particles to a

point (a convergent sequence), there is no guarantee that this final point is a quality

point in the search space (Bergh and Engelbrecht 2006). In Bergh and Engelbrecht

(2010), it has been proven that for any c1 and c2 that satisfy converging conditions,

all particles collapse to the global best gt , i.e. limt xti = pit = gt for all particles.

Also, if gt = pit = xti for all particles, the velocity vector shrinks very fast. In this

situation, i.e., gt = pit = xti for all particles and at the same time vti = 0, all particles

stop moving and no improvement can take place as all components for moving the

particles are zero. This issue is known as stagnation, and was first introduced as a

defect in the standard PSO (Bergh and Engelbrecht 2002) and further investigated

by Bergh and Engelbrecht (2010). This issue exists in both LPSO and CCPSO. A

variant of PSO was proposed (called Guaranteed Converging PSO, GCPSO) which

addressed the stagnation issue. The only difference between GCPSO and CCPSO

was in updating the velocity of the global best particle (the particle that its personal

best is the current global best of the swarm).

i

=

vt+1

x

ti + gt +

vti +

vti + c1 Rti pit xti + c2 Rti gt xti

if i = t

otherwise

(8.14)

where t is the index of the particle which its personal best is the global best of the

swarm, i.e., (gt = pt t ), and is a randomly generated through and adaptive approach

(Bergh and Engelbrecht 2010). Note that, according to this formulation, stagnation

might still happen for all particles except for the global best particle. Hence, if the

global best particle is improved, gt is improved, which causes the other particles to

get out of the stagnation situation. See Bonyadi and Michalewicz (2014) for more

information.

to LPSO

that is

exclusive

is called line search (Wilke et al. 2007a);

Another issue

if pit xti || gt xti and vti || pit xti , the particle i starts oscillating between its

personal best and the global best (line search) forever. In this case, only the solutions

that are on this line are sampled by the particle i and other locations in the search space

are not examined anymore. Wilke showed that this is not the case in the standard

PSO (Wilke et al. 2007a); however, there are some situations where the particles

in the standard PSO start oscillating along one of the dimensions while there is no

chance for them to get out of this situation (Bonyadi 2014; Spears et al. 2010; Bergh

and Engelbrecht 2010). Note that GCPSO does not have this issue.

Stagnation happens with a higher probability when the swarm size is small (Bergh

and Engelbrecht 2002); this is called the swarm size issue throughout this chapter.

In Bergh and Engelbrecht (2002), the authors argued that PSO is not effective when

its swarm size is small (2 for example), and particles stop moving in the earlier

stage of the optimization process. To address this issue, a new velocity updating rule

was proposed that was only applied to the global best particle to prevent it from

becoming zero. Consequently, the global best particle never stops moving which

216

solves the stagnation issue and, as a result, the swarm size issue is addressed as well.

Experiments confirmed that, especially in single modal optimization problems, the

new algorithm is significantly better than the standard version when the swarm size

is small (with 2 particles). Note that, in LPSO, apart from the stagnation issue, the

line search issue is reason why the algorithm becomes ineffective when the swarm

size is small.

There are many different topologies that have been introduced so far for PSO

(Kennedy and Mendes 2002). One of the well-known topologies is called gbest

topology. In this topology, the set Tti contains all particles in the swarm, i.e.,

Tti = {1, 2, . . . , n}. As an example, the standard PSO uses this topology as

used for the velocity updating rule and gt = pt t where

in each iteration,

gt

is

l

t = argminlT i F pt . It has been shown that when this topology is used, the

t

algorithm converges rapidly to a point (Kennedy and Mendes 2002). The reason

behind this rapid convergence is that all particles are connected7 to each other, and

hence, they all tend to converge to the best ever found solution.

Another well-known topology is called the ring topology, where the set Tti contains

{i, i 1, i + 1} (it is assumed that the particles are in a fixed order during the run).

In fact, each particle is connected to two other particles that are the previous and the

next particles. Also, if i + 1 was larger than n (swarm size), it is replaced by 1, and if

i 1 < 1, it is replaced by n. The velocity updating rule for this topology is written

as

i

vt+1

(8.15)

= vti + c1 R1t pit xti + c2 R2t lb it xti

where lb it is the best ever found solution by the particles i, i 1, and i + 1, i.e.

i

i

lb t = pt t where ti = argminlT i F plt . It has been shown that if the algorithm

t

uses the ring topology, it requires more iteration for exploration in comparison to the

gbest topology, thereby resulting in better explorative behavior.

Another topology that is used in this chapter is called nonoverlapping topology.

In this topology, the particles in the swarm are divided into several sets (called

i

sub-swarms)

that are independent of each other. In fact, if we define the set st =

{i} Tti , in any nonoverlapping topology, there exists at least one particle i that

j

for all j as a member of {1, 2, . . . , n} sti , the intersection

of sti and st is empty, i.e.

j

i {1, 2, . . . n} j {1, 2, . . . , n} sti

sti st = . Note that, in this case,

the gbest topology is a special case of nonoverlapping topology because for all i, the

j

{1, 2, . . . , n} sti is empty and, consequently, st is also empty. This means that

set

j

sti st = for any j {1, 2, . . . , n} sti . If the size of Tti is the same for all i,

7

A particle i is connected to particle j if it is aware of the personal best location of the particle j.

217

we show the topology by the notation nvl where l is the size of each sub-swarm.

Thus, the gbest topology can be indicated by nvn.

There are other topologies (e.g., pyramid) and it is hard to review all of them.

Our review has been limited to the topologies that are used in the rest of the chapter.

For further information about topologies, the readers are referred to Kennedy and

Mendes (2002).

Niching is a concept that has been introduced in multi-modal optimization. Niching

in multi-modal optimization refers to locating several (ideally all) optima (including

local and global optima) of a function. An optimization algorithm is said to support

niching if it is able to locate different optima in the search space rather than finding

only one (Li 2010). There have been many attempts to adopt the PSO to support

niching (Brits et al. 2007; Engelbrecht et al. 2005; Li 2010). As an example, in

Engelbrecht et al. (2005), the authors analyzed the performance of the PSO when the

gbest or ring topology is taken into account. In the gbest topology, results showed

that only one optimum is located at each run of the algorithm. This was expected

as all particles converge to (the convergence sequence, see Sect. 8.3.2), which is not

desirable for niching. In addition the capabilities of ring topology were investigated

experimentally so as to understand whether ring topology can satisfy niching aims.

Experiments with some standard functions led the authors to conclude that ring

topology is not an appropriate candidate for niching as well.

Niching is a concept that has been introduced in multi-modal optimization. Niching in multi-modal optimization refers to locating several (ideally all) optima (including local and global optima) of a function. An optimization algorithm is said to support niching if it is able to locate different optima in the search space rather than

finding only one (Li 2010). There have been many attempts to adopt the PSO to

support niching (Brits et al. 2007; Engelbrecht et al. 2005; Li 2010). As an example,

in Engelbrecht et al. (2005), the authors analyzed the performance of the PSO when

the gbest or ring topology is taken into account. In the gbest topology, results showed

that only one optimum is located at each run of the algorithm. This was expected

as all particles converge to gt (the convergence sequence, see Sect. 8.3.2), which is

not desirable for niching. In addition the capabilities of ring topology were investigated experimentally so as to understand whether ring topology can satisfy niching

aims. Experiments with some standard functions led the authors to conclude that ring

topology is not an appropriate candidate for niching as well.

A multi-swarm approach called NichePSO (Brits et al. 2007) was proposed in

which multiple sub-swarms were run to locate different local optima. Sub-swarms

could merge or exchange particles with one another. Also, in the NichePSO, whenever

the improvement in a particles fitness over some number of iterations (a parameter)

was small, a sub-swarm was created within that particles neighbor to assist that

particle in improving the solution.

218

niching (Li 2010). The author found that a CCPSO algorithm which uses the ring

topology can operate as a niching algorithm because of the particles personal bests.

In fact, the personal best of each particle forms a stable network retaining the best

positions found so far, while these particles explore the search space more broadly

by changing their position. Also, it was concluded that by using a reasonably large

population, CCPSO algorithm which uses the ring topology is able to locate dominant

niches (optima) across the search space. This means that particles locate niches that

are fairly similar in terms of their objective value. However, if the aim of the algorithm

is to locate the local optima that are less dominant, a nonoverlapping topology is a

good candidate. Results showed that a nonoverlapping topology with 2 or 3 particles

(i.e., nv2 or nv3) in each sub-swarm is significantly better than other topologies when

the number of dimensions is small (up to 8 dimensions). Although the performance

of these topologies is good with a small number of dimensions, their performance

was impaired much faster than other topologies in locating optima as the number of

dimensions grew. In fact, based on experiments, nv2 and nv3 were the worst among

other tested methods when the number of dimensions was larger than 8.

In this chapter, a PSO method is proposed, which is able to locate feasible regions

in COPs. The niching concept in multimodal optimization is adopted for locating

feasible regions in COPs. The proposed approach has two main parts:

1. The issues of PSO with nonoverlapping topology in niching are investigated in

detail. A new PSO (called mutation linear PSO, MLPSO) is proposed, which

addresses the issues of the nonoverlapping topology in niching (see Sect. 8.3).

2. A new PSO based on MLPSO (called EMLPSO) is proposed, which can locate

feasible regions.

As discussed earlier, CCPSO with nonoverlapping topology with a small number

of particles in each sub-swarm is highly effective for niching purposes (locating

different optima in the search space), when the number of dimensions is small.

However, it rapidly becomes ineffective as the number of dimensions grows. On the

other hand, it has been shown that most PSO algorithms, including CCPSO, with

small population size are not effective for optimization, because of stagnation and

line search issues (recall that this issue was known as swarm size issue, see Sect. 8.3).

Thus, it is natural to claim that if the swarm size issue is addressed, the nonoverlapping

topology with small sub-swarms becomes effective for niching purposes even if the

219

to the velocity updating rule of LPSO (the new algorithm is called MLPSO) and

can address stagnation and line search issues. As these two issues are the reasons

behind the swarm size issue, we expect that MLPSO does not suffer from the swarm

size issue. The ability of MLSPO with small swarm size is examined through some

experiments. These experiments confirm that MLPSO is more effective than other

types of PSO when the swarm size is small. Then, in order to confirm that MLSPO

is effective in niching using nonoverlapping topology in higher dimensions, we test

the algorithm with this topology and compare its results with CCPSO with the same

topology defined in Li (2010).8

Consider an arbitrary vector d that connects the center of the coordinates to the point

d in the D-dimensional space. The proposed mutation operator is as follows:

d = m (d, c, )

(8.16)

where d
is a vector that connects the center of the coordinates to the point d, m is the

mutation operator, c and are two constants. Obviously, for every vector d, there

are two elements that the operator m should mutate: direction and magnitude. One

can consider two different ideas to design m: (1) it rotates d by a random rotation

matrix to perturb its direction and multiplies that to a random number to perturb its

magnitude, and (2) it adds a normal distribution to the vector, which mutates both the

length and direction. In the first design (rotating and then mutating the magnitude),

we can write

(8.17)

d
= m (d) = d

where is a rotation matrix and is a random scalar value. There are several ways

to design such as a Euclidean rotation equation (Ricardo and Prez-aguila 2004)

or an exponential map (Wilke et al. 2007b). However, both methods are in O D2

in terms of time complexity (see also Bonyadi (2014)).

The second design of the operator m can be written as

d
= m (d) = d + N (0, )

8

(8.18)

Note that the GCPSO is another variant of PSO (introduced in Sect. 8.3) that does not have the

swarm size issue. However, it is not a good choice for niching using the nonoverlapping topology.

The reason is that, in GCPSO, the only particle which is able to move after stagnation is the global

best particle. All other particles stay unchanged until this particle is improved. As the global best

particle is only in one of the sub-swarms (the sub-swarms do not overlap with each other), this

particle cannot share its information (personal best) with particles in the other sub-swarms. Thus,

all other sub-swarms stay in the stagnation situation and only one of the sub-swarms may continue

searching. This leads to ineffective niching behavior, as only one of the sub-swarms converges to a

local optimum.

220

The larger the is, the more probably it can generate d farther from d (see also

Bonyadi and Michalewicz (2014)). As this calculation only needs the addition of

two D dimensional vectors, it is done in O(D) of time complexity. It is clear that

the second approach needs considerably less calculation. Thus, we use this design

(Eq. 8.18) for the mutation operator m.

In this chapter, the value of is calculated using the following equation:

for all j {1, . . . , D} j =

c ||N (0, )||

c ||d||

if 0 ||d|| <

otherwise

(8.19)

where ||.|| is the norm operator and c is a constant, is a small real number, is a

vector in which the value of all dimensions is equal to , N is the normal distribution.

If the length of the vector d is small, a random vector (N (0, )) is generated and

used for the calculations instead. The mutation operator that uses Eqs. 8.18 and 8.19

is shown by m (d, c, ).

In this subsection, we propose a new variant of the linear PSO, which addresses

stagnation and line search issues. Also, we experimentally show that the proposed

algorithm addresses the swarm size issue as well.

As discussed earlier, the appropriate setting of constriction coefficients guarantees

convergence of the particles to a solution in the search space, but not necessarily to

a quality solution. This results in stagnation in the algorithm, i.e., all particles stop

moving while the quality of the found solution is not satisfactory. In this chapter, it

is proposed to use the introduced vector mutation to guarantee that particles do not

stop moving (this variant is called the mutation linear PSO, MLPSO). In fact, the

velocity updating rule of LPSO is revised as follows:

i

= m vti + c1 rti pit xti + c2 rti gt xti , ti , ti

vt+1

(8.20)

The parameters , c1 , and c2 are exactly the same as the ones in CCPSO, while r1t

and r2t are two random values rather than random matrices. Note that in this variant

of LPSO, we have used CCPSO model (defined in Eq. 8.12); however, any other type

of PSO can be used instead. If the values of ti and ti are guaranteed to be nonzero,

i

is always nonzero (these parameters are investigated later in this subsection).

vt+1

Thus, the stagnation issue is addressed, i.e., there is no stagnation

Also, as

anymore.

i , the condition vi || pi xi is violated,

the mutation m changes the direction of vt+1

t

t

t

which implies that the line search issue does not exist in this variant of LPSO. We

propose an adaptive approach to set the value of ti , which has been inspired by Bergh

and Engelbrecht (2002, 2010) with some modifications. In this adaptive approach,

the value of ti for a particle i at the time t is calculated by:

i

t+1

i

2

t i

0.5t

=

2 i

it

t

if fmin < fti < fmax and vti < ti

if fti > fmax and ti < max and mod (t, q) = 0

otherwise

221

(8.21)

where sti (fti ) is the number of successive iterations at the current iteration t that the

personal best of the particle i has been (has not been) improved by at least impmin

percent; this value was set to 105 in all experiments. At each iteration, if the personal

best of the particle i was improved, sti is increased by one and fti is set to 0 and if

it was not improved, fti is increased by one and sti is set to 0. If sti was larger than

the constant s (set to 10 in all experiments), the value of ti is multiplied by 2. This

multiplication, which grows the value of ti , takes place to give the algorithm the

opportunity to sample further locations and improve faster. Also, if fti was larger

than fmin and smaller than fmax , the value of ti is reduced to enable the algorithm

to conduct local search around current solutions and improve them. However, if the

value of fti was even larger than fmax , the strategy of controlling ti is reversed and ti

starts to grow. The idea behind this is that if the current solution is not improved for

a large number of successive iterations, the exploitation has been done and no better

solutions can be found in the current region. Thus, it is better to start jumping out

from the current local optima to improve the probability of finding better solutions.

According to Eq. 8.21, the value of ti is increased by a low rate (every q iterations)

in this situation (when fti is very large) to prevent the algorithm from jumping with

big steps. The values of max and min are set to 1 and 1e10 , respectively. Also, the

values of fmin and s are set to 10 as it was proposed in Bergh and Engelbrecht (2010),

fmax and q are set to 200 and 50, and 0i is set to 1 for all particles. We propose to

set the value of ti to D1z where z is a constant real value. Our experiments show that

z = 1.5 has acceptable performance in a wide range of optimization problems. Thus,

we use it = D11.5 in all experiments.

As was mentioned earlier, stagnation and line search are the main reasons behind

the swarm size issue in PSO. As the stagnation and line search issues have been

solved in MLPSO, it is very likely that the swarm size issue has been addressed. To

test if the swarm size issue has been solved, we apply MLPSO, LPSO, and CCPSO to

some standard benchmark functions (taken from CEC2005 (Suganthan et al. 2005))

when both algorithms use 2 particles (n = 2). Table 8.1 shows the results.

Each algorithm was run 20 times for 1000D function evaluations (FE) for D = 10

and D = 30. The results have been compared based on the averages over 20 runs and

the Wilcoxon test (Wilcoxon 1945) (with a significance level of p = 0.05), which

is used to measure the significance of the differences. It is obvious from the table

that the proposed MLPSO has a significantly better performance in 8 cases out of all

10 in comparison with LPSO and CCPSO when the swarm size is small (n = 2) for

the 10-dimensional cases. Also, it is worse than CCPSO in only 2 cases, although the

worst performance is not significant based on the Wilcoxon test. Also, MLPSO was

significantly better than LPSO in all cases when D = 10. When D = 30, MLPSO is

222

Table 8.1 Comparison results between MLPSO and LPSO with small swarm size (n = 2)

Dimension 10

30

Algorithm

F1

F2

F3

F4

F5

F6

F7

F8

F9

F10

MLPSO

450LC

450LC

362588.8LC

59091.76L

6682.806LC

1492.037LC

172.326LC

119.746LC

244.852LC

167.377L

LPSO

30240.78

39143.76

1.54E+09

47408.93

26006.44

2.91E+10

1525.607

119.301

186.876

109.014

CCPSO

12259.54

14065.74

1.52E+08

22805.53

17362.1

6.7E+09

369.1611

119.553

233.761

193.809

MLPSO

450LC

445.696LC

4347140LC

622474.1LC

21284.27LC

2453.11LC

179.919LC

119.756LC

9.17035LC

442.103LC

LPSO

136525

717133.8

4.2E+09

682395.7

59100.9

1.55E+11

5673.608

118.796

249.8911

720.9524

CCPSO

87020.26

139933.9

1.86E+09

205955.2

39995.19

1.01E+11

4008.997

118.979

134.7559

465.4687

significantly better than CCPSO and LPSO in all cases. These results confirm that

the proposed method works better than LPSO and CCPSO when the swarm size is

small.

It has been shown that the nonoverlapping topology (in CCPSO) with 2 or 3 particles

in each sub-swarm shows good potential to locate different local optima (Li 2010).

However, it becomes very ineffective when the number of dimensions grows above 8.

We claim that the issue actually stemmed from the swarm size issue. As we have

addressed the swarm size issue in MLPSO, we expect to see that the algorithm with

the nonoverlapping topology with small number of particles in each sub-swarm is

more effective in locating different local optima. In the following experiment, we test

the ability of MLPSO to locate different local optima when it uses the nonoverlapping

topology with a small number of particles in each sub-swarm. We designed a test

function for this purpose (called six circles) as follows:

f (x) = min (C1 , C2 , C3 , C4 , C5 , C6 )

(8.22)

2

1)2 0.25, C3 = D

where C1 = D

(x 1.5)2 1, C2 = D

i=1

i=1 (xi +

i=1 (xi + 3)

D i

D

2

2

5

5

0.0625, C4 = i=1 (xi + 2) + 10 , and C5 = i=1 (xi 3.5) + 10 , C6 =

D

2

5

i=1 (2xi ) + 10 . The objective function (f (x) versus x) has been shown in

Fig. 8.3 in one/two dimensional case.

It is clear that the function has six optima (at x = 3, x = 2, x = 1, x = 0,

x = 1.5, and x = 3.5). We apply MLPSO and CCPSO to the six circles function

with two different topologies: nv2 and nv4. In this test we set the maximum number

of FEs to 3000D and D = {2, 5, 10, 15, 20, 25, 30, 40, and 50}. After each run,

g(x)

(a)

223

(b) 5

50

0.8

45

0.6

40

0.4

35

0.2

30

25

-0.2

-1

20

-0.4

-2

15

-0.6

-3

10

-0.8

-4

-1

-4

-5

-3

-2

-1

-5

Fig. 8.3 The six circles function in a one dimensional, b two dimensional spaces

we evaluated the personal bests of all particles to find how close they are to the

different local optima of the objective function. We consider a personal best of a

particle i (pit ) has located a local optimum if the mean square of error over all

dimensions of pit form that local optimum is less than 0.05. We set n = 20 for this

test. Figure 8.4 shows the average results over 20 runs.

The performance of MLPSO is inferior to CCPSO in both topologies when the

number of dimensions is small (two-dimensional problems). The reason is that when

MLPSO is used, most of the sub-swarms converge to the global optimum of the six

circles function (x = 1.5 in all dimensions) and, hence, the number of located local

optima drops. However, when the number of dimensions grows, MLPSO with both

topologies outperforms CCPSO in terms of the found number of local optima. Also,

the nv2 topology performs more effectively (in terms of locating local optima) than

the nv4 topology in MLPSO. The reason behind this phenomenon is that we have

MLPSO (n-v2)

CCPSO (n-v2)

MLPSO (n-v4)

CCPSO (n-v4)

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

10

15

20

25

30

40

50

Fig. 8.4 Comparison results of applying MLPSO and CCPSO to six circle function with nv2 and

nv4 topologies. The x axis is the number of dimensions and y axis is the average number of found

local optima

224

used 20 particles in all cases. Thus, the number of sub-swarms in the nv2 is greater

than the number of sub-swarms in the nv4. Hence, the number of located local optima

is less when the nv4 is used. In addition, the performance of MLPSO does not drop

when the number of dimensions grows.

Results presented in Fig. 8.4 confirm that MLPSO performs better than CCPSO

in locating different local optima. Note that this result was expected as MLPSO

outperforms CCPSO with small swarm size, hence, MLPSO with small sub-swarms

should outperform CCPSO with small sub-swarms. Also, the performance of MLPSO

does not drop when the number of dimensions grows.

In this section we extend MLPSO to locate disjoint feasible regions. We incorporate

a modified version of ELCH (called MELCH) technique into MLPSO to enable

the method to handle constraints (this method is called EMLPSO). This method

(EMLPSO) is used to locate feasible regions in the search space. Also, the effect of

topology in this variant for locating feasible regions is tested through some experiments.

8.4.2.1 EMLPSO

In ELCH, the equality and inequality constraints were combined and a function called

constraint violation function appeared. Also, a level of desired constraint violation

(called ) was considered as the level of feasibility. The value of was reduced

linearly to zero during the optimization process. ELCH is modified by considering

this fact that equalities can be replaced by inequalities (Eq. 8.2). Hence, in ELCH,

we can modify the constraint violation function as follows:

G(x) =

m

max{0, gi (x)}k

(8.23)

i=1

where gi (x) for i = 1, . . . , q is the same as Eq. 8.1, while gi (x) is defined as

gi (x) = |hi (x)| for i = q + 1, . . . , m. Note that in this case, x is a feasible

solution if G (x) = 0. ELCH technique that uses Eq. 8.23 is called MELCH throughout this chapter. We incorporate MELCH technique into MLPSO algorithm (this is

called EMLPSO) to enable the algorithm to deal with constraints. Also, as MELCH

combines all constraints into one function; locating different local optima of this

function corresponds to locating disjoint feasible regions. Note that G(x) = 0 is

essential to count x as a local optima, as G(x) > 0 does not correspond to a feasible solution, which is not desirable. We test the ability of EMLPSO with different

topologies to locate disjoint feasible solutions in the next subsection.

225

In order to test the ability of EMLPSO with different topologies to locate feasible

regions in the search space, we designed a test function as follows:

f (x) =

D

i=1

(8.24)

where the definition of C1 to C6 is the same as that mentioned in Eq. 8.22. It is clear

that the function has three disjoint feasible regions (x = 1, x = 1, and x = 3) in

which g (x) 0 (feasible regions). However, there are three trap regions (x = 2,

x = 0, and x = 2) where values of g (x) reduce rapidly to 105 . Because the value

of g (x) at these points is larger than 0, these solutions are not feasible (see Fig. 8.5).

We test the ability of EMLPSO with different topologies (gbest, ring, and nonoverlapping) to deal with this function. For the nonoverlapping topology, we test the algorithm with nv6, nv4, nv3, and nv2, i.e., 6, 4, 3, and 2 particles in each sub-swarm. In

this test we set the maximum number of function evaluations (FE) to 3000D/n and

D = 10 and D = 30. Also, we set n = 12 to ensure that the swarm size is divisible

by 2, 3, 4, and 6. Table 8.2 shows the average of the results over 100 runs. The row

satisfaction is the percentage of the runs where a feasible solution was found (e.g.,

EMLPSO with ring topology has found a feasible solution in 76 % of all runs). The

row No. of feasible regions (Avr) is the average number of feasible regions that was

located by the personal bests of the particles in the swarm on average over all runs

(e.g. EMLPSO with ring topology found 1.18 over all three existing feasible regions

on average). The row locating optimal region (%) indicates the percentage of the

runs where the algorithm has found a feasible solution in the optimal region (in this

example, the region around x = 1.5). Comparing the results, it is clear that EMLPSO

with nonoverlapping topology with 2 particles in each sub-swarm (nv2) has the best

performance in satisfying the constraints (100 %), locating different feasible regions

(a)

5

160

(b) 5

80

4

140

70

120

60

100

50

80

-1

60

-2

40

-1

30

-2

40

-3

-4

20

-4

-5

-5

-5

-5

20

-3

10

0

Fig. 8.5 The contour of the function introduced in Eq. 8.24, a the objective values, and b objective

values in the feasible space

226

Table 8.2 Comparison of different topologies in EMLPSO for solving COP defined in Eq. 8.24

where D = 10 and D = 30

D

Topology

Gbest

Ring

Nonoverlapping

nv6

nv4

nv3

nv2

10

30

Satisfaction (%)

No. of feasible regions (Avr)

Locating optimal region (%)

Satisfaction (%)

No. of feasible regions (Avr)

Locating optimal region (%)

58

1

23

61

1

24

76

1.17

26

77

1.18

27

78

1.27

28

77

1.26

31

95

1.4

41

88

1.48

42

96

1.65

53

98

1.6

50

100

2.06

58

100

2.14

73

(2.06 feasible regions in average over all 3 existing regions), and finding the optimal

region (58 % of runs). Note that the last two measures (average of feasible solutions and percentage of locating optimal solution) are interrelated since the ability of

the methods to find feasible regions improves the probability of finding the optimal

region. It is also clear that the results in 30-dimensional space confirm the results

of 10-dimensional space. Thus there is a better performance in locating different

feasible regions when there are several small sub-swarms and a better performance

in improving the final solutions when there are few large sub-swarms.

We compare EMLPSO, CCPSO, and CC methods in locating disjoint feasible

regions. The test problems that were introduced in Smith et al. (2013) are used

for this comparison. The specifications (i.e., equation, boundaries, and number of

disjoint feasible regions) of these problems are reported in Table 8.3. EMLPSO, CC,

and CCPSO were applied (CCPSO was combined with MELCH to be able to handle

Table 8.3 The test functions used for the next experiments

Functions Equation

Boundaries

2

5.1x 2

g1 (x) = x2 4 21 + 5x1 6 +

10

10 8

cos (x1 ) + 9

12

g2 (x) = x2 + x11.2

2

2

Rastrigin1 g1 (x) = x1 + x2 + 20

20 (cos (2 x1 ) + cos (2 x2 ))

g2 (x) = x2 x13

Schwefel1 g1 (x) = x1 sin |x1 | +

x2 sin |x2 | + 125

1 2

g2 (x) = x2 16

x1 + 150

Branin1

5 x1 10,

regions

3

0 x2 15

5 x1 5,

5 x2 5

150 x1 150,

150 x2 150

36

227

Table 8.4 Results of applying EMLPSO, CCPSO, and CC to three 2-dimensional COPs to locate

their feasible regions

Branin1

Rastrigin1

Schwefel1

EMLPSO

CCPSO+MELCH

CC

3/50

2.4/99

3/50

20.4/90

17.1/192

16/50

5.5/50

2.9/110

3/50

The table reports the averages of number of found feasible disjoint regions/needed FE over 20 runs

the constraints) to these problems. The PSO methods used nv2 topology with 50

particles, because CC method uses 50 initial solutions. The maximum number of FE

was also set to 3000*D. Table 8.4 shows the average results over 20 runs of each

method.

Figure 8.6 shows the feasible regions of all three functions and the personal bests

of the particles after finding the feasible regions.

Clearly, Branin1 function (Fig. 8.6a) contains 3 similar size disjoint feasible

regions fairly scattered over the search space. This makes the problem relatively

easier to solve for the stochastic methods (such as EMLPSO). Also, reported results

15

(a)

(b)

4

3

2

10

1

0

-1

5

-2

-3

-4

0

-5

-5

0

10

-5

-4

-3

-2

-1

(c) 150

100

50

0

-50

-100

-150

-150

-100

-50

50

100

150

Fig. 8.6 A particular run of EMLPSO to locate disjoint feasible regions of a Branin1, b Rasterigin1,

and c Schwefel1. The red areas are feasible regions/the gray areas are infeasible regions, and white

dots are the personal best of the particles

228

in Table 8.4 shows that the proposed EMLPSO was located all feasible regions for

Branin1 function.

Rastrigin1 (Fig. 8.6b) contains 36 disjoint feasible regions with many different

sizes. Some of these regions are very small which makes it harder to locate them. In

this test problem, the proposed EMLPSO has located 20.4 (in average) number of

feasible regions over all 36. Compared to other listed methods, EMLPSO has located

more number of regions in average.

Schwefel1 (Fig. 8.6c) function contains 6 disjoint feasible regions in the different

sizes. Two of these regions are hard to locate as they has been surrounded by two larger

feasible regions. In fact, the methods tend to move the solutions toward these larger

regions rather than the smaller ones in between. However, the proposed EMLPSO

could locate 5.5 regions over all 6 regions (in average) while the other methods, CC

and CCPSO+MELCH, have located 3 and 2.9 feasible regions in average.

Feasible regions in a constrained optimization problem (COP) might have an irregular

shape, e.g., many disjointed regions or regions connected with narrow passages. The

quality of the solutions in each feasible region might be different and the optimal

solution might be in any of these regions. Hence, locating feasible regions, and as

many of these as possible, is of great value. In this chapter, we used the idea of

niching (locating different local optima) in a multi-modal optimization to locate

feasible regions in the COPs. One of the successful algorithms for niching is PSO

with a special type of topology called a nonoverlapping topology. However, existing

studies have shown that PSO with this topology is effective in locating local optima

when the number of dimensions is small (up to 8). We proposed a new PSO (called

mutation linear PSO, MLPSO) which is effective in locating local optima (niching) in

functions with a higher number of dimensions. The abilities of MLPSO in locating

local optima with up to 50 dimensions were tested through some experiments. In

order to locate feasible regions, a constraint handling technique was incorporated

into MLPSO and the new method was called epsilon MLPSO, EMLPSO. EMLPSO

was applied to some COPs and several different topologies of the method were

compared in terms of locating feasible regions. Results showed that EMLPSO with

the nonoverlapping topology with a small number of particles in each sub-swarm

is effective in locating feasible regions. As a future work, it is worthwhile to apply

EMLPSO on more benchmark constraint optimization functions and analyze its

performance in dealing with different COPs.

Acknowledgments This work was partially funded by the ARC Discovery Grants DP0985723,

DP1096053, and DP130104395, as well as by the grant N N519 5788038 from the Polish Ministry

of Science and Higher Education (MNiSW).

229

References

Bonyadi MR, Michalewicz Z (2014) A locally convergent rotationally invariant particle swarm

optimization algorithm. Swarm Intell 8(3):159198

Bonyadi MR, Li X, Michalewicz Z (2013) A hybrid particle swarm with velocity mutation for

constraint optimization problems. In: Genetic and evolutionary computation conference. ACM,

pp 18

Bonyadi MR, Michalewicz Z, Li X (2014) An analysis of the velocity updating rule of the particle

swarm optimization algorithm. J Heuristics 20(4):417452

Brits R, Engelbrecht AP, Van den Bergh F (2002) A niching particle swarm optimizer. In: 4th AsiaPacific conference on simulated evolution and learning, vol 2. Orchid Country Club, Singapore,

pp 692696

Brits R, Engelbrecht AP, Van den Bergh F (2007) Locating multiple optima using particle swarm

optimization. Appl Math Comput 189(2):18591883

Clerc M, Kennedy J (2002) The particle swarmexplosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 6(1):5873

Dantzig G (1998) Linear programming and extensions. Princeton University Press, Princeton

Engelbrecht AP, Masiye BS, Pampard G (2005) Niching ability of basic particle swarm optimization

algorithms. In: Swarm intelligence symposium. IEEE, pp 397400

Gilbert JC, Nocedal J (1992) Global convergence properties of conjugate gradient methods for

optimization. SIAM J Optim 2(1):2142

Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. AddisonWesley Publishing Company, Reading

Hansen N (2006) The CMA evolution strategy: a comparing review. In: Towards a new evolutionary

computation. Springer, Berlin, pp 75102

Helwig S, Wanka R (2007) Particle swarm optimization in high-dimensional bounded search spaces.

In: Swarm intelligence symposium. IEEE, pp 198205

Jabr RA (2012) Solution to economic dispatching with disjoint feasible regions via semidefinite

programming. IEEE Trans Power Syst 27(1):572573

Kennedy J, Eberhart R (1995) Particle swarm optimization. In: International conference on neural

networks, vol 4. IEEE, pp 19421948

Kennedy J, Mendes R (2002) Population structure and particle swarm performance. In: Congress

on evolutionary computation, vol 2. IEEE, pp 16711676

Lasdon L, Plummer JC (2008) Multistart algorithms for seeking feasibility. Comput Oper Res

35(5):13791393

Li XD (2010) Niching without niching parameters: particle swarm optimization using a ring topology. IEEE Trans Evol Comput 14(4):150169

Liang JJ, Zhigang S, Zhihui L (2010) Coevolutionary comprehensive learning particle swarm optimizer. In: Congress on evolutionary computation. IEEE, pp 18

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132

Paquet U, Engelbrecht AP (2007) Particle swarms for linearly constrained optimisation. Fundam

Inf 76(1):147170

Ricardo A, Prez-aguila R (2004) General n-dimensional rotations

Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: World congress on computational

intelligence. IEEE, pp 6973

Smith L, Chinneck J, Aitken V (2013) Constraint consensus concentration for identifying disjoint

feasible regions in nonlinear programmes. Optim Methods Softw 28(2):339363

Spears WM, Green DT, Spears DF (2010) Biases in particle swarm optimization. Int J Swarm Intell

Res 1(2):3457

Suganthan PN, Hansen N, Liang JJ, Deb K, Chen YP, Auger A, Tiwari S (2005) Problem definitions

and evaluation criteria for the CEC 2005 special session on real-parameter optimization. KanGAL

Report

230

with -level control. Soft Comput Transdiscipl Sci Tech 10191029

Takahama T, Sakai S (2010) Constrained optimization by the constrained differential evolution

with an archive and gradient-based mutation. In: Congress on evolutionary computation (CEC).

IEEE, pp 19

Trelea IC (2003) The particle swarm optimization algorithm: convergence analysis and parameter

selection. Inf Process Lett 85(6):317325

Tsang E (1993) Foundations of constraint satisfaction, vol 289. Academic Press, London

Van den Bergh F, Engelbrecht AP (2002) A new locally convergent particle swarm optimiser. In:

Systems, man and cybernetics, vol 3. IEEE, pp 96101

Van den Bergh F, Engelbrecht AP (2006) A study of particle swarm optimization particle trajectories.

Inf Sci 176(8):937971

Van den Bergh F, Engelbrecht AP (2010) A convergence proof for the particle swarm optimiser.

Fund Inf 105(4):341374

Wang H, Yang S, Ip WH, Wang D (2010) A particle swarm optimization based memetic algorithm

for dynamic optimization problems. Nat Comput 9(3):703725

Whitley D, Gordon VS, Mathias K (1994) Lamarckian evolution, the Baldwin effect and function

optimization. Springer, Heidelberg, pp 515

Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):8083

Wilke DN, Kok S, Groenwold AA (2007a) Comparison of linear and classical velocity update rules

in particle swarm optimization: notes on diversity. Int J Numer Methods Eng 70(8):962984

Wilke DN, Kok S, Groenwold AA (2007b) Comparison of linear and classical velocity update rules

in particle swarm optimization: notes on scale and frame invariance. Int J Numer Methods Eng

70(8):9851008

Chapter 9

Techniques for Single Objective

Constrained Optimization

Rammohan Mallipeddi, Swagatam Das and Ponnuthurai

Nagaratnam Suganthan

Abstract Many optimization problems in science and engineering involve

constraints due to which the feasible region reduces and the search process gets

complicated. In addition, when evolutionary algorithms (EAs) are employed to solve

constrained optimization problems additional mechanisms referred to as constraint

handling techniques are required as EAs generally perform unconstrained search.

Generally, the performance of a constraint handling technique depends on its effectiveness in utilizing the information present in the infeasible individuals generated

during the evolution process. In the literature, a variety of techniques are developed

to exploit the information present in infeasible individuals. However, according to

the No Free Lunch (NFL) theorem, no single state-of-the-art constraint handling

technique can outperform all others on every problem. In other words, depending on

several factors, such as the ratio between feasible search space and the whole search

space, multi-modality of the problem, the chosen EA and global exploration/local

exploitation stages of the search process, different constraint handling methods can

be effective on different problems and during different stages of the search process.

Hence, solving a particular constrained problem requires numerous trial-and-error

runs to choose a suitable constraint handling technique and to fine-tune the associated parameters. The trial-and-error approach may be unrealistic in applications

where the objective function is computationally expensive or solutions are required

in real-time.In this chapter, we present an ensemble of constraint handling techniques

(ECHT) as an efficient alternative to the trial-and-error-based search for the best constraint handling technique with its best parameters for a given problem. Ensemble

R. Mallipeddi (B)

Kyungpook National University, 1370 Sangkyuk-Dong, 702 701 Puk-gu,

Daegu, South Korea

e-mail: mallipeddi.ram@gmail.com

S. Das

Electronics and Communication Sciences Unit Indian Statistical Institute,

203 B T Road, 700108 Kolkata, India

e-mail: swagatam.das@isical.ac.in

P.N. Suganthan

EEE, SS2-B2a-21, 639798 Ntu, Singapore

e-mail: epnsugan@ntu.edu.sg

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_9

231

232

R. Mallipeddi et al.

being a general concept can be realized with any EA framework. In this chapter,

ECHT is combined with an improved differential evolution (DE) algorithm referred

to as EPSDE. EPSDE is an improved of DE version based on ensemble framework.

The performance of the proposed architecture is compared with the state-of-the-art

algorithms.

Keywords Constraint handling

problems

9.1 Introduction

Optimization is an intrinsic part of life and of human activity. For example,

manufacturers seek maximum efficiency in the design of their production processes,

investors aim at creating portfolios that avoid high risk while yielding a good return,

traffic planners need to decide on the level and ways of routing traffic to minimize

congestion, etc.

Classical optimization techniques make use of differential calculus, where it is

assumed that the function is differentiable twice with respect to the design variables,

and that the derivatives are continuous in locating the optimum solution. Thus, classical methods have limited scope in practical real-world applications as objective

functions are characterized by chaotic disturbances, randomness, and complex nonlinear dynamics and may not always be continuous and/or differentiable. Recently,

population-based stochastic algorithms such as evolutionary algorithms (EAs) are

well known for their ability to handle non linear and complex optimization problems. The primary advantage of EAs over other numerical methods is that they just

require the objective function values, while properties such as differentiability and

continuity are not necessary (Anile et al. 2005).

Many optimization problems in science and engineering involve constraints.

The presence of constraints reduces the feasible region and complicates the search

process. In addition, when solving constrained optimization problems, solution candidates that satisfy all the constraints are feasible individuals while individuals that

fail to satisfy any of the constraints are infeasible individuals. To solve constrained

optimization problems, EAs require additional mechanisms referred to as constraint

handling techniques. One of the major issues in constraint optimization using EAs

is how to deal with infeasible individuals throughout the search process. One way

to handle is to completely disregard infeasible individuals and continue the search

process with feasible individuals only. This approach may be ineffective as EAs are

probabilistic search methods and potential information present in infeasible individuals can be wasted. If the search space is discontinuous, then the EA can also

be trapped in one of the local minima. Therefore, different techniques have been

developed to exploit the information in infeasible individuals. In the literature, several constraint handling techniques are proposed to be used with the EAs (Coello

233

Coello 2002). Michalewicz and Schoenauer (1996) grouped the methods for handling

constraints within EAs into four categories: preserving feasibility of solutions (Koziel

and Michalewicz 1999), penalty functions, make a separation between feasible and

infeasible solutions, and hybrid methods. A constrained optimization problem can

also be formulated as a multi-objective (Wang et al. 2007) problem, but it is computationally intensive due to non-domination sorting.

According to the No Free Lunch theorem (Wolpert and Macready 1997), no single

state-of-the-art constraint handling technique can outperform all others on every

problem. Hence, solving a particular constrained problem requires numerous trialand-error runs to choose a suitable constraint handling technique and to fine-tune the

associated parameters. This approach clearly suffers from unrealistic computational

requirements in particular if the objective function is computationally expensive (Jin

2005) or solutions are required in real-time. Moreover, depending on several factors

such as the ratio between feasible search space and the whole search space, multimodality of the problem, the chosen EA and global exploration/local exploitation

stages of the search process, different constraint handling methods can be effective

during different stages of the search process.

In pattern recognition and machine learning (Rokach 2009; Zhang 2000), ensemble methodology has been successfully employed. Ensemble integrates different

methods available to perform the same task into a single method so that the reliability can be improved. For example, in classification, an ensemble model formed by

integrating multiple classifiers reduces the variance, or instability caused by single

methods and improves the classification efficiency or prediction accuracy.

In this chapter, an ensemble of constraint handling techniques (ECHT) with four

constraint handling techniques (Coello Coello 2002; Huang et al. 2006; Runarsson

and Yao 2000; Tessema and Yen 2006) is presented as an efficient alternative to

the trial-and-error-based search for the best constraint handling technique with its

best parameters for a given problem. In ECHT, each constraint handling technique

has its own population and each function call is efficiently utilized by each of these

populations. Ensemble being a general concept can be realized with any EA framework. In this chapter, we integrate ECHT with an improved version of DE algorithm

referred to as EPSDE. EPSDE is a version of DE algorithm which is based on the concept of ensemble (Mallipeddi et al. 2011). In EPSDE, a pool of distinct mutation and

crossover strategies along with a pool of control parameters associated with DE algorithm coexist throughout the evolution process and competes to produce offspring.

Experimental results show that the performance of ECHT-EPSDE is better than each

single constraint handling method used to form the ensemble and competitive to the

state-of-the-art algorithms.

234

R. Mallipeddi et al.

A constrained optimization problem with D parameters to be optimized is usually

written as a nonlinear programming problem of the following form (Qin et al. 2009):

Minimize: f (X ), X = (x1 , x2 , . . . , x D ) and X S

subject to:

gi (X ) 0,

h j (X ) = 0,

(9.1)

i = 1, . . . , p

j = p + 1, . . . , m

Here f need not be continuous but must be bounded. S is the search space. p

and (m p) are the number of inequality and equality constraints respectively. The

inequality constraints that satisfy gi (X ) = 0 at the global optimum solution are

called active constraints. All equality constraints are active constraints. The equality

constraints can be transformed into inequality form and can be combined with other

inequality constraints as

G i (X ) =

max{gi (X ), 0}

max{| h i (X ) | , 0}

i = 1, . . . p

i = p + 1, . . . , m

(9.2)

where is a tolerance parameter for the equality constraints. An adaptive setting of the

tolerance parameter, which is originally proposed in (Hamida and Schoenauer 2002)

and used in Mezura-Montes and Coello Coello (2003), Mezura-Montes and Coello

Coello (2005),Wang et al. (2008) is adopted in our work with some modifications.

Therefore, the objective is to minimize the fitness function f (X ) such that the optimal

solution obtained satisfies all the inequality constraints G i (X ). The overall constraint

violation for an infeasible individual is a weighted mean of all the constraints, which

is expressed as

m

wi (G i (X ))

m

(X ) = i=1

(9.3)

i=1 wi

where wi (=1/G maxi ) is a weight parameter, G maxi is the maximum violation of

constraint G i (X ) obtained so far. Here, wi is set as 1/G maxi which varies during

the evolution in order to balance the contribution of every constraint in the problem

irrespective of their differing numerical ranges.

The search process for finding the feasible global optimum in a constrained problem can be divided in to three phases (Wang et al. 2008) depending on the number

of feasible solutions present in the combined parent population and its offspring

population as (a) Phase 1: No feasible solution, (b) Phase 2: At least one feasible

solution, and (c) Phase 3: Combined offspring-parent population has more feasible

solutions than the size of next generation parent population. Different constraint

handling techniques perform differently during each of these three phases.

235

Powell and Skolnick 1993)

In SF, when two solutions X i and X j are compared, X i is regarded superior to X j

under the following conditions:

X i is feasible and X j is not.

X i and X j are both feasible and X i has a smaller objective value (in a minimization

problem) than X j .

X i and X j are both infeasible, but X i as a smaller overall constraint violation (X i )

as computed by using Eq. (9.3).

Therefore, in SF, feasible ones are always considered better than infeasible ones.

Two infeasible solutions are compared based on their overall constraint violations

only, while two feasible solutions are compared based on their objective function

values only. Comparison of infeasible solutions based on the overall constraint violation aims to push infeasible solutions to the feasible region, while comparison of

two feasible solutions on the objective value improves the overall solution. Therefore,

in Phase 1, infeasible solutions with low overall constraint violation are selected. In

Phase 2, first all the feasible ones are selected and then infeasible ones with low

overall constraint violation are selected. In Phase 3, only feasible ones with best

objective values are selected.

The simplest and the earliest method of involving infeasible individuals in the search

process, even after sufficient number of feasible solutions are obtained, is the static

penalty method. In this method, a penalty value is added to the fitness value of each

infeasible individual so that it will be penalized for violating the constraints. Static

penalty functions are popular due to their simplicity but they usually require different

parameters to be defined by the user to control the amount of penalty added when

multiple constraints are violated. The parameters are usually problem-dependent. To

overcome this difficulty, adaptive penalty functions (Farmani and Wright 2003) are

suggested where information gathered from the search process is used to control the

amount of penalty added to infeasible individuals. Adaptive penalty functions are

easy to implement and they do not require users to define parameters.

In Tessema and Yen (2006), a self-adaptive penalty function method is proposed

to solve constrained optimization problems. Two types of penalties are added to

each infeasible individual to identify the best infeasible individuals in the current

population. The amount of the added penalties is controlled by the number of feasible individuals currently present in the combined population. If there are a few

feasible individuals, a higher amount of penalty is added to infeasible individuals

with a higher amount of constraint violation. On the other hand, if there are several

236

R. Mallipeddi et al.

feasible individuals, then infeasible individuals with high fitness values will have

small penalties added to their fitness values. These two penalties allow the algorithm

to switch between finding more feasible solutions and searching for the optimum

solution at any time during the search process. This algorithm requires no parameter

tuning. The final fitness value based on which the population members are ranked is

given as F(X ) = d(X ) + p(X ), where d(X ) is the distance value and p(X ) is the

penalty value. The distance value is computed as follows:

d(X ) =

(X ),

f (X )2 + (X )2 ,

if r f = 0

otherwise

(9.4)

where r f =

, (X ) is the overall constrain violation as

Population size

f (X ) f min

Eq. (9.3), f (X ) = fmax fmin . f max and f min are the maximum and

defined in

minimum values of the objective function f (X ) in the current combined population.

The penalty value is defined as

p(X ) = (1 r f )M(X ) + r f N (X )

where

M(X ) =

N (X ) =

0

f (X ),

(9.5)

if r f = 0

otherwise

(9.6)

if X is a feasible individual

if X is an infeasible individual

(9.7)

0,

(X ),

Therefore, in Farmani and Wright (2003), Tessema and Yen (2006), the selection

of individuals in all the three phases is based on a value determined by the overall

constraint violation and objective values. Thus, there is a chance for an individual

with lower overall constraint violation and higher fitness to get selected over a feasible

individual with lower fitness even in Phase 3, where there is sufficient number of

feasible solutions to form the parent population using only feasible solutions.

In -constraint handling method the relaxation of the constraints is controlled by

using the parameter. As solving a constrained optimization problem becomes

tedious when active constraints are present, proper control of the parameter is

essential (Takahama and Sakai 2006) to obtain high quality solutions for problems

with equality constraints. The level is updated until the generation counter G reaches

the control generation Tc . After the generation counter exceeds Tc , the level is set

to zero to obtain solutions with no constraint violation.

237

(0) = (X )

(k) =

(0) 1

0,

G

Tc

cp

(9.8)

0 < G < Tc

G Tc

(9.9)

where X is the top -th individual and = (0.05 N P). The recommended

parameter ranges are (Takahama and Sakai 2006):Tc [0.1Tmax , 0.8Tmax ] and cp

[2, 10].

The selection of individuals in the three phases of evolution by using the

-constraint technique is similar to the SF, but in the EC, a solution is regarded

as feasible if its overall constraint violation is lower than (G).

Runarsson and Yao (2000) introduced stochastic ranking (SR) method to achieve

a balance between objective and the overall constraint violation stochastically. A

probability factor p f is used to determine whether the objective function value or

the constraint violation value determines the rank of each individual. Basic form of

the SR (Runarsson and Yao 2000) can be presented as:

If (no constraint violation or rand < p f )

Rank based on the objective value only

else

Rank based on the constraint violation only

End

In Runarsson and Yao (2005), an improved version of the SR (ISR) was proposed

using evolution strategies and differential variation. In SR, comparison between two

individuals may be based on objective value alone or overall constraint violation

alone as randomly determined. Thus, infeasible solutions with better objective value

have a chance to be selected in all three phases of evolution. In our work, a modified

version of the SR presented in Runarsson and Yao (2000) is used. Here, the value of

p f is not maintained a constant instead, decreased linearly from p f = 0.475 in the

initial generation to p f = 0.025 in the final generation.

From the above discussions, we can observe that each of the constraint handling

methods used in ECHT differs in at least one of the three phases. In addition, it should

be noted that the ECHT approach is general and can be formulated with any search

method and constraint handling techniques.

238

R. Mallipeddi et al.

Each constrained optimization problem would be unique in terms of the ratio between

feasible search space and the whole search space, multi-modality and the nature of

constraint functions. As evolutionary algorithms are stochastic in nature, the evolution paths can be different in every run even when the same problem is solved

using the same algorithm. In other words, the search process passes through different phases at different points during the search process. Therefore, depending on

several factors such as the ratio between feasible search space and the whole search

space, multi-modality of the problem, nature of equality/inequality constraints, the

chosen EA and global exploration/local exploitation stages of the search algorithm,

different constraint handling methods can be effective during different stages of the

search process. Due to the strong interactions between these diverse factors and the

stochastic nature of the evolutionary algorithms, it is not straightforward to determine which constraint handling method is the best during a particular stage of the

evolution to solve a given problem using a given EA. Motivated by these observations, we develop the ECHT to implicitly benefit from the match between constraint

handling methods, characteristics of the problem being solved, chosen EA, and the

exploration-exploitation stages of the search process.

A real-world problem can take several minutes to several hours to compute the

objective function value (Jin 2005). Therefore, finding a better constraint handling

method for such problem by trial-and-error may become difficult. The computation

time wasted in searching for a better constraint handling method can be saved by

using the proposed ECHT.

In this section, we present ECHT with four constraint handling techniques

discussed in previous section. Each constraint handling technique has its own population and parameters. Each population corresponding to a constraint handling method

produces its offspring and evaluates them. The parent population corresponding to

a particular constraint handling method not only competes with its own offspring

population but also with offspring population of the other three constraint handling

methods. Due to this, an offspring produced by a particular constraint handling

method may be rejected by its own population, but could be accepted by the populations of other constraint handling methods. Hence, in ECHT every function call is

utilized effectively. If the evaluation of objective/constraint functions is computationally expensive, more constraint handling methods can be included in the ensemble

to benefit more from each function call. And if a particular constraint handling technique is best suited for the search method and the problem during a point in the

search process, the offspring population produced by the population of that constraint handling method will dominate the other and enter other populations too. In

the subsequent generations, these superior offspring will become parents in other

populations too. Therefore, ECHT transforms the burden of choosing the best constraint handling technique and tuning the associated parameter values for a particular

problem into an advantage. If the constraint handling methods selected to form an

ensemble are similar in nature then the populations associated with each of them may

239

lose diversity and the search ability of ECHT may deteriorate. Thus, the performance

of ECHT can be improved by selecting constraint handling methods with diverse and

competitive nature. The general framework of the ensemble algorithm is illustrated

in the flowchart shown in Fig. 9.1.

As ECHT employs different constraint handling methods each having its own

population, it can be compared with hybrid methods like memetic algorithms

(Ishibuchi et al. 2003; Ong and Keane 2004; Ong et al. 2006). Some methods like

island models (Skolicki and De Jong 2007) sometimes called Migration model or

Coarse Grained model, also employ subpopulations in their approach. The main

difference between the ECHT and the island model is that in island model, subpopulations in different islands evolve separately with occasional communication

between them to maintain diversity while in ECHT the communication between different populations is by sharing of all offspring and thus facilitating efficient usage

of each function call.

9.3.1 ECHT-EPSDE

In this section, an ECHT with EPSDE as the basic search algorithm (ECHT-EPSDE)

is demonstrated. ECHT-EPSDE uses the four constraint handling techniques discussed in Sects. 9.2.19.2.4. Each constraint handling technique has its own population and parameters. Each population corresponding to a constraint handling method

produces its offspring using the associated strategies and parameters of the EPSDE.

The offspring produced are evaluated. In ECHT-EPSDE, the parent population corresponding to a particular constraint handling method not only competes with its own

offspring population but also with offspring population of the other three constraint

handling methods. In DE, since mutation and crossover are employed to produce

an offspring, among the parent and offspring population of the same constraint handling technique DEs one-to-one selection is employed. But when the parents of one

constraint handling method competes with offspring population of the other constraint handling method then corresponding to every offspring a parent is randomly

selected for competition. Hence, in ECHT-EPSDE every function call is utilized by

every population associated with each constraint handling technique in the ensemble. Due to this, an offspring produced by a particular constraint handling method

may be rejected by its own population, but could be accepted by the populations of

other constraint handling methods. Therefore, the ensemble transforms the burden of

choosing a particular constraint handling technique and tuning the associated parameter values for a particular problem into an advantage.

The ECHT-EPSDE can be summarized as

STEP 1: Each of the four constraint handling techniques (SF, SP, EC and SR

in Sects. 9.2.19.2.4) has its own population of NP individuals each with dimension D (POPk , k = 1, . . . , 4) and parameter/strategy pools (P Sk , k = 1, . . . , 4)

240

R. Mallipeddi et al.

STEP 1:

AND EACH CH ( i = 1,..., 4 ) RULES

POP1

PAR1

POP2

PAR2

POP3

PAR3

POP4

PAR4

INCREASE NUMBER OF FUNCTION EVALUATIONS (nfeval)

NO

nfeval

Max_FEs

STOP

YES

EACH CONSTRAINT HANDLING METHOD CH ( i = 1,..., 4 )

i

STEP 4:

OFFS1

OFFS2

OFFS3

OFFS4

INCREASE NUMBER OF FUNCTION EVALUATIONS (nfeval)

STEP 6:

POP1

OFFS1

OFFS2

OFFS3

OFFS4

STEP 7:

POP1

POP2

OFFS1

OFFS2

OFFS3

OFFS4

POP3

OFFS1

OFFS2

OFFS3

OFFS4

POP4

OFFS1

OFFS2

OFFS3

OFFS4

RULES OF EP & CH ( i = 1,...,4)

i

POP2

POP3

POP4

Fig. 9.1 Flowchart of ECHT (CH: constraint handling method, POP: population, PAR: parameters,

OFF: offspring, Max_FEs: maximum number of function evaluations)

241

initialized according to the EPSDE rules and the corresponding constraint handling

method (C Hk , k = 1, . . . , 4). Set the generation counter G = 0.

STEP 2: Evaluate the objective/constraint function values and the overall constraint

violation for each individual X ik ,i {1, . . . , N P} of every population (POPk , k =

1, . . . , 4) using Eqs. (9.29.3).

STEP 3: The parameter values of constraint handling methods are updated according

to Sect. 9.2.

STEP 4: Each parent population (POPk , k = 1, . . . , 4) produces offspring population (OFFS k , k = 1, . . . , 4) by mutation and crossover (Takahama and Sakai

2006).

STEP 5: Compute the objective/constraint function values and the overall constraint

violation of each offspring X i k i {1, . . . , N P}. Each offspring retains the objective and constraint function values separately, i.e., each offspring is evaluated only

once.

STEP 6: Each parent population POPk , k = 1, . . . , 4 is combined with offspring

produced by it and the offspring produced by all other populations corresponding to

different constraint handling techniques as in STEP 6 in Fig. 9.1. The four different

groups are:

Group 1: (POP1 , OFFS k , k = 1, . . . , 4), Group 2: (POP2 ,OFFS k , k = 1, . . . , 4),

Group 3: (POP3 , OFFS k , k = 1, . . . , 4) and Group 4: (POP4 ,OFFS k , k =

1, . . . , 4).

STEP 7: In selection step, parent populations POPk , k = 1, . . . , 4 for the next

generation are selected from Groups 1, 2, 3, and 4 respectively. In a Group (say

Group 1), since OFF1 is produced by POP1 by mutation and crossover, DEs selection

based on competition between parent and its offspring is employed when POP1

competes with OFF1 . But when POP1 competes with OFF2 or OFF3 or OFF4 ,

produced by other populations, each member in POP1 competes with a randomly

selected offspring from OFF2 or OFF3 or OFF4 .

STEP 8: Stop if termination criterion is met. Else, G = G + 1 and go to STEP 3.

In Mallipeddi and Suganthan (2010b), we evaluated the performance of ECHT-DE

with the four constraint handling methods used in ECHT (SF-DE, SP-DE, SR-DE,

and EC-DE) are evaluated and compared. In addition, the performance of ECHT-DE

is compared with some of the state-of-the-art methods on a set of 24 well-defined

problems of CEC 2006 (Liang et al. 2006).

In this chapter, we evaluated the performance of EPSDE-ECHT using 10D and

30D versions of CEC 2010. The performance of the algorithm is compared with the

state-of-the-art algorithms that participated in the CEC 2010 competition. The details

regarding the problems and the evaluation criteria are presented in Mallipeddi and

Suganthan (2010).

242

R. Mallipeddi et al.

method is set to 50. The details regarding the selection of the parameter and strategy pools of EPSDE algorithm are discussed in Mallipeddi et al. (2011). On each

problem of the problem set, every algorithm is run 25 times independently. The

maximum number of function evaluations used is 2 105 and 6 105 for 10D and

30D respectively. The parameters corresponding to the constraint handling methods

are set to: Tc = 0.2Tmax , c p = 5 and P f is linearly decreased from an initial value

of 0.4750.025 in the final generation. However, the performance of the ECHT can

be improved by tuning the parameters of individual constraint handling methods.

The tolerance parameter for the equality constraints is adapted using the following

expression:

(G)

(9.10)

(G + 1) =

Table 9.1 Function values achieved for FES = 2 105 for 10D problems

C01

C02

C03

C04

C05

246.8502

246.7401

240.4916

0, 0, 0

0

245.7474

2.2307

C07

Best

1.000E05

Median 1.000E05

Worst 1.000E05

c

0, 0, 0

0

Mean

1.000E05

Std

2.9292E05

C13

Best

0.0036

Median 0.0036

Worst 0.0036

c

0, 0, 0

0

Mean

0.0036

Std

7.7800E09

Best

Median

Worst

c

Mean

Std

580.7301

602.0537

608.4520

0, 0, 0

0

600.5612

7.2523

C08

20.0780

19.9875

18.9875

0, 0, 0

0

19.3492

0.3452

C14

0.7473

0.7473

0.7406

0, 0, 0

0

0.7470

0.0014

0.0034

0.0034

0.0034

0, 0, 0

0

0.0034

8.5413E18

C09

68.4294

68.4294

61.6487

0, 0, 0

0

67.4211

1.8913

C15

1417.2374

1417.2374

1417.2374

0, 0, 0

0

1417.2374

0

420.9687

420.9687

420.9687

0, 0, 0

0

420.9687

4.6711E07

C10

2.2777

2.2777

2.2612

0, 0, 0

0

2.2761

5.2000E03

C16

325.4888

0.1992

0.1992

0, 0, 0

0

75.2591

122.3254

0

0

0

0, 0, 0

0

0

0

C11

2.2800E+02

9.9040E+02

1.5013E+03

0, 0, 0

0

1.0356E+03

1.0344E+03

C17

2960.9139

2960.9139

2960.9139

0, 0, 0

0

2960.9139

0

C06

2.4983E+01

7.7043E+01

9.2743E+04

0, 0, 0

0

9.7245E+03

2.9188E+04

C12

0

0

0

0, 0, 0

0

0

0

C18

0

0

0

0, 0, 0

0

0

0

243

The initial (0) is selected as the median of equality constraint violations over the

entire initial population. The value of is selected in such a way that it causes to

reach a value of E-04 at around 600 generations, after which the value of is fixed

at E-04.

The experimental results (best, median, mean, worst, and standard deviation

values) are presented in Tables 9.1 and 9.2. c are the number of violated constraints

at the median solution: the sequence of three numbers indicates the number of violations (including inequality and equalities) by more than 1.0, more than 0.01 and

more than 0.0001 respectively. is the mean value of the violations of all constraints

at the median solution. The ranking of the algorithm in comparison with the stateof-the-art algorithms is shown in Tables 9.3 and 9.4. The overall and average ranking

for each of the algorithms is presented in Table 9.5.

From the results it can be observed that the best three algorithms are DEg,

ECHT-EPSDE and ECHT-DE with average ranks of 3.08, 3.58, and 4.67. In other

words, the performance of ECHT-EPSDE is better than the ECHT-DE variant.

Table 9.2 Function values achieved for FES = 6 105 for 30D problems

C01

C02

C03

C04

C05

Best

Median

Worst

c

Mean

Std

500

500

501

0, 1, 1

1.3250E02

485.3521

76.4931

C07

Best

6.2793E04

Median 7.2345E04

Worst 8.3291E04

c

0, 0, 0

0

Mean

7.8321E04

Std

9.5612E05

C13

Best

0.0039

Median 0.0039

Worst 0.0039

c

0, 0, 0

0

Mean

0.0039

Std

1.1166E05

1962.5740

2040.3251

2051.3521

0, 0, 0

0

2021.2371

24.5128

C08

20.2688

19.8770

11.1774

0, 0, 0

0

18.5035

2.7152

C14

0.8217

0.8012

0.7557

0, 0, 0

0

0.7994

0.0179

0.0005

0.0001

0.0022

0, 0, 1

3.1000E03

0.0007

0.025

C09

67.4137

64.4208

62.6694

0, 0, 0

0

64.3612

1.2845

C15

2344.6224

2933.9001

3310.3263

0, 0, 0

0

2887.4795

556.8420

420.9832

439.1865

500

1, 1, 1

2.9637E+03

450.6785

28.4321

C10

1.2574

2.3390

4.1011

0, 0, 0

0

2.4532

0.9931

C16

0.1993

0.1993

11096.2789

0, 0, 0

0

79.5125

255.1325

28.6735

29.6333

87.3162

0, 0, 0

0

37.2923

15.1524

C11

4.3051E+03

4.3051E+03

4.3053E+03

0, 0, 0

0

4.3051E+03

6.7521E07

C17

3.1120

9320.5713

21577.5875

0, 0, 1

7.6318E04

12705.5579

6455.6924

C06

2.4983E+01

2.49832E+01

2.49832E+01

0, 0, 0

0

2.49832E+01

3.5147E06

C12

6514.7354

12470.9657

10670.6636

0, 0, 1

1.7311E04

12229.2897

2178.3588

C18

4.2090E09

2.400E07

4.1800E05

0, 0, 0

0

2.1100E06

8.3000E06

244

R. Mallipeddi et al.

Algorithm/Problem C01

C02

jDEsoco

DE-VPS

RGA

E-ABC

DEg

DCDE

Co-CLPSO

CDEb6e6r

sp-MODE

MTS

IEMA

ECHT-DE

ECHT-EPSDE

Algorithm/Problem

jDEsoco

DE-VPS

RGA

E-ABC

DEg

DCDE

Co-CLPSO

CDEb6e6r

sp-MODE

MTS

IEMA

ECHT-DE

ECHT-EPSDE

7

11

9

10

1

12

8

5

1

13

6

1

4

C10

5

6

7

10

2

4

8

12

13

9

11

3

1

13

7

9

8

6

5

4

10

12

11

1

3

1

C11

3

8

9

12

1

6

11

7

13

10

2

5

4

C03

C04

C05

C06

C07

C08

C09

9

11

13

12

1

1

6

7

10

8

5

1

1

C12

5

10

11

5

1

9

5

2

13

12

5

5

2

1

9

8

11

5

10

7

1

13

12

6

1

1

C13

4

6

7

8

2

12

9

2

13

11

5

10

1

10

6

11

8

1

1

1

12

13

7

9

5

1

C14

4

5

7

12

1

2

3

11

10

13

6

8

9

4

10

11

8

1

1

1

12

13

7

9

6

5

C15

8

5

7

11

2

1

4

13

9

12

6

10

3

1

10

11

12

1

8

9

1

1

13

5

6

7

C16

9

1

10

7

8

6

2

12

13

11

3

5

4

2

11

5

12

9

10

1

8

7

13

6

3

3

C17

10

7

8

9

5

2

6

12

13

11

1

4

3

4

6

7

9

1

5

8

12

13

10

11

3

2

C18

10

1

8

9

1

6

7

12

13

11

1

1

1

Table 9.4 Ranking for 30D Problems

Algorithm/Problem C01

C02

jDEsoco

DE-VPS

RGA

E-ABC

DEg

DCDE

Co-CLPSO

CDEb6e6r

sp-MODE

MTS

IEMA

ECHT-DE

ECHT-EPSDE

Algorithm/Problem

jDEsoco

DE-VPS

RGA

E-ABC

DEg

DCDE

Co-CLPSO

CDEb6e6r

sp-MODE

MTS

IEMA

ECHT-DE

ECHT-EPSDE

5

12

7

8

2

11

10

1

3

13

4

6

9

C10

2

7

8

11

3

1

9

13

12

10

6

5

4

9

7

8

10

3

2

1

11

12

13

6

5

4

C11

3

8

7

10

2

6

11

1

13

9

12

5

3

245

C03

C04

C05

C06

C07

C08

C09

3

8

12

11

2

1

10

6

9

7

13

4

5

C12

1

10

7

8

11

2

3

9

13

4

12

5

6

4

8

7

10

5

9

6

3

13

11

12

1

1

C13

1

10

9

6

4

8

11

3

13

12

2

7

5

9

5

6

7

1

10

2

12

13

8

11

3

4

C14

4

5

10

8

1

3

5

11

13

12

2

9

7

3

8

9

7

1

10

4

11

13

6

12

5

2

C15

6

4

7

10

2

1

3

12

11

13

5

9

8

1

7

12

13

4

6

9

1

11

8

5

10

3

C16

8

7

10

9

1

6

1

11

13

12

5

1

1

7

10

13

9

2

3

8

1

11

12

4

5

6

C17

10

6

9

8

7

5

4

13

11

12

1

2

3

2

9

8

10

3

13

7

1

12

11

6

5

4

C18

10

5

7

9

8

4

6

12

11

13

1

1

1

246

Table 9.5 Overall ranking of the algorithms

Algorithm

JDEsoco (Brest et el. 2010)

DE-VPS (Tasgetiren et al. 2010)

RGA (Saha et al. 2010)

E-ABC (Mezura-Montes and Velez-Koeppel 2010)

DEg (Takahama and Sakai 2010)

DCDE (Zhihui et al. 2010)

Co-CLPSO (Liang et al. 2010)

CDEb6e6r (Tvrdik and Polakova 2010)

sp-MODE (Reynoso-Meza et al. 2010)

MTS (Lin-Yu and Chun 2010)

IEMA (Singh et al. 2010)

ECHT-DE (Mallipeddi and Suganthan 2010a)

ECHT-EPSDE

R. Mallipeddi et al.

Ranking

10D

30D

Overall

Average

109

130

158

173

49

101

100

151

193

194

98

80

53

197

266

314

337

111

202

210

283

400

380

217

168

129

5.47

7.39

8.72

9.36

3.08

5.61

5.83

7.86

11.11

10.56

6.03

4.67

3.58

88

136

156

164

62

101

110

132

207

186

119

88

76

9.4 Conclusions

In this chapter, a novel constraint handling procedure called ECHT was presented

with four different constraint handling methods where each constraint handling

method has its own population. In ECHT every function call is effectively used

by all four populations and the offspring population produced by the best suited constraint handling technique dominates the others at a particular stage of the optimization process. Furthermore, an offspring produced by a particular constraint handling

method may be rejected by its own population, but could be accepted by the populations associated with other constraint handling methods. No Free Lunch (NFL)

theorem implies that irrespective of the exhaustiveness of parameter tuning, no single constraint handling method can be the best for every constrained optimization

problem. Hence, according to the NFL, the ECHT has the potential to perform well

over diverse problems over any single constraint handling method. In this chapter, we

evaluated the performance of ECHT using EPSDE algorithm. Experimental results

showed that the ECHT-EPSDE outperforms the state-of-the-art methods on CEC

2010 problems.

247

References

Anile AM, Cutello V, Nicosia G, Rascuna R, Spinella S (2005) Comparison among evolutionary

algorithms and classical optimization methods for circuit design problems. Paper presented at

the IEEE conference on evolutionary computation, Vancouver, Canada

Brest J, Boskovic B, Zumer V (2010) An improved self-adaptive differential evolution algorithm in

single objective constrained real-parameter optimization. Paper presented at the IEEE congress

on evolutionary computation

Coello Coello CA (2002) Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput Methods Appl Mech Eng 191(11

12):12451287

Deb K (2000) An efficient constraint handling method for genetic algorithms. Comput Methods

Appl Mech Eng 186(24):311338

Farmani R, Wright JA (2003) Self-adaptive fitness formulation for constrained optimization. IEEE

Trans Evol Comput 7(5):445455

Hamida SB, Schoenauer M (2002) ASCHEA: New results using adaptive segregational constraint

handling. Paper presented at the proceedings of congress evolutionary computation

Huang VL, Qin AK, Suganthan PN (2006) Self-adaptive differential evolution algorithm for constrained real-parameter optimization. Paper presented at the IEEE congress on evolutionary computation, Vancouver, Canada

Ishibuchi H, Yoshida T, Murata T (2003) Balance between genetic search and local search in

memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans Evol Comput 7(2):204223

Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft

Comput 9(1):312

Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):1944

Liang JJ, Runarsson TP, Mezura-Montes E, Clerc M, Suganthan PN, Coello Coello CA, Deb K

(2006) Problem definitions and evaluation criteria for the CEC 2006 special session on constrained

real-parameter optimization: Technical Report, Nanyang Technological University, Singapore

Available from http://www3.ntu.edu.sg/home/EPNSugan/

Liang JJ, Shang Z, Li Z (2010) Coevolutionary comprehensive learning particle swarm optimizer.

Paper presented at the IEEE congress on evolutionary computation

Mallipeddi R, Suganthan PN (2010) Problem definitions and evaluation criteria for the CEC 2010

competition on constrained real-parameter optimization, Nanyang Technological University, Singapore

Lin-Yu T, Chun C (2010) Multiple trajectory search for single objective constrained real-parameter

optimization problems. Paper presented at the IEEE congress on evolutionary computation

Mallipeddi R, Suganthan PN (2010a) Differential evolution with ensemble of constraint handling

techniques for solving CEC 2010 benchmark problems. Paper presented at the IEEE congress on

evolutionary computation

Mallipeddi R, Suganthan PN (2010b) Ensemble of constraint handling techniques. IEEE Trans Evol

Comput 14(4):561579

Mallipeddi R, Suganthan PN, Pan QK, Tasgetiren MF (2011) Differential evolution algorithm with

ensemble of parameters and mutation strategies. Appl Soft Comput 11(21):6791696. doi: http://

dx.doi.org/10.1016/j.asoc.2010.04.024

Mezura-Montes E, Coello Coello CA (2003) Adding diversity mechanism to a simple evolution strategy to solve constrained optimization problems. Paper presented at the proceedings of congress

on evolutionary computation

Mezura-Montes E, Coello Coello CA (2005) A simple multimembered evolution strategy to solve

constrained optimization problems. IEEE Trans Evol Comput 9(1):117

Mezura-Montes E, Velez-Koeppel RE (2010) Elitist artificial bee colony for constrained realparameter optimization. Paper presented at the IEEE congress on evolutionary computation

248

R. Mallipeddi et al.

Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4(1):132

Ong YS, Keane AJ (2004) Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol

Comput 8(2):99110

Ong YS, Lim M-H, Zhu N, Wong K-W (2006) Classification of adaptive memetic algorithms: a

comparative study. IEEE Trans Syst, Man, Cybern 36(1):141152

Powell D, Skolnick M (1993) Using genetic algorithms in engineering design optimization with

non-linear constraints. Paper presented at the proceedings of fifth international conference on

genetic algorithms, San Mateo,California

Qin AK, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation

for global numerical optimization. IEEE Trans Evol Comput 13(2):398417

Reynoso-Meza G, Blasco X, Sanchis J, Martinez M (2010) Multiobjective optimization algorithm

for solving constrained single objective problems. Paper presented at the IEEE congress on

evolutionary computation

Rokach L (2009) Taxonomy for characterizing ensemble methods in classification tasks: a review

and annotated bibliography. Comput Stat Data Anal 53:40464072

Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE

Trans Evol Comput 4(3):284294

Runarsson TP, Yao X (2005) Search biases in constrained evolutionary optimization. IEEE Trans

Syst, Man, Cybern 35(2):233243

Saha A, Datta R, Deb K (2010) Hybrid gradient projection based genetic algorithms for constrained

optimization. Paper presented at the IEEE congress on evolutionary computation

Singh HK, Ray T, Smith W (2010) Performance of infeasibility empowered memetic algorithm

for CEC 2010 constrained optimization problems. Paper presented at the IEEE congress on

evolutionary computation

Skolicki Z, De Jong K (2007) The importance of a two-level perspective for Island model design.

Paper presented at the IEEE congress on evolutionary computation

Takahama T, Sakai S (2006) Constrained Optimization by the constrained differential evolution with

gradient-based mutation and feasible elites. Paper presented at the IEEE congress on evolutionary

computation, Sheraton Vancouver wall centre hotel, Vancouver, BC, Canada

Takahama T, Sakai S (2010) Constrained optimization by the -constrained differential evolution

with an archive and gradient-based mutation. Paper presented at the IEEE congress on evolutionary computation

Tasgetiren MF, Suganthan PN, Quan-ke P, Mallipeddi R, Sarman S (2010) An ensemble of differential evolution algorithms for constrained function optimization. Paper presented at the IEEE

congress on evolutionary computation

Tessema B, Yen GG (2006) A Self adaptive penalty function based algorithm for constrained

optimization. Paper presented at the IEEE congress on evolutionary computation

Tvrdik J, Polakova, R (2010) Competitive differential evolution for constrained problems. Paper

presented at the IEEE congress on evolutionary computation

Wang Y, Cai Z, Guo G, Zhou Y (2007) Multiobjective optimization and hybrid evolutionary algorithm to solve constrained optimization problems. IEEE Trans Syst, Man, Cybern 37(3):560575

Wang Y, Cai Z, Zhou Y, Zeng W (2008) An adaptive tradeoff model for constrained evolutionary

optimization. IEEE Trans Evol Comput 12(1):8092

Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol

Comput 1(1):6782

Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst, Man, Cybern Part

CAppl Rev 30(4):451462

Zhihui L, Liang JJ, Xi H, Zhigang S (2010) Differential evolution with dynamic constraint-handling

mechanism. Paper presented at the IEEE congress on evolutionary computation

Chapter 10

A Hybrid Approach

Rituparna Datta and Kalyanmoy Deb

efficient, scale invariant, and generic constraint-handling procedure in single- and

multi-objective constrained optimization problems. Constrained optimization is a

computationally difficult task, particularly if the constraint functions are nonlinear

and nonconvex. As a generic classical approach, the penalty function approach is a

popular methodology that degrades the objective function value by adding a penalty

proportional to the constraint violation. However, the penalty function approach

has been criticized for its sensitivity to the associated penalty parameters. Since its

inception, evolutionary algorithms (EAs) have been modified in various ways to solve

constrained optimization problems. Of them, the recent use of a bi-objective evolutionary algorithm in which the minimization of the constraint violation is included

as an additional objective, has received significant attention. In this chapter, we propose a combination of a bi-objective evolutionary approach with the penalty function

methodology in a manner complementary to each other. The bi-objective approach

provides an appropriate estimate of the penalty parameter, while the solution of the

unconstrained penalized function by a classical method induces a convergence property to the overall hybrid algorithm. We demonstrate the working of the procedure

on a number of standard numerical test problems. In most cases, our proposed hybrid

methodology is observed to take one or more orders of magnitude lesser number of

function evaluations to find the constrained minimum solution accurately than some

of the best-reported existing methodologies.

Keywords Constrained optimization Penalty function Inequality and equality

constraints Bi-objective evolutionary algorithms Hybrid methodology

R. Datta (B)

Department of Electrical Engineering, Korea Advanced Institute of Science and Technology,

291 Daehak-ro, Yuseong-gu, Daejeon 305-701,

Republic of Korea

e-mail: rdatta@rit.kaist.ac.kr

K. Deb

Department of Electrical and Computer Engineering, Department of Computer Science

and Engineering and Department of Mechanical Engineering, Michigan State University,

428 S. Shaw Lane, 2120 EB, East Lansing, MI 48824, USA

e-mail: kdeb@egr.msu.edu

Springer India 2015

R. Datta and K. Deb (eds.), Evolutionary Constrained Optimization,

Infosys Science Foundation Series, DOI 10.1007/978-81-322-2184-5_10

249

250

10.1 Introduction

Most real-world optimization problems involve constraints mainly due to physical

limitations or functional requirements. A constraint can be of equality type or of

inequality type, but all constraints must be satisfied for a solution to be called feasible. Most often in practice, constraints are of inequality type and in some cases

an equality constraint can be suitably approximated as an inequality constraint. In

some situations, a transformation of an equality constraint to a suitable inequality

constraint does not change the optimal solution. Thus, in most constrained optimization studies, researchers are interested in devising efficient algorithms for handling

inequality constraints.

Traditionally, constrained problems with inequality constraints are solved by using

a penalty function approach in which a penalty term proportional to the extent of

constraint violation is added to the objective function to define a penalized function.

Since constraint violation is included, the penalized function can be treated as an

unconstrained objective, which then can be optimized using an unconstrained optimization technique. A nice aspect of this approach is that it does not care about

the structure of the constraint functions (linear or nonlinear, convex or nonconvex).

However, as it turns out, the proportionality term (which is more commonly known

as the penalty parameter) plays a significant role in the working of the penalty

function approach. In essence, the penalty parameter acts as a balancer between

objective value and the overall constraint violation value. If too small a penalty

parameter is chosen, constraints are not emphasized enough, thereby causing the

algorithm to lead to an infeasible solution. On the other hand, if too large a penalty

parameter is chosen, the objective function is not emphasized enough and the problem behaves like a constraint satisfaction problem, thereby leading the algorithm to

find an arbitrary feasible solution. Classical optimization researchers tried to make

a good balance between the two tasks (constraint satisfaction and convergence to

the optimum) by trial-and-error means in the beginning and finally resorting to a

sequential penalty function approach. In the sequential approach, a small penalty

parameter is chosen at first and the corresponding penalized function is optimized.

Since the solution is likely to be an infeasible solution, a larger penalty parameter is

chosen next and the corresponding penalized function is optimized starting from the

obtained solution of the previous iteration. This process is continued till no further

improvement in the solution is obtained. Although this method, in principle, seems

to eliminate the difficulties of choosing an appropriate penalty parameter by trialand-error or other ad hoc schemes, the sequential penalty function method is found

to not work well on problems having (i) a large number of constraints, (ii) a number

local or global optima, and (iii) different scaling of constraint functions. Moreover,

the performance of the algorithm depends on the choice of initial penalty parameter

value and how the penalty parameter values are increased from one iteration to the

next.

251

approach in different ways. In the initial studies (Deb 1991), a fixed penalty parameter was chosen and a fitness function was derived from the corresponding penalized

function. As expected, these methods required trial-and-error simulation runs to

arrive at a suitable penalty parameter value to find a reasonable solution to a problem. Later studies (Michalewicz and Janikow 1991; Homaifar et al. 1994; Dadios

and Ashraf 2006) have used dynamically changing penalty parameter values (with

generations) and also self-adaptive penalty parameters (Coello and Carlos 2000;

Tessema and Yen 2006) based on current objective and constraint values. Although

the chronology of improvements of the penalty function approach with an EA seemed

to have improved EAs performance from early trial-and-error approaches, radically different methodologies came into existence to suit EAs population and flexible structure. These methodologies made remarkable improvements and somehow

the traditional penalty function approach has remained in oblivion in the recent

past.

Among these recent EA methodologies, the penalty-parameter-less approach (Deb

2000) and its extensions (Angantyr et al. 2003) eliminated the need for any penalty

parameter due to the availability of a population of solutions at any iteration. By

comparing constraint and objective values within population members, these methodologies have redesigned selection operators that carefully emphasized feasible over

infeasible solutions and better feasible and infeasible solutions.

Another approach gaining a lot of popularity is a bi-objective EA approach (Deb

et al. 2007; Ray et al. 2009), in which in addition to the given objective function,

an estimate of the overall constraint violation is included as a second objective.

The development of multi-objective evolutionary algorithms (EMO) (Deb 2001;

Coello et al. 2007) allowed solution of such bi-objective optimization problems

effectively. Although at first instance this may seem to have made the constrainthandling task more complex, certainly the use of two apparently conflicting objectives

(of minimizing given objective and minimizing constraint violation) brings in more

flexibility in the search space, which has the potential to overcome multimodality

and under or over-specification of the penalty parameter.

From the above-mentioned ideas, we combine the original penalty function

approach with a specific bi-objective EAthe elitist nondominated sorting genetic

algorithm (NSGA-II) (Deb et al. 2002) to form a hybrid evolutionary-cum-classical

constrained handling procedure in a complementary manner to each other. The

difficulties of choosing a suitable penalty parameter are overcome by finding the

Pareto-optimal front of the bi-objective problem and deriving an appropriate penalty

parameter from it theoretically. On the other hand, the difficulties of an EA to converge to the true optimum is overcome by solving the derived penalized function

problem using a classical optimization algorithm.

252

Function Approach

A constrained optimization problem is formulated as follows:

Minimize f (x),

subject to gj (x) 0, j = 1, . . . , J,

hk (x) = 0, k = 1, . . . , K,

xil xi xiu , i = 1, . . . , n.

(10.1)

In the above nonlinear programming (NLP) problem, there are n variables, J greaterthan-equal-to type constraints, and K equality constraints. The function f (x) is the

objective function, gj (x) is the jth inequality constraint, and hk (x) is the kth equality

constraint. The ith variable varies in the range [xil , xiu ]. The conventional way to deal

with equality constraints is by converting it into an appropriate inequality constraint:

gJ+k (x) = |k hk (x)| 0, with a small given value of k .

The penalty function approach is a popular approach used with classical and early

evolutionary approaches. In this approach, an amount proportional to the constraint

violation of a solution is added to the objective function value to form the penalized

function value, as follows:

P(x, R) = f (x) +

J

j=1

Rj gj (x) +

K

Rk |hk (x)| .

(10.2)

k=1

The term gj (x) is zero if gj (x) 0 and is gj (x), otherwise. The parameter Rj

is the penalty parameter associated with inequality constraints and Rk is the penalty

parameter associated with equality constraints. The penalty function approach has

the following features:

1. The optimum value of the penalized function P() largely depends on the penalty

parameters Rj and Rk . Users generally attempt with different values of Rj and Rk

to find what value would push the search toward the feasible region. This requires

extensive experimentation to find a reasonable approximation for the solution of

problem given in Eq. (10.1).

2. The addition of the penalty term makes a distortion of the penalized function

from the given objective function. For small values of the penalty parameter,

the distortion is small, but the optimal solution of P() may happen to lie in

the infeasible region. By contrast, if a large Rj and Rk is used, any infeasible

solution has a large penalty, thereby causing any feasible solution to be projected

as an exceedingly better solution than any infeasible solution. The difference

between two feasible solutions gets overshadowed by the difference between an

feasible and an infeasible solution. This often leads the algorithm to converge to

an arbitrary feasible solution. Moreover, the distortion may be so severe that in

253

the presence of two or more constraints, P() may have artificial locally optimal

solutions.

To overcome these difficulties, classical penalty function approach works in a

sequence of solving a number of penalized functions, where in every sequence the

penalty parameters are increased in steps and the current sequence of optimization

begins from the optimized solution found in the previous sequence. However, the

sequential penalty function approach has shown its weakness in (i) handling multimodal objective functions having a number of local optima, (ii) handling a large

number of constraints, particularly due to the increased chance of having artificial

local optimum where the procedure can get stuck to, and (iii) using the numerical

gradient-based approaches, due to the inherent numerical error which is caused in

taking one feasible and one infeasible solution in the numerical gradient computation.

Let us consider a single-variable constrained problem to illustrate some of these

difficulties:

(10.3)

subject to g1 (x) 1 x 0,

g2 (x) x 0.

Figure 10.1 shows the objective function f (x) in x [0, 6.5] in which all solutions

satisfying x > 1 are infeasible.

The constrained minimum is the point H with x = 1. Due to multimodalities

associated with the objective function, the first iteration of the sequential penalty

function method (with R = 0) may find the global minimum (A) of the associated

penalized function P(x, 0). In the next sequence, if R is increased to one and the

resulting P(x, 1) is minimized starting from A, a solution close to A will be achieved,

P(x,R)

R=20

10

1

H

G

5

2

E

D

C

f(x)

A

x

Fig. 10.1 Penalized function for different values of R for the problem given in Eq. (10.3)

254

R is increased continuously, the resulting minimum solution will not change much.

Due to an insignificant change in the resulting optimal solution, the algorithm may

eventually terminate after a few iterations and a solution close to A will be declared

as the final optimized solution. Clearly, such a solution is infeasible and is far from

the true constrained minimum solution (H). The difficulty with the single-objective

optimization task is that even if a solution close to H is encountered, it will not be

judged to be better than solution A in such a problem. We shall get back to this

problem later and illustrate how a bi-objective formulation of the same problem can

allow solutions such as F or G to be present in the population and help find the

constrained minimum in such problems.

There is another point we would like to make from this example. When the

penalized function with R = 20 or more is attempted to solve with a global optimizer,

there is some probability that the algorithm can get out of local optimum (A ) and

converge to the global minimum (H) of the corresponding penalized function, thereby

correctly solving the problem. This gives us a motivation to use a global optimizer,

such as an EA, than a classical gradient-based local search approach, with the penalty

function approach.

Due to the importance of solving the constrained problems in practice, evolutionary algorithm researchers have been regularly devising newer constraint-handling

techniques. A standard EA is modified with a number of different principles for

this purpose. Some comprehensive survey with the evolutionary-based constrainthandling methods can be found in Michalewicz and Schoenauer (1996), Coello and

Carlos (2002), Mezura-Montes and Coello (2011).

Michalewicz and Janikow (1991) classified different constrained-handling

schemes within EA into six different classes. Among them, a majority of the EA

approaches used two methodologies: (i) penalizing infeasible solutions and (ii) carefully delineating feasible and infeasible solutions. We mention the studies related to

each of these two methods in the following subsections.

The initial constrained EA studies used static, dynamic, and self-adaptive penalty

function methods, which handled constraints by adding a penalty term proportional to

the constraint violation to the objective function (Dadios and Ashraf 2006; Homaifar

et al. 1994; Michalewicz and Janikow 1991). Richardson et al. (1989) proposed a set

of guidelines for genetic algorithms using penalty function approach. Gen and Cheng

(1996) proposed a tutorial survey of studies till 1996 on penalty techniques used in

255

genetic algorithms. Coit et al. (1996) proposed a general adaptive penalty technique

which uses a feedback obtained during the search along with a dynamic distance

metric. Another study proposed adaptation of penalty parameter using co-evolution

(Coello and Carlos 2000). A stochastic approach is proposed by Runarsson and Yao

(2000) to balance the objective and penalty functions. Nanakorn and Meesomklin

(2001) proposed an adaptive penalty function that gets adjusted by itself during

the evolution in such a way that the desired degree of penalty is always obtained.

Kuri-Morales and Gutirrez-Garca (2002) proposed a statistical analysis based on

the penalty function method using genetic algorithms with five different penalty

function strategies. For each of these, they have considered three particular GAs. The

behavior of each strategy and the associated GAs is then established by extensively

sampling the function suite and finding the worst-case best values.

Zhou et al. (2003) did not suggest any new penalty term, but performed a time

complexity analysis of EAs for solving constrained optimization using the penalty

function approach. It is shown that when the penalty coefficient is chosen properly,

direct comparison between pairs of solutions using penalty fitness function is equivalent to that using the criteria superiority of feasible point or superiority of objective

function value. They also analyzed the role of penalty coefficients in EAs in terms of

time complexity. In some cases, EAs benefit greatly from higher penalty parameter

values, while in other examples, EAs benefit from lower penalty parameter values.

However, the analysis procedure still cannot make any theoretical predication on the

choice of suitable penalty parameter for an arbitrary problem.

Wang and Ma (2006) proposed an EA-based constraint-handling scheme with continuous penalty function where only one control parameter is proposed on penalty

function. Lin and Chuang (2007) proposed an adjustment of penalty parameter with

generations by using the rough set theory. Matthew et al. (2009) suggested an adaptive

GA that incorporates population-level statistics to dynamically update penalty functions, a process analogous to strategic oscillation used in the tabu search literature.

The method of delineating feasible from infeasible solutions was proposed by

Powell and Skolnick (1993). The method was modified in devising a penaltyparameter-less approach (Deb 2000). From the objective function and constraint

function values, a fitness function is derived so that (i) every feasible solution is

better than any infeasible solution, (ii) between two feasible solutions, the one with

better objective function value is better, and (iii) between two infeasible solutions,

the one with a smaller overall constraint violation is better. Angantyr et al. (2003) is

another effort in this direction:

1. If no feasible individual exists in the current population, the search should be

directed toward the feasible region.

2. If the majority of the individuals in the current populations are feasible, the search

should be directed toward the unconstrained optimum.

3. A feasible individual closer to the optimum is always better than the feasible

individual away from the optimum.

4. An infeasible individual might be a better individual than the feasible individual

if the number of feasible individuals is high.

256

system (AIS), where the role of AIS was to help in pushing the population towards

feasible region (Bernardino et al. 2007). A recent study combined genetic algorithm

with complex search algorithm (Sha and Xu 2011) to improve the convergence

and applied to constrained trajectory optimization. Optimal solution of genetic algorithm was used as an initial parameter for the complex search method. Another recent

methodology proposed a hybrid genetic algorithm with a flexible allowance technique (GAFAT) for solving constrained engineering design optimization problems by

fusing center-based differential crossover (CBDX), Levenberg Marquardt mutation

(LMM), and nonuniform mutation (NUM) (Zhao et al. 2011).

A recent methodology described a framework based on both genetic algorithm

and differential evolution, which consists of collective search operators in every

generation and adaptively mixes them to solve constrained optimization problems

(Elsayed et al. 2011).

More recent studies convert the original problem into a bi-objective optimization

problem in which a measure of an overall constraint violation is used as an additional

objective (Surry et al. 1995; Zhou et al. 2003). Another study suggested the use of

violation of each constraint as a different objective, thereby making the approach a

truly multi-objective one (Coello 2000).

Let us return to the example problem in Fig. 10.1. If we consider a set of solutions

(A to H) and treat them for two objectives (minimization of f (x) and constraint

violation CV(x) = 1 x + x), we obtain the plot in Fig. 10.2. It is clear that

due to the consideration of the constraint violation as an objective, now, solutions

BH are nondominated with solution A. Since a bi-objective EA will maintain a

diverse population of such solutions, any solution (albeit having a worse objective

value f (x)) close to the critical constraint boundary will also be emphasized and

there is a greater chance of finding the true constrained optimum by the bi-objective

optimization procedure quickly. With a single objective of minimizing f (x) (as done

in the penalty function approach), such a flexibility will be lost.

Surry et al. (1995) proposed a multi-objective-based constraint-handling strategy (Constrained Optimization by Multiobjective Optimization Genetic algorithms

(COMOGA)), where the population was first ranked based on the constraint violation

followed by the objective function value. Camponogara and Talukdar (1997) proposed solving a bi-objective problem in which EA generates the Pareto-optimal front.

Based on domination, two solutions are chosen and a search direction is generated.

Zhou et al. (2003) proposed constraint handling based on a bi-objective formulation,

where solutions are ranked based on the SPEA (Zitzler and Thiele 1999)-style Pareto

strength. In the formulation, one objective is the given objective function itself and

degree of constraint violation forms the second objective. In each generation two

257

Dominated

points

H

G

F E

f(x)

D

C

front

Fig. 10.2 Two-objective plot of a set of solutions for the problem given in Eq. (10.3)

offspring are selected based on the highest Pareto strength and with lower degree of

constraint violation.

Venkatraman and Yen (2005) proposed a two-phase framework. In the first phase,

the objective function is neglected and the problem is treated as a constraint satisfaction problem to find at least one feasible solution. Population is ranked based on

the sum of constraint violations. As and when at least a single feasible solution is

found, both the objective function and the constraint violation are taken into account

where two objectives are original objective and summation of normalized constraint

violation values.

Cai and Wang (2005) proposed a novel EA for constrained optimization. In the

process of population evolution, the algorithm is based on multiobjective optimization, i.e., an individual in the parent population may be replaced if it is dominated

by a nondominated individual in the offspring population. In addition, three models

of a population-based algorithm generator and an infeasible solution archiving and

replacement mechanism are introduced. Furthermore, the simplex crossover is used

as a recombination operator to enrich the exploration and exploitation abilities of the

approach proposed.

Deb et al. (2007) proposed a hybrid reference-point-based evolutionary multiobjective optimization (EMO) algorithm, where the summation of normalized constraint violation is used as the second objective. Wang et al. (2008) proposed a

multi-objective way of constraint handling with three main issues: (i) the evaluation of infeasible solutions when the population contains only infeasible individuals;

(ii) balancing feasible and infeasible solutions when the population consists of a

combination of feasible and infeasible individuals; and (iii) the selection of feasible

solutions when the population is composed of feasible individuals only.

258

based on the philosophy of lexicographic goal programming for solving constraint

optimization problems. In the first phase, the objective function is completely disregarded and the entire search effort is directed toward finding a single feasible solution.

In the second phase, the problem is treated as a bi-objective optimization problem,

turning the constraint optimization into a two-objective optimization problem. Ray

et al. (2009) proposed an infeasibility-driven bi-objective method that maintains a

small percentage of infeasible solutions close to the constraint boundary.

Although many other ideas are suggested, researchers realized that the task of finding

the constrained optimum by an EA can be made more efficient and accurate, if it

is hybridized with a classical local search procedure. Some such studies are Myung

and Kim (1998), Fatourechi et al. (2005). A combination of a genetic algorithm and

a local search method can speed-up the search to locate the exact global optimum.

Applying a local search to the solutions that are guided by a genetic algorithm can

help in convergence to the global optimum.

Burke and Smith (2000) proposed a hybrid EA-local search for the thermal generator maintenance scheduling problem. A heuristic is used for solutions initialization.

Fatourechi et al. (2005) proposed a hybrid genetic algorithms for user customization

of the energy normalization parameters in braincomputer interface systems. The GA

is hybridized with local search. Victoire and Jeyakumar (2005) proposed a sequential

quadratic programming (SQP) method for the dynamic economic dispatch problem

of generating units considering the valve-point effects. The developed method is a

two-phase optimizer. In the first phase, the candidates of EP explores the solution

space freely. In the second phase, the SQP is invoked when there is an improvement

of solution (a feasible solution) during the EP run. Thus, the SQP guides EP for better

performance in the complex solution space.

Wang et al. (2006) proposed an effective hybrid genetic algorithm (HGA) for a

flow shop scheduling problem with limited buffers. In the HGA, not only multiple

genetic operators are used simultaneously in a hybrid sense, but also a neighborhood

structure based on graph theoretical approach is employed to enhance the local search,

so that the exploration and exploitation abilities can be well balanced. Moreover, a

decision probability is used to control the utilization of genetic mutation operation

and local search based on problem-specific information so as to prevent the premature convergence and concentrate the computing effort on promising neighboring

solutions.

El-Mihoub et al. (2006) proposed different forms of integration between genetic

algorithms and other search and optimization techniques and also examined several

issues that needed to be taken into consideration when designing an HGA that used

another search method as a local search tool.

259

Deb et al. (2007) proposed a hybrid reference-point-based evolutionary multiobjective optimization (EMO) algorithm coupled with the classical SQP procedure

for solving constrained single-objective optimization problems. The reference pointbased EMO procedure allowed the procedure to focus its search near the constraint

boundaries, while the SQP methodology acted as a local search to improve the

solutions. Deep et al. (2008) proposed a constraint-handling method based on the

features of genetic algorithm and self-organizing migrating algorithm.

Araujo et al. (2009) proposed a novel methodology to be coupled with a genetic

algorithm to solve optimization problems with inequality constraints. This methodology can be seen as a local search operator that uses quadratic and linear approximations for both objective function and constraints. In the local search phase, these

approximations define an associated problem with a quadratic objective function and

quadratic and/or linear constraints that are solved using a linear matrix inequality

(LMI) formulation. The solution of this associated problems is then reintroduced in

the GA population.

Bernardino et al. (2009) proposed a hybridized genetic algorithm (GA) with an

artificial immune system (AIS) as an alternative to tackle constrained optimization

problems in engineering. The AIS is inspired by the clonal selection principle and is

embedded into a standard GA search engine in order to help move the population into

the feasible region. The resulting GA-AIS hybrid is tested in a suite of constrained

optimization problems with continuous variables, as well as structural and mixed

integer reliability engineering optimization problems. In order to improve the diversity of the population, a variant of the algorithm is developed with the inclusion of a

clearing procedure. The performance of the GA-AIS hybrids is compared with other

alternative techniques, such as the adaptive penalty method, and the stochastic ranking technique, which represent two different types of constraint handling techniques

that have been shown to provide good results in the literature.

Yuan and Qian (2010) proposed a new HGA combined with local search to solve

twice continuously differentiable nonlinear programming (NLP) problems. The local

search eliminates the necessity of a penalization of infeasible solutions or any special

crossover and mutation operators.

Recently Mezura-Montes (2009) edited a book on constraint handling in evolutionary optimization. The most recent study in constraint-handling survey using

nature-inspired optimization can be found in Mezura-Montes and Coello (2011). The

following methodologies are briefly described in their paper:

Feasibility rules

Stochastic ranking

-constraint method

Novel penalty functions

Novel special operators

Multiobjective concepts

Ensemble of constraint-handling techniques

The authors also showed a good future direction for the researchers in constrainthandling areas. These areas will be helpful for researchers, novice, and experts alike.

260

Year

Fig. 10.3 Paper published in evolutionary constrained optimization per year (19612013,

September 26) (taken from Coello (2013))

Constraint Approximation

Dynamic constraints

Hyper-heuristics

Theory

The aforesaid literature clearly indicates that different techniques are proposed

using EAs for efficient constraint handling. However, it is difficult to cover the

whole literature on constraint handling. Coello (2013) maintains a constrainthandling repository which holds a broad spectrum of constraint-handling techniques.

Figure 10.3 quantitatively shows the histogram of a number of paper published in

evolutionary constrained optimization. From Fig. 10.3 it is clear that researchers are

coming up with new constraint-handling mechanisms using EAs, for which the number of published papers is directly proportional to time. For the year 2013, we have

data until September 26.

It is clear from the above growing list of literature that EAs are increasingly being used

for constrained optimization problems. This popularity is due to their flexibility in

working with any form of constraint violation information and ability to get integrated

with any other algorithm. In this section, demonstration of this flexibility of EAs is

given by using a bi-objective EA and integrating it with a penalty-function-based

261

classical approach to speed-up the convergence. The main motivation of the hybridization is to take advantage of one method to overcome difficulties of the other method

and, in the process, develop an algorithm that may outperform each method individually and preferably to most reported high-performing algorithms.

Evolutionary multi-objective optimization (EMO) algorithms have demonstrated

enough for their ability to find multiple trade-off solutions for handling two, three, and

four conflicting objectives. The principle of EMO has also been utilized to problems

other than multi-objective optimization problemsa process now largely known as

a multiobjectivization process (Knowles et al. 2008). Although we are interested

in solving a single-objective constrained optimization problem, we have mentioned

earlier that the concept of multi-objective optimization was found to be useful and

convenient in handling single-objective constrained optimization problems. A biobjective optimization problem is formulated to handle single-objective constrained

problems in the past (Coello 2000; Deb et al. 2007; Surry et al. 1995). Let us consider

the following single-objective, two-variable minimization problem:

minimize f (x) = 1 + x12 + x22 ,

subject to g(x) 1 (x1 1.5)2 (x2 1.5)2 0.

(10.4)

The feasible region is the area inside a circle of radius one and center at (1.5, 1.5)T .

Since the objective function is one more than the distance of any point from

the origin,

the constrained minimum lies on the circle and at x1 = x2 = 1.5 1/ 2 = 0.793.

The corresponding function value is f = 2.121. Thus, in this problem, the minimum

point makes the constraint g() active. This problem was also considered elsewhere

(Deb 2001).

Let us now convert this problem into the following two-objective problem:

minimize f1 (x) = CV(x) = g(x),

x12 + x22 ,

(10.5)

where CV(x) is the constraint violation. For multiple inequality and equality constraints, the constraint violation function is defined in terms of normalized constraint

functions, as follows:

CV(x) =

J

K

hk (x) .

gj (x) +

j=1

k=1

(10.6)

262

6

5

f(x)

f(x)

minimum, feasible

solutions of the original

single-objective optimization

problem, and the

Pareto-optimal set of the

bi-objective problem given

in Eq. (10.5)

R0

Feasible solutions

of equation (1)

1 tangent

CV(x)

A

Constrained minimum

2

1

Unconstrained

minimum

Paretooptimal front

10

15

20

25

For the above problem, the first objective (f1 ()) is always nonnegative. If for any

solution the first objective value is exactly equal to zero, it is the feasible solution to

the original problem, given in Eq. (10.4). Figure 10.4 shows the objective space of

the above bi-objective optimization problem. Since all feasible solutions lie on the

CV = 0 axis, the minimum of all feasible solutions corresponds to the minimum

point of the original problem. This minimum solution is shown in the figure as

solution A.

The corresponding Pareto-optimal front for the two-objective optimization problem (given in Eq. (10.5)) is marked. Interestingly, the constraint minimum solution

A lies on one end of the Pareto-optimal solution front. Such bi-objective problems

are usually solved using a lexicographic method (Miettinen 1999), in which after

finding the minimum-CV solution (corresponds to CV = 0 here), the second level

optimization task would minimize f (x) subject to CV(x) 0. But this problem

is identical to the original problem (Eq. (10.4)). Thus, the lexicographic method of

solving the bi-objective problems is not computationally and algorithmically advantageous in solving the original constrained optimization problem. However, an EMO

with a modification in its search process can be used to solve the bi-objective problem. Since we are interested in the extreme solution A, there is no need for us to

find the entire Pareto-optimal front. Fortunately, a number of preference-based EMO

procedure which can find only a part of the entire Pareto-optimal front (Branke 2008;

Branke and Deb 2004). In solving constrained minimization problems, we may then

employ such a technique to find the Pareto-optimal region close to the extreme left

of the Pareto-optimal front (as in Fig. 10.4).

In summary, we claim here that since an EMO procedure (even for a preferencebased EMO approach) emphasizes multiple trade-off solutions by its niching (crowding or clustering) mechanism, an EMO population will maintain a diverse set of

solutions than a single-objective EA would. This feature of EMO should help solve

complex constrained problems better. Moreover, the use of bi-objective optimization

263

avoids the need of any additional penalty parameter which is required in a standard

penalty function-based EA approach.

EAs and EMOs do not use gradients or any mathematical optimality principle to

terminate their runs. Thus, a final solution found with an EMO is always questionable for its nearness to the true optimum solution. For this purpose, EA and EMO

methodologies are recently being hybridized with a classical optimization method as

a local search operator. Since the termination of a local search procedure is usually

checked based on mathematical optimality conditions (such as the Kaursh-KuhnTucker (KKT) error norm being close to zero, as used in standard optimization softwares (Byrd et al. 2006; Moler 2004), and the solution of the local search method

is introduced in the EA population, the final EA solution also carries the optimality

property. Usually, such local search methods are sensitive to the initial point used to

start the algorithm and the use of an EA is then justified for the supply of a good initial

solution to a local search method. Some such implementations can be found in Hedar

and Fukushima (2003) for single-objective optimization problems and Sharma et al.

(2007), Kumar et al. (2007), Sindhya et al. (2008) for multiobjective optimization

problems.

In this study, we are interested in using a classical penalty function approach with

our proposed bi-objective approach, mainly due to the simplicity and popularity of

penalty function approaches for handling constraints. Instead of using a number of

penalty parameters, one for each constraint as proposed in Eq. (10.2), a normalization

technique of each constraint may help us use only one penalty parameter. Most

resource or limitation-based constraints usually appear with a left-side term (gj (x))

restricted to have a least value bj , such that gj (x) bj . In such constraints, we suggest

the following normalization process:

g j (x) = gj (x)/bj 1 0.

(10.7)

the following unconstrained penalty term, requiring only one penalty parameter R:

P(x, R) = f (x) + R

J

gj (x).

(10.8)

j=1

Here, the purpose of the penalty parameter is to make a balance of the overall constraint violation to the objective function value. If an appropriate R is not chosen, the

optimum solution of the above penalized function P() will not be close to the true

constrained minimum solution. There is an intimate connection to this fact with our

bi-objective problem given in Eq. (10.5), which we discuss next.

264

violation arising from all inequality constraints can be

written as CV(x) = Jj=1 gj (x). Thus, the penalized term given in Eq. (10.8) can

be written as follows:

P(x, R) = f (x) + R CV(x),

= f2 (x) + Rf1 (x),

(10.9)

(10.10)

where f1 () and f2 () are described in Eq. (10.5). It is well known that one way to solve

a two-objective minimization problem (minimize {f1 (x), f2 (x)}) is to convert the

problem as a weighted-sum minimization problem (Chankong and Haimes 1983):

minimize Fw1 ,w2 (x) = w1 f1 (x) + w2 f2 (x).

(10.11)

In the above formulation, w1 and w2 are two nonnegative numbers (and both are not

zero). It is proven that the solution to the above problem is always a Pareto-optimal

point of the two-objective optimization problem (Miettinen 1999). Moreover, the

optimal point of problem (10.11) is a particular point on the Pareto-optimal front

which minimizes Fw1 ,w2 . For a convex Pareto-optimal front, the optimal point for

the weighted-sum approach is usually the point on which the linear contour line

of the weighted-sum function is tangent to the Pareto-optimal front, as depicted in

Fig. 10.5. The contour line has a slope of m = w1 /w2 .

Against this background, let us now compare Eqs. (10.11 with 10.10). We observe

that solving the penalized function P() given in Eq. (10.10) is equivalent to solving

the bi-objective optimization problem given in Eq. (10.5) with w1 = R and w2 = 1.

This implies that for a chosen value of penalty parameter (R), the corresponding

optimal solution will be a Pareto-optimal solution to the bi-objective problem given

in Eq. (10.5), but need not be the optimal solution for the original single-objective

optimization problem (or solution A). This is the reason why the penalty function

weights in the weight-sum

approach for a generic

bi-objective optimization

f2

A

w1

w2

f1

Paretooptimal front

265

approach produce different optimized solutions.

This connection makes one aspect clear. Let us say that at CV = 0, the slope

of the Pareto-optimal front of the bi-objective problem is R0 , or m = R0 , as

illustrated in Fig. 10.4. Thus, for R R0 , the optimal solution of the corresponding

penalized function (Eq. (10.10)) is nothing but the constrained optimum solution.

This reveals that for any problem there exists a critical lower bound of R which will

theoretically cause the penalty function approach to find the constrained minimum.

This critical value (R0 ) is nothing but the slope of the Pareto-optimal curve at the

zero constraint violation solution. However, the information of this critical R is not

known a priori and here we propose our hybrid bi-objective-cum-penalty-function

approach to compute R0 for this purpose.

The key issue is then to identify the critical R for a particular problem, as it involves

knowing the optimal solution A beforehand. However, there is another fact that we

can consider here to avoid computing R0 . It also seems that if R is larger than R0 the

corresponding minimum of the penalized function P() will also be the constrained

minimum of the original problem. Extending the idea, we can then use an R which

is arbitrarily large (say 106 or more) for this purpose and be done with for every

problem. Theoretically, for such a large R, the idea of solving the penalized function

should work, but there is a practical problem that does not allow us to use such a

large value of R. With an unnecessarily large R, the objective function f () has almost

no effect on P(). The problem becomes more of a constraint satisfaction problem,

rather than a constrained optimization problem. In such a case, the search is directed

toward the feasible region and not specifically directed toward the constrained minimum solution. If particularly this solution is not close to the optimum solution, it

then becomes difficult to converge to the constrained minimum solution. With a large

penalty parameter, there is a scaling problem which is also critical for the classical

gradient-based methods. When solutions come close to the constrained boundary,

any numerical gradient computation will involve evaluation of some solutions from

the feasible region and some from the infeasible region to utilize the finite difference

idea. Since infeasible solutions are heavily penalized, there will be large difference

in the function values, thereby causing an instability in the numerical derivative computations. This is the reason that the classical penalty function approach (Reklaitis

et al. 1983) considers a successive use of penalty function method with a carefully

chosen sequence of R.

In the following subsection, we present our hybrid methodology which would

find an appropriate R through a bi-objective optimization adaptively.

Based on the bi-objective principles of handling a constrained optimization problem

and the use of a penalty function approach mentioned above, we now propose the

266

t = 0.

Step 1:

Step 2:

Step 3:

the nondominated front:

minimize f (x),

minimize CV(x),

(10.12)

subject to CV(x) c,

x(L) x x(U) .

The constraint is added to find the nondominated solutions close to

minimum-CV(x) solution. Since CV(x) is the normalized constraint violation (Eq. (10.6)), it is suggested that c = 0.2J be chosen for problems

having no equality constraints and c = 0.4(J + K) in the presence of

equality constraints. To have an adequate number of feasible solutions in

the population to estimate the critical penalty parameter R0 , we count the

number of feasible solutions (checked with CV 106 ). If there are more

than three bi-objective feasible solutions (with CV c) in the population,

we proceed to Step 2, else increment generation counter t and repeat Step 1.

If t > 0 and ((t mod ) = 0), compute Rnew from the current nondominated front as follows. First, a cubic-polynomial curve is fitted for the

nondominated points (f = a + b(CV) + c(CV)2 + d(CV)3 ) and then the

penalty parameter is estimated by finding the slope at CV = 0, that is,

R = b. Since this is a lower bound on R, we use R = rb, where r is

a weighting parameter greater than equal to one. So as not to have abrupt

changes in the values of R between two consecutive local searches, we set

Rnew = (1 w)Rprev + wR, where w is a weighting factor. In the very first

local search, we use Rnew = R.

Thereafter, the following penalized function is optimized with Rnew computed from above and starting with the current minimum-CV solution:

Jj=1 gj (x),

if K = 0,

2

minimize P(x) = f (x) + Rnew J

K

j=1 gj (x)2 + k=1 h k (x) , otherwise.

x(L) x x(U) .

(10.13)

Step 4:

If x is feasible and the difference between f (x) and the objective value

of the previous local searched solution (or a given target objective value)

is smaller than a small number f (104 is used here), the algorithm is

terminated and x is declared as the optimized solution. Else, we increment t

by one, set Rprev = R, and proceed to Step 1.

267

parameter and gets adjusted from the obtained nondominated front. However, we

have introduced three new parameters , r, and w, instead. Our extensive parametric

study (described in Sect. 10.7) on a number of problems shows that two of these

parameters (w and r) do not have much effect on the outcome of our proposed

method, as an appropriate penalty parameter will have an effect on the performance

of an algorithm. Moreover, the parameter [1, 5] works well on all problems

studied here. By contrast, the choice of a penalty parameter in a penalty function

approach is crucial and we attempt to overcome this aspect by making an educated

guess of this parameter through a bi-objective study.

In all our study, we use Matlabs fmincon() procedure to solve the penalized

function (the local search problem of Step 3) with standard parameter settings. Function evaluations needed by fmincon() procedure are added to those needed by the

bi-objective NSGA-II procedure to count the overall function evaluations. Other local

search solvers (such as Knitro (Byrd et al. 2006)) may also be used instead.

To illustrate the working of our proposed hybrid approach, we consider a two-variable

problem first (Problem P1):

minimize f (x) = (x1 3)2 + (x2 2)2 ,

subject to g1 (x) 4.84 (x1 0.05)2 (x2 2.5)2 0,

g2 (x) x12 + (x2 2.5)2 4.84 0,

0 x1 6, 0 x2 6.

(10.14)

For this problem, only constraint g1 is active at the minimum point. To demonstrate

the working of our proposed hybrid strategy, we use different optimization techniques

to solve the same problem.

First, we find the Pareto-optimal front for two objectivesminimization of f (x) and

minimization of constraint violation CV(x))near the minimum-CV solution, by

solving the following -constraint problem and by generating the Pareto-optimal

front theoretically (Chankong and Haimes 1983):

minimize f (x),

subject to g1 (x) ,

0 x1 6, 0 x2 6.

(10.15)

268

front from KKT theory and

by proposed hybrid

procedure

f(x)

R0

CV(x)

We use different values of and for each case find the optimum solution by solving

mathematical KKT conditions exactly. The resulting f (x) and CV(x) values are

shown in Fig. 10.6 with diamonds.

The optimum solution of the problem given in Eq. (10.14) is obtained for = 0

and is x = (2.219, 2.132)T with a function value of 0.627. The corresponding

Lagrange multiplier is u1 = 1.74. Later, we shall use this theoretical result to verify

the working of our hybrid procedure.

When we fit a cubic polynomial passing through the obtained points (f -CV) from

the above theoretical analysis, we obtain the following fitted function of the Paretooptimal front:

f = 0.628 1.739(CV) + 1.643(CV)2 0.686(CV)3 .

(10.16)

m = 1.739 is the critical lower bound of R. This result indicates that if we use

any penalty parameter greater than or equal to R0 , we hope to find the constrained

optimum solution using the penalty function method.

To investigate, we consider a number of R values and find the optimal solution

of the resulting penalty function (with g1 () alone) using KKT optimality conditions.

The solutions are tabulated in Table 10.1.

The unconstrained minimum has a solution (3, 2)T with a function value equal to

zero. When a small R is used, the optimal solution of the penalized function is close to

this unconstrained solution, as shown in the table and in Fig. 10.7. As R is increased,

the optimized solution gets closer to the constrained minimum solution and function

value reaches 0.6274 at around R = 1.74. The solution remains more or less at this

value for a large range of R. For a large value of R (for R > 50), the optimized

solutions move away from the constrained minimum and converges to an arbitrary

feasible solution. With a large R, the problem becomes a constraint satisfaction

269

Table 10.1 Effect of penalty parameter values for the problem given in Eq. (10.14)

Penalty parameter

x1

x2

F

0.01

2.9939

0.1

2.9406

1

2.4949

1.5

2.3021

1.75

2.2189

10

2.2187

15

2.2191

50

2.2215

Theoretical optimum (using Eq. (10.18))

1.74

2.219

CV

2.0010

2.0096

2.0856

2.1183

2.1303

2.1302

2.1326

2.1469

0.0085

0.0812

0.5330

0.6181

0.6274

0.6274

0.6274

0.6277

0.8421

0.7761

0.2507

0.0780

0.0001

0

0

0

2.132

0.627

search region is within the

two circular arcs for the

problem given in Eq. (10.14).

Results for different penalty

parameter values are shown

3.5

3.0

Infeasible

region

2.0

2.5

1.5

Feasible

region

1.0 0.1

1.5 0.5 R=0.01

150

0.5

0.79

1,500

2

1.0

R=15,000

1.75

Infeasible

region

problem. Since constraint satisfaction becomes the main aim, the algorithm converges

to any arbitrary feasible solution. This example clearly shows the importance of

setting an appropriate value of R. Too small or too large values may produce infeasible

or an arbitrary feasible solution, respectively.

and Lagrange Multiplier

For a single active constraint (g1 (x) 0) at the optimum x , there is an important

result we would like to discuss here. The KKT equilibrium conditions for the problem

given in Eq. (10.1) without equality constraints are as follows:

270

f (x ) u1 g1 (x ) = 0,

g1 (x ) 0,

u1 g1 (x ) = 0,

u1 0.

Here, any variable bound that will be active at the optimum must also be considered as

an inequality constraint. Next, we consider the penalized function given in Eq. (10.2).

The solution (xp ) of the penalized function (given at Eq. (10.8)) at an Rcr R0 can

be obtained by setting the first derivative of P() to zero:

f (xp ) + Rcr

dg1 (xp )

g1 (xp ) = 0.

dg1

(10.17)

The derivative of the bracket operator at g1 = 0 does not exist, as at a point for which

g1 = 0+ , the derivative is zero and at a point for which g1 = 0 , the derivative is

1. But considering that an algorithm usually approaches the optimum from the

infeasible region, the optimum is usually found with an arbitrarily small tolerance

on constraint violation. In such a case, the derivative at a point xp for which g1 = 0

is 1. The comparison of both conditions states that Rcr = u1 . Since xp is arbitrarily

close to the optimum, thus the second and third KKT conditions above are also

satisfied at this point with the tolerance. Since u1 = Rcr and the penalty parameter

R is chosen to be positive, u1 > 0. Thus, for a solution of the penalized function

formed with a single active constraint, we have an interesting and important result:

Rcr = u1 .

(10.18)

For the example problem of this section, we notice that the u1 = 1.74 obtained from

the KKT condition is identical to the critical lower bound on Rcr .

Finding the bi-objective Pareto-optimal front through a generating method verified

by KKT optimality theory and by verifying the derived critical penalty parameter

with the theoretical Lagrange multiplier obtained through the KKT optimality theory,

we are now certain about two aspects:

1. The obtained bi-objective front is optimal.

2. The critical penalty parameter obtained from the front is adequate to obtain the

constrained minimum.

We now apply our proposed hybrid strategy to solve the same problem.

In Step 1, we apply NSGA-II (Deb et al. 2002) to solve the bi-objective optimization problem (minimize {f (x), CV(x)}). Following parameter values are used:

population of size 60, crossover probability 0.9, mutation probability 0.5, crossover

271

Table 10.2 Function evaluations, FE (NSGA-II and local search), needed by the hybrid algorithm

in 25 runs

Best

Median

Worst

FE

f

0.627380

0.627379

0.627379

index 10, and mutation index 100 (Deb 2001). Here, we use = 5, r = 2, and

w = 0.5. The hybrid algorithm is terminated when two consecutive local searches

produce feasible solutions with a difference of 104 or less in the objective values.

The obtained front is shown in Fig. 10.6 with small circles, which seems to match

with the theoretical front obtained by performing KKT optimality conditions on

several -constraint versions (in diamonds) of the bi-objective problem.

At best, our hybrid approach finds the optimum solution in only 677 function

evaluations (600 needed by EMO and 77 by fmincon() procedure). The corresponding solution is x = (2.219, 2.132)T with an objective value of 0.627380.

Table 10.2 shows the best, median, and worst performance of the hybrid algorithm

in 25 different runs.

Figure 10.8 shows the variation of population-best objective value with generation

number for the median performing run (with 999 function evaluations). The figure

shows that the objective value reduces with generation number. The algorithm could

not find any feasible solution in the first two generations, but from generation 3,

the best population member is always feasible. At generation 5, the local search

method is called the first time. The penalty parameter obtained from the NSGA-II

front is R = 1.896 at generation 5 and a solution very close to the true optimum is

R=1.896

Infeasible

points

R

R=1.722

Fig. 10.8 Objective value reduces with generation for the problem in Eq. (10.14)

272

local search. Since our algorithm terminates when in two consecutive local searches,

solutions within a function difference of 104 or smaller is obtained, the algorithm

continues for another round of local search at generation 15 before termination. At

this generation, the penalty parameter value is found to be R = 1.722, which is close

to the critical R for this problem, as shown in Table 10.1.

Thus, it is observed that the bi-objective algorithm