Full Text 01

Feasible Direction Methods for
Constrained Nonlinear Optimization

Suggestions for Improvements
Maria Mitradjieva–Daneva
Linköping 2007
Linköping Studies in Science and Technology. Dissertations, No. 1095
Feasible Direction Methods for

Constrained Nonlinear Optimization
Suggestions for Improvements
Maria Mitradjieva-Daneva
Division of Optimization, Department of Mathematics, Linköping University,

SE-581 83 Linköping, Sweden
Copyright c 2007 Maria Mitradjieva-Daneva, unless otherwise noted.

All rights reserved.
ISBN: 978-91-85715-11-4 ISSN 0345-7524
Typeset LATEX 2ε
Printed by LiU-Tryck, Linköping University, SE - 581 83 Linköping,

Sweden 2007
To the men of my life,
Stefan, Petter, Martin and Danyo.
Acknowledgments
There are a lot of people who made the appearance of this work possible.
First of all my sincere thanks go to professor Maud Göthe Lundgren, for all
encouragement and advice. My deepest thanks for her support during my
time of difficulties.
Many thanks go to Clas Rydergren for all help and support he has given me
over the years. It has been a great pleasure to work with him.
Special thanks to Torbjörn Larsson for his guidance in optimization theory,

research and writing methodology. The interesting discussions with him and
his profound remarks were very helpful.
I would like to thank professor Per Olov Lindberg, for giving me the op-
portunity to work within the Optimization group in Linköping. I have to
thank him for his importance in my work, for his enthusiasm and endless
finickiness.
I also gratefully acknowledge the financial support from KFB (Swedish Trans-
portation & Communications Research Board) and later Vinnova under the
project ”Mathematical Models for Complex Problems within the Road and
Traffic Area”.
Special acknowledgments to Leonid Engelson from Inregia, Stockholm, for

introducing me an interesting research topic.
Many thanks go to all my colleagues at the Division of Optimization. There

are many that from behind the scenes have encouraged me and made my
work pleasant and easier. I am especially grateful to Helene, Andreas and
Oleg for all support and discussions. Sometimes only a few words can make
a lot!
v
Many heartfelt thanks to the girls in LiTH Doqtor, esspecially to Linnéa.
There is one man in my life who urged me on by way of his unbelievable

generousness and love. To Danyo, I send all my love.
Last, but absolutely not least, I would like to express my deepest gratitude
to my parents, my sister Rumi and my friends only for being there.
Thank you to my lovely sons, Petter, Stefan and Martin, who made hard
times seem brighter with their cheering laugh.
To all of you, I send my deepest gratitude!
Linköping, May 2007

Maria Mitradjieva-Daneva
vi
Sammanfattning
Avhandlingen behandlar utveckling av nya effektiva optimeringsmetoder.

Optimering med hjälp av matematiska modeller används inom en mängd
tillämpningar, såsom trafikplanering, telekommunikation, schemaläggning,
produktionsplanering, finans, massa- och pappersindustri. I avhandlingen
studeras lösningsmetoder för olinjära optimeringsproblem.
De optimeringsmetoder som utvecklas i avhandlingen är tillämpbara för ett

stort antal problemtyper. I avhandlingen studeras bland annat trafikjäm-
viktsproblemet, vilket är centralt vid analys och planering av trafiksystem.
Denna typ av modell kan användas för att simulera val av färdväg i sam-
band med arbetsresor i tätort. Vi har studerat flera typer av trafikjämvikter,
till exempel sådana som tar hänsyn till trafikanternas tidsvärdering vid
beräkning av vägavgifter baserade på samhällsekonomiska marginalkostnader.
Avhandlingen beskriver nya koncept för snabbare och noggrannare lösnings-

metoder. Snabbhet och noggrannhet är speciellt viktiga då man har opti-
meringsproblem med ett stort antal beslutsvariabler. Metoderna som utveck-
lats har det gemensamt att de är baserade på tillåtna sökriktningar. Meto-
dutvecklingen som föreslås i avhandlingen bygger på förbättringar vid beräk-
ning av dessa tillåtna riktningar.
Avhandlingen har rubriken: Tillåtnariktningsmetoder för begränsad olinjär

optimering - några förslag till förbättringar
vii
Abstract
This thesis concerns the development of novel feasible direction type algo-
rithms for constrained nonlinear optimization. The new algorithms are based
upon enhancements of the search direction determination and the line search
steps.
The Frank–Wolfe method is popular for solving certain structured linearly

constrained nonlinear problems, although its rate of convergence is often
poor. We develop improved Frank–Wolfe type algorithms based on conjugate
directions. In the conjugate direction Frank–Wolfe method a line search is
performed along a direction which is conjugate to the previous one with
respect to the Hessian matrix of the objective. A further refinement of
this method is derived by applying conjugation with respect to the last two
directions, instead of only the last one.
The new methods are applied to the single-class user traffic equilibrium prob-
lem, the multi-class user traffic equilibrium problem under social marginal
cost pricing, and the stochastic transportation problem. In a limited set
of computational tests the algorithms turn out to be quite efficient. Addi-
tionally, a feasible direction method with multi-dimensional search for the
stochastic transportation problem is developed.
We also derive a novel sequential linear programming algorithm for general

constrained nonlinear optimization problems, with the intention of being
able to attack problems with large numbers of variables and constraints.
The algorithm is based on inner approximations of both the primal and
the dual spaces, which yields a method combining column and constraint
generation in the primal space.
ix
Contents
Sammanfattning vii
Abstract ix
Contents xi
PART I: INTRODUCTION AND OVERVIEW
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Selected topics in nonlinear optimization . . . . . . . . . . . . 5
2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Descent directions . . . . . . . . . . . . . . . 5
2.1.2 Line search . . . . . . . . . . . . . . . . . . . 8
2.2 Linearly constrained optimization . . . . . . . . . . . . 9
2.2.1 The Frank–Wolfe method . . . . . . . . . . 9
2.2.2 Simplicial decomposition . . . . . . . . . . . 10
2.3 General constrained optimization . . . . . . . . . . . . 12
2.3.1 Sequential linear programming . . . . . . . . 12
2.3.2 Sequential quadratic programming . . . . . . 13
2.4 The Lagrangian dual problem . . . . . . . . . . . . . . 14
xi
2.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . 15
3 Outline of the thesis and contribution . . . . . . . . . . . . . 16
4 Chronology and publication status . . . . . . . . . . . . . . . 20
Bibliography 23
PART II: APPENDED PAPERS
PAPER I: The Stiff is Moving — Conjugate Direction Frank–

Wolfe Methods with Applications to Traffic Assignment
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2 The Frank–Wolfe method and modifications . . . . . . . . . . 38
2.1 The Frank–Wolfe method . . . . . . . . . . . . . . . . 38
3 Conjugate direction Frank–Wolfe methods . . . . . . . . . . . 40
3.1 The conjugate Frank–Wolfe method, CFW . . . . . . 41
3.2 Outline of the CFW algorithm . . . . . . . . . . . . . 42
3.3 The bi-conjugate Frank–Wolfe method, BFW . . . . 44
3.4 Convergence of CFW method . . . . . . . . . . . . . 45
4 Applications to traffic assignment problems . . . . . . . . . . 49
4.1 The fixed demand traffic assignment problem . . . . . 49
4.2 Computational experiments . . . . . . . . . . . . . . . 50
4.3 A comparison with origin-based and DSD methods . . 53
A Derivation of the coefficients βki in BFW . . . . . . . . . . . . 58
B Closedness of the mapping A(Dk , Nk ) . . . . . . . . . . . . . 60
C Closedness of the conjugation map DCF W . . . . . . . . . . 61
Bibliography 63
xii
PAPER II: Multi-Class User Equilibria under Social Marginal
Cost Pricing
1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2 Multi-class user equilibria . . . . . . . . . . . . . . . . . . . . 71
3 Equilibria under social marginal cost pricing . . . . . . . . . . 72
4 A two-link example . . . . . . . . . . . . . . . . . . . . . . . . 74
5 A Frank–Wolfe algorithm for the SMC equilibrium . . . . . . 75
6 Some experimental results . . . . . . . . . . . . . . . . . . . . 75
6.1 The two link network . . . . . . . . . . . . . . . . . . 75
6.2 Sioux Falls network . . . . . . . . . . . . . . . . . . . 75
Bibliography 79
PAPER III: A Conjugate Direction Frank–Wolfe Method for

Nonconvex Problems
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2 Conjugate directions . . . . . . . . . . . . . . . . . . . . . . . 86
3 The extended conjugate Frank–Wolfe method . . . . . . . . . 88
3.1 Outline of the ECFW algorithm . . . . . . . . . . . . 90
4 Applications to marginal cost congestion tolls . . . . . . . . . 90
4.1 Multi-class traffic equilibria under SMC pricing . . . . 91
4.2 Computational experiments . . . . . . . . . . . . . . 91
Bibliography 95
PAPER IV: A Comparison of Feasible Direction Methods for

the Stochastic Transportation Problem
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xiii
2 The stochastic transportation problem . . . . . . . . . . . . . 102
3 Feasible direction methods for STP . . . . . . . . . . . . . . . 104
3.1 The Frank–Wolfe, FW . . . . . . . . . . . . . . . . . . 105
3.2 The diagonalized Newton method, DN . . . . . . . . . 106
3.3 The conjugate Frank–Wolfe method, CFW . . . . . . 108
3.4 Frank–Wolfe with multi-dimensional search, MdFW . 109
3.5 The heuristic Frank–Wolfe method, FWh . . . . . . . 110
4 Test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 113
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Bibliography 119
PAPER V: A Sequential Linear Programming Algorithm with

Multi-dimensional Search — Derivation and Convergence
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2 Related methods . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.1 Sequential linear programming algorithms . . . . . . . 127
2.2 Simplicial decomposition . . . . . . . . . . . . . . . . . 129
3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4 SLP algorithm with multi-dimensional search . . . . . . . . . 133
4.1 Derivation of the multi-dimensional SLP algorithm . . 133
4.2 Convergence to KKT points . . . . . . . . . . . . . . . 136
5 MdSLP in the convex case . . . . . . . . . . . . . . . . . . . . 140
6 An illustrational example . . . . . . . . . . . . . . . . . . . . 142
7 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 148
xiv
7.1 Termination criteria . . . . . . . . . . . . . . . . . . . 149
7.2 Numerical results . . . . . . . . . . . . . . . . . . . . . 150
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Bibliography 157
A Computation of the first-order optimality conditions . . . . . 161
B Extension to the case of SQP . . . . . . . . . . . . . . . . . . 163
xv
PART I
Introduction and Overview

1 Introduction 3
1 Introduction
The field of nonlinear programming has a very broad range of applications

and it has experienced major developments in the last few decades. Nonlin-
ear models arise in various fields of the real life and there is a wide variety of
approaches for solving the resulting nonlinear optimization programs. Non-
linear optimization problems appear, for example, in routing problems in
traffic [9, 48, 52, 63, 65] and telecommunications [29], oil [35] and chemical in-
dustries [47], design optimization of large-scale structures [43, 44], variational
inequalities [62], applications in structural optimization [66], economics [42],
marketing [54, 58] and business applications [36], in solving systems of equa-
tions [77], and in scientific applications as biology, chemistry, physics and
mechanics, protein structure prediction [59], etc.
The traffic assignment problem is a nonlinear model, which describes how

each traveler minimizes his/her own travel cost for reaching the desired des-
tination. Modeling of the travel times, congestion and differences in the
travelers value of time leads to nonlinearities. In the management of invest-
ment portfolios, the goal might be to determine a mix of investments so as
to maximize return while minimizing risk. The nonlinearity in the model
comes from taking risk into account. Although there is a variety of port-
folio selection models, widely used is the quadratic optimization problem
that minimizes the risk. In physics, for example, minimizing the potential
energy function would determine a stable configuration of a system of atoms
or determine the configuration of the largest terminal or kinetic energy, is
also a nonlinear programming model. A related problem in chemistry is to
determine the molecular structure that minimizes Gibb’s free energy, known
also as chemical equilibrium [17]. In the very last years much research has
been devoted to the development of nonlinear optimization for atomic and
molecular physics, to solve difficult molecular configuration problems like
cluster problems, protein folding problems, etc. [25, 41, 59].
Some important recent developments in nonlinear optimization are in least

squares [60], neural networks [4, 73] and interior point methods for linear
and nonlinear programs [4, 26]. Karmarkar [40] introduced a polynomial-
time linear programming method and this work started the revolution of
interior-point methods. These algorithms are especially efficient for convex
4 Introduction and Overview
optimization problems [1]. One can show that the number of iterations that
an interior point algorithm needs in order to achieve a specified accuracy
is bounded by a polynomial function of the size of the problem. For more
details on interior-point methods, see [1, 4, 26]. Other important recent
developments are the increased accent on large-scale problems [6, 13, 33, 48,
67, 79], and algorithms that take advantage of problem structures as well as
parallel computation [10, 56, 76].
When modeling real-world problems different types of optimization problems

occur. They can be linear or nonlinear, with or without constraints, contin-
uous, integer or mixed-integer. The functions in an optimization problem
can be differentiable or non-differentiable, convex or non-convex. Sometimes
we consider optimization under uncertainty, known as stochastic optimiza-
tion, where the functions are only given probabilistically. Nice references
on fundamental theory, methods, algorithm analysis and advices on how to
obtain good implementations in nonlinear optimization are among others
[1, 3, 4, 51, 60, 77].
The focus of this thesis is on algorithms that solve nonlinear constrained

optimization problems. Our concern is on algorithms which iteratively gen-
erate a sequence of points {xk }∞k=1 , which either terminates at or converges
to a solution of the problem under consideration. Only in very special cases,
such as linear programming, LP, and convex quadratic programming, QP
[77, Chapter 1.6], finite termination at an optimal point occurs.
We consider a general constrained nonlinear optimization problem
min f (x), (1)

x∈X
where the objective function f : X 7→ R is differentiable and the feasible

set X ⊂ Rn is non-empty and compact. A generic form of a primal feasible
descent algorithm for (1) can be written as:
2 Selected topics in nonlinear optimization 5
Algorithm 1 A Generic Primal Feasible Descent Algorithm

Step 0 (Initialization): Choose an initial point x0 ∈ X and let k = 0.
Step 1 (Termination check): If a termination criterion is satisfied, then
stop, else k = k + 1.
Step 2 (Direction determination): Determine a feasible descent search
direction dk .
Step 3 (Step length determination): Determine a step length tk > 0 such
that f (xk + tk dk ) < f (xk ) and xk + tk dk ∈ X.
Step 4 (Update): Update xk+1 = xk + tk dk and go to Step 1.
The development of better performing algorithms can be made through mod-

ifications in both the direction determination and the step length determi-
nation steps of Algorithm 1.
2 Selected topics in nonlinear optimization
We here present some fundamental concepts from nonlinear optimization

problems and methods.
2.1 Prerequisites
We consider the nonlinear minimization problem
min f (x), (2)

x
where the objective f : Rn 7→ R is a differentiable function. Below, we

describe basic methods for solving unconstrained optimization problems. An
interesting aspect of these approaches is if they converge globally. By a
globally convergent algorithm we mean that the method generates a sequence
that converges to a stationary point x∗ , i.e. k∇f (x∗ )k = 0, for any starting
point.
2.1.1 Descent directions
How descent directions are generated depends on the particular optimization

problem. A sufficient condition for dk to be a descent direction with respect
to f at xk is given by ∇f (xk )T dk < 0. In unconstrained optimization a

search direction often has the form
dk = −Bk−1 ∇f (xk ), (3)
where Bk is a symmetric and nonsingular matrix. If Bk additionally is
positive definite, dk becomes a descent direction. In the steepest descent
method, Bk is simply the identity matrix, thus dk = −∇f (xk ), which is a
descent direction for f at xk . Global convergence of the steepest descent
method is shown under convexity requirements of the problem [60, Chapter
3]. The steepest descent method is important from a theoretical point of
view, but it is quite slow in practice.
The Newton method, that performs a second-order approximation of the ob-

jective function and enjoys a better convergence rate [51, Ch. 7], is obtained
from (3), when Bk is the Hessian matrix ∇2 f (xk ) of the objective function.
The Newton method converges rapidly when started close enough to a local
optimum. A drawback is that it may not converge starting at an arbitrary
point. The Newton method acts as a descent method at the iterate xk , if the
Hessian matrix ∇2 f (xk ) is positive definite, and as an ascent method if it
is negative definite. The lack of positive definiteness of the Hessian may be
cured by adding to ∇2 f (xk ) a diagonal matrix Dk , such that ∇2 f (xk ) + Dk
becomes positive definite.
Accelerating the steepest descent method while avoiding the evaluation, stor-
age and inversion of the Hessian matrix motivates the existence of quasi–
Newton methods as well as conjugate direction methods. In conjugate di-
rection methods (see e.g. [51, Ch. 8]) for unconstrained convex quadratic
optimization, one performs line searches consecutively in a set of directions,
d1 , · · · , dn , mutually conjugate with respect to the Hessian ∇2 f (x) of the ob-
T
jective (i.e. fulfilling di ∇2 f (x)dj = 0 for i 6= j). In Rn the optimum is then
identified after n line searches [51, p. 241, Expanding Subspace Theorem].
In conjugate gradient methods, one obtains conjugate directions by ”conju-

gating” the gradient direction with respect to the previous search direction,
that is dk = −∇f (xk ) + βk dk−1 , with βk chosen so that dk is conjugate to
dk−1 , which is accomplished by the choice
∇f (xk )T ∇f (xk )
βk = .
∇f (xk−1 )T ∇f (xk−1 )
In the quadratic case, dk then in fact becomes conjugate to all previous
directions d1 , · · · , dk−1 (e.g. [51, p. 245, Conjugate Gradient Theorem]).
In 1962, Fletcher and Reeves [23] introduced the nonlinear conjugate gra-
dient method, known as the Fletcher-Reeves, FR, method. This method is
shown to be globally convergent when all the search direction are descent.
The method can produce a poor search direction in the sense that the search
direction dk is almost orthogonal to −∇f (xk ), which results in a small im-
provement in the objective value. Therefore, whenever this happens, using
a steepest descent direction is advisable. The search direction may fail to
be a descent direction, unless the step size βk satisfies the Wolfe condition
[60]. The FR method may take small steps and thus have bad numerical
performance (see [10]).
The Polak–Ribiére method has proved to be more efficient in practice. The

two methods differ by the formula calculating βk
∇f (xk )T (∇f (xk ) − ∇f (xk−1 ))

βk = .
∇f (xk−1 )T ∇f (xk−1 )
Quasi–Newton methods are based on approximations of the inverse of the

Hessian. In these methods, the search direction is chosen to be dk =
−Dk ∇f (xk ), where Dk is an approximation of the inverse Hessian. The
quasi-Newton methods use information gathered from the iterates, xk and
xk+1 , and the gradients, ∇f (xk ) and ∇f (xk+1 ). The well-known Davidon–
Fletcher–Powell method, see e.g. [1, 51, 60], has the property that in the
quadratic case, it generates the same direction as the conjugate direction
method, while constructing the inverse of the Hessian. The method starts
with a symmetric and positive definite matrix D0 and iteratively updates
the approximate inverse Hessian by the formula
pk (pk )T (Dk q k )((q k )T Dk )

Dk+1 = Dk + k T k
− ,
(p ) q (q k )T Dk q k
where
q k =∇f (xk+1 ) − ∇f (xk ),

pk =tk dk ,
with dk = −Dk ∇f (xk ), and tk = arg min f (xk + tdk ).

t
2.1.2 Line search
To ensure global convergence of a descent method, a line search can be

performed. A step length, that gives a substantial reduction of the objective
value, is obtained, usually by the minimization of a one-dimensional function.
The one-dimensional optimization problem can be formulated as
min ϕ(t) = f (xk + tdk ),

t>0
where t is the step length to be determined. In practice, an exact line search

is not recommended since it is often too time consuming. As a matter of fact,
an effective step, tk , needs not to be near a minimizer of ϕ(t). A typical line
search procedure requires an initial estimate t0k and generates a sequence {tik }
that terminates when the step length satisfies certain conditions. An obvious
condition on tk is a reduction of the objective value, i.e. f (xk +tk dk ) < f (xk ).
To get convergence, the line search needs to obtain sufficient decrease, i.e.
to satisfy a condition like the strong Wolfe condition, which is
f (xk + tk dk ) ≤ f (xk ) + η1 tk ∇f (xk )T dk , (7a)

k T k k k T k
η2 |∇f (x ) d | ≥ |∇f (x + tk d ) d |, (7b)
where 0 < η1 < η2 < 1. In practice η1 is chosen to be quite small, usually

η1 = 10−4 . Typical values for η2 is 0.9 if dk is obtained by Newton or
quasi–Newton and 0.1 if dk is obtained by a nonlinear conjugate gradient
method (see e.g. [60]). The condition (7a) is also known as the Armijo
condition. The strong condition (7b) does not allow the directional derivative
∇f (xk + tk dk )T dk to be too positive. A nice discussion on practical line-
searches can be found in [60].
Another way to effect the convergence properties of an optimization algo-

rithm is to use a trust region. The trust region methods avoid line searches
by bounding the length of the search direction, d. In the context of a Newton
type method, the second-order approximation
1
f (xk ) + ∇f (xk )T d + dT ∇2 f (xk )d
2
is trusted only in a neighborhood of xk , i.e. if kdk2 ≤ ∆k , for some positive

∆k . The need for trust regions is apparent when the Hessian ∇2 f (xk ) is not
semi-positive definite. The idea is that when ∇2 f (xk ) is badly conditioned,

∆k should be kept low, and thereby the algorithm turns into a steepest
descent-like method. Even if ∇f (xk ) = 0, progress is made if the Hessian
∇2 f (xk ) is not positive definite, i.e. the trust region algorithms move away
from stationary points if they are saddle points or local maxima.
Line search methods and trust region methods differ in the order in which
they choose the direction and the step length of the move to the next iterate.
Trust region methods first choose the maximum distance and then determine
the new direction. The choice of the trust region size is crucial and it is based
on the ratio between the actual and the predicted reduction of the objective
value. The robustness and strong convergence characteristics have made
trust regions popular, especially for non-convex optimization [1, 4, 11, 57, 60].
2.2 Linearly constrained optimization
The direction determination step of Algorithm 1 produces a feasible descent

direction. A direction dk is feasible if there is a scalar α > 0, such that
f (xk + tdk ) ∈ X for all nonnegative t ≤ α. A steepest descent direction or
a Newton direction do not guarantee that feasibility is maintained.
We briefly discuss methods that solve linearly constrained nonlinear opti-

mization problems, that is
f ∗ = min f (x), (LCP )
x∈X
where f : Rn → R is continuously differentiable and the feasible set X =

{x : Ax ≤ b} is a nonempty polytope. The Frank-Wolfe algorithm is one of
the most popular methods for solving some instances of such problems.
2.2.1 The Frank–Wolfe method
The Frank–Wolfe, FW, method [28] was originally suggested for quadratic
programming problems, but in the original paper it was noted that the
method could be applied also to linearly constrained convex programs.
The FW method approximates the objective f of (LCP ) by a first-order

Taylor expansion (linearization) at the current iterate, xk , giving an affine
minorant fk to f , i.e.
fk (x) = f (xk ) + ∇f (xk )T (x − xk ).
Then, the FW method determines a feasible descent direction by minimizing

fk over X.
fk∗ = min fk (x). (F W SU B)
x∈X
We denote by y k , the solution of this linear program, which is called the FW

subproblem. The Frank–Wolfe direction is dk = y k − xk . Note, that if f is
convex, fk∗ is a lower bound to f ∗ , a fact that may be used for terminating
the method.
The next step of the method is to perform a line search in the FW direction,
i.e. a one-dimensional minimization of f , along the line segment between
the current iterate xk and the point y k . The point where this minimum is
attained (at least approximately) is chosen as the next iterate, xk+1 . Note
that f (xk+1 ) is an upper bound to f ∗ .
The algorithm generally makes good progress towards an optimum during

the first few iterations, but convergence often slows down substantially when
close to an optimum. The reason for this is that the search directions of the
FW method, in late iterations, tend to become orthogonal to the gradient
of the objective function, leading to extreme zigzagging (e.g. [63, p. 102]).
For this reason the algorithm is perhaps best used to find an approximate
solution. It can be shown that the worst case convergence rate is sublinear
[4]. In order to improve the performance of the algorithm, there are many
suggestions for modifications of the direction finding [30, 49, 53] and the
line search steps [64, 72]. There are also other more complex extensions
of the FW method, such as simplicial decomposition, introduced by von
Hohenbalken [70].
2.2.2 Simplicial decomposition
The idea of simplicial decomposition is to build up an inner approximation

of the feasible set X, founded on Caratheodory’s theorem (e.g. [3]), which
states that any point in the convex hull of a set X ⊂ Rn can be expressed
as a convex combination of at most 1 + dim X points of the set X. Thus,
any feasible solution of (LCP ) can be represented as a convex combination
of the extreme points of the set X. The simplicial decomposition algorithm
alternates between a master problem, which minimizes the objective f over
the convex hull of a number of extreme points of X, and a subproblem that
generates a new extreme point of the feasible set X and, if f is convex on
X, also provides a lower bound on the optimal value.
Given the current iterate xk and the extreme points y i , i = 1 . . . , k + 1,

generated by the subproblem, the next iterate is obtained from the master
problem
k+1
X
min f (xk + λi (y i − xk ))
i=0
k+1
X
s.t. λi ≤ 1 (10)
i=0
λi ≥ 0, i = 0, . . . , k + 1,
where y 0 = x0 . This problem is typically of lower dimension than the prob-

lem (LCP ).
The advantage of using an inner representation of X is that it is much

easier to deal with the linear constraints. The disadvantage is that the
number of the extreme points is very large, for a large-scale problem. The
algorithm may also need a large number of them in order to span an optimal
solution to (LCP ). In [70] von Hohenbalken shows finite convergence of the
simplicial decomposition algorithm, in the number of master problems, even
if extreme points with zero weights are removed from one master to the next
[71]. This result allows for the use of column dropping, which is essential to
gain computational efficiency in large-scale applications.
When the algorithm throws away every point that is previously generated, we
are back to the Frank–Wolfe algorithm. The number of the stored extreme
points is crucial for the convergence properties, since if it is to small the
behavior can be as bad as the Frank–Wolfe algorithm. We refer to [1] for
further information about column dropping and simplicial decomposition.
Hearn et al. [37] extend the simplicial decomposition concept to the re-
stricted simplicial decomposition algorithm [38, 69], in which the number of
stored extreme points is bounded by a parameter r. Convergence to an opti-
mal solution is obtained provided that r is greater then the dimension of the
optimal face of the feasible set. Another extension of the simplicial decom-
position strategy, known as disaggregate simplicial decomposition, is made
by Larsson and Patriksson [45], who take advantage of Cartesian product
structures. The simplicial decomposition strategy has been applied mainly
to certain classes of structured linearly constrained convex programs, where
it has been shown to be successful.
2.3 General constrained optimization
We here consider the constrained nonlinear optimization problem
min { f (x) | g(x) ≤ 0 }, (N LP )

x
where f : Rn 7→ R and g : Rn 7→ Rm are continuously differentiable func-

tions. There are plenty of methods that attempt to solve optimization pro-
grams with general constraints (see e.g. [34, 60]). A frequently employed
solution principle is to alternate between the solution of an approximate
problem and a line search with respect to a merit function. The merit func-
tion measures the degree of non-optimality of any tentative solution. The
sequential linear programming (SLP) and the sequential quadratic program-
ming (SQP) approaches are methods that are based on this principle.
2.3.1 Sequential linear programming
The sequential linear programming methods have become popular because

of their easiness and robustness for large-scale problems. They are based
on the application of first-order Taylor series expansions. The idea is to
linearize all non-linear parts (objective and/or constraints) and, thereafter,
to solve the resulting linear programming problem. The solution to this
LP problem is used as a new iterate. The scheme is continued until some
stopping criterion is met.
The SLP approach originates from Griffith and Stewart [35]. Their method
is called the Method of Approximation Programming, and utilizes an LP
approximation of the type
min ∇f (xk )T (x − xk ) (SLP SU B)
s.t. g(xk ) + ∇g(xk )(x − xk ) ≤ 0

kx − xk k2 ≤ ∆k ,
where ∆k is some positive scalar. The linearity of the subproblem makes
the choice of the step size crucial. It is necessary to impose trust regions
on the steps taken in order to ensure convergence and numerical efficiency
of an SLP algorithm. The trust regions must be neither too large nor too
small. If they are too small, the procedure will terminate prematurely or
move slowly towards an optimum and if they are too large infeasibility or
oscillation may occur. The SLP methods are most successful when curvature
effects are negligible. For problems which are highly nonlinear, SLP methods
may converge slowly and become unreliable. A variety of numerical methods
has been proposed [7, 11, 12, 21, 24, 57, 61, 78] to improve the convergence
properties of SLP algorithms.
One of the milestones in the development of the SLP concept is the work
of Fletcher and Sainz de la Maza [24]. They describe an algorithm that
solves a linear program to identify an active set of constraints, followed
by the solution of an equality constrained quadratic problem (EQP). This
sequential linear programming - EQP (SLP-EQP) method is motivated by
the fact that solving quadratic subproblems with inequality constraints can
be expensive. The cost of solving one linear program followed by an equality
constrained quadratic problem would be much lower.
2.3.2 Sequential quadratic programming
The method of sequential quadratic programming, suggested by Wilson [74]

in 1963, for the special case of convex optimization, has been of great inter-
est for solving large-scale constrained optimization problems with nonlinear
objective and constraints. An SQP method obtains search directions from a
sequence of QP subproblems. Each QP subproblem minimizes a quadratic
approximation of the Lagrangian function subject to linear constraints. At
the primal-dual point (xk , uk ) the SQP subproblem can be written as
1
min ∇f (xk )T (x − xk ) + (x − xk )T ∇2xx L(xk , uk )(x − xk )
2
s.t. g(xk ) + ∇g(xk )(x − xk ) ≤ 0, (SQP SU B)
where ∇2xx L(xk , uk ) denotes the Hessian of the Lagrangian. The SQP algo-
rithm in this form is a local algorithm. If the algorithm starts at a point in a
vicinity of a local minimum, the algorithm has a quadratic local convergence.
A line search or a trust region method is used to achieve global convergence
from a distant starting point. In the line search case the new iterate is
obtained by searching along the direction generated by solving (SQPSUB),
until a certain merit function is sufficiently decreased. A variety of merit
functions are described in e.g. [60, Chapter 15]. Another way to find the
next iterate is to use trust regions. SQP methods have proved to be efficient
in practice. They typically require fewer function evaluations than some of
the other methods. For an overview of SQP methods, see [5].
One of the important recent developments in SLP and SQP methods is the
introduction of the filter concept by Fletcher and Leyffer [20]. The main
advantage of using the filter concept is to avoid using a merit function. The
filter allows a trial step to be accepted if it reduces either the objective
function or a constraint violation function. The filter is used in trust region
type algorithms as a criterion for accepting or rejecting a trial step. Global
convergence of an SLP-filter algorithm is shown in [12, 21] and the global
convergence properties of an SQP-filter algorithm are discussed in [19, 22,
68].
2.4 The Lagrangian dual problem
Suppose, that (N LP ) has a set of optimal solutions which is non-empty and

compact. Let u ∈ Rm + be a vector of Lagrangian multipliers associated with
the constraints g(x) ≤ 0, and consider the Lagrangian function
L(x, u) = f (x) + uT g(x).
Under a suitable constraint qualification the problem (N LP ) can be restated
as the saddle point problem (e.g. [3])
max min L(x, u) = f (x) + uT g(x). (SP P )
u≥0 x
If a point (x∗ , u∗ ) solves (SP P ), then, according to the saddle point theorem
([4, p. 427]), x∗ is a local minimum to (N LP ). Furthermore, if the problem
(N LP ) is convex then x∗ is a global optimal solution to the problem (N LP )
(see [3]).
The saddle point theorem gives sufficient conditions for optimality. By intro-
ducing the Lagrangian function for the (N LP ) problem with slack variables
in the constrains, gi (x) + s2i = 0, i = 1, . . . , m, necessary conditions for a
local optimum of general constrained optimization problem can be estab-
lished. A point (x∗ , u∗ , s∗ ) is a stationary point to (SP P ) if it satisfies
∇L(x∗ , u∗ , s∗ ) = 0, and the Hessian with respect to x and s is positive
semidefinite. These requirements can be written as
∇f (x∗ ) + ∇g(x∗ )T u∗ = 0, (14a)

∗T ∗
u g(x ) = 0, (14b)
∗
g(x ) ≤ 0, (14c)
∗
u ≥ 0. (14d)
The conditions (14a – 14d) are known as the Karush-Kuhn-Tucker (KKT)

conditions and a point that satisfies them is known as a KKT point. The
condition (14a) means that there is no descent direction, with respect to x,
for L(x, u) from x∗ . Additionally it is required that the complementarity
condition u∗T g(x∗) = 0 is fulfilled. The equation (14b) says that u∗i can
be strictly positive only when the corresponding constraint gi (x∗ ) is active,
that is gi (x∗ ) = 0 holds. The KKT conditions are first-order necessary
conditions, and they may be satisfied by both local maxima, local minima
and other vectors. The second-order condition
dT ∇2xx L(x∗ , u∗ )d ≥ 0, for all d 6= 0 with ∇g(x∗)d = 0,
is used to guarantee that a given point x∗ is a local minimum.
The methods that solve (N LP ) problems can be divided into methods that
work in primal, dual and primal-dual spaces. The primal algorithms work
with feasible solutions and improve the value of the objective function. Com-
putational difficulties may arise from the necessity to remain within the
feasible region, particularly for problems with nonlinear constraints. For
problems with linear constraints they enjoy fast convergence.
The dual methods attempt to solve the dual problem. In this case a direction
determination step should find an ascent direction for the dual objective
function, which is always concave even when the primal problem may be
non-convex. This means that a local optimum of (SP P ) is also a global one.
The main difficulty of the dual problem is that it may be non-differentiable
and is not explicitly available.
Primal-dual methods [27, 31, 32, 50] are methods that simultaneously work
in the primal and dual spaces. This principle is widely spread in the field of
interior point methods. A nice book that covers the theoretical properties,
practical and computational aspects of primal-dual interior-point methods
is written by Stephen J. Wright [75].
2.5 Convergence
An important subject when considering methods in nonlinear optimization is

their local and global convergence properties. Local convergence properties
measure the ultimate speed of convergence, and can be used to determine
the relative advantage of one algorithm to another. If, for arbitrary starting
points, an algorithm generates a sequence of points converging to a solution,
then the algorithm is said to be globally convergent. Many algorithms for

solving nonlinear programming problems are not globally convergent, but it
is often possible to modify such algorithms so as to achieve global conver-
gence.
The subject of global convergence is treated by Zangwill [77]. We here think

of an algorithm as a mapping, that is, the algorithm is represented as a
point-to-set map A, that maps the iteration point xk to a set A(xk ) to
which xk+1 will belong, i.e. xk+1 ∈ A(xk ).
Definition A point-to-set map A is closed at x if for all sequences

{xk }∞ k ∞ k k
k=1 → x and {y }k=1 → y with y ∈ A(x ), we have y ∈ A(x).
The Convergence Theorem [77, p. 91] establishes global convergence of closed

algorithmic point-to-set maps.
Convergence theorem: Let A be an algorithm on X, and suppose that,

given x1 , the sequence {xk }∞
k=1 is generated satisfying x
k+1 ∈ A(xk ). Let a
solution set Γ ⊂ X be given and suppose that
i) all points xk are contained in a compact set S ⊂ X
ii) there is a continuous function Z on X such that
a) if x ∈
/ Γ, then Z(y) < Z(x) for all points y ∈ A(x)
b) if x ∈ Γ, then Z(y) ≤ Z(x) for all points y ∈ A(x)
iii) the mapping A(x) is closed at points outside Γ.
Then the limit of any convergent subsequence of {xk }∞

k=1 is a solution.
The requirements ii) amounts to the existence of a merit function, which

can be used to measure the progress of an algorithm.
3 Outline of the thesis and contribution
The attention in this thesis is on the development of feasible descent direction

algorithms. The thesis consists of five papers.
The first paper, ”The Stiff is Moving - Conjugate Direction Frank–Wolfe Meth-
ods with Applications to Traffic Assignment”, treats the traffic assignment
problem [63]. In this problem, travelers between different origin-destination
3 Outline of the thesis and contribution 17
pairs in a congested urban transportation network, want to travel along their

shortest routes (in time). However, the travel times depend on the conges-
tion levels, which, in turn, depend on the route choices. The problem is to
find the equilibrium traffic flows, where each traveler indeed travels along his
shortest route. It is well known that this equilibrium problem can be stated
as a linearly constrained convex minimization problem of the form (LCP ),
see e.g. [63, Ch. 2].
The conventional Frank–Wolfe, FW, method is frequently used for solving

structured linearly constrained optimization problems. We improve the per-
formance of the Frank–Wolfe method by choosing better search directions,
based on conjugate directions. In conjugate gradient methods, one obtains
search directions by conjugating the gradient direction with respect to the
previous search direction. The same trick can be applied to the FW direc-
tion.
In the conjugate direction FW method, CFW, we choose the search direction

d˜k as
d˜k = dk + βk d˜k−1 ,
where dk is the FW direction found by solving the (F W SU B) problem

and βk is chosen to make d˜k conjugate to d˜k−1 with respect to the Hessian
∇2 f (xk ).
Global convergence of the CFW method using an inexact line search is

proved. Further refinement of the conjugate direction Frank–Wolfe method
is derived by applying conjugation with respect to the last two directions
instead of only the last one. The computations in the Bi-Conjugate Frank–
Wolfe Method, BFW, are slightly more complicated. This modification
outperforms CFW, at least for high iteration counts. The CFW and BFW
algorithms were first implemented in the Matlab [55] environment. The
promising results spurred us to implement the two algorithms, as well as
FW, in the programming language C, to be able to make more detailed
investigations on larger networks.
In a limited set of computational tests the new algorithms, applied to the

single-class traffic equilibrium problem, turned out to be quite efficient. Our
results indicate that CFW and BFW algorithms outperform, for accuracy
requirements suggested by Boyce et al. [8], the pure and “PARTANized”
Frank–Wolfe, disaggregate simplicial decomposition [45] and origin-based
algorithms [2].
We extend the conjugate Frank–Wolfe method to non-convex optimization

problems with linear constraints and apply this extension to the multi-class
traffic equilibrium problem under social marginal cost pricing (SMC). In the
second paper ”Multi-Class User Equilibria under Social Marginal Cost Pricing”
we study the model in which the cost of a link may differ between the
different classes of users in the same transportation network [15]. Under
SMC pricing, the users have to pay a toll for the delays they incur to other
users. We show that, depending on the formulation, the multi-class SMC
pricing equilibrium problem (with different time values) can be stated either
as an asymmetric or as a symmetric equilibrium problem. In the latter case,
the corresponding optimization problem is in general non-convex. For this
non-convex problem, we devise descent methods of Frank–Wolfe type. We
apply these methods to a synthetic case based on Sioux Falls network.
The third paper ”A Conjugate Direction Frank–Wolfe Method for Non-convex

Problems ” generalizes the conjugate Frank–Wolfe method, examine some
properties of it for non-convex problems, and show through limited testing
that it seems to be more efficient than Frank–Wolfe, at least for high iteration
counts.
Further, we exploit the conjugate Frank–Wolfe algorithm for solving the

stochastic transportation problem, for which Frank–Wolfe type methods
have been claimed to be efficient [14, 49, 39]. The stochastic transporta-
tion problem, first described by Elmaghraby [18] in 1960, can be considered
as the problem of determining the shipping volumes from supply points to
demand points with uncertain demands, that yields the minimal expected
total cost. In the fourth paper ”A Comparison of Feasible Direction Meth-
ods for the Stochastic Transportation Problem” we compare several feasible
direction methods for solving this problem.
Besides the conjugate Frank–Wolfe algorithm, we also apply the diagonalized

Newton, DN, approach [46]. In this method the direction generation sub-
problem of the Frank–Wolfe method is replaced by a diagonalized Newton
subproblem, based on a second-order approximation of the objective func-
tion. The CFW and DN methods do not introduce any further parameters in
the solution algorithm, they have a better practical rate of convergence than
the Frank–Wolfe algorithm, and they take full advantage of the structure of
the problem.
Additionally, an algorithm of FW type but with multi-dimensional search

is described in this paper. In the previously discussed approaches for the
3 Outline of the thesis and contribution 19
stochastic transportation problem the direction finding subproblem is mod-

ified in order to improve upon the FW algorithm. Numerical results for
the proposed methods, applied to two types of test problems presented in
Cooper and LeBlanc [14] and LeBlanc et al. [49], show a performance that
is superior to that of the Frank–Wolfe method, and to the heuristic variation
of the Frank–Wolfe algorithm used in LeBlanc et al. [49], whenever solutions
of moderate or high accuracy are sought.
In paper five ”A Sequential Linear Programming Algorithm with Multi-dimen-

sional Search — Derivation and Convergence” we utilize ideas from simplicial
decomposition (see Section 2.2.2), sequential linear programming (see Sec-
tion 2.3.1) and duality (see Section 2.4). This results in a novel SLP algo-
rithm for solving problems with large number of variables and constraints.
In particular, the line search step is replaced by a multi-dimensional search.
The algorithm is based on inner approximations of both the primal and the
dual spaces, and yields both column and constraint generation in the primal
space and its linear programming subproblem differs from the one obtained
in traditional SLP methods.
A linear approximation of (SP P ) (see Section 2.4) at the current primal and
dual points gives a column generation problem which reduces and separates
into a primal and a dual column generation problems. These are used to
find better approximations of the inner primal and dual spaces. The line
search problem of a traditional SLP algorithm is replaced by a minimization
problem of the same type as the original one, but with typically fewer vari-
ables and fewer constraints. Because of the fewer number of variables and
constraints, it should be computationally less demanding than the original
problem.
The theoretical results presented in this paper show the convergence of the
new method to a point that satisfies the KKT conditions, and thus to a global
optimal solution for a convex problem. In the presented algorithm it is not
necessary to introduce rules to control the move limits ∆k , and we may
abandon the merit function as well, while still guaranteeing convergence.
In the paper, the suggested idea of using multi-dimensional search is also
outlined for the case of sequential quadratic programming algorithms.
We apply the new method to a selection of the Hoch-Schittkowski’s nonlinear

test problems and report preliminary computational results in a Matlab
environment.
My contribution to the papers presented in this thesis includes a major in-

volvement in the development of the solution methods, in the writing process

and the analysis of the results. My contributions are in the implementation
and testing of the solution algorithms that are described in the papers, as
well.
4 Chronology and publication status
The papers that has contributed to the contents of the thesis, arised in the
following order.
”A Conjugate Direction Frank–Wolfe Method with Applications to the Traffic

Assignment Problem”, co-authored with Per Olov Lindberg.
Published in Operations Research Proceedings 2002, pp. 133-138, Springer,

2003. The paper is also presented in my licentiate thesis [16].
”Improved Frank–Wolfe Directions through Conjugation with Applications to

the Traffic Assignment Problem”, co-authored with Per Olov Lindberg.
Published as Technical Report LiTH-MAT-R-2003-6, Department of Math-

ematics, Linköping University. The paper is part of my licentiate thesis
[16].
”Multi-Class User Equilibria under Social Marginal Cost Pricing”, co-authored

with Leonid Engelson and Per Olov Lindberg.
Published in Operations Research Proceedings 2002, pp. 174-179, Springer,

2003. This paper is presented as paper II in the thesis and is also presented
in my licentiate thesis [16].
4 Chronology 21
”A Conjugate Direction Frank–Wolfe Method for Non-convex Problems”, co-

authored with Per Olov Lindberg.
Published as Technical Report LiTH-MAT-R-2003-09, Department of Math-

ematics, Linköpings University. The paper is presented as paper III in this
thesis and is also in my licentiate thesis [16].
”The Stiff is Moving - Conjugate Direction Frank–Wolfe Methods with Appli-

cations to Traffic Assignment”, co-authored with Per Olov Lindberg.
The paper is under review for publication in the journal Transportation

Science. This paper is presented as paper I in this thesis and is an extension
of the first two papers above.
”A Sequential Linear Programming Algorithm with Multi-dimensional Search

— Derivation and Convergence”, co-authored with Maud Göthe-Lundgren,
Torbjörn Larsson, Michael Patriksson and Clas Rydergren.
The paper is submitted for publication and is presented as paper V in this

thesis.
”A Comparison of Feasible Direction Methods for the Stochastic Transportation

Problem”, co-authored with Torbjörn Larsson, Michael Patriksson and Clas
Rydergren.
The paper is submitted for publication and is presented as paper IV in this

thesis.
Bibliography
[1] N. Andréasson, A. Evgrafov, and M. Patriksson. An Introduction to

Continuous Optimization: Foundations and fundamental algorithms.
Studentlitteratur, 2005.
[2] H. Bar-Gera. Origin-based algorithms for the traffic assignment prob-

lem. Transportation Sci., 36(4):398–417, 2002.
[3] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Program-

ming: Theory and Algorithms. John Wiley & Sons, New York, NY,
second edition, 1993.
[4] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont,

MA, second edition, 1999.
[5] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta

Numerica, pages 1–51, 1995.
[6] P. T. Boggs, J. W. Tolle, and A. J. Kearsley. A truncated SQP al-

gorithm for large scale nonlinear programming problems. In Advances
in optimization and numerical analysis (Oaxaca, 1992), volume 275 of
Math. Appl., pages 69–77. Kluwer Acad. Publ., Dordrecht, 1994.
[7] J. F. Bonnans, J. Ch. Gilbert, C. Lemaréchal, and C. Sagastizábal. Nu-

merical Optimization – Theoretical and Practical Aspects. Universitext.
Springer Verlag, Berlin, second edition, 2006.
[8] D. Boyce, B. Ralevic-Dekic, and H. Bar-Gera. Convergence of traf-

fic assignments: How much is enough? In 16th Annual International
EMME/2 Users’ Group Conference, Albuquerque, NM, 2002.
[9] M. Bruynooghe, A. Gibert, and M. Sakarovitch. Une méthode

d’affectation du trafic. In Proceedings of the 4th International Sympo-
23
sium on the Theory of Road Traffic Flow, pages 198–204. Bundesmin-

isterium für Verkehr, Bonn, Karlsruhe, 1969.
[10] Y. Censor and S. A. Zenios. Parallel optimization. Numerical Mathe-

matics and Scientific Computation. Oxford University Press, New York,
1997.
[11] T. Y. Chen. Calculation of the move limits for the sequential linear pro-
gramming method. Internat. J. Numer. Methods Engrg., 36(15):2661–
2679, 1993.
[12] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filter

algorithm that takes EQP steps. Math. Programming, 96(1):161–177,
2003.
[13] A. R. Conn, N. I. M. Could, and Ph. L. Toint. LANCELOT: a fortran

package for large-scale nonlinear optimization (release a). In Springer
Series in Computational Mathematics, volume 17, 1992.
[14] L. Cooper and L. J. LeBlanc. Stochastic transportation problems and

other network related convex problems. Naval. Res. Logist. Quart.,
24(2):327–337, 1977.
[15] S. Dafermos. Toll patterns for multiclass-user transportation networks.

Transportation Sci., 7:211–223, 1973.
[16] M. Daneva. Improved Frank-Wolfe directions with applications to the

traffic assignment problem. Linköping Studies in Science and Technol-
ogy. Theses No. 1023. Department of Mathematics, Linköping Univer-
sity, 2003.
[17] G. B. Dantzig. Linear programming and extensions. Princeton Univer-

sity Press, Princeton, N.J., 1963.
[18] S. E. Elmaghraby. Allocation under uncertainty when the demand has

continuous d.f. Management Sci., 6:270–294, 1960.
[19] R. Fletcher, N. I. M. Gould, S. Leyffer, Ph. L. Toint, and A. Wächter.

Global convergence of a trust-region SQP-filter algorithm for general
nonlinear programming. SIAM J. Optim., 13(3):635–659, 2002.
[20] R. Fletcher and S. Leyffer. Nonlinear programming without a penalty

function. Technical Report 171, Department of Mathematics, University
of Dundee, Scotland, 1996.
Bibliography 25
[21] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of

an SLP-filter algorithm. Technical Report 183, Department of Mathe-
matics, University of Dundee, Scotland, 1998.
[22] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence of

a filter-SQP algorithm. SIAM J. Optim., 13(1):44–59, 2002.
[23] R. Fletcher and C. M. Reeves. Function minimization by conjugate

gradients. Comput. J., 7:149–154, 1964.
[24] R. Fletcher and E. Sáinz de la Maza. Nonlinear programming and

nonsmooth optimization by successive linear programming. Math. Pro-
gramming, 43(3):235–256, 1989.
[25] C. A. Floudas. A global optimization approach for Lennard-Jones mi-

croclusters. Journal of Chemical Physics, 97:7667 – 7678, 1992.
[26] A. Forsgren and Ph. E. Gill. Interior methods for nonlinear optimiza-
tion. SIAM Rev., 44:525–597, 2002.
[27] A. Forsgren, Ph. E. Gill, and M. H. Wright. Primal-dual interior meth-

ods for nonconvex nonlinear programming. SIAM J. Optim., 8:1132 –
1152, 1998.
[28] M. Frank and Ph. Wolfe. An algorithm for quadratic programming.

Naval Res. Logist. Quart., 3:95–110, 1956.
[29] L. Fratta, M. Gerla, and L. Kleinrock. The flow deviation method:

An approach to store-and-forward communication network design. Net-
works, 3:97–133, 1973.
[30] M. Fukushima. A modified Frank-Wolfe algorithm for solving the traffic

assignment problem. Transportation Res. Part B, 18(2):169–177, 1984.
[31] E. M. Gertz and Ph. E. Gill. A primal-dual trust region algorithm for
nonlinear optimization. Math. Program., 100(1):49–94, 2004.
[32] Ph. E. Gill, W. Murray, D. B. Ponceleón, and M.A. Saunders. Primal-

dual methods for linear programming. Math. Programming, 70(3, Ser.
A):251–277, 1995.
[33] Ph. E. Gill, W. Murray, and M. A. Saunders. SNOPT: an SQP algo-

rithm for large-scale constrained optimization. SIAM Rev., 47(1):99–
131, 2005.
[34] N. Gould, D. Orban, and Ph. Toint. Numerical methods for large-scale
nonlinear optimization. Acta Numerica, pages 299–361, 2005.
[35] R. E. Griffith and R. A. Stewart. A nonlinear programming technique

for the optimization of continuous processing systems. Management
Sci., 7:379–392, 1960/1961.
[36] R. Haugen. In Modern Investment Theory, pages 92–130. 1997.
[37] D. W. Hearn, S. Lawphongpanich, and J. A. Ventura. Finiteness in re-

stricted simplicial decomposition. Oper. Res. Lett., 4(3):125–130, 1985.
[38] D. W. Hearn, S. Lawphongpanich, and J. A. Ventura. Restricted simpli-

cial decomposition: computation and extensions. Math. Programming
Study, 31:99–118, 1987.
[39] K. Holmberg. Efficient decomposition and linearization methods for the

stochastic transportation problem. Comput. Optim. Appl., 4(4):293–
316, 1995.
[40] N. Karmarkar. A new polynomial-time algorithm for linear program-
ming. Combinatorica, (4):373–395, 1984.
[41] V. G. Kartavenko, K. A. Gridnev, and W. Greiner. Nonlinear effects in
nuclear cluster problem. Int. J. Mod. Phys., E7:287 – 299, 1998.
[42] D. M. Kreps. Course in Microeconomic Theory. Princeton University

Press, New Jersey, 1990.
[43] L. Lamberti and C. Pappalettere. Move limits definition in structural

optimization with sequential linear programming. I. Optimization algo-
rithm. Comput. & Structures, 81(4):197–213, 2003.
[44] L. Lamberti and C. Pappalettere. Move limits definition in structural

optimization with sequential linear programming. II. Numerical exam-
ples. Comput. & Structures, 81(4):215–238, 2003.
[45] T. Larsson and M. Patriksson. Simplicial decomposition with disaggre-

gated representation for the traffic assignment problem. Transportation
Sci., 26:4–17, 1992.
[46] T. Larsson, M. Patriksson, and C. Rydergren. An efficient solution
method for the stochastic transportation problem. Linköping Studies in
Science and Technology. Theses No. 702. Department of Mathematics,
Linköping University, 1998.
Bibliography 27
[47] L. S. Lasdon and A. D. Waren. Large scale nonlinear programming.

Computers and Chemical Engineering, 7(5):595–613, 1983.
[48] L. J. Leblanc. Mathematical programming algorithms for large scale
network equilibrium and network design problems. PhD thesis, IE/MS
Dept, Northwestern University, Evanston IL, 1973.
[49] L. J. LeBlanc, R. V. Helgason, and D. E. Boyce. Improved efficiency of
the Frank-Wolfe algorithm for convex network programs. Transporta-
tion Sci., 19(4):445–462, 1985.
[50] X. Liu and J. Sun. A robust primal-dual interior-point algorithm for
nonlinear programs. SIAM J. Optim., 14(4):1163–1186, 2004.
[51] D. G. Luenberger. Linear and Nonlinear Programming. Addison-
Wesley, Reading, MA, 1984.
[52] J. T. Lundgren. Optimization approaches to travel demand mod-
elling. PhD thesis, Department of Mathematics, Linköpings university,
Linköping, Sweden, 1989.
[53] M. Lupi. Convergence of the Frank-Wolfe algorithm in transportation
network. Civil Engineering Systems, 3:7–15, 1986.
[54] R. Markland and J. Sweigart. Quantitative Methods: Applications to
Managerial Decision Making. John Wiley & Sons, New York, 1987.
[55] The MathWorks, Inc., Natick, MA. Matlab User’s Guide, 1996.
[56] A. Migdalas, G. Toraldo, and V. Kumar. Nonlinear optimization and
parallel computing. Parallel Comput., 29(4):375–391, 2003.
[57] J. J. Moré and D. C. Sorensen. Computing a trust region step. SIAM
J. Sci. Statist. Comput., 4(3):553–572, 1983.
[58] R. M. Nauss and R. E. Markland. Optimization of bank transit check
clearing operations. Management Sci., 31(9):1072–1083, 1985.
[59] A. Neumaier. Molecular modeling of proteins and mathematical pre-
diction of protein structure. SIAM Rev., 39(3):407–460, 1997.
[60] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag,
New York, 1999. Springer series in operations research.
[61] J. Nocedal and Y. Yuan. Combining trust region and line search
techiques. Advances in Nonlinear Programming, pages 153–175, 1998.
[62] M. Patriksson. A unified framework of descent algorithms for nonlin-

ear programs and variational inequalities. PhD thesis, Department of
Mathematics, Linköpings university, Linköping, Sweden, 1993.
[63] M. Patriksson. The Traffic Assignment Problem - Models and Methods.

VSP, Utrecht, 1994.
[64] W. B. Powell and Y. Sheffi. The convergence of equilibrium algorithms

with predetermined step sizes. Transportation Sci., 16(1):45–55, 1982.
[65] C. Rydergren. Decision support for strategic traffic management : an

optimization-based methodology. PhD thesis, Department of Mathemat-
ics, Linköpings university, Linköping, Sweden, 2001.
[66] M. Rönnqvist. Applications of Lagrangean dual schemes to structural

optimization. PhD thesis, Department of Mathematics, Linköpings uni-
versity, Linköping, Sweden, 1993.
[67] K. Schittkowski and C. Zillober. Nonlinear programming: algorithms,

software, and applications. From small to very large scale optimization.
In System modeling and optimization, volume 166 of IFIP Int. Fed. Inf.
Process., pages 73–107. Kluwer Acad. Publ., Boston, MA, 2005.
[68] S. Ulbrich. On the superlinear local convergence of a filter-SQP method.

Math. Programming, 100(1, Ser. B):217–245, 2004.
[69] J. A. Ventura and D. W. Hearn. Restricted simplicial decomposition for

convex constrained problems. Math. Programming, 59(1):71–85, 1993.
[70] B. von Hohenbalken. A finite algorithm to maximize certain pseudo-

concave functions on polytopes. Math. Programming, 9:189–206, 1975.
[71] B. von Hohenbalken. Simplicial decomposition in nonlinear program-

ming algorithms. Math. Programming, 13:49–68, 1977.
[72] A. Weintraub, C. Ortiz, and J. González. Accelerating convergence of

the Frank-Wolfe algorithm. Transportation Res. Part B, 19(2):113–122,
1985.
[73] Y. Wen, M. A. Moreno-Armendariz, and E. Gomez-Ramirez. Modelling

of gasoline blending via discrete-time neural networks. In Proceedings.
2004 IEEE International Joint Conference on Neural Networks, vol-
ume 2, pages 1291 – 1296. 2004.
Bibliography 29
[74] R. B. Wilson. A simplicial method for concave programming. PhD

thesis, Harward University, Cambridge, Mass., 1963.
[75] S. J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.
[76] G. L. Xue, R. S. Maier, and J. B. Rosen. Minimizing the Lennard-Jones

potential function on a massively parallel computer. In ICS ’92: Pro-
ceedings of the 6th international conference on Supercomputing, pages
409–416, New York, NY, USA, 1992. ACM Press.
[77] W. I. Zangwill. Nonlinear programming: a unified approach. Prentice-

Hall Inc., Englewood Cliffs, N.J., 1969.
[78] J. Z. Zhang, N-H. Kim, and L. Lasdon. An improved successive linear

programming algorithm. Management Sci., 31(10):1312–1331, 1985.
[79] Ch. Zillober, K. Schittkowski, and K. Moritzen. Very large scale opti-
mization by sequential convex programming. Optim. Methods Softw.,
19(1):103–120, 2004.

Full Text 01

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Full Text 01

Hochgeladen von

Copyright:

Verfügbare Formate

Feasible Direction Methods for

Constrained Nonlinear Optimization

Feasible Direction Methods for

Division of Optimization, Department of Mathematics, Linköping University,

Copyright c 2007 Maria Mitradjieva-Daneva, unless otherwise noted.

ISBN: 978-91-85715-11-4 ISSN 0345-7524

Printed by LiU-Tryck, Linköping University, SE - 581 83 Linköping,

Special thanks to Torbjörn Larsson for his guidance in optimization theory,

Special acknowledgments to Leonid Engelson from Inregia, Stockholm, for

Many thanks go to all my colleagues at the Division of Optimization. There

There is one man in my life who urged me on by way of his unbelievable

To all of you, I send my deepest gratitude!

Linköping, May 2007

Avhandlingen behandlar utveckling av nya effektiva optimeringsmetoder.

De optimeringsmetoder som utvecklas i avhandlingen är tillämpbara för ett

Avhandlingen beskriver nya koncept för snabbare och noggrannare lösnings-

Avhandlingen har rubriken: Tillåtnariktningsmetoder för begränsad olinjär

The Frank–Wolfe method is popular for solving certain structured linearly

We also derive a novel sequential linear programming algorithm for general

PART I: INTRODUCTION AND OVERVIEW

2 Selected topics in nonlinear optimization . . . . . . . . . . . . 5

2.1.1 Descent directions . . . . . . . . . . . . . . . 5

2.1.2 Line search . . . . . . . . . . . . . . . . . . . 8

2.2 Linearly constrained optimization . . . . . . . . . . . . 9

2.2.1 The Frank–Wolfe method . . . . . . . . . . 9

2.2.2 Simplicial decomposition . . . . . . . . . . . 10

2.3 General constrained optimization . . . . . . . . . . . . 12

2.3.1 Sequential linear programming . . . . . . . . 12

2.3.2 Sequential quadratic programming . . . . . . 13

2.4 The Lagrangian dual problem . . . . . . . . . . . . . . 14

3 Outline of the thesis and contribution . . . . . . . . . . . . . 16

4 Chronology and publication status . . . . . . . . . . . . . . . 20

PART II: APPENDED PAPERS

PAPER I: The Stiff is Moving — Conjugate Direction Frank–

2 The Frank–Wolfe method and modifications . . . . . . . . . . 38

2.1 The Frank–Wolfe method . . . . . . . . . . . . . . . . 38

3 Conjugate direction Frank–Wolfe methods . . . . . . . . . . . 40

3.1 The conjugate Frank–Wolfe method, CFW . . . . . . 41

3.2 Outline of the CFW algorithm . . . . . . . . . . . . . 42

3.3 The bi-conjugate Frank–Wolfe method, BFW . . . . 44

3.4 Convergence of CFW method . . . . . . . . . . . . . 45

4 Applications to traffic assignment problems . . . . . . . . . . 49

4.1 The fixed demand traffic assignment problem . . . . . 49

4.2 Computational experiments . . . . . . . . . . . . . . . 50

4.3 A comparison with origin-based and DSD methods . . 53

A Derivation of the coefficients βki in BFW . . . . . . . . . . . . 58

B Closedness of the mapping A(Dk , Nk ) . . . . . . . . . . . . . 60

C Closedness of the conjugation map DCF W . . . . . . . . . . 61

2 Multi-class user equilibria . . . . . . . . . . . . . . . . . . . . 71

3 Equilibria under social marginal cost pricing . . . . . . . . . . 72

5 A Frank–Wolfe algorithm for the SMC equilibrium . . . . . . 75

6 Some experimental results . . . . . . . . . . . . . . . . . . . . 75

6.1 The two link network . . . . . . . . . . . . . . . . . . 75

6.2 Sioux Falls network . . . . . . . . . . . . . . . . . . . 75

PAPER III: A Conjugate Direction Frank–Wolfe Method for

3 The extended conjugate Frank–Wolfe method . . . . . . . . . 88

3.1 Outline of the ECFW algorithm . . . . . . . . . . . . 90

4 Applications to marginal cost congestion tolls . . . . . . . . . 90

4.1 Multi-class traffic equilibria under SMC pricing . . . . 91

4.2 Computational experiments . . . . . . . . . . . . . . 91

PAPER IV: A Comparison of Feasible Direction Methods for

3 Feasible direction methods for STP . . . . . . . . . . . . . . . 104

3.1 The Frank–Wolfe, FW . . . . . . . . . . . . . . . . . . 105