Beruflich Dokumente
Kultur Dokumente
Programming
SIAM's Classics in Applied Mathematics series consists of books that were previously
allowed to go out of print. These books are republished by SIAM as a professional
service because they continue to be important resources for mathematical scientists.
Editor-in-Chief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
Richard A. Brualdi, University of Wisconsin-Madison
Herbert B. Keller, California Institute of Technology
Andrzej Z. Manitius, George Mason University
Ingram Olkin, Stanford University
Stanley Richardson, University of Edinburgh
Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht
Classics in Applied Mathematics
C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the
Natural Sciences
Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie
Algebras with Applications and Computational Methods
James M. Ortega, Numerical Analysis: A Second Course
Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential
Unconstrained Minimization Techniques
F. H. Clarke, Optimization and Nonsmooth Analysis
George F. Carrier and Carl E. Pearson, Ordinary Differential
Equations
Operators
Nonlinear
Programming
Otvi L Mangasarian
University of Wisconsin
Madison, Wisconsin
siajTL.
Society for Industrial and Applied Mathematics
Philadelphia
is a registered trademark.
To
Preface to the
Classics Edition
ix
Preface
xi
To
the
Reader
xiii
Contents
Preface to the Classics Edition
Preface
To the Reader
Chapter 1. The Nonlinear Programming Problem,
Preliminary Concepts, and Notation
1. The nonlinear programming problem
2. Sets and symbols
3. Vectors
4. Matrices
5. Mappings and functions
6. Notation
Introduction
The optimalily criteria of linear programming: An
application of Farkas' theorem
Existence theorems for linear systems
Theorems of the alternative
ix
xi
xiii
1
1
3
6
8
11
13
16
16
18
21
27
38
38
46
54
55
63
69
70
72
74
76
XV
xvi
Contents
83
83
87
88
functions
functions
90
92
1.
93
96
97
113
123
126
114
131
131
136
140
145
147
148
151
151
153
157
Contents
xvii
Vectors
Matrices
Appendix C.
161
16%
162
170
174
177
177
179
182
182
185
188
191
200
200
204
204
Bibliography
205
Name Index
215
217
Subject Index
Chapter
One
The
Nonlinear
Programming
Problem,
Preliminary
Concepts,
and
Notation
1. The nonlinear
programming problem f
The nonlinear programming problem that will concern us has three
fundamental ingredients: a finite
number of real variables, a finite
number of constraints which the
variables must satisfy, and a function of the variables which must
be minimized (or maximized).
Mathematically speaking we can
state the problem as follows: Find
specific values (xi, . . . ,xn), if
they exist, of the variables
(1, . . ,zn) that will satisfy the
inequality constraints
1.1
Nonlinear Programming
1.3
1.2
Nonlinear Programming
of A. The empty set is the set which contains no elements and is denoted
by 0. We denote a set sometimes by {x,y,z}, if the set is formed by the
elements x, y, z. Sometimes a set is characterized by a property that its
elements must have, in which case we write
[x | x satisfying property P\
For example the set of all nonnegative real numbers can be written as
The set of elements belonging to either of two sets F or A is called
the union of the sets F and A and is denoted by F U A. We have then
The set of elements belonging to at least one of the sets of the (finite or
infinite) family of sets (r,),e/ is called the union of the family and is
denoted by U F,. Then
The set of elements belonging to all the sets of the (finite or infinite)
family of sets (F,)ie/, is called the intersection of the family and is denoted
by r! F,. Then
ie/
Two sets F and A are disjoint if they do not intersect, that is, if
r n A = 0.
1.1
1.8
Nonlinear Programming:
3.
Vectors
n-vector
otherwise they are linearly dependent. (Here and elsewhere 0 denotes the
real number zero or a vector each element of which is zero.)
Linear combination
The vector x Rn is a linear combination of x1, . . . , xm G Rn if
x = X1^1 4- + \mxm
6
1.3
Cauchy-Schwarz inequah'ty
Let x,y E Rn. Then
where \xy\ is the absolute value of the real number xy.
PROOF
1.4
Nonlinear Programming
(triangle inequality)
(Hint: Use the Cauchy-Schwarz inequality to establish the triangle
inequality.)
Angle between two vectors
Let x and y be two nonzero vectors in Rn: The angle \l> between
x and y is defined by the formula
This definition of angle agrees for n = 2,3 with the one in analytic geometry. The nonzero vectors x and y are orthogonal if xy = 0
(\l/ = 7T/2); form an acute angle with each other if xy ^ 0 (0 2g ^ f=j ir/2),
a s/nc/ acwte an0/e if xy > 0 (0 ^ ^ < ?r/2), an obtuse angle if xy ^ 0
(7T/2 ^ ^ ^ TT), and a sin'd obtuse angle if xy < 0 (?r/2 < ^ ^ ?r).
4. Matrices
Although our main concern is nonlinear problems, linear systems of the
following type will be encountered very frequently:
AnXj. + + Alnxn - bi
Amlxi + ' ' + Amnxn - bm
where Ai}^ and 6,, i = 1, . . . , m,j = 1, . . . , n, are given real numbers.
We can abbreviate the above system by using the concepts of the previous
section. If we let Ai denote an n-vector whose n components are
AHJ j = 1, . . . , n, and if we let x G Rn, then the above system is
equivalent to
AiX bi
i = 1, . . . , m
In 2 we interpret A& as the scalar product 1.3.6 of At and x. If we further let Ax denote an w-vector whose m components are A<x, i = 1,
. . . , m, and b an w-vector whose m components are bi, then the equivalent systems 1 and 2 can be further simplified to
Ax = b
8
1.4
The ith row of the matrix A will be denoted by A and will be an n-vector.
Hence
The jth column of the matrix A will be denoted by A.> and will be an
w-vector. Hence
Obviously the t'th row of A is equal to the ith column of A', and the jth
column of A is equal to the jth row of A'. Hence
The last equalities of 8 and 9 are to be taken as the definitions of A't and
A'J respectively. Since Ay is the real number in the t'th row of the jth
column of A, then if we define A^t as the real number in thejth row of the
ith column of A', we have
1.4
Nonlinear Programming
Here A.j and b are vectors in Rm and x, are real numbers. The representation 2 can be interpreted as a problem in Rn whereas 11 can be interpreted as a problem in Rm. In 2 we are required to find an x Rn that
makes the appropriate scalar products 6, (or angles, see 1.3.11} with the
n-vectors Ai} i = 1, . . . , m. In 11, we are given the n + 1 vectors in
Rm> A.j, j' 1, . . . , n and b, and we are required to find n weights
xi, . . . , xn such that b is a linear combination of the vectors A.j.
These two dual representations of the same linear system will be used in
interpreting some of the important theorems of the alternative of the next
chapter.
The m X n matrix A of 4 can generate another linear system yA,
defined as follows
then we define the following sub matrices of A (which are matrices with
rows and columns extracted respectively from the rows and columns of A)
1.8
jth column of
ith row of
It follows then that
and
EXAMPLEE
Then
Function
A function f is a single-valued mapping from a set X into a set Y.
That is for each x X, the image set/(x) consists of a single element of F.
The domain of / is X, and we say that / is defined on X. The range of /
is f(X) = U f(x). (For convenience we will write the image of a funcxex
tion not as a set but as the unique element of that set.)
11
1.&
Nonlinear Programming
Numerical function
A numerical function 6 is a function from a set X into #. In other
words a numerical function is a correspondence which associates a real
number with each element x of X.
EXAMPLESS If If
X=
then
d isd the
familiar
real
EXAMPLESS
X R,
= R,
then
is the
familiar
realsingle-valued
single-valued func
func
tion of a real variable, such as 0(x) = sin x. If X is the set of positive
integers, then 6 assigns a real number for each positive integer, for example 6(x) = l/x\. If X = Rn, then d is the real single-valued function
of n variables.
Vector function
An m-dimensional vector function f is a function from a set X
into Rm. In other words a vector function is a correspondence which
associates a vector from Rm with each element x of Jf. The m components of the vector f(x) are denoted by fi(x), . . . , fm(x).
Each/< is a
numerical function on X. A vector function / has a certain property (for
example continuity) whenever each of its components/, has that property.
EXAMPLE If X = R, then d is the familiar real single-valued func
of Rn. The m components /,, i' = 1, . . . , m, of / are numerical functions on Rn.
Linear vector functions on Rn
An m-dimensional vector function / defined on Rn is said to be
linear if
f(x) = Ax + b
where A is some fixed m X n matrix and b is some fixed vector in Rm.
It follows that if / is a linear function on Rn then
(Conversely, the last two relations could be used to define a linear vector
function on Rn, from which it could be shown that/(a;) = Ax -f 6 [Berge
63, p. 159].)
If m = 1 in the above, then we have a numerical linear function
B on Rn and
where c is a fixed vector in Rn and y is a fixed real number.
12
1.6
6.
Notation
Vectors and real numbers
1.6
Nonlinear Programming
Sets
Sets will always be denoted by capital Greek or Latin letters such
as F, A, fl, R, I, X, Y. Capital letters with subscripts, such as TI, T2, I\,
and capital letters with elevated symbols, such as r*, X will also denote
sets. (See also Sec. 1.2.)
Ordering relations
The following convention for equalities and inequalities will be
used. If x,y Rn, then
If x ^ 0, x is said to be nonnegative, if x > 0 then x is said to be semipositive, and if x > 0 then x is said to be positive. The relations =, ^, >, >
defined above are called ordering relations (in Rn).
The nonlinear programming problem
By using the notation introduced above, the nonlinear programming problem 1.1.1 to 1.1.3 can be rewritten in a slightly more general
form as follows. Let X C Rn, let g, h, and 6 be respectively an m-dimensional vector function, a ^-dimensional vector function, and a numerical
function, all defined on X. Then the problem becomes this: Find an
x, if such exists, such that
The set X is called the feasible region, x the minimum solution, and Q(x)
the minimum. All points x in the feasible region X are referred to as
feasible points or simply as feasible.
Another way of writing the same problem which is quite common
in the literature is the following:
subject to
14
1.6
We favor the more precise and brief designation 9 of the problem instead
of 10 to 12. Notice that if we let X = Rn in the above problem, then
we obtain the nonlinear programming problem 1.1.1 to 1.1.3.
If X = Rn and 6, g, and h are all linear functions on Rn, then
problem 9 becomes a linear programming problem: Find an x, if such
exists, such that
where 6, c, and d are given fixed vectors in Rn, Rm, and Rk respectively,
and A and B are given fixed m X n and k X n matrices respectively.
There exists a vast literature on the subject of linear programming
[Dantzig 63, Gass 64, Hadley 62, Simmonard 66]. It should be remarked
that problem 13 is equivalent to finding an x such that
When B and d are absent from this formulation, 14 becomes the standard
dual form of the linear programming problem [Simmonard 66, p. 95].
16
Chapter
Two
Linear
Inequalities
and
Theorems
of the
Alternative
1. Introduction
It was mentioned in Chap. 1 that
the presence of inequality constraints in a minimization problem
constitutes the distinguishing feature between the minimization
problems of the classical calculus
and those of nonlinear programming. Although our main interest
lies in nonlinear problems, and
hence in nonlinear inequalities,
linearization (that is, approximating nonlinear constraints by linear
ones) will be frequently resorted to.
This will lead us to linear inequalities. It is the purpose of this chapter to establish some fundamental
theorems for linear inequalities
which will be used throughout this
work. (Needless to say, these
fundamental theorems also play a
crucial role in linear programming.
See for example [Gale 60, chap. 2].)
The type of theorem that
will concern us in this chapter will
involve two systems of linear
inequalities and/or equalities, say
systems I and II. A typical
theorem of the alternative asserts
that either system I has a solution,
or that system II has a solution,
but never both. The most famous
theorem of this type is perhaps
Farkas' theorem
[Farkas 02,
Tucker 56, Gale 60].
Farkas' theorem
For each fixed p X n matrix
A and each fixed vector b in Rn,
either
has a solution
or
a.i
has a solution
where A[f denotes the jih column of A' and Aj the jih row of A (see
Sec. 1.4). System II' requires that the vector b be a nonnegative linear
combination of the vectors AI to Ap. System I' requires that we find a
vector x G Rn that makes an obtuse angle (^7r/2) with the vectors AI
to Ap and a strictly acute angle (<ir/2) with b (see 1.3.11). Figure 2.1.1
h.2
Nonlinear Programming
Then
t These are standard terms of linear programming [Dantzig 63], which differ from the
terminology of nonlinear programming (see Chap. 8). The complementarity condition
usually refers to an equivalent form of 6: H(Ax c) = 0.
18
2.3
it follows that
for some real number 8 > 0, where e is an m-vector of ones.
each x G Rn> we can find a real number a > 0 such that
Then for
for each x G Rn
and the second inequality of 12 holds for some a > 0 because d < 0.
But relations 10 to 12 imply that x -f ax G -^ and b(x + az) < bx,
which contradicts the assumption that x is a solution of 1. Hence 9 has
no solution x G Rn, and by Farkas' theorem 2.1.1 the system
f To see this, take for a fixed i M = {1, 2, .. . , n}, at,- = b{, x^i = 0, then bx ^ 0 implies that
(b,; ) 2 ^ 0. Hence bt = 0. Repeating this process for each i M, we get 6 = 0.
19
8.2
Nonlinear Programming:
1.3
where 6, a, c, and d are given vectors in Rn, R', Rm, and Rk respectively,
and A, D, B, and E are given m X n, m X t, k X n, and A; X ^ matrices
respectively. Show that a necessary and sufficient condition for (x,y) to
solve the above problem is that (x,y) and some u G #w and v Rk
satisfy the following conditions:
by
z + y ^ d
and
-5a: - Ey ^ -d
and use Theorem 2.}
Nonlinear Programming
1.3
Tucker's lemma
For any given p X n matrix A, the systems
Ax ^ 0
and
II
A'y = 0, y ^ 0
we take
Then
where
32
1.3
So
or
Bx = 0
Let
then
and
where the last inequality follows from 7, the equality before from 6, the
equality before from 10, the equality before from 11, and the equality
before from 4- Finally from 4, H, 10, 6, and 7 we have
AIW + HI = (Bi \iA p+i)w -f ui = BIW + HI
>.3
Nonlinear Programming
Define
and for i = 1, . . . , p,
14
a.s
or
Ax + y > 0
A geometric interpretation of the above theorem can be given in
the space of the rows A G Rn, i = 1, . . . , p. The theorem states
that one of the following alternatives must occur for any given p X n
matrix A:
(a) There exists a vector x which makes a strict acute angle (<ir/2) with
all the rows Ait Fig. 2.3.la, or
(6) There exists a vector x which make an acute angle (=g ir/2) with all
the rows Aif and the origin 0 can be expressed as a nonnegative
linear combination of the rows A, with positive weights assigned to
the rows A, that are orthogonal to x, Fig. 2.3.1b, or
(c) The origin can be expressed as a positive linear combination of the
rows At, Fig. 2.3.Ic.
By letting the matrix A of the previous theorem have a special
structure a second existence theorem can be easily established.
Second existence theorem [Tucker 56]
Let A and B be given pl X n and p2 X n matrices, with A nonvacuous. Then the systems
and
s.s
Nonlinear Programming
and
Bx + 21 > 0
-fix + 22 > 0
Define now y2 = Zi 22- We have then that x, 1/1, 1/2 satisfy
Ax ^ 0
Bx = 0
A'yi + B'y2 = 0
t,! ^ 0
As + yi >
Corollary
Lef A, B, C, and D be given pl X n, pz X n, p3 X n, and p* X n
matrices, with A, B, or C nonvacuous. Then the systems
and
II
possess solutions
satisfying
Ax + t/i > 0
Bx + 2/2 > 0
and
Cx + 1/3 > 0
1.4
I<^T!
or equivalently
I<=*II
TYPICAL PROOF
I = II
and
I => II
The proof that I => II is usually quite elementary, but the proof that
I => II utilizes the existence theorems of the previous section.
In the theorems to follow, certain obvious consistency conditions
will not be stated explicitly for the sake of brevity. For example, it will
be understood that certain matrices must have the same number of rows,
that the dimensionality of certain vectors must be the same as the number of columns in certain matrices, etc.
We begin now by establishing a fairly general theorem of the
alternative due to Slater [Slater 51].
Slater's theorem of the alternative [Slater 51]
Let A, B, C, and D be given matrices, with A and B being nonvacuous.
Then either
I
Ax > 0
Bx > 0
Cx ^ 0
Dx = 0 has a solution x
or
f Occasionally we shall also refer to the systems of inequalities and equalities themselves
as systems I and II.
XT
3.4
Nonlinear Programming
has a solution
i=*n.
(I => ID
=>II.
We remark that in the above proof, the requirement that both A
and B be nonvacuous was used essentially in establishing the fact that
I => II. Corollary 18, which was used to prove that I = II, can handle
systems in which merely A or B are nonvacuous. By slightly modifying
the above proof, the cases B vacuous and A vacuous lead respectively to
Motzkin's theorem of the alternative (or transposition theorem, as
Motzkin called it) [Motzkin 36] and Tucker's theorem of the alternative
[Tucker 56].
Motzkin's theorem of the alternative [Motzkin 36]
Let A, C, and D be given matrices, with A being nonvacuous. Then
as
2.4
either
Ax > 0
Cx ^ 0
Dx = 0
has a solution x
or
has a solution
(T=>H)
I =* (Ax ^ 0, Cx ^ 0, Dx = 0> => {Ax ^ 0
(by Corollary 2.3.18)
=*!!. |
Cx ^ 0
Then
Dx = 0 has a solution x
or
h2.4
has a solution
or
has a solution
but never both.
PROOF (I => !I)
7/3, 2/4 satisfying
i=>n.
(n=>i)
by Crollary 18
=>!
We remark that if either A or 5 is vacuous, then we revert to
Tucker's theorem 3 or Motzkin's theorem 2.
We remark further that in all of the above theorems of the alternative the systems I are all homogeneous. Hence, by defining z = x, the
system I of, say, Slater's theorem 1 can be replaced by
I'
1.4
or
II
Ax ^ 0
or
II
si
8.4
Nonlinear Programming
but not both. Since r; G ]?, T? > 0 means 77 > 0. Dividing through by
17 and letting y = 1/3/77, we have that II' is equivalent to
Stiemke's theorem [Stiemke 15]
For each given matrix B, either
I
or
II
or
has a solution
but never both.
PROOF
has a solution
By Motzkin's theorem #, then, either I' holds or
8.4
Now we have either 171 > 0 or 171 = 0 (and hence fi > 0). B
defining y y^lt\\ in the first case and y = y3 in the second, we have that
II' is equivalent to
Ax = c has a solution x G Rn
or
II
has a solution
but not both. Since y\ G R, y\> 0. By defining y = y*/y\, II' is
equivalent to II. |
Gale's theorem for linear inequalities (^) [Gale 60]
For a given p X n matrix A and given vector c G Rp, either
I
Ax ^ c has a solution x G Rn
or
II
has a solution
93
Nonlinear Programming
3.4
2 Ax > 0, Cx ^ 0, Dx = 0
(A nonvacuous) (Motzkin)
3 Bx > 0, Cx 0, Dx = 0
(B nonvacuous) (Tucker)
5 Az > 0 (Gordan)
4'y = 0, y > 0
6 bx > 0, Ax ^ 0 (Farkas)
A'l/ = 6, y ^ 0
B'y = 0, y > 0
* 6z > 0, 4z ^ c
(Nonhomogeneous Farkas)
A'y = b, cy 0, y ^ 0
or
A'y = 0, cy < 0, y ^ 0
9 4z = c (Gale)
A'y = 0, cy = 1
*0 4x ^ c (Gale)
A'y = 0, cy = -1, y ^ 0
ff
A'y = 0, cy = -1, y ^ 0
or
A'y = 0, cy g 0, y > 0
Ax < c
tNo "or" appearing in the above table and in Problems 2.4.12 to 2.4.17 is an
exclusive "or."
has a solution
but not both. By defining y = y^/y\, II follows from II'.
34
3.4
or
has a solution
but never both.
PROOF
T
but not both. If for the case when y\ > 0, y% ^ 0, we set y = 2/2/2/1, and
for the case when y\ ^ 0, yz > 0, we set y = y 2 , then II is equivalent to
II'.
In the table above, Table 2.4-1, we give a convenient summary of all
the above theorems of the alternative.
Problems
By using any of the above theorems 1 to 11, establish the validity
of the following theorems of the alternative (12 to ./7): Either I holds, or
II holds, but never both, where I and II are given below.
has a solution
has a solution
has a solution
has a solution
has a solution
has a solution
SB
2.4
Nonlinear Programming
has a solution
has a solution
has a solution
has a solution
has a solution
has a solution
Mnemonic hint
In all the theorems of the alternative Mo 17 above, which involve
homogeneous inequalities and/or homogeneous equalities, the following
correspondence between the ordering relations, >, >, ^, =, occurs:
Orderings appearing in I
36
Orderings appearing in II
2.4
has a solution
pseudoconcave
and use Farkas' theorem. e is a vector pof ones in the above.)
37
Chapter
Three
Convex
Sets
inR n
[ x \ x = x1 -f X(z 2 - x1), X e # )
and consider the case when x G R*,
it becomes obvious that the vector
equation x = x1 + X(z 2 x1) is
the parametric equation of elementary analytic geometry of the line
through x1 and x2, Fig. 8.1.1.
8.1
Line segments
Let xl, x2 Rn. We define the following line segments joining z1
z
and x :
(i) Closed line segment [xl,x*] = {x \ x = (1 - X)*1 + \x*, 0 g X ^ 1}
(ii) Open line segment (xl,x2) = {x \ x = (1 - X)z l -f Xz 2 , 0 < X < 1}
(iii) Closed-open line segment [xl,x*) = {x \ x (1 X)^1 + Xz2,
0 ^ X < 1}
(iv) Open-closed line segment (xl,x*] = {x \ x = (1 \)xl + Xz2,
0 < X ^ 1}
Obviously [z^z2] is the portion of the straight line through a;1 and
x which lies between and includes the points x1 and x2, Fig. 3.1.1. (z*,z2)
does not include x1 or z2, (a:1,*;2) does not include a:2, and (xl,x*] does not
include x1.
z
Convex set
A set F C Rn is a convex set if the closed line segmentf joining
every two points of F is in T. Equivalently we have that a set F C Rn
is convex if
Figure 3.1.2 depicts some con vex sets in R2, and Fig. 3.1.3 some nonconvex sets in Rz. It follows from 3 that Rn itself is convex, that the
empty set is convex, and that all sets consisting each of one point are
convex.
The subsets of Rn defined below in 4. 5, and 6 are all convex sets
n
in R . This can be easily established by a direct verification of the
definition 3 of a convex set.
t It is obvious that the definition of a convex set would be unchanged if any of the other
line segments denned in 2 were used here instead of the closed line segment.
Nonlinear Programming
8.1
Halfspace
Let c G Rn, c 7* 0, and a G R- Then the set [x x G R", ex < a]
is an open half space in Rn, and the set {x \ x G #n, cz ^ a j is a c/oserf
halfspace in #". (Both halfspaces are convex sets.)
Plane
Letc G #n, c ?* 0, and a G # Then the set {z | x G ^n, ex = a}
is called a plane in .Rn. (Each plane in Rn is a convex set.)
Subspace
A set F C Rn is a subspace if
Each subspace of /2n contains the origin and is a convex set. The
subspaces of R3 consist of 0, R3, the origin, and all straight lines and
planes passing through the origin.
Problem
(i) Show that each open or closed ball
Bt(x) = {x \ x G Rn, \\x - x\\ < 6} Bt(x) = {x | x G Rn, \\x - x\\ ^ e]
Convex Sets in R*
3.1
3.1
Nonlinear Programming
^ 0
+ Pm+1 = 1
Convex Seta in Rn
3.1
Let
Define
9 = Pi otTi
for i = I , . . . , m
where a is some positive number chosen such that #,- ^ 0 for all i, and
at least one g,-, say g*, is equal to 0. In particular we choose a such that
43
a.i
Nonlinear Programming
Then
and
If x\x* G A, then
44
Conrex Sets in Rn
3.1
Hence for 0 g X ^ 1
and
A ~ r.
Theorem
The sum Y + A of two convex sets T and A in Rn is a convex set.
PROOF Let 21,*2 E T + A, then zl = xl + yl and zz = x2 + yz, where
xl,x2 G T and y^y2 A. For 0 ^ X g 1
Hence r + A is convex.
Theorem
The product pY of a convex set T in Rn and the real number p is a
convex set.
45
1.2
Nonlinear Programming
For
Corollary
// F and A are two convex sets in R", then F A is a convex set.
Separating plane
The plane {x \ x Rn, ex = a } , c 7^ 0, is said to separate (strictly
separate] two nonempty sets F and A in Rn if
If such a plane exists, the sets F and A are said to be separable (strictly
separable).
Figure 3.2.1 gives a simple illustration in R2 of two sets in Rn
which are separable, but which are neither disjoint nor convex. It should
be remarked that in general separability does not imply that the sets are
disjoint (Fig. 3.2.1], nor is it true in general that two disjoint sets are
separable (Fig. 3.2.2}. However, if the sets are nonempty, convex, and
46
Convex Sets In R?
3.2
disjoint, then they are separable, and in fact this is a separation theorem
we intend to prove.
Lemma
Let fl be a nonempty convex set in Rn, not containing the origin 0.
Then there exists a plane {x \x (~ Rn, ex = 0}, c 7* 0, separating 2 and 0,
thatis,
PROOF
Let x1, . . . , xm be any finite set of points in fl. It follows from the convexity of fl, Theorem 3.1.13, and from the fact that 0 ^ 12, that
3.2
Nonlinear Programming
or equivalently
and hence
The sets (A^en are closed sets relative to the compact set {y \ y G Rn,
yy = 1} [see B.1.8 and B.3.2(\}}, hence by the finite intersection theorem
B.3.2(\\\) we have that C\ Ax ^ 0. Let c be any point in this interseczen
tion. Then cc = 1 and cz ^ 0 for all x G ^- Hence {# | a: G #n,
ex = 0} is the required separating plane. |
It should be remarked that in the above lemma we did not impose
any conditions on fl other than convexity. The following example shows
that the above lemma cannot be strengthened to x G ^ =* ex > 0 without
some extra assumptions. The set
is convex and does not contain the origin, but there exists no plane
{x | x R*, ex = 0} such that x E & = ex > 0 (Fig. 3.8.3).
If on the other hand we do assume that ft is closed (or even if we
Fig. 3.2.3
48
Convex Sets in Rn
3.9
assume less, namely that the origin is not a point of closure ft), then we
can establish a stronger result, that is, there exists a plane which strictly
separates the origin from ft (see Corollary 4 and Lemma 5 below). However, before doing this, we need to establish the following fundamental
separation theorem.
Separation theorem
Let F and A fee two nonempty disjoint convex sets in R". Then there
exists a plane {x | x G Rn, ex = a } , c ^ 0, which separates them, that is,
PROOF
The set
is convex by Corollary 3.1.22, and it does not contain the origin 0 because
F C\ A = 0. By Lemma 2 above there exists a plane \x \ x G Rn, ex = 0},
c 7 0, such that
or
Hence
Define
Then
3.2
Nonlinear Programming
PROOF (=) Assume that there exist c ^ 0, a > 0 such that ex > a
for all x G Q. If 0 G 12, then (see BJ.3 and B.I.6} there exists an x G f
such that ||z|| < a/2||c||, and hence
Since 5t(0) is an open ball, it must contain the nonzero vector Sc for
some positive 5. Hence 7 ^ dec > 0. Let a = ^dcc > 0. Then
Lemma
Let ft be a nonempty closed convex set in Rn. If 12 does not contain
the origin, then there exists a plane \x \ x G Rn, ex = a } , c ^ 0, a > 0,
strictly separating 12 and 0, and conversely. In other words
3.2
In other
Hence
Define
Then
The above separation theorems will be used to derive some fundamental theorems for convex functions in the next chapter, which in turn
will be used in obtaining the fundamental Kuhn-Tucker saddlepoint
optimality criteria of convex nonlinear programming in Chap. 5 and also
the minimum principle necessary optimality condition of Chap. 11.
We remark here that a theorem of the alternative, the Gordan
theorem 8.4.6, was fundamental in deriving the above separation theorems. We can reverse the process and use the above separation theorems
to derive theorems of the alternative. Thus to derive Gordan's theorem
2.4.5, namely that either A'y = 0, y > 0 has a solution y G Rm or Ax > 0
51
8.3
Nonlinear Programming
3.2
us
Chapter
Four
Convex
and
Concave
Functions
4.1
4.1
Nonlinear Programming
Concave function
A numerical function 6 defined on a set F C Rn is said to be concave at x i F (with respect to F) if
4.1
6(x) = pf(x)
p 0
4.1
Nonlinear Programming
PROOF
Then
Problem
Let 0 be a numerical function defined on a convex set F C #"
Show that 6 is respectively convex, concave, strictly convex, or strictly
concave on T if and only if for each a:1,a:2 G F, the numerical function \p
defined on the line segment [0,1] by
is respectively convex, concave, strictly convex, or strictly concave on
[0,1].
Theorem
For a numerical function 9 defined on a convex set T C Rn to be convex on F it is necessary and sufficient that its epigraph
be a convex set in Rn+1.
PROOF
(Sufficiency) Assume that Ge is convex. Let xl,xz G T, then [xl,6(x1)]
G Ge and [z2,0(2)] G Ge. By the convexity of Ge we have that
or
SB
4.1
Hence'
and Ge is a convex set in Rn+1.
Corollary
For a numerical function 6 defined on a convex set T C Rn to be concave on F it is necessary and sufficient that its hypograph
be convex set in Rn+l.
Figure J^.l.Sa depicts a convex function on T and its convex epigraph Ge. Figure 4-l-$b depicts a concave function on T and its convex
hypograph Hf.
Theorem
Let 6 be a numerical function defined on a convex set T C Rn- A
necessary but not sufficient condition for 6 to be convex on T is that the set
be convex for each real number a.
PROOF
Nonlinear Programming
4.1
Problem
Let 9 be a numerical function defined on the convex set F C Rn.
Show that a necessary and sufficient condition for 9 to be convex on F
is that for each integer m ^ 1
Fig. 4.1.4 The convex sets Aa and fla of 10 and 11 associated with a function 6.
60
4-1
Theorem
If (0t)i'e/ is a family (finite or infinite) of numerical functions
which are convex and bounded from above on a convex set T C Rn, then the
numerical function
6(x) = sup 0i(x)
iei
is a convex function on T.
PROOF
is also a convex set in Rn+1 by Theorem 3.1.9. But this convex intersection is the epigraph of 6. Hence 6 is a convex function on F by
Theorem 8. |
Corollary
If (#i)ie/ is a family (finite or infinite) of numerical functions which
are concave and bounded from below on a convex set T C Rn, then the numerical function
is a concave function on F.
We end this section by remarking that a function 0 which is convex
on a convex set F C Rn is not necessarily a continuous function. For
example on the halfline T = {x \ x ^ R, x ^ I } , the numerical function
4,1
Nonlinear Programming
Theorem
Let F be an open convex set in Rn. If 6 is a convex numerical function on T then 6 is continuous on T.
PROOF [Fleming 65]f Let x G T, and let a be the distance (see L3.9)
from x to the closest point in Rn not in T (a + w if r = Rn). Let C
be an n-cube with center x and side length 25, that is
By letting (n)*43 < a, we have that C C r.
vertices of C. Let
/3 = max 6(x)
xV
Fig. 4.1.5
69
4.9
Since 8 is convex on F
Thus for any given e > 0 it follows that 1 6 ( x ) - B(x)\ < e for all x satisfying
[ft - 0(x)] ||x - x\\ < ed, and hence 6(x) is continuous at oj. |
Since the interior of each set F C Rn is open, it follows that if 6
is a convex function on a convex set F C Rn, it is continuous on its
interior.
2.
63
4.3
Nonlinear Programming
and
Also, A is convex,
and
or
Now, if
64
*-2
which is a contradiction to the fact that pf(x) + qh(x) ^ epe for all
x G r. Hence
and
We give now a generalization of Gordan's theorem of the alternative 2.4-.S to convex functions over an arbitrary convex set in Rn.
Generalized Gordan theorem [Fan-Glicksburg-Hoffman 57]
Let f be an m-dimensional convex vector function on the convex set
T C Rn. Then either
I
4.1
Nonlinear Programming
or
II
or
II
66
4.9
then for some finite subfamily (/,- . . . ,/iJ of (fi)iM and some finite
subfamily (h^, . . . ,^,-J of (fti),-e/c there exist p Rm and q Rk
such that
If K is empty, that is if all equalities hi(x) = 0 are deleted, then the last
inequality above (^ 0) becomes a strict inequality (> 0).
67
Nonlinear Programming
4.2
PROOF
68
Chapter
Five
Saddlepoint
Optimality
Criteria
of Nonlinear
Programming
Without
Differentiability
6.1
Nonlinear Programming
requirements. However to establish the necessity of the above saddlepoint condition, we need not only convexity but also some sort of a regularity condition, a constraint qualification. This confirms earlier statements made to the effect that necessary optimality conditions are more
complex and harder to establish.
We shall develop the optimality criteria of this chapter without
any differentiability assumptions on the functions involved. Subsequent
chapters, Chaps. 7 and 11, will establish optimality criteria that involve
differentiable functions.
The set X is called the feasible region or the constraint set, x the
minimum solution or solution, and 6(x) the minimum. All points x in
the feasible region X are called feasible points.
If X is a convex set, and if 6 is convex on ^T, the minimization
problem MP is often called a convex programming problem or convex
program,
(We observe that the above minimization problem is a special case
of the general minimization problem 1.6.9, where the additional ^-dimensional vector equality constraint h(x) = 0 was also present. The reason
for this is that in the absence of differentiability there are no significant
optimality criteria for problems with nonlinear equality constraints.
Some results for linear equality constraints will be obtained however.
See 5.8.2, 6.4.2, and 6.4.8.)
The local minimization problem (LMP)
Find an x in X, if it exists, such that for some open ball B&(x)
around x with radius 6 > 0
70
8.1
Remark
If (z,fo,f) is a solution of FJSP and f 0 > 0, then (x,f/f0) is a solution of KTSP. Conversely, if (x,u) is a. solution of KTSP, then (x,l,u)
is a solution of FJSP.
Remark
The numerical functions <(>(x,r0,r) and t(x,u) defined above are
often called Lagrangian functions or simply Lagrangians, and the m-dimensional vectors f and u Lagrange multipliers or dual variables. These
multipliers play a role in linear and nonlinear programming which is very
similar to the role played by the Lagrange multipliers of the classical
calculus where a function of several variables is to be minimized subject
to equality constraints (see for example [Fleming 65]). Here, because
we have inequality constraints, the Lagrange multipliers turn out to be
nonnegative. When we shall consider equality constraints in 5.3.2,
5.4-2, and 5.4-8, the multiplier associated with these equalities will not
be required to be nonnegative.
Remark
The right inequality of both saddlepoint problems, FJSP 3 and
KTSP 4
and
71
5.2
Nonlinear Programming
can be interpreted as a minimum principle, akin to Pontryagin's maximum principle] [Pontryagin et al. 62]. Pontryagin's principle in its
original form is a necessary optimality condition for the optimal control
of systems described by ordinary differential equations. As such, it is a
necessary optimality condition for a programming problem, not in Rn,
but in some other space. More recently [Halkin 66, Canon et al. 66,
Mangasarian-Fromovitz 67] a minimum principle has also been established for optimal control problems described by ordinary difference
equations. This is a programming problem in Rn, which unfortunately
is not convex in general, and hence the results of this chapter do not apply.
However the optimality conditions of Chaps. 7 and 11, which are based
mainly on linearization and not on convexity, do apply to optimal control
problems described by nonlinear difference equations.
2.
We establish now some basic results concerning the set of solutions of the
minimization problem and relate the solutions of the minimization and
local minimization problems to each other.
Theorem
Let X be a convex set, and let 6 be a convex function on X.
of solutions of MP 5.1.1 is convex.
The set
That is,
Hence (1 X)^1 + Xz2 is also a solution of MP, and the set of solutions
is convex.
t Pontryagin gets a maximum principle instead of a minimum principle because his
Lagrangian is the negative of the Lagrangian of nonlinear programming.
72
5.2
Uniqueness theorem
Let X be convex and x be a solution of MP 5.1.1. If 8 is strictly
convex at x, then x is the unique solution of MP.
PROOF Let x ^ x be another solution of MP, that is, x X, and
0(z) = 6(x). Since -X" is convex, then (1 X)x + Arc G X whenever
0 < X < 1, and by the strict convexity of 0 at x
This contradicts the assumption that 8(x) is a minimum, and hence x
cannot be another solution.
Theorem
Let X be convex, and let 6 be a nonconstant concave function on X.
Then no interior point of X is a solution of MP 5.1.1, or equivalently any
solution x of MP, if it exists, must be a boundary point of X.
PROOF If MP 5.1.1 has no solution the theorem is trivially true. Let
x be a solution of MP. Since 6 is not constant on X, there exists a point
x G X such that 6(x) > 6(x). If z is an interior point of X, there exists
a point y j X such that for some X, 0 ^ X < 1
See Fig. 5.2.1.
Hence
Fig. 5.2.1
73
Nonlinear Programming
6.8
PROOF If x solves MP, then x solves LMP for any 8 > 0. To prove
the converse now, assume that x solves LMP for some 8 > 0, and let X
be convex and 6 be convex at x. Let y be any point in X distinct from x.
Since X is convex, (1 X)z + Xy G X for 0 < X ^ 1. By choosing X
small enough, that is, 0 < X < 8/\\y x\\ and X ^ 1, we have that
Hence
5.8
and all x in X
It follows then that QJ(X) ^ 0. Repeating this for all j, we get that
g(x) ^ 0, and hence x is a feasible point, that is, x G X.
Now since u ^ 0 and g(x) ^ 0, we have that ug(x) ^ 0. But
again from the first inequality of the saddlepoint problem we have, by
setting u = 0, that ug(x) ^ 0. Hence ug(x) = 0.
Let x be any point in X, then from the second inequality of the
saddlepoint problem we get
75
5.4
Nonlinear Programming
Necessary criteria
Sufficient criteria
No convexity needed
5.4
Because x solves MP
77
5.4
Nonlinear Programming
78
6.4
5.4
Nonlinear Programming
some f0R, f Rm, (f 0 ,f) > 0, solve FJSP 5.1.3 and fg(x) = 0. If
f 0 > 0, then by Remark 5.1.5 we are done. If f0 = 0, then f > 0, and
from the second inequality of FJSP 5.1.3
which contradicts Karlin's constraint qualification 4- Hence f 0 > 0.
We summarize in Fig. 5.4-1 the relationships between the solutions
of the various problems of this chapter.
We end this section by deriving a Kuhn-Tucker saddlepoint necessary optimality criterion in the presence of linear equality constraints.
In order to do this, we have to let the set X of MP 5.1.1 be the entire
space Rn.
Kuhn-Tucker saddlepoint necessary optimality theorem in the
presence of linear equality constraints [Uzawa 58]
Let 6, g be respectively a numerical function and an m-dimensional
vector function which are both convex on Rn. Let h be a k-dimensional
linear vector function on Rn, that is, h(x) = Bx d, where B is a k X n
matrix, and d is a k-vector. Let x be a solution of the minimization problem
S.4
(ii)
(Generalized Karlin 4)
such that
(iii)
PROOF
6.4
Nonlinear Programming
0 ^ fg(x) + s(Bx - d)
for all x Rn
82
Chapter
Six
Differentiable
Convex
and
Concave
Functions
6.1
Nonlinear Programming
or
Since
84
6.1
PROOF
6.1
Nonlinear Programming
The con-
Hence
and by Theorem 2 above, 6 is convex on F.
If / is an n-dimensional function on F C Rn, and [/(x2) /(x1)]
86
6.1
6.8
Nonlinear Programming
reader. The proof of Theorem 1 is not only different from the proof of
Theorem 6.1.1 but also makes use of Theorem 6.1.1. We give this proof
below.
PROOF OF THEOREM 1 Let 0 be strictly convex at x.
Then
or
88
6.8
PROOF
Let y G Rn-
Hence
Taking the limit as X approaches zero, and recalling that lim 0(x,\y) = 0,
we get that
The concave case is established in a similar way.
Theorem
Let 8 be a numerical twice-differentiable function on an open convex
set T C Rn- 8 is convex on T if and only if V20(z) is positive semidefinite
on T, that is, for each x G T
8 is concave on T if and only if V28(x) is negative semidefinite on T, that is,
for each x G T
89
8.4
Nonlinear Programming
PROOF
(Necessity) Since 6 is convex (concave) at each x (E T, this part of the
proof follows from Theorem 1 above.
(Sufficiency) By Taylor's theorem D.2.2 we have that for any x1^ T
for some 8, 0 < 5 < 1. But the right-hand side of the above equality is
nonnegative (nonpositive), because V 2 0(z) is positive (negative) semidefinite on T, and xl + &(xz xl) G r. Hence the left-hand side is
nonnegative (nonpositive), and by Theorem 6.1.2 6 is convex (concave)
on T.
6.4
Theorem
Let 6 be a numerical twice-differentiable function on an open convex
set F C Rn- -4 nonnecessary but sufficient condition that 8 be strictly convex on T is that V 2 0(z) be positive definite on T; that is, for each x G T
A nonnecessary but sufficient condition that 6 be strictly concave on T is
that V 2 0(z) be negative definite on T; that is, for each x G F
PROOF The nonnecessity follows from Theorem 1 above. The sufficiency proof is essentially identical to the sufficiency proof of Theorem
6.3.8.
91
Chapter
Seven
Optimality
Criteria
in Nonlinear
Programming
With
Differentiability
93
Nonlinear Programming
7.2
(It is implicit in the above statement that 6 and g are differentiable at x.)
The Kuhn-Tucker stationary-point problem (KTP)
Find x G X, u E Rm if they exist, such that
2.
7.2
Hence
Since <7(x) ^ 0, x is in .X", and hence
It should be remarked here that because of the convexity requirements on g, nonlinear equality constraints of the type h(x) = 0 cannot
be handled by the above theorem by replacing them by two inequalities
h(x) ^ 0 and h(x) 5 0. However, linear equality constraints can be
handled by this procedure.
Problem
Let x G X, let X be open, let 6 and g be differentiable and convex
at x, let B be a given k X n matrix, and let d be a given fc-dimensional
vector. Show that if (f,w,y), x G X, u G Rm, v G Rk is a solution of
the following Kuhn-Tucker problem
then
7.2
Nonlinear Programming
/ = {i\Qi(x) = 0 }
Let
J = [i\gi(x) <0}
?igi(x) = 0
for i 1, . . . , m
and hence
f, = 0
for i E J
has a solution
Consequently
has a solution
for if it did have a solution x E X, then x 5^ x, and
0 > tf(f) - 0(z) ^ V5(x)(f - x)
f Sometimes the constraints gi(x) ^ 0 are said to be active constraints at x, and pj(x) ^ 0
are said to be inactive constraints at x.
96
7.3
7.8
Nonlinear Programming
and let
Let
Hence
Let x be a solution of LMP 7.1.2 with 5 = 5. We shall show that
if z satisfies V6(x)z < 0, Vgw(x)z < 0, and Vgry(z)2 ^ 0 then a contradiction ensues. Let z satisfy these inequalities. Then, since X is open,
there exists a 5 > 0 such that
Since 6 and g are differentiate at x (see D.I.3), we have that for 0 < 8 < I
where
(i) If 6 is small enough (say 0 < 5 < 50), then
and hence
then
98
(ii)
7.3
and hence
that
(iii)
Hence, for 0 < 5 < 6, we have that x + 626 BI(X) r\Xand6(x + dz) < 6(x),
which contradicts the assumption that x is a solution of LMP 7.1.2 with 5
= 5. Hence there exists no z in R" satisfying V0()z < 0, Vgr,,-(f )z < 0, and
V0v()z^0.
We are now ready to derive a series of necessary optimality criteria
based on the above linearization lemma. We begin by a generalization of a
result of Abadie which, for the case of a finite number of constraints, includes
the classical result of Fritz John.
Fritz John stationary-point necessary optimality theorem
[John 48, Abadie 67]
Let xbea solution of LMP 7.1.2 or of MP 7.1.1, let X be open and
let 8 and g be differentiable at x. Then there exists an f0 E R and anfRm
such that (x,r0,f) solves FJP 7.1.3, and
99
7.S
Nonlinear Programming
where
PROOF
Let
and
Hence by Motzkin's theorem 2.4-2 there exist f0, fw, fv such that
and
ttttttH
101
7.3
Nonlinear Programming
where
where
and
108
7.3
(i)
(ii)
(iii)
7.3
Nonlinear Programming
(ii)
Define
Let y be any vector in Rn satisfying
Define
for some
Obviously conditions (a) and (c) of the Kuhn-Tucker constraint qualification 3 are satisfied. We will now show that condition (6) is also
satisfied. Since X is open, and since gi is concave and differentiate
at x, we have that
and
for 0 ^ T ^ 1 and 0 ^ X < X. Hence for 0 ^ X < X and 0 ^ r ^ 1, we
have that
and for i G /, we have that
where the last inequality holds because gj(x) < 0 and lim ati(x,\Ty) = 0.
Hence gj[e(r)\ < 0, for 0 ^ r ^ 1. Since gi[e(r)] ^ 0 for 0 ^ r ^ 1,
we have that
7.3
Since g
where
Hence by taking z = x x, we have that Vgf/(z)z > 0, and the ArrowHurwicz-Uzawa constraint qualification 4 is satisfied at x.
The results of the above lemma are summarized in Fig. 7.3.2.
We are now ready to derive the fundamental necessary optimality
criterion of nonlinear programming, the Kuhn-Tucker necessary optimality criterion. We shall establish the result under all the constraint qualifications introduced. In view of Lemma 6 above, we need only establish
the result under the Kuhn-Tucker constraint qualification 3 and the
Arrow-Hurwicz-Uzawa constraint qualification 4- (A somewhat involved
proof [Abadie 67] shows that a special case of the Arrow-Hurwicz-Uzawa
constraint qualification, where gv are the linear active constraints and
gw are the nonlinear active constraints, implies the Kuhn-Tucker constraint qualification. We take here a different and somewhat simpler
approach and show that either the Arrow-Hurwicz-Uzawa or the KuhnTucker constraint qualification is adequate for establishing the necessary
optimality conditions we are after.)
Kuhn-Tucker stationary-point necessary optimality theorem
[Kuhn-Tucker 51]
Let X be an open subset of Rn, let 6 and g be defined on X, let x
solve LMP 7.1.2 or MP 7.1.1, let 6 and g be differentiable at x, and let g
7.3
Nonlinear Programming
satisfy
(i) the Kuhn-Tucker constraint qualification 3 at x, or
(ii) the Arrow-Hurwicz-Uzawa constraint qualification 4 at x, or
(iii) the reverse convex constraint qualification 5 at x, or
(iv) Slater's constraint qualification 5.4-3 on X, or
(v) Karlin's constraint qualification 5.4-4 on X, or
(vi) the strict constraint qualification 5-4-5 on X.
Then there exists a u (E Rm such that (x,u) solves KTP 7.1-4PROOF In view of Lemma 6 above we need only establish the theorem
under assumptions (i) or (ii).
(i) Let x solve LMP 7.1.8 with 5 = 5. Let
We have to consider two cases: the case when 7 is empty and the case
when 7 is nonempty.
(7 = 0) Let y be any vector in Rn such that yy = 1. Then
Since Qi(x) < 0 and lim ai(x,8y) = 0, it follows that for small enough 5,
-o
say 0 < 8 < 8 < 5, gt(x + fy) < 0 and x + Sy E ^. But since x
solves LMP 7.1.2, we have that
Since lim a(x,dy) = 0, we have on taking the limit of the above expression
*-o
as 5 approaches zero that
7.3
where lim Ti(0,r) = 0. Hence by taking T small enough, say 0 < T <
f < 1, we have that e(r) G Bi(x).
x solves LMP 7.1.2, we have that
Hence b}^ the chain rule D.I.6 and the differentiability of 6 at x and of e
at 0, we have for 0 < T < f that
Hence
107
7.8
Nonlinear Programming
we have that
Hence (x,u) solves KTP 7J.4(ii) Let solve MP 7.1.1 or LMP 7J.0. Then by the Fritz John
theorem 2 there exists an f0 R and an f G # m such that (x, r0, f) solves
FJP 7.1.3, and
where
Define
and
By Remark 7.1.5 we only have to show that f 0 > 0. Since (f0,fw) > 0,
we have that f 0 > 0 if W is empty. Assume now that W is nonempty.
We will now show by contradiction that f0 > 0. Suppose that f 0 = 0;
then since fj = 0, we have that
108
7.S
7.S
Nonlinear Programming
Fig. 7.3.5 Relationships between the solutions of the local minimization problem (LMP) 7.1.2, the minimization problem (MP) 7.1.1, the Fritz John
problem (FJP) 7.1.3, the Kuhn-Tucker problem (KTP) 7.14, the Fritz
John saddlepoint problem (FJSP) 5.1.3, and the Kuhn-Tucker saddlepoint problem (KTSP) S.I.4.
(i)
(ii)
7.3
then
Under what conditions is the converse true? Give a geometric interpretation of the above conditions for the case when X = R.
Problem
Let X be an open set in Rn, let 8 be defined on X, let x X,
and let 6 be differentiable at x. Show that if
then
Under what conditions is the converse true? Give a geometric interpretation of the above conditions for the case when X = R.
Problem (method of feasible directions [Zoutendijk 60])
Let one of the assumptions (ii) to (vi) of Theorem 7 hold, and let 6
and g be differentiable on X. Suppose that we have a point x X at
which one of the last five constraint qualifications of 7 holds but at which the
Kuhn-Tucker conditions 7.1.4 are not satisfied. Show that there exists a feasible direction z in Rn such that x + dz X and 6(x + dz) < 6(x) for some
111
7.8
5 > 0.
Nonlinear Programming:
State
(Hint: Redefine X as
X = {(x,y) | (x,v) G X, g(x,y) ^ 0, /i^y) g 0, -h(x,y} ^ 0, -y ^ 0}
and then use Theorem 7.)
113
Chapter
Eight
Duality
in
Nonlinear
Programming
8.1
Nonlinear Programming
on X.
Then
8.1
PROOF
Hence
(x.u) G Y = \(x,u)
Then
115
1.1
Nonlinear Programming
Hence
Since
where a is some nonnegative number. The dual to this problem is: Find
x G R2 and u G R* such that
When a = 1, the primal problem has one feasible (and hence minimal)
point, Xi = X2 = 0. At this point none of the constraint qualifications
are satisfied. However it can be verified (after a little calculation) that
x = 0, u 0 also solves the dual problem with a maximum of zero.
Another important duality theorem is the converse of Theorem 4
above. In order to obtain such a theorem, we have to modify the
hypotheses of Theorem 4- There are a number of such converse theo11*
8.1
and
Because (,u) G Y, we have that Vxif/(,u) = 0. Hence by the strict
convexity of \l>(x,u) at x and 6.2.1 we have that
It follows then that
or that
But from Theorem 7.3.7 we have that ug(x) = 0, hence
f We have used the adjective strict to distinguish the above converse duality theorem from
other theorems, such as Dorn's converse duality theorem [Dorn 60] (see 8.2.6 below) and
theorem 5.6 of [Mangasarian-Ponstein 65], in which the solution of the primal problem
x need not equal x, where (x,u) is the solution of the dual problem.
117
8.1
Nonlinear Programming
8.1
and since
x = e(u)
The last four relations and Theorem 7.2.1 imply that x is a solution of
UP 1. We also have that
We give now a theorem which shows when the dual problem has
an objective function, unbounded from above, on the set of dual feasible
points Y. This is a generalization of a theorem of Wolfe [Wolfe 61,
theorem 3].
Unbounded dual theorem
Let X be an open set in Rn, and let 6 and g be differentiable on X.
If there exists a dual feasible point (xl,u1} such that
then the dual problem has an unbounded objective function (from above) on
the set of dual feasible points Y.
119
Nonlinear Programming
Consider now the point (x1, w1 -f- ru2) for any r > 0. We have that
and hence (x1, w1 + rw 2 ) is dual feasible for any T > 0. Since u2g(xl) = 1,
we have that the dual objective function
tends to as r tends to .
REMARK The above theorem does not apply if g is convex at xl and
the primal problem has a feasible point, x*, say. For if we define
z x* xl, then we have that
and hence ^(x1) + Vg^z1)* ^ 0 has a solution 2 = x* x1.
EXAMPLE Consider the example in the Remark following Theorem 4
above, and let a = y. Obviously then the primal problem has no
feasible point. Consider now the dual feasible point Xi1 = x 2 x = 0,
Ui1 = W2 1 = 1. We have then that
has no solution z G Rn, and hence by Theorem 7 above the dual objective
function is unbounded from above on the dual feasible region Y. This
can be easily seen if we let Xi = x 2 = 0 and let u\ w 2 tend to . Then
^(x,w) tends to w.
no
8.1
PROOF
We claim that
has no solution
For if it did have a solution z Rn, then x* = xl + z would satisfy
where the next-to-last inequality follows from 6.1.1. Hence
which contradicts the assumption that X is empty. So,
has no solution z in Rn, and by Theorem 7 the dual problem has an
unbounded objective function (from above) on the set Y of dual feasible
points.
The case when g is linear in the above corollary is theorem 3 of
[Wolfe 61].
We finally give a theorem which tells us when the primal problem
has no local or global minimum.
Theorem (no primal minimum)
Let X be an open set in Rn, let 0 and g be differentiable and concave^
on X, and let X ?* 0. // the dual problem DP 2 has no feasible point, then
neither the primal problem MP 1 nor the local minimization problem LMP
7.1.2 has a solution.
PROOF
iai
8.1
Nonlinear Programming
has a solution
Let x (E X.
122
8.8
State and prove the duality relations that hold for these dual problems
and the conditions under which they hold.
Using this equation in the objective function, the dual problem becomes
the following.
t If C is not symmetric, we replace C by (C + C")/2 in QMP 1 because xCx - x[(C + C')/2]x.
M
8.2
Nonlinear Programming
Then
8.9
115
Nonlinear Programming
8.8
PROOF
Then
Problem
Let b, a, c, and d be given vectors in Rn, Re, Rm, and Rk, respectively, and let A, D, B, E, C, and F be given ra X n, m X A /c X n, k X A
n X n (symmetric), and / X t (symmetric) matrices, respectively. Show
that the following are dual quadratic problems.
8.3
Note that the dual problem contains only the variable u. The
nonlinear and quadratic dual problems 8.1.2 and 8.2.2 contain both the
variables x and u. The dual linear program has the unique feature that
it does not contain the primal variable x.
We combine now all the fundamental duality results of linear
programming in the following theorem.
Duality theorem of linear programming [Gale et al. 51]
(ii)
(\\\)
(iv)
Let X 7* 0.
Then
REMARK Both of the sets X and Y may be empty, in which case none
of the above results hold. For example take A = Q, c = 1,6= 1.
(i) Follows from 8.8.3.
(ii) Follows from 8.24 and 8.2.6.
(ill) The forward implication follows from 8.2.7.
ward implication is equivalent logically to
PROOF
The back-
8.3
Nonlinear Programming
(<=)
(A'v = 0, v ^ 0) = cv ^ 0
Similarly, since F 5^ 0, A'w = 6, u ^ 0 has a solution w, which implies
that Ay ^ 0, 6y > 0 has no solution y (for if it did have a solution y,
then 0 ^ wAy = by, which contradicts by > 0). Hence
Ay ^ 0 => by ^ 0
has a solution
128
8.3
has a solution
has a solution
has a solution
has a solution
has a solution
Nonlinear Programming
8.3
Problem
Consider the primal linear minimization problem LMP 1. Show
that if X T* 0, and if, for all x in X, bx^a for some real number a,
then LMP 1 has a solution x. (Hint: Use the facts that -X" 5^ 0 and that
bx < a has no solution x in X to show that there exists a dual feasible
point, that is, a w 6E Y; then use Theorem 3, part (v).)
Remark
It is not true in general that if a linear function is bounded from
below on a closed convex set, then it achieves its infimum on that set.
For example, x% ^ 0 on the closed convex set {x \ x G R2, (2)~x> xz ^ 0};
however x2 does not attain its infimum of zero on the set.f Problem 7
above shows that if a linear function is bounded from below on a poly tope,
then it achieves its infimum on the polytope.
Problem
Show that the following are dual linear programs:
where 6, a, c, and d are given vectors in Rn, Rf, Rm, and Rh, respectively
and A, D, B, and E are given m X n, m X , k X n, and k X I matrices,
respectively.
11 am indebted to my colleague James W. Daniel for this example.
ISO
Chapter
Nine
Generalizations
of
Convex
Functions:
Quasiconvex,
Strictly
Quasiconvex
and
Pseudoconvex
Functions
Nonlinear Programming
Quasiconcave function
A numerical function 6 defined
concave at x G F (with respect to
B(x) ^ Q(x), the function 6 assumes
each point on the intersection of the
equivalently
9.1
on R.
3Theorem
Let 6 be a numerical function defined on a convex set T C Rn, let
and let
Then
(B quasiconvex on F) <=> (Aa is convex for each a G R)
and
(9 quasiconcave on F) => (fta is convex for each a G R)
PROOF We shall establish the theorem for the quasiconvex case.
quasiconcave case follows from it.
The
9.1
Nonlinear Programming
PROOF We shall prove the quasiconvex case only. The other case is
similar.
(=>) If x1 xz, the implication is trivial. Assume then that
1
x T* x*. Since F is open, there exists an open ball B&(x1} around x1 which
is contained in F. Then, for some ft such that 0 < ft < 1 and # < d/
134
9.1
we have that
Then
and let
We establish now the quasiconvexity of 0 on T by showing that ft is empty.
We assume that there is an x G ft and show that a contradiction ensues.
Since 0(z2) ^ Q(xl) < B(x) for x G 0, we have from the hypothesis of the
theorem that
and
1SS
9.2
Nonlinear Programming
Since 6(xl) < 0(x), and since 6 is continuous on F, the set S) is open
tive to (xl,x*), it contains x, and there exists an x3 = (1 ju)z +
0 < M ^ 1, such that x3 is a point of closure of ft, and such that 0(z3) =
(see C.I.I and B.I.3}. By the mean-value theorem D.2.1 we have
for some x G ft
relanxl,
6(x1)
that
and since
then
Since x G fy the last relation above contradicts the equality established
earlier, V0(x)(x 2 - x1} = 0 for all x E fi.
9.3
We introduce now a type of function which is essentially a restriction of quasiconvex (quasiconcave) functions to functions with the property that a local minimum (maximum) is also a global minimum (maximum). Such functions have been independently introduced in [Hanson
64, Martos 65, Karamardian 67]. (These functions were called functionally convex by Hanson and explicitly quasiconvex by Martos.)
Strictly quasiconvex function
A numerical function 6 defined on a set F C Rn is said to be strictly
quasiconvex at x G F (with respect to F) if for each x G F such that
6(x) < B(x) the function 8 assumes a lower value than 6(x) on each point
in the intersection of the open line segment (x,x) and F, or equivalently
9.2
Nonlinear Programming
9.3
Hence if 6(xz) < 8(xl), we are done by 9.1.1. Assume now that
6(xz) = 6(xl). We will show (by contradiction) that there exists no
G (xl,xz) such that 6(xl) < 8(x). This will then establish the quasiconvexity of 6 by 9.1.1. Let x (x^x*) such that 8(xl) < 8(x). Then
Since 6 is lower semicontinuous on F, ft is open relative to (xl,x*) by
C.1.2(\v}. Hence, there exists an G (xl,x) C\ ft. By the strict quasiconvexity of 6 and 1 we have that (since G ft and x G ft)
and
which is a contradiction. Hence no such x exists and 6 is quasiconvex
on T.
That the converse is not true follows from the example given at the
beginning of this section, which is quasiconvex on R but not strictly
quasiconvex on R. (For take x1 = %, x2 = %, X = ^!o> then
0(z2) < 6(x1), but 8[(l - \)xl + Xz 2 )] = 8(xl), which contradicts 1.)
Theorem
Let 8 be a numerical function defined on the convex set F in Rn, and
let x G r be a local minimum (maximum). If 8 is strictly quasiconvex
(strictly quasiconcave) at x, then 8(x) is a global minimum (maximum) of
8 on T.
PROOF We prove the strictly quasiconvex case.
mum, then there exists a ball BS(X) such that
If x is a local mini-
139
Nonlinear Programming
9.3
Assume now that there exists an x in F but not in BS(X) such that Q(x
&(x). By the strict quasiconvexity of 6 at x and the convexity of F, we
have that
and hence
9.3
to F) if it is differentiable at x and
Fig. 9.3.1 Pseudoconvex and pseudoconcave functions, (a) Pseudoconvex function 6 on Rn = R; (6) pseudoconcave function 6 on Rn = R.
141
9.3
Nonlinear Programming
(i)
(ii)
PROOF
(i)
at x.
Then
Then
Since T is convex
where
Hence
By taking the limit as X approaches zero, we get
The second implication of (i) follows similarly.
(ii) For any x in T we have that V6(x)(x x) ^ 0, hence by 1,
Q(x] ^ 6(x) and 6(x) = min 6(x). The second implication follows
*er
from 2.
Corollary
Let 6 be a numerical function defined on the open convex set T in Rn.
Let x T, and let 6 be differentiable at x. Then
142
(ii)
9.3
Then
and
for some x such that
Hence there exists an x (xl,x*) such that
where
By Theorem 8(1) above we have then that
143
9.3
Nonlinear Programming
and
Since
then
and
Hence
and
and hence
144
Let 6 be convex at x.
9.4
Hence
However, 6 is pseudo-
Theorem
Let F be a convex set in Rn, and let 9 be a numerical function defined
on some open set containing F. If 6 is pseudoconvex (pseudoconcave) on T,
then each local minimum (maximum) of 6 on T is also a global minimum
(maximum) of 8 on T.
PROOF By Theorem 5 & is strictly quasiconvex (strictly quasiconcave
on T. By Theorem 9.2.4 each local minimum (maximum) of 6 on T is
also a global minimum (maximum) on F.
or
145
9.4
Nonlinear Programming
Fig. 9.4.1 Properties and relations between differentiable quasiconvex, strictly quasiconvex, pseudoconvex, convex and strictly convex functions defined on an open convex set r C Rn146
9.8
Fig. 9.4.2 Properties and relations between differentiable quasiconcave, strictly quasiconcave,
pseudoconcave, concave and strictly concave functions defined on an open convex
set T C Rn-
5.
Warning
147
Nonlinear Programming
9.6
6. Problems
Nonlinear fractional functions
Let 0 and ^ be numerical functions defined on a set F C R", and
let <A ^ 0 on T.
(i)
and
(ii)
and
148
9.6
is both pseudoconvex and pseudoconcave (and hence also both quasiconvex and quasiconcave) on each convex set r C R" on which bx + /3 ^ 0.
Bi-nonlinear functions
Let 0 and cr be numerical functions defined on a convex set F Rn,
and let o- 7^ 0 on r.
(i)
9.6
Nonlinear Programming
160
Chapter
Ten
Optimality
and
Duality
for
Generalized
Convex
and
Concave
Functions
1. Sufficient optimality
criteria
We establish in this section two
sufficient optimality criteria which
are slight generalizations of the
sufficiency result of [Mangasarian
65]. The present results also subsume the sufficiency results of
[Arrow-Enthoven 61] and Theorem
7.2.1 of this book.
Sufficient optimality
theorem
Let X be an open set in Rn,
and let 6 and g be respectively a
numerical function and an m-dimensional vector function both defined
on X. Let x G X, let
10.1
Nonlinear Programming
PROOF
Let
and hence
Since
it follows by the quasiconvexity of QI at x and Theorem 9.1.4 that
and hence
But since uj = 0, we also have that
Addition of the last two relations gives
But since [V6(x) + uVg(x)](x x) ^ 0 for all x G X, the last inequality
gives
152
10.2
The following theorem, which is a generalizaton of the KuhnTucker sufficient optimality criterion 7.8.1, follows trivially from the
above theorem by observing that V6(x) -f uVg(x) = 0 implies that
[V0(z) + uVg(x)](x - x) = 0 for all x 6 X.
Kuhn-Tucker sufficient optimality theorem
Theorem 1 above holds with the condition
replacing the condition
\i I Qi(%) 0>
an
d Qi is pseudoconcave at x\
and let
Q H I 9i(%)
0>
d Qits t pseudoconcave at x}
an
10.2
Nonlinear Programming
modified as follows:
(in') For i 6E P, we have by 9.3.2, since 0t is pseudoconcave at
x and dVgp(x)z ^ 0, that
We generalize now the Fritz John stationary-point necessary
optimality theorem 7.3.2.
Fritz John stationary-point necessary optimality theorem
Let x be a solution of LMP 7.1.2 or of MP 7.1.1, let X be open, and
let 8 and g be differentiate at x. Then there exist an f 0 G # and an f (E Rm
such that (x,f0,r) solves FJP 7.1.3 and
where
PROOF The proof follows from Lemma 1 above in exactly the same
way as Theorem 7.3.2 follows from Lemma 7.3.1.
Again, as in Chap. 7, there is no guarantee that f 0 > 0 in the above
theorem. To ensure this, we impose constraint qualifications. We give
below less restrictive versions of constraint qualifications that were
introduced earlier.
The weak Arrow-Hurwicz-Uzawa constraint qualification
(see 7.3.4)
Let X be an open set in Rn, let g be an m-dimensional vector function defined on X, and let
g is said to satisfy the weak Arrow-Hurwicz-Uzawa constraint qualification at x G X if g is differentiable at x and
has a solution z Gj Rn
where
and Qi is pseudoconcave at x}
and
10.3
lo.a
(iii)
Nonlinear Programming
10.3
Then, there exists a u G Rm such that (x,u) solves KTP 7.1.4PROOF In view of Lemma 6 we need only establish the theorem under
assumptions (i) and (ii) above.
(i) This is the same as 7.3.7(1).
(ii) This proof is identical with the proof of Theorem 7.3.7(ii)
with the following modifications: V and W are replaced by P and Q, and
Theorem 2 is used in the proof instead of Theorem 7.8.2.
3. Duality
In this section we will extend the Hanson-Huard strict converse duality
theorem 8.1.6 in two directions. In the first direction we will show that
(x,u) need only be a local solution of the dual problem rather than a global
solution. In the second direction we will relax the convexity of the objective function 6 to pseudoconvexity, and the convexity of the constraint
function g to quasiconvexity. We will also show that neither the weak
duality theorem 8.1,3 nor Wolfe's duality theorem 8.1.4 holds for a
pseudoconvex 6.
Strict converse duality theorem [Mangasarian 65]
Let X be an open set in Rn, and let 6 and g be differentiate
Let (x,u) be a local solution of DP 8.1.2, that is,
on X.
157
Nonlinear Programming
10.3
and
If x solves MP 8.1.1, it does not follow that x and some u solve DP 8.1.2
(even if g is linear), nor does it follow that 0(x) ^ $(x,u) for any dual feasible point (x,u), that is, (,u) E Y.
PROOF The proof of the first part of the theorem is essentially identical
to the proof of Theorem 8.1.6, except that we invoke the sufficient optimality theorem 10.1.2 here, whereas we invoked the sufficient optimality
theorem 7.2.1 in the proof of 8.1.6. (The assumption that (x,u) need
only be a local solution of DP 8.1.2 does not change the proof of 8.1.6
and in fact could have been made in Theorem 8.1.6 also. It would have
added nothing to that theorem, however, since under the assumptions of
Theorem 8.1.6 each local maximum is also a global maximum. This,
however, is not the case under the present assumptions, as can be seen
from the example following this proof, in which a local maximum to the
dual problem is not a global maximum.)
We establish now the second part of the theorem by means of the
following counterexample:
158
10.8
and note that the quadratic equation 1 2x + 2(z)2 = 0 has only complex roots.) We also have dual feasible points (,ti), such as = 10,
U = 20e-100, such that 8(x) < iK,fl).
We give now an example where the first part of the above theorem
applies and where the dual problem has no global maximum but a local
maximum which is also a solution of the primal problem. Consider the
primal minimization problem
Setting v = I u gives
Hence V<() = 0 implies that x = 1, and since V 2 0(z) < 0 for x < 0,
</> is strictly concave on {x \ x G R, x < 0}, and f = 1 is a local maximum of 0. However : = 1 is not a global maximum of <f> on {x \ x G R,
x > 0 or x < 0}, since <p approaches < as x approaches <. However
(,#) == ( 1,^) is a local solution of the dual problem and ^(z,w) = 2.
1S9
10.3
Nonlinear Programming
160
Chapter
Eleven
Optimality
and
Duality
in the
Presence
of Nonlinear
Equality
Constraints
11.1
1,
Nonlinear Programming
The sufficient optimality criteria given here follow directly from Theorem
10.1.1 and Theorem 10.1.2 by observing that the equality constraint
h(x) = 0 can be written as h(x) ^ 0 and h(x) ^ 0, and that the negative of a quasiconcave function is quasiconvex.
Sufficient optimality theorem
Let X be an open set in Rn, and let d, g, and h be respectively a numerical function, an m-dimensional vector function, and a k-dimensional vector
function, all defined on X. Let x G X, let
let 0 be pseudoconvex at x, let gr be differentiate and quasiconvex at x, and let
h be differentiate, quasiconvex, and quasiconcave at x. If there exist u j Rm
and v Rk such that (x,u,v) satisfies the following conditions
2.
162
11.2
ll.X
Nonlinear Programming
Then
164
11.1
and
VXKh(x) is nonsingular
Since h and e are differentiable, and since h[xi,e(xi)] = 0 for all XT E fy
we have, by the chain rule D.L6, that
Postmultiplication by (XT XT) gives
But by assumption Vh(x)( x) = 0, or equivalently
Hence the last two distinct equalities and the nonsingularity of VXKh(x)
imply that
188
11.1
Nonlinear Programming
166
11.9
Since b tends to zero as 5 tends to zero, and since Vf(x) (x x) < 0, there
exists a 6 > 0 such that
xnA.
We derive now another lemma from the above one by using the
fundamental theorem for convex functions, Theorem ^$.1 (which is a
consequence of the separation theorem for convex sets, Theorem 3.2.8).
This will be the key lemma in deriving the minimum-principle necessary
optimality criterion.
Lemma
Let X be a convex set in Rn with a nonempty interior, int (X~), and
let A be an open set in Rn. Let f be an ^-dimensional vector function, and
let h be a k-dimensional vector function, both defined on some open set containing X. Let
Let f be differentiable
at x. Then
has
solution
has aa solution
167
11.3
Nonlinear Programming
Since the interior, int (X), of a convex set is convex (see 3.1.7), it follows
by Theorem 4-2.1 that there exist p G R*, q G Rk, P ^ 0, (p,q) ^ 0 such
that
and since the above expression is continuous in x, and in fact linear, the
above inequality holds also on the closure X of X.
We are now ready to derive the fundamental necessary optimality
criterion of this chapter.
Minimum-principle necessary optimality theorem
Let X be a convex set in Rn with a nonempty interior: int (X). Let
6 be a numerical function, let g be an m-dimensional vector function, and let
h be a k-dimensional vector function, all defined on some open set containing
X. Let x be a solution of
Let 6 and g be differentidble at x, and let h have continuous first partial derivatives at x. Then there exist f 0 G R> f G Rm, s G Rk such that the following
conditions are satisfied
168
Let
11.2
Let m/ and mj denote the number of elements in the sets / and J respectively, so mi -\- mj = m.
Since g is defined on some open set containing X, and since g is
differentiate at x, there exists a 8 > 0 such that for i 6E J and \\x x \ < 8
for if it did have a solution, then x would not be a solution of the minimization problem. But we also have that
Hence by Lemma 2 above, there exist f 0
such that
11,8
Nonlinear Programming
convex sets with empty interiors can, in effect, be handled by the above
results. The convexity requirement on X is of course a restriction which
cannot be dispensed with easily. (See however [Halkin 66, Canon et al.
66].) If we replace the convexity requirement on X by the requirement
that XQ be open, a stronger necessary optimality condition than the above
one can be obtained. In effect this will be an extension of the Fritz John
stationary-point necessary optimality theorem 7.3.2 to the case of nonlinear equalities. We shall give this result in the next section of this
chapter.
where B&(x) is an open ball around x with radius 8. Let 6 and g be differentiable at x, and let h have continuous first partial derivatives at x. Then
there exist fo (E R, r G Rm, s G Rk such that the following conditions are
satisfied
PROOF
ITU
11.8
In either case (since X is open) there exists an open ball Bp(x) around x
with radius p such that Bp(x) C BI(X) C. X, and
is differentiable at
where
171
11.3
Nonlinear Programming
has a solution
where
ispseudoconcave
pseudoconcave
11.3
has a solution
where
where B$(x) is some open ball around x with radius 5. Let d, g, and h be
differentiate at x, and let g and h satisfy
(i)
(ii)
(m)
(iv)
U.*
Nonlinear Programming
Fritz John
s/f0 satisfy
f 0 = 0 and
active conHence by
11.4
and
and
If x solves the minimization pro blem 3, it does not follow that x and some u
and v solve the dual problem 2 (unless X is convex, 6 and g are convex on X,
and h is linear on Rn), nor does it follow that &(x) ^ \l/(,u,v) for any dual
feasible point (,#,), that is, (,u,v) Y.
175
11.4
Nonlinear Programming
Corollary
Let all the assumptions of Theorem 1 above hold, except that 6 need
not be pseudoconvex at x, nor g quasiconvex at x, nor h both quasiconvex and
quasiconcave at x. If (x,u,v) is a local solution of the dual problem 2, then
(,ufi} is a Kuhn-Tucker point, that is,
176
Appendix
A
Vectors
and
Matrices
1. Vectors
Fundamental theorem of
vector spaces
Let each of the vectors y, yl,
. . . , ym in Rn be a linear combination of the vectors xl, xz, . . . , xm
in Rn; then y, yl, . . . , ym are
linearly dependent.
A.I
Nonlinear Programming
A.2
Appendix A
Basis theorem
The linearly independent vectors xl, . . . , xr are a basis for a subset
n
S of R if and only if every vector y in S is a linear combination of the x\
PROOF [Gale 60] Suppose every y in S is a linear combination of
the x\ Then S contains no larger set of linearly independent vectors,
for any set of more than r vectors must be dependent since they are combinations of the x\ Therefore r is the maximum number of linearly
independent vectors which can be chosen from S, and hence the xi are
a basis for S.
Conversely, suppose that the xi are a basis for S. Then by definition, r is the number of vectors in the largest set of linearly independent
vectors that can be found in S. So if y is in S, then x1, . . . , xr, y are
linearly dependent, that is,
Hence
2.
Matrices
179
A.a
Nonlinear Programming:
from the basis theorem A.1.6 that for all k, Ak Y pikAi for some numbers pik. Hence
and
for all k
which is equivalent to
This shows that the columns A.I, . . . , A.t of A are linearly dependent,
which contradicts the assumption that they were a basis. This contradiction shows that r ^ s. The same argument applied to the transpose of A
shows that s ^ r. Hence r = s.
Rank of a matrix
The rank of a matrix is the column or row rank of the matrix
(which are equal by the above theorem).
Corollary
Let A be an r X n matrix with rank r (r ^ ri). Then for any b in Rr,
the system Ax = b has a solution x in R".
PROOF By Theorem 2 above, the column rank of A is r, and thus if
A.i, . . . , A.r, say, are a column basis, we have r linearly independent
vectors in Rr, and the vector b Rr can be expressed as a linear combination of them, that is,
Appendix A
A.2
Nonsingular matrix
An n X n matrix is said to be nonsingular if it has rank n.
Semidefinite matrix
An n X n matrix A is said to be positive semidefinite if xAx ^ 0
for all x in Rn and negative semidefinite if xAx ^ 0 for all x in Rn.
Definite matrix
An n X n matrix A is said to be positive definite if
and negative definite if
Obviously the negative of a positive semidefinite (definite) matrix
is a negative semidefinite (definite) matrix and conversely. Also, each
positive (negative) definite matrix is also positive (negative) semidefinite.
Proposition
Each positive or negative definite matrix is nonsingular.
PROOF Let A be an n X n positive or negative definite matrix. If A is
singular, then its rank is less than n, and there exists an x G R", x -^ 0,
such that Ax = 0. Hence xAx = 0 for some x ?** 0, which contradicts
the assumption that A is positive or negative definite. Hence A is
nonsingular.
181
Appendix
B
Resume'
of
Some
Topological
Properties
ofR n
1.
Appendix B
B.I
Open set
A set T C Rn such that every point of F is an interior point is said
to be open.
Closed set
A set T C Rn such that every point of closure of F is in F is said
to be closed.
Closure of a set
The closure T of a set T C Rn is the set of points of closure of F.
Obviously F C T, and for a closed set F = T.
Interior of a set
The interior int (F) of a set F C Rn is the set of interior points of F.
Obviously int (F) C F, and for an open set F = int (F).
Relatively open (closed) sets
Let F and A be two sets such that F C A C Rn- F is said to be
open (closed) relative to A if F = A C\ Q, where 0 is some open (closed)
set in Rn.
Obviously an open (closed) set F in Rn is open (closed) relative to
n
R . If F C A, and if F is open (closed), then F is open (closed) relative
to A for F = A r> F.
Problem
Show that:
(i) Every open ball B\(a) in Rn is an open set.
(ii) The closure B*(a) of an open ball B^(a) in Rn is
and is closed. (The closure of an open ball is called a closed ball
and is denoted by -Bx(a).)
(iii) The interior of a closed ball #x(a) is the open ball B\(a).
Theorem
The family of open sets in Rn has the following properties:
(i) Every union of open sets is open.
(ii) Every finite intersection of open sets is open.
(iii) The empty set 0 and Rn are open.
us
B.I
Nonlinear Programming
B.(x) C r, C r
Hence is an interior point of T, and T is open.
(ii) Let r,6/ be a finite family of open sets in Rn. If x
F = O I\, then a: I\ for each i /. Because each F, is open, there
exist c* > 0 such that
Bt<(x) C I\
for V* 7
Take e = minimum e' > 0 (this is where the finiteness of / is used), then
B.(X) c r
and F is open.
(iii) Since the empty set 0 contains no points, we need not find
an open ball surrounding any point, and hence 0 is open. The set Rn is
open because for each x Rn, Bt(x) C Rn for all e > 0.
The family of open sets in Rn, as defined in 4> is called a topology
n
in R . (In fact any family of setscalled openwhich has properties
(i), (ii), (iii) above is also called a topology in Rn. For example the sets
0, Rn also form a topology in Rn. We shall, however, be concerned here
only with open sets as defined by 40
Theorem
Let T C Rn-
PROOF
Then
Let x Rn.
Then
Hence
Corollary
The complement (relative to Rn) of an open set in Rn is closed, and
vice versa.
184
B.
Appendix B
PROOF
T is closed *=> T = T
(i)
(ii)
(iii)
Show that the set {x \ x G Rn, Ax ^ b}, and hence also the set
{a: | x G Rn, Ax = 6} are closed sets in Rn.
Show that the set {x \ x G Rn, Ax < b} is an open set in Rn.
Show that the set {x \ x j Rn, Ax < b} is neither a closed nor an
open set in Rn.
B.a
Nonlinear Programming
Limit point
Let x1, x2, ..., be a sequence of points in Rn. A point x Rn is said
to be a limit point of the sequence if
Remark
Then
Appendix B
B.3
Then
We shall take the following axiom as one of the axioms of the real
number system [Birkhoff-Maclane 53].
1ST
B.S
Nonlinear Programming
Axiom
Any nonempty set T of real numbers which has a lower (upper) bound
has a greatest (least) lower (upper) bound.
If the set F has no infimum (or equivalently by the above axiom
if it has no lower bound), we say that F is unbounded from below and we
write inf F = . Similarly if F has no supremum (or equivalently by
the above axiom if it has no upper bound), we say T is unbounded from
above and we write sup F = + . Hence by augmenting the Euclidean
line R by the two points + and any nonempty set F will have an
infimum, which may be , and a supremum, which may be + o. We
shall follow the convention of writing inf 0 = -f and sup 0 = .
We observe that neither inf F nor sup F need be in F. For example
inf {iy2,lA, . . .} is 0, but 0 is not in the set |1,H,H }
Theorem
Every bounded nondecreasing (nonincreasing) sequence of real numbers has a limit.
PROOF Let xl, x2, . . . , be a bounded nondecreasing sequence of real
numbers. By the above Axiom 9, the sequence has a least upper bound 0.
Hence
(by 8)
(because sequence is nondecreasing)
(because 0 ^ xn for all n)
The proof for a bounded nonincreasing sequence is similar.
Cauchy convergence criterion
A sequence xl, x2, . . . , in Rn converges to a limit XQ if and only if
it is a Cauchy sequence, that is, for each e > 0 there exists an n* such that
\\xm xn\\ < efor each m,n ^ n*.
For a proof see [Buck 65, Fleming 65, Rudiri 64].
3.
Compact sets in Rn
Bounded set
A set F C Rn is bounded if there exists a real number a such that
for each x F, ||x|| ^ a.
188
Appendix B
B.S
or equivalently
(iv)
te/
We shall not give here a proof of the equivalence of the above four
conditions; such a proof can be found in any of the references given at the
beginning of this Appendix. An especially lucid proof is also given in
chap. 2 of [Berge-Ghouila Houri 65].
Corollary
Let T and A be sets in Rn which are respectively compact and closed.
Then the sum ft
is closed.
PROOF Let x belong to the closure of Q. Then there exists a sequence
xl, x2, . . . , in 8 which converges to x (see B.2.6). Then we can find
189
Nonlinear Programming
B.3
forn = 1, 2, . . .
Hence
and ft is closed.
1*0
Appendix
C
Continuous
and
Semicontinuous
Functions,
Minima
and
Infima
(ii)
no
and
(iv)
C.I
Nonlinear Programming
and
(v)
(ii)
where lim inf 8(x") denotes the infimum of the limit points of the
n+
(iii)
The set
(iv)
198
C.I
Appendix C
The epigraph of 6
is closed relative to I' X R.
(ii)
where lim sup 0(xn) denotes the supremum of the limit points of the
no
(iii)
The set
(iv)
(v)
C.I
Nonlinear Programming
if 6 is upper semicontinuous at x T (with respect to r). 8 is continuous at x (E T (with respect to T) if and only if it is both lower semicontinuous and upper semicontinuous at x F (with respect to r).
Examples
is lower semicontinuous on R, see Fig.
C.I.I
is upper semicontinuous on R, see Fig.
rU1.-1 .9
Theorem
Let (6i)ii be a (finite or infinite) family of lower semicontinuous
functions on T C Rn> Its least upper bound
e(x) = sup 8t(x)
i&
194
Appendix C
C.S
is closed (relative to F). The second part follows from #(iii) and Theorem
B.IJ3(ii) if we observe that for any real X the set
2.
We recall that in Appendix B (B.2.8] we defined the infimum and supremum of a set of real numbers F as follows
and
195
C.S
Nonlinear Programming
and
If there is a
and
Appendix C
C.S
Examples
Remarks similar to (i), (ii), and (iii) above, which applied to the
minimum of a function, also apply here to the maximum of a function.
197
Nonlinear Programming
C.4
PROOF
the set
Let 7 > a.
Then
Hence by the
8 is lower semicontinuous on F,
T is closed, and
T is bounded.
We give examples below where the infimum is not attained whenever any
one of the above conditions is violated.
198
Appendix C
C.4
(i)
inf
(ii)
inf 8(x) = 0, but no minimum exists of this continuous function
0<x<1
(iii)
inf 6(x) = 0, but no minimum exists of this continuous function on
199
Appendix
D
Differentiable
Functions,
Mean-value
and
Implicit
Function
Theorems
Appendix D
D.I
tends to a finite limit when 5 tends to zero. This limit is called the partial
derivative of 6 with respect to x, at x and is denoted by 66(x)/dXi.
The
n-dimensional vector of the partial derivatives of 6 with respect to
Xi, . . . , xn at x is called the gradient of 6 at x and is denoted by V0(z),
that is,
Theorem
Let 6 be a numerical function defined on an open set Y in Rn, and
let x be in T.
(i) // 6 is differentiate at x, then 6 is continuous at x, and V0(z) exists
(but not conversely), and
Nonlinear Programming
D.I
has partial
at x and if <f> is
differentiate
V6(x) = V0()V/(z)
Twice-differentiable numerical function and its Hessian
Let 6 be a numerical function denned on an open set F in Rn, and
let x be in F. $ is said to be twice differentiate at x if for all x G Rn
such that x -f- x G r we have that
Appendix D
(i)
(ii)
D.I
Remark
The numbers dd(x)/dxi, i' = 1, . . . , n, are also called the first
partial derivatives of 6 at x, and d*Q(x)/dXi dx, i,j = 1, . . . , n, are also
called the second partial derivatives of 6 at x. In an analogous way we
can define /cth partial derivatives of 6 at x.
Remark
Let 6 be a numerical function defined on an open set F C Rn X Rk
which is differentiable at (x,y) (E T. We define then
and
108
Nonlinear Programming
D.3
and
204
References
Abadie, J.: On the Kuhn-Tucker Theorem, in J. Abadie (ed.), "Nonlinear Programming," pp. 21-36, North Holland Publishing Company, Amsterdam, 1967.
Anderson, K. W., and D. W. Hall: "Sets, Sequences and Mappings, the Basic
Concepts of Analysis," John Wiley & Sons, Inc., New York, 1963.
Arrow, K. J., and A. C. Enthoven: Quasiconcave Programming, Econometrica
29:779-800 (1961).
Arrow, K. J., L. Hunvicz, and H. Uzawa (eds.): "Studies in Linear and Nonlinear Programming," Stanford University Press, Stanford, Calif., 1958.
Arrow, K. J., L. Hurwicz, and H. Uzawa: Constraint Qualifications in Maximization Problems, Naval Research Logistics Quarterly 8: 175-191 (1961).
Bartle, R. G.: "The Elements of Real Analysis," John Wiley & Sons, Inc., New
York, 1964.
Berge, C.: "Topological Spaces," The MacMillan Company, New York, 1963.
Berge, C., and A. Ghouila Houri: "Programming, Games and Transportation
Networks," John Wiley & Sons, Inc., New York, 1965.
Birkhoff, G. and S. Maclane: "A Survey of Modern Algebra," The MacMillan
Company, New York, 1953.
Bohnenblust, H. F., S. Karlin, and L. S. Shapley: "Solutions of Discrete, Twoperson Games," Contributions to the Theory of Games, vol. I, Annals of Mathematics Studies Number 24, Princeton, 1950, 51-72.
Bracken, J., and G. P. McCormick: "Selected Applications of Nonlinear Programming," John W T iley & Sons, Inc., New York, 1968.
Brondsted, A.: Conjugate Convex Functions in Topological Vector Spaces,
Matematisk-fysiske Meddelel ser udgivet af Dei Kongelige Danske Videnskabernes
Selskab, 34(2): 1-27 (1964).
Browder, F. E.: On the Unification of the Calculus of Variations and the Theory
of Monotone Nonlinear Operators in Banach Spaces, Proc. Nat. Acad. Sci. U.S.,
66: 419-425 (1966).
Buck, R. C.: "Advanced Calculus," McGraw-Hill Book Company, New York,
1965.
Canon, M., C. Cullum, and E. Polak: Constrained Minimization Problems in
Finite-dimensional Spaces, Society for Industrial and Applied Mathematics Journal
on Control, 4: 528-547 (1966).
Carathe'odory, C.: tlber den variabilitatsberich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehmen, Mathematische Annalen 64: 95-115
(1907).
205
Nonlinear Programming
Charnes, A., and W. W. Cooper: "Management Models and Industrial Applications of Linear Programming," vols. I, II, John Wiley & Sons, Inc., New York,
1961.
Cottle, R. W.: Symmetric Dual Quadratic Programs, Quarterly of Applied Mathematics 21: 237-243 (1963).
Courant, R.: "Differential and Integral Calculus," vol. II, 2d ed., rev., Interscience Publishers, New York, 1947.
Courant, R., and D. Hilbert: "Methods of Mathematical Physics," pp. 231-242,
Interscience Publishers, New York, 1953.
Dantzig, G. B.: "Linear Programming and Extensions," Princeton University
Press, Princeton, N.J., 1963.
Dantzig, G. B., E. Eisenberg, and R. W. Cottle: Symmetric Dual Nonlinear
Programs, Pacific Journal of Mathematics, 15: 809-812 (1965).
Dennis, J. B.: "Mathematical Programming and Electrical Networks," John
Wiley & Sons, Inc., New York, 1959.
Dieter, U.: Dualitat bei konvexen Optimierungs(Programmierungs)Aufgaben, Unternehmensforschung 9: 91-111 (1965a).
Dieter, U.: Dual External Problems in Locally Convex Linear Spaces, Proceedings of the
Colloquium on Convexity, Copenhagen, 52-57 (1965b).
Dieter, U.: Optimierungsaufgaben in topologischen Vektorraumen I: Dualitatstheorie,
Zeitschrift fur \Vahrscheinlichkeitstheorie und Verwandte Gebiete, 5: 89-117
(1966).
Dorn, W. S.: Duality in Quadratic Programming, Quarterly of Applied Mathematics, 18: 155-162 (1960).
Dorn, W. S.: Self-dual Quadratic Programs, Society for Industrial and Applied
Mathematics Journal on Applied Mathematics, 9: 51-54 (1961).
Duffin, R. J.: Infinite Programs, in [Kuhn-Tucker 56], pp. 157-170.
Fan, K., I. Glicksburg, and A. J. Hoffman: Systems of Inequalities Involving
Convex Functions, American Mathematical Society Proceedings, 8: 617-622 (1957).
Farkas, J.: Uber die Theorie der einfachen Ungleichungen, Journal fur die Heine
und Angewandte Mathematik, 124: 1-24 (1902).
Fenchel, W.: "Convex Cones, Sets and Functions," Lecture notes, Princeton
University, 1953, Armed Services Technical Information Agency, AD Number
22695.
Fiacco, A. V.: Second Order Sufficient Conditions for Weak and Strict Constrained Minima, Society for Industrial and Applied Mathematics Journal on
Applied Mathematics, 16: 105-108 (1968).
106
References
Nonlinear Programming:
Beferencea
Nonlinear Programming
210
References
Nonlinear Programming
Tyndall, W. F.: An Extended Duality Theorem for Continuous Linear Programming Problems, Notices of the American Mathematical Society, 14: 152-153 (1967).
Uzawa, H.: The Kuhn-Tucker Theorem in Concave Programming, in [Arrow
et al. 58], pp. 32-37.
Vajda, S.: "Mathematical Programming," Addison-Wesley Publishing Company,
Inc., Reading, Mass., 1961.
Valentine, F. A.: "Convex Sets," McGraw-Hill Book Company, New York, 1964.
Van Slyke, R. M., and R. J. B. Wets: "A Duality Theory for Abstract Mathematical Programs with Applications to Optimal Control Theory," Mathematical
Note Number 538, Mathematics Research Laboratory, Boeing Scientific Research
Laboratories, October, 1967.
Varaiya, P. P.: Nonlinear Programming in Banach Space, Society for Industrial
and Applied Mathematics Journal on Applied Mathematics, 16: 284-293 (1967).
von Neumann, J.: Zur Theorie der Gesellschaftsspiele, Mathematische Annalen, 100:
295-320(1928).
Whinston, A.: Conjugate Functions and Dual Programs, Naval Research
Logistics Quarterly, 12: 315-322 (1965).
Whinston, A.: Some Applications of the Conjugate Function Theory to Duality,
in [Abadie 67], pp. 75-96.
Wolfe, P.: A Duality Theorem for Nonlinear Programming, Quarterly of Applied
Mathematics, 19: 239-244 (1961).
Zangwill, W. I.: "Nonlinear Programming: A Unified Approach," Prentice-Hall,
Inc., Englewood Cliffs, N.J., 1969.
Zarantonello, E. H.: "Solving Functional Equations by Contractive Averaging,"
Mathematics Research Center, University of Wisconsin, Technical Summary
Report Number 160, 1960.
Zoutendijk, G.: "Methods of Feasible Directions," Elsevier Publishing Company,
Amsterdam, 1960.
Zukhovitskiy, S. I., and L. I. Avdeyeva: "Linear and Convex Programming,"
W. B. Saunders Company, Philadelphia, 1966.
ata
Indexes
Name Index
Abadie, J., 97, 99, 100, 205
Almgren, F. J., 62
Anderson, K. W., 3, 205
Arrow, K. J., 102, 103, 151, 205
Avdeyeva, L. I., 212
Nonlinear Programming
216
Subject Index
Alternative:
table of theorems of, 34
theorems of, 27-37
Angle between two vectors, 8
Axiom of real number system, 188
Ball:
closed, 183
open, 182
Basis, 178
Bi-nonlinear function, 149
Bolzano-Weierstrass theorem, 189
Bounded function, 196
Bounded set, 188
Bounds:
greatest lower and least upper,
187
lower and upper, 187
Caratheodory's theorem, 43
Cauchy convergence, 188
Cauchy-Schwarz inequality, 7
Closed set, 183
Closure of a set, 183
Compact set, 189
Concave function, 56
differentiable, 83-91
minimum of, 73
strictly, 57
strictly concave: and differentiable,
87-88
and twice differentiable, 90-91
twice differentiable, 88-90
Concave functions, infimum of, 61
Constraint qualification:
Arrow-Hurwicz-Uzawa, 102
modified, 172
weak, 154
with convexity, 78-81
with differentiability, 102-105,
154-156, 171-172
Karlin's, 78
Kuhn-Tucker, 102, 171
Constraint qualification:
reverse convex, 103
weak, 155, 172
Slater's, 78
weak, 155
strict, 79
Constraint qualifications, relationships
between, 79, 103, 155
Continuous function, 191-192
Convex:
and concave functions, 54-68
generalized, 131-150
and pseudoconvex functions, relation
between, 144, 146
Convex combinations. 41
Convex function, 55
continuity of, 62
differentiable, 83-91
at a point, 83
on a set, 84
strictly, 56
strictly convex: and differentiable,
Q7 fifi
oi~
oo
Nonlinear Programming
ais
Gordan's theorem:
generalized to convex functions, 65
Gradient of a function, 201
Halfspace, 40
Heine-Borel theorem, 189
Hessian matrix, 202
Implicit function theorem, 204
Inequalities, linear, 16-37
Infimum, 187, 195
of a numerical function, 196
Interior of a set, 183
Interior point, 182
Jacobian matrix, 202
Kuhn-Tucker saddlepoint necessary
optimality theorem, 79
in the presence of equality constraints, 80
Kuhn-Tucker saddlepoint problem
(KTSP), 71
Kuhn-Tucker stationary-point
necessary optimality criteria, 105,
111, 112, 156, 173
Kuhn-Tucker stationary-point
problem (KTP), 94
Kuhn-Tucker sufficient optimality
theorem, 94, 153, 162
Lagrange multipliers, 71
Lagrangian function, 71
Limit, 186
Limit point, 186
Line, 38
segments, 39
Linear combination, 6
nonnegative, 7
Linear dependence, 6
Subject Index
Nonlinear Programming
120
Sets:
intersection of, 4
product of, 4-5
sum of, 45
union of, 4
Simplex, 42
Slater's theorem, 27
Stiemke's theorem, 32
Subspace, 40
Sum of two convex sets, 45
Supremum, 187, 195
of a numerical function, 196
Symbols, 5
Taylor's theorem, 204
Triangle inequality, 8, 41
Tucker's theorem, 29
Twice differentiable numerical
function, 202
Uniqueness of minimum solution, 73
Vector, 6
addition, 6
multiplication by a scalar, 6
norm of, 7
Vector function, 12
Vector space, fundamental theorem of,
177
Vectors:
angle between two, 8
scalar product of, 7
Vertex, 41