Beruflich Dokumente
Kultur Dokumente
on
Continuous Optimization
Song Wang
School of Mathematics & Statistics
The University of Western Australia
swang@maths.uwa.edu.au
Chapter 1
For any industrial optimization study, the first task is to assess all the factors and their relationships, requirements, and objective about the problem.
Then, mathematical representations are constructed. This is called model construction, which is the first stage towards formulating an optimization problem.
Model construction is one of the most important and interesting exercises in optimization studies. If an industrial problem is too complicated, we would need
to make as many simplifications as possible in the construction of the model, as
long as the answers remain realistic and can be used for the purposes for which
they are intended. There are three types of simplifications: (i) assumptions;
(ii) approximations; and (iii) estimations. To some extend, these three types of
simplifications can be overlapping. Some further details are as follows:
m(i, j) = kd(i, j)
The feasible region is the set which consists of all those x = [x1 , x2 , ..., xn ]
Rn such that the above constraints are satisfied.
Some examples are given below.
Example 1.2.1.
minimize f (x1 , x2 )
subject to x1 + 2x2 = 0
x1 1
x2 1
The feasible region is:
n
o
x1
D = [x1 , x2 ]T R2 : x2 = , 1 x1 2
2
Example 1.2.2.
min f (x1 , x2 , x3 )
subject to x1 0
x2 0
x3 0
(x1 + x2 + x3 ) 0
The feasible region is: the tetrahedron with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0),
(0, 0, 1).
3
f (x) is a function
Chapter 2
p = 160 0.01q,
where
q
is
the
number
of
units
each week and the price and cost are
min f (x)
(2.0.1) measured in dollars. Then, we produced
have
subject to a x b
(2.0.2)
Total revenue = pq = 160q 0.01q
where the objective function f is to be minimized subject to the constraint
A farmer has hired a farm worker to put up a fence of 100
(2.1b). If a = and b = , then we have an unconstrained one-dimensional Example 2.1.3.
meters so as to enclose a rectangular region along a river. What is the largest
optimization problem, which can be re-stated as :
area that can be enclosed? Clearly,
min f (x)
subject to x R
where R denotes the real line.
A = xy
(2.0.3)
(2.0.4) Thus, the optimization problem is:
max xy = A
subject to 2x + y = 100
max A = x(100 2x) = 100x 2x2
Some Examples
Note that
Let us, first of all, consider the following 6 examples.
A = 2(x2 50x + 252 252 )
Example 2.1.1.
Consider a situation, where a company manufactures some
commodity (Q). Let q be the units of the commodity manufactured, let p =
= 2(x 25)2 + 2 252
p(q) be the sale price (which depends on q), and let c = c(q) be the total cost of
1250
production (which also depends on q). Furthermore, let T = T (q) be the total
profit made by the company through the sale of the q units of the commodity Suppose x = 25. we have A = 1250, i.e.,
(Q). Clearly, the total profit = total revenue (pq) total cost. Thus, we have
Amax = 1250
T = pq c.
Example 2.1.4.
A tracking device located at (1, 0) is used to track a missile
which is descending along the path y 2 = x, (y 0). What angles of elevation
must the device be capable of?
tan =
y
1+x
max
= tan1
subject to x y 2 = 0
y0
y
1+x
=
=
Equivalently
(
max
= tan1
y
1 + y2
AC
CB
+
Rw
Rr
2
1
3x
1
1
1+x
+
= (1 + x2 ) 2 + (3 x)
3
5
3
5
1
1
1
2 2
minimize T = (1 + x ) + (3 x)
3
5
subject to 0 <
subj. to y 0
xopt
D
.
R
x3
3
, Topt 52 minutes
4
xopt =
Optimal No of trees + 12 + 25
4
= 48 min .
5
N.B. This problem should be formulated as a discrete optimization problem as (i.e., straight line connecting any two points in [a, b] lies above the graph)
x only takes 1, 2, 3, . . .
Example 2.1.6.
Tom is out of petrol at the location A in the figure. TomNotes: (i) If the inequality is strict, for all (0, 1) is strictly convex.
(ii) If the equality holds everywhere, then the graph must be a straight
thinks about how to reach the location B in the shortest possible time. He is
capable of hiking 3mph through the woods and jogging 5mph on the road.
line a straight line is convex.
5
Stationary point
Any point at which
df (x)
=0
dx
A function f defined on [a, b] is said to attain a strict local minimum value
at x0 [a, b] if there exists an > 0 such that
f (x0 ) < f (x)
x N (x0 ) \ {x0 }, where N (x0 ) is called the -neighbourhood of x0 , and is
defined
N (x0 ) = {x [a, b] : |x x0 | < }.
A function f defined on [a, b] is said to attain a strict global minimum
value at x0 [a, b] if
f (x0 ) < f (x)
x [a, b] \ {x0 }.
A function f defined on [a, b] is said to attain a local minimum value at
x0 [a, b] if there exists an > 0 such that
Analytic Methods
f (x0 ) f (x)
Theorem 2.1 Let f be a function depends on (a, b) such that f 0 also continuous
on (a, b). A necessary condition for x (a, b) to be a local optimum is that
x N (x0 ).
df (x )
A function f defined on [a, b] is said to attain a global minimum value at
= (i.e. x is a stationary points).
dx
x0 [a, b] if
f (x0 ) f (x)
Example2.3.1
x [a, b].
f (x) = 3x3 36x
Critical Point
f 0 (x) = 9x2 36 = 0 x = 2
local maximum and minimum
Example 2.3.2
and a minimum at x if
(n)
= (f (z) f (p)) < 0
f (x ) > 0
Note is arbitrary. We may choose sufficiently small such that p + (z p)
N (p). This violates the assumption that p is a local minimum.
Example 2.3.3
f (x)
= x4 4x3 + 6x2 4x + 1
=
df
dx
d3 f
dx3
d2 f
df
Example 2.3.5.
f (x) = x2 , 1 x 1. f convex;
= 2x,
=2>
dx
dx2
and n = 2 which is even x = 0 is a local minimum. But f is a x = 0 is
also a global minimum.
(x 1)4
4(x 1)3 ,
24(x 1),
d2 f
= 12(x 1)2 ,
dx2
d4 f
= 24
dx4
Theorem 2.4 The global maximum of a convex function f (x) over a closed
interval a x b is either at x = a or x = b or both.
Note that similar results apply to concave functions.
f (x) = x2 , 1 x 2 maximum at x = 2.
Example 2.3.6.
Example 2.3.7.
Example 2.3.8.
x = 1.
d4 f
dx4
d4 f
minimum since n = 4 is even and
= 24 > 0.
dx4
Example 2.3.4
f (x) = (x 1)3
First non-vanishing derivative at x = 1 is
d3 f
= 6,
dx3
f (x) = x2 ,
f (x) = x2 ,
2 x 1 maximum at x = 2.
1 x 1 maximum at x = 1 and
n odd
But
x = 1 is not a local min
7
4.1
Function values given only at set of n points (nothing 4.4 (I) Equal Interval Search (dichotomous search)
else know)
This is shown in the following figure.
Method:
est.
4.2
Compare function values and the one which gives rise to the small-
ba
.
n
m = 0, 1, , n.
1
1
+
and f
.
2 2
2 2
4.3 Function is unimodal and piecewise continuous on a
1
1
1
If f
+
<f
, reject 0,
the interval of uncertainty is
finite interval
2 2
2 2
2
(We make use of the methods to be discussed below even we do not know if the reduced to 1 + = 1 + 2 1 .
2
2
2
function is unimodal provided we are happy with a local minimum.)
Repeat this procedure by placing two points apart at the midpoint of remaining interval. the interval , uncertainty is reduced to
Definition 2.2 A function f of a single variable x on an interval [a, b] is said
1 1+
1 + 3
1 41
+ =
= +
4.5
1 + 7
1 81
+ =
= +
2
4
2
8
8
8
2n + (1 2n ).
Stage 1
x1 = b 2 L1
x2 = a + 2 L1
where
2 =
Fn2
Fn2
1
=
<
Fn
Fn1 + Fn2
2
f (x2 ) <
f (x2 ) >
Stage 2
suppose
f (x3 ) > f (x2 ) x3 xm
f (x1 ) a xmin x1
f (x1 ) x2 xmin b
Fn2
Fn
We define L2 = L1 L1 2 = L1 (1 2 )
where L1 = b a and 2 =
Fn Fn2
Fn3
= L1
Fn
Fn
2Fn2
= L1 22 L1 = L1 1
Fn
Fn 2Fn2
= L1
Fn
Fn1 + Fn2 2Fn2
Fn1 Fn2
= L1
= L1
Fn
Fn
Fn1 Fn3
= L1
Fn
Fn1
= 3 L2
= L1
General step
x1 x2
L1
Fn3
Fn
= Lk Lk k+1
= Lk (1 k+1 )
Fn(k+1)
= Lk 1
Fn(k+1)
Fnk+1 Fnk
= Lk
Fnk+1
Fnk
= Lk
Fnk+1
=
=
Lk
=
=
Fn(k1)
Fn(k1)+1
Fnk+1
Fnk+1
= Lk1
Lk1
Fnk+1+1
Fnk+2
Lk1
Fnk+1 Fnk+2
Fn1
L1
Fnk+2 Fnk+3
Fn
Fnk+1
L1
Fn
Lk
Fnk+1
=
L1
Fn
Take K = n
Ln
L1
Ln
10
F1
Fn
F1
1
L1 =
L1
Fn
Fn
Ratio of reduction
Limit of
Ln+1
Fn
=
Ln
Fn+1
Fn
Fn+1
Fn+1 = Fn + Fn1
=
+1
Fn
Fn1
Fn1
Let n ,
f (3.75) = 6.44
f (6.25) = 3.56
Fn+1
Fn
t,
t
Fn
Fn1
Stage 2:
t2 = t + 1
or t2 t 1 = 0
1 1+4
1.618 (positive root)
=
2
Fn
1
= 0.618
Fn+1
t
3 =
F53
F2
2
=
=
F51
F4
5
Example
minimize f (x) = x2 6x + 2
subject to 0 x 10
to 15% of the original interval (10 units) i.e. the final interval of uncertainty is
wit.
Ln
f (2.5)
f (1.25)
Stage 3:
15% of 10.
15 10
L1
10
=
=
=
100
Fn
Fn
15
1
require
<
F5 = 8
Fn
100
Stage 1:
2 =
Fn2
F3
3
=
= ,
Fn
F5
8
L1 = 10
11
Stage 4:
f (2.5)
x2 x1
x3 x2
Lk+2
Lk+1
=
= ratio of interval reduction / step
Lk
Lk+1
Lk+1 = x3 x1 = x4 x2 = Lk
= x4 x3 = Lk Lk+1 = Lk Lk = (1 )L
= (x3 x1 ) (x2 x1 ) = Lk (1 )Lk = (2
=
Also
x3 x2
L1
Fn
2 1 = (1 )
2 + 1 = 0
1 + 5
=
= 0.6180
2
= ratio of Golden section
Note: The new interval of uncertainty is equal to 0.618 of the previous one
COMPARISON
12
f (x2 )
Fnk+1
at kth step
Fn
x5
Fm1
0.618 as m
Fm
L1
total reduction to
after n function evaluations (n steps, i.e., k = n)
Fn
= x3 12.36 0.618
= 7.36 12.36 0.618
= 0.278
x2
x [5, 15]
= x1 + 0.618 20
= 5 + 0.618 20
= 7.36
x6
= x4 0.618 20
= 15 0.618 20
= 2.64
13
= x2 0.618 7.638
= 2.64 4.72
= 2.08
x9
= x6 + 4.72 0.618
= 2.08 + 2.917
= 0.84
= x7 0.618 2.917
= 0.84 1.803
= 0.96
f (x9 )
= x7 + 0.618 1.803
= 0.96 + 1.114
= 0.16
= (0.16)2 = 0.023
f (x9 ) < f (x5 ) xmin [0.28,
5
5.1
Suppose f (x) is smooth such that f 0 and f 00 exist. At any xk [a, b], f (x) can
be approximated locally by the following truncated Taylors expansion of at xk :
1
q(x) = f (xk ) + f 0 (xk )(x xk ) + f 00 (xk )(x xk )2 .
2
14
=
=
=
So, if
1
q 0 (x) = f 0 (xk ) + f 00 (xk ) 2(x xk )
2
f 0 (xk )
xk+1 : = x = xk 00
f (xk )
g(xk ) g(x )
g 0 (xk )
xk x
k1
|xk x | < < 1, we have
2k2
|xk+1 x | < |xk x | (0 < < 1)
The mapping from xk xk+1 is a contraction
xk x 0 or xk x .
5.2
:= f 0 (x) = 0,
g(xk )
.
= xk 0
g (xk )
f 00 (xk )
x [0, 1.5]
g(xk )
, k = 0, 1,
g 0 (xk )
Newtons method uses information at only xk and thus needs f (xk ), f 0 (xk ) and
f 00 (xk ).
If use more points, then less information is required at each point.
We now replace f 00 (xk ) in the Newtons formula by
Problem
Write a small C or Matlab program to implement the Newtons
method for an arbitrary f (x), and test your code with
f (x) = x4 4x + 1
(2.5.5)
f 0 (xk1 ) f 0 (xk )
.
xk1 xk
f 0 (xk1 ) f 0 (xk ) (x xk )2
xk1 xk
2
xk1
0
f (xk1 )
xk
f 0 (xk )
Using the
g[xk1 , xk ] = g 0 (x ) + O(k2 ),
k = xk x
Similarly
g[xk1 , xk ] g[xk , x ]
1 00
2
g (x )(xk1 x ) + 0(k1
)
2
g 00 (x )
2
(xk x )(xk1 x ) + 0(k2 ) + 0(k1
)
2g 0 (x )
M (xk x )(xk1 x )
|xk+1 x | =
where M = g 00 (x )/2g 0 (x )
Letk = M (xk x ). We have
Let
k+1
= M (xk+1 x )
= M 2 (xk x )(xk1 x )
= k k1
xk xk1
= xk g(xk )
g(xk ) g(xk1 )
which is the Fibonacci difference equation. From the previous discuss we know
that
yk+1
yk
PROOF. Similar to that for the Newtons method. Let g[x, y] denote the divided
difference defined by
g[x, y] =
g(y) g(x)
,
yx
1
= 1.6
0.618
log k+1
1
log k
0.618
k+1
1, k+1 1.618
k
1.618
k
(g[x, x] = g 0 (x)) .
1.618
|xk+1 x | |xk x |
Then,
xk+1 x
xk xk1
g(xk ) g(xk1 )
g(xk ) g(x )
1
= (xk x ) (xk x )
xk x
g[xk1 , xk ]
g[x
,
x
]
g[x
,
x
]
k1
k
k
= (xk x )
g[xk1 , xk ]
= xk x g(xk )
6
6.1
xmin (x0 , x2 )
+ bx0 + c = f (x0 )
= f0 ,
(2.6.6)
+ bx1 + c = f (x1 )
= f1 ,
(2.6.7)
+ bx2 + c = f (x2 )
= f2 .
(2.6.8)
Estimate the minimum of f (x) as the minimum of the quadratic. That is,
y = ax2 + bx + c,
b
dy
= 0 = 2ax + b x =
dx
2a
Solve Eqs. (2.6.6), (2.6.7) and (2.6.8):
a(x21 x20 ) + b(x1 x0 )
= f1 f0
= f2 f0
f1 f0
b
(x1 x0 ) =
(x21 x20 )
a
a
b
f2 f0
(x2 x0 ) =
(x22 x20 )
a
a
(2.6.9)
xm =
Need to test whether the new point is minimum or not. That is,
d2 y
= 2a > 0
dx2 x=xm
Recall
a(x21 x20 ) + b(x1 x0 )
= f1 f0
(2.6.13)
a(x22
= f2 f0
(2.6.14)
x20 )
+ b(x2 x0 )
(2.6.13)(x2 x0 ) - (2.6.14)(x1 x0 )
a =
(2.6.10)
(2.6.11)
f1 f0
(2.6.11)
f2 f0
b (x2 x0 )(f1 f0 )
(x2 x20 )(f1 f0 )
f1 f0
2
=
a
f2 f0
a
f2 f0
b
1 f0 (x21 x22 ) + f1 (x22 x20 ) + f2 (x20 x21 )
=
2a
2 f0 (x1 x2 ) + f1 (x2 x0 ) + f2 (x0 x1 )
(2.6.10)-(2.6.12)
4. If xm is within the small prescribed distance from x1 , take
(x2 x0 )(f1 f0 )
(x22 x20 )(f1 f0 )
b
2
2
x1 x0
= (x1 x0 ) +
min{f (xm ), f (x1 )}
a
f2 f0
f2 f0
b
a
17
6. x = x1 and x1 [xm , x2 ]. Discard x0 and replace it by xm . Return to 6.2 Davidsons Cubic Interpolation Method
Step 2
It is generally better than Powells method if the derivatives of f (x) are easy
7. x = xm and xm [x0 , x1 ]. Discard x2 and re-name the remaining 3 points to evaluate.
as:
Consider the problem:
x0 = x0 , x1 = xm , x2 = x1
min f (x) along x = x0 + ,
Return to Step 2.
where x0 is the current point
8. x = xm and xm [x1 , x2 ]. Discard x0 and re-name the remaining 3 points
as:
Let f0 = f (x0 ) and f = f (x0 + ), is a given value of .
x0 = x1 , x1 = xm , x2 = x2
Suppose we know:
Return to step 2.
df
4
G
=
= f00 and G0 < 0
Example
f (x) = x 4x + 1.
0
d =0
Choose x0 = 0, h = 0.5, = 0.02
df
= f0
G =
x
f (x)
d =
x0 = 0.5
0
1
bracket min
0.5 0.9375
x1 = 1
[Note: To cover case where G0 > 0, i.e. minimum to the left, use f = f (x0 ).]
is -2
1.5
0.0625
(a) Order of magnitude of m the minimizing value of established.
1 0.9375 (1)2 (1.5)2 + (2) (1.5)2 (0.5)2 + (0.0625) (0.5)2 (b) Upper and lower bounds found m .
xm =
2
0.9375(1 1.5) + (2)(1.5 0.5) + 0.0625(0.5)
(c) Cubic interpolation used for more precise bounds.
1 0.9375 1.25 2 2 0.0625 0.75
=
(a) Initial Approximation to m .VIZ
2 0.9375 0.5 2 1 0.0625 0.5
0.9375 1.25 4 0.0625 0.75
=
2(f0 fe )
0.9375 4 0.0625
= min K,
G0
= 0.92
K = some representative magnitude for the problem usually K = 2
f (xm ) = 1.9636071
fe = preliminary estimates (low rather than high)of f (x0 + m )
Check conditions for minimum: Reject x = 0.5
(even though f (0.5) is a
bracket minimum)
Note
If a function to be minimized a quadratic, then
x0 = 0.92,
x1 = 1,
f0 = 1.9636071, f1 = 2,
x2 = 1.5
f2 = 0.0625
=
=
=
2(f0 fe )
= m
G0
1 1.9636071 1 1.52 + (2) 1.52 0.922 + 0.0625(0.92)2 1)
2
1.9636071(1 1.5) + (2)(1.5 0.92) + 0.0625(0.5)
1 1.9636071 1.25 2 1.4036 0.0625 0.1536
2 1.9636071 0.5 2 0.58 0.0625 0.08
0.98880519 f (xm ) = 1.99925
(*)
Now, look at
2(f0 fe )
:
G0
Clearly, (2.6.19) and (2.6.20) satisfy (2.6.15) and (2.6.17) respectively. Setting
= and using (2.6.16) and (2.6.18) we have
b2
2 ax20 + bx0 + c c +
4a
=
2ax0 + b
(2ax0 + b)
4a2 x20 + 4abx0 + b2
=
=
2a (2ax0 + b)
2a (2ax0 +
2ax0 + b
b
=
= x0
2a
2a
is valid
2(f0 fe )
G0
2 y2 + 3 y3
2y2 + 32 y3
= f f0 G0
= G G0
(b)
= (G0 + Z)
1
(G0 + G + 2Z)
=
3
3
(f0 f ) + G0 + G
(2.6.20) is then
y 0 () = G0 2(G0 + Z)
2
+ (G0 + G + 2Z) 2
To find m , set y 0 () = 0
m
=
=
where
W = (Z 2 G0 G ) 2
Which sign?
Consider
(2.6.15)
(2.6.16)
(2.6.17)
(2.6.18)
Interpolation Formula:
Assume
y() = f0 + G0 + y2 2 + y3 3
y () = G0 + 2y2 + 3y3
(2.6.19)
(2.6.20)
2
y 00 () = (G0 + Z) + 2(G0 + G + 2Z) 2
so
y 00 (m )
1
G0 + Z (G0 + Z)2 G0 (G0 + G + 2Z) 2
G0 + G + 2Z
G0 + Z W
G0 + G + 2Z
2
m
= (G0 + Z) + 2(G0 + G + 2Z) 2
2W
=
G0 + G + 2Z
m
G + W Z
=1
G G0 + 2W
19
(2.6.21)
=
=
=
Example
G + W Z
G G0 + 2W
G G0 + 2W G W + Z
G G0 + 2W
W G0 + Z
G G0 + 2W
for = 0.001.
Solution
W G0 + 2W 2 = G0 W G20 + G0 Z + G W G
2.
+G Z + 2ZW 2ZG0 + 2Z 2
= f (2) = (2)4 4 2 + 1 = 9,
= f 0 (2) = 4 (2)3 4 = 28 > 0
f2
2W 2 = 2(Z 2 G0 G )
validity by (2.6.21)
G2
5.
G2 + W Z
m
=1
2
G2 G0 + 2W
Algorithm
1.
2.
3.
4.
5.
6.
7.
(Normally use K = 2)
G = f0 (x0 + )
If G > 0 or if f > f0 ,
to step 5; otherwise go to
step 4.
Replace by 2, evaluate
where
(a)
(b)
and
W
3
(f0 f ) + G0 + G
3
=
(1 9) + (4) + 28
2
= 12 4 + 28 = 12
=
=
=
28 + 16 12
32
1
m
=1
=1
=
2
28 + 4 + 32
64
2
(c)
m
G1
Return to
20
= 1 f (0 + 1) = 2 and
= f 0 (0 + 1) = 0
5. on interval [0, 1]
m
1
G1 + W Z
G1 G0 + 2W
3
(f0 f ) + G0 + G , = 1 =
3
(1 (2)) + (4) + 0 = 5
1
1
1
(Z 2 G0 G1 ) 2 = (25 (4)(0)) 2 = 5
m
0+55
0
=1
=1
1
0 (4) + 10
14
m = 1, as before stop
Minimum at x = 1, f (1) = .
Check Analytically:
df
= 4x3 4 = 0
dx
x = 1
d2 f
= 12x2 x=1
2
dx x=1
= 12 > 0
x = 1 is minimum.
21
Chapter 3
Introduction
min f (x)
xRn
(i) gi (x) 0,
i = 1, 2, . . . , m
200
150
100
50
0
(ii) gi (x) = 0,
i = m + 1, m + 2, . . . , r,
-10
subject to
10
5
where x = (x1 , x2 , . . . , xn )T . The constraints (i) and (ii) form a feasible region
for the optimization problem. We denote it by .
Global minimum: x is said to be a global minimum if
f (x ) f (x)
-5
10
-10
x .
-5
x \{x }.
Convex Set. A set C is said to be convex if for any two points in C the line
segment joining the two points is also in C. Mathematically, for any x, y C,
x + (1 )y C, [0, 1].
Convexity. A function f (x) is said to be convex in a convex set X Rn if
f (x + (1 )y) f (x) + (1 )f (y)
ai fi (x)
1
f (y) = f (x) + (g(x))T h + hT G(x + (1 )y)h
2
for a (0, 1), where G(z) denotes the Hessian of the function f defined by g,
2f
2f
x2
xy
G=
2f
2f
i=1
is also convex on X.
4. If f (x) is convex on Rn , then the set
xy
:= {x Rn : f (x) b, b R}
y 2
is a convex set.
PROOF. Let x, y , i.e.,
g(x0 ) = 5f (x0 ) = 0.
f (x) b
Consider
(3.1.1)
and f (y) b.
z = x + (1 )y,
(3.1.2)
This is equivalent to
f (x0 )
=0
xj
[0, 1].
j = 1, 2, . . . , n.
f (x + (1 )y)
f (x) + (1 )f (y)
b + (1 )b
b.
Taylors Theorem
If f (x) is continuous and has continuous first partial derivatives over an open PROOF. At x0 we have from (3.1.1)
convex set X in Rn , where
1
f (x0 + h) f (x0 ) = hT G(x0 + (1 )(x0 + h))h
x = (x1 , . . . , xn )
2
(3.1.3)
then for any two points x and y = x + h in X there exists a , 0 1, such because g(x0 ) = 0. Assume that the second partial derivatives of f are continthat
uous. Then, there exists an > 0 such that both
f (y) = f (x) + g(x + (1 )y) h
2 f (x0 )
2 f (x0 + (1 )(x0 + h))
where
and
g(z) = 5f (z) z X
xi xj
xi xj
and
h = (h1 , h2 , . . . , hn ).
P =
P =
P11
Pn1
P12
Pn2
P1n
Pnn
P11
Example
(xT P x)T = xT P T x
P + PT
2
x
1
Clearly, (P + P T ) is symmetric. Thus, we may just as well assume that P is
2
symmetric.
Now, let us go back to the quadratic function
P1n
P2n
Pnn
P1n
P2n
Pnn
P
> 0, 11
P12
P12
P22
P11
P12
> 0, . . . , .
..
P1n
P12
P22
..
.
P1n
P2n
..
.
P2n
Pnn
>0
10
1 2
4 1 .
P = 1
2 1 1
P is positive definite.
Example Find minimum of
f (x) = 2x21 + 3x22 + 4x23 8x1 12x2 24x3 + 10
f
= 4x1 8 = 0 x1 = 2
x1
f
= 6x2 12 = 0 x2 = 2
x2
f
= 8x3 24 = 0 x3 = 3
x3
V (x) = xT P x
and assume that P is symmetric,
P11 P12
P22
P =
P1n P2n
P2n
xT P x = xT P T x
V (x) = xT
But,
P12
P22
P11
P12
P1n
Thus,
2
x0 = 2
3
2f
2f
= 4,
= 0 for i 6= j,
2
x1
xi xj
4 0 0
G= 0 6 0
0 0 8
2f
2f
= 6,
= 8,
2
x2
x23
Note: f (x) is concave on X iff (if and only if) f is convex on X. All the
results here can be extended to concave functions.
Let us consider the problem: Find minimum of f (x), x where is the
feasible region.
x1
x = ...
xn
Theorem 3.2 Let f (x) be a convex function over a closed set X in Rn . Then, Method depends again on what are known about the function f (x).
any local minimum of f (x) in X is also the global minimum of f (x) over X.
Two Types of Methods
PROOF. Let x0 be a local minimum. Assume x0 is not a global minimum.
(1) Methods requiring only function values.
Then, we can find y such that
(2) Methods using gradient information much more efficient.
Now, consider
f (x0 + (1 )y) f (x0 ) + (1 )f (y)
<
f (x0 ) + (1 )f (x0 )
= f (x )
2.1
Exhaustive Search
[0, 1]
min f (x)
x=
x1
x2
can take long time essential to use most efficient line search method such as
Fibonacci. But even then, very inefficient.
subject to
0 x1 b1
0 x2 b2
Set h1 =
b1
,
m1
Example
max f (x1 , x2 ) = 10 2(1 x1 )2 (1 x2 )
0
Use Univariate method, starting point x(0) =
0
Solution
First, search along x1 direction the objective function becomes
(0)
(0)
f (x1 + 1 , x2 )
= f (1 , 0) = 10 2(1 1 )2 1
=
9 2(1 1 )2
Clearly,
1
1 maximizes
1
=
.
0
x(1)
2.2
f (1 , 0)
Univariate method
min f (x),
x1
x = x2
x3
(1)
f (x1 , x2 + 2 )
2
x(2)
10 (1 2 )2
1 maximizes
1
=
.
1
f (1, 2 )
(1 iteration)
now, come back to search along x1 direction (i.e. we start the second iteration)
26
(2)
(2)
f x1 + 1 , x2
= f (1 + 1 , 1)
=
10 2(1 1 1 )2 (1 1)2
10 221
1
x(3)
0 minimizes f (1 + 1 , 1)
1
=
.
1
=
= f (1, 1 + 2 )
=
10 2(1 1)2 (1 1 2 )2
10 22
1
(0)
1
(0)
x1 + h
x
+
h
1
(2)
2
2 and x =
3
(0)
(0)
x2
x2 +
h
2
2
x(4)
0 maximizes
1
=
.
1
f (1, 1 + 2 )
triangle.
(2nd iteration)
4. Find point with largest function value (say, x(1) ).
Since
1
1
2.3
Simplex Method
7. Go back to step 4.
Difficulties
(a) Oscillationoccur if
h
i
(0) (0)
1. Start with any point x(0) = x1 x2
Also
f (x(k+3) )
>
(k+2)
=x
2.4
Pattern Search
min f (x)
subject to x Rn
(0)
(2)
(0)
T1
(0)
B + D(1) , if f (B (0) + D(1) ) < min{f (B (0) ), f (B (0) D(1) )}
=
B (0) D(1) , if f (B (0) D(1) ) < min{f (B (0) ), f (B (0) + D(1) )}
(0)
B ,
if f (B (0) ) < min{f (B (0) + D(1) ), f (B (0) D(1) )}
(1)
(0)
(1)
is
(7)
B (2) = Tn(1)
(0)
Tj
(0)
(0)
(0)
(0)
(j)
(j)
(j)
(0)
(0)
(0)
(0)
Tj1 ,
if f (Tj1 ) < min{f (Tj1 D(j) ), f (Tj1 + D(j) )}
if
f Tn(1) < f B (1) .
(4)
(a)
T0
= B (0) .
The last temporary position is designated to second base point B (1) , i.e.,
(8)
(5)
(2)
B (1) = Tn(0) .
T0
All these exploratory moves which determine the movement from B (a) to
B (1) establish a pattern movement. Now instead of exploring around
B (1) , a similar fashion we assume that the pattern may persist and start
the next temporary search to position not at B (1) but at a point 2(B (1) 1)
from B (0) . Thus,
(2)
(6)
(1)
T0
(9)
f T02 f B (1)
A local exploration is now carried out around T0 , and the equations for
(1)
(0)
determining Tj for j = 1, . . . , n are the same as the equations for Tj with
the superscript 1 replacing 0. Then if the final temporary position
29
To enable the step size to be automatically adj? the step sizes xi are
(k)
halved when no improvement can be made around some T0 and the whole
procedure repeated until the required accuracy is obtained.
B (0) = T0
= [0, 0]T ,
=
=
=
=
f (2, 3) = 4 accept
(4)
> f B (4) , we are returned to the old base as B (5) = B (4) . Would
f T2
now search around B (5) possibly with a smaller step size.
Example
=
=
with
Choose
x1 = x2 = 1,
and let [0, 0]T be the initial base point.
Evaluate base point
(0)
New base point is B (4) = [2, 3]T = B (T ) = B (2) . We may halve the step size
and repeat the procedure until a desired accuracy is achieved.
New base point
B (2) = [2, 3]T .
(2)
Temporary position T0 = 2B (2) B (1) = [4, 6]T [1, 1]T = [3, 5] with
f (3, 5) = 21.
30
(2)
Explore around T0
= [3, 5]T
=
=
=
=
then s(k) is called a descent direction as the function value can always be reduced
in a line search for some > 0.
New base point B (2) = [2, 3]T = B (2) as f (2, 4) > f (2, 3).
(3)
Explore around the base point B (3) = T0 = [2,
min{f (3, 3), f (1, 3)}
= min{5, 5}
= 5 > f (2, 3) reject
slope of f x(k) + s(k) is negative at = 0
Descent Algorithms
Given an initial estimate x(0) , the kth iteration is:
Gradient-type Methods
(iii) find (k) to minimize f x(k) + s(k) with respect to line search
minn f
xR
= f (x(k) )
G(k)
= 2 f (x(k) )
Consider the function f (x) along the line x() = x + s(k) . Then, f (x ())
may be regarded as a function of alone.
slope:
df (x())
d
Remarks
d
T
T
f x(k) + s(k) = sk f x(2) + s(k) = sk f (x())(1) this is a general structure within which most of the good methods lie.
d
(k)
(2) different methods arise from different ways of generating the search direcdf (x())
k T
(k)
k T
=
s
f
x
=
s
g x
tion s(k) .
d =0
31
(3) line search is idealized in that an exact line search is impossible in practice.
A necessary condition for (k) to minimize f x(k) + s(k) is that
df x(k) + s(k)
= 0 at = (k)
d
but
3.1
T
df x(k) + s(k)
= f x(k) + s(k)
s(k)
d
a necessary condition for an exact line search is:
T
T
f x(k) + (k) s(k)
s(k) = g (k+1) s(k) = 0
g (k)
s =
g (k)
.
xRn
choose
s(k) = g (k) =
Remarks:
(1) simple need only values of function and its gradient.
(2) global convergence to a stationary point.
(3) the convergence can be very slow.
Example
min f (x) = 10 + 2(1 x1 )2 + (1 x2 )2
starting point
(0)
x
Result:
0
0
.
Solution
T
g (0) = [4(1 x1 ), 2(1 x2 )]
1/2
(k)
g
= (g (k) )T g (k)
x(0) =
[4, 2]
solves
T (k)
min s g
PROOF.
sT g (k) = ksk
g (k)
cos
0
x(0) g (0)
0
4
4
=
=
0
2
2
subject to ksk = 1.
f x(0) + g (0)
df x(0) + g 0
= 0 72 20 = 0
d
minimizes
f x(1) + g (1)
d2 f x(1) + g (1)
96 2
=
>0
d2
81
(1)
(0)
(0)
(0) =
d f x + g
= 72 > 0
18
(0)
d
10
x(2)
f x(2)
=
=
=
10
4
9
9
5
8
+
9
9
= 10 + 2(1 x1 )2 + (1 x2 )2
2
10 4
5 8
= 10 + 2 1
+
+ 1
9
9
9
9
2
2
1 4
4 8
= 10 + 2 +
+
9
9
9
9
2
1
8 16
16 64 642
= 10 + 2
+
+
+
81
81
81
81
81
81
2
2
96
80 79
=
81
81
81
df x(1) + g (1)
d
=
(1)
2
2
25
25
10 + 2 1
+ 1
27
27
2 2
2
2
+
10 + 2
27
27
8
4
10 +
+
= 9.98355
2
(27)
(27)2
f x(1) + g (1)
25
27
25
27
T
g (1) = [4(1 x1 ), 2(1 x2 )]
x=x(1)
T
4 8
,
=
9 9
x(1) g (1)
10
4
= 9
5 8 =
9
9
10 4
5
= 95 89 12
5 =
+
9 9 12
g (2)
x(2) g (2)
= 27
4
27
8
25
27
= 27
25
4 =
27
27
8
25
+
27 27
25
4
+
27 27
f x(2) g (2)
2
2
25
8
25
4
= 10 + 2 1
t + 1
t
27 27
27 27
2
2
2
8
2
4
= 10 + 2
t +
t
27 27
27 27
4
48
64 2
= 10 + 2
(27)2
(27)2
(27)2
4
16
16 2
+
2
2
(27)
(27)
(27)2
96 2
80
=0
81
81
80
5
=
=
2 96
12
33
df x(2) g (2)
d
288
80
(27)2
(27)2
80
288
d2 f x(2) g (2)
288
(2)
(2)
>0
=
which minimizes f x g
as
2
(2)
d
(27)2
20
25
25
8
80
+
+
1.0082
27 9 27
27 288
x(3) = 27
=
=
25
4
80 25
10
0.9671
+
+
27 27 280
27 9 27
=
0 (2) =
E(x)
1
(x x )T Q(x x )
2
1 T
x Qx 2xT Qx + (x )T Qx
=
2
1 T
1
=
x Qx xT b + (x )T Qx
2
2
1
= f (x) + (x )T Qx
2
=
E(x) and f (x) have the same minimum point x , since (x )T Qx is a constant.
We now apply the steepest descent method to E(x) or f (x)
f x(3) = 9.9990 etc.
g(x) = Qx b
i = 0, 1, . . .
f (xk+1)
i = 0, 1, . . .
gk
Rate of Convergence
We take the following quadratic form for example,
f (x) =
1 T
x Qx bx
2
xk+1 = xk
gkT gk
gkT Qgk
gkT gk
gkT Qgk
gk
k = 0, 1, 2,
0 < a = 1 2 n = A.
The unique minimum point x can be found from
(
E(xk+1 ) =
f (x ) = 0 Qx b = 0 or x = Q1 b
34
gkT gk
1 T
gk Qgk gkT Q1 gk
)
E(xk )
(3.3.4)
Definition 3.4 (condition number) For any matrix Q, the condition number of Q is defined as
gkT gk
T
(gk Qgk )(gkT Q1 gk )
r+1
4aA
(x x)
(xT Qx)(xT Q1 x)
(a + A)2
(3.3.5)
where Q is a positive definite n n matrix and a and A are the smallest and
largest eigenvalues of Q respectively.
PROOF. Omitted.
A largest eigenvalue
a smallest eigenvalue
r = A/a where
kxk+1 x k
r1
r+1
2
kxk x k
1
f (x(k) + )
= q (k) () = f (k) + T g (k) + T G(k) ,
2
1
(x x )T Q(x x ).
2
where g (k) () is the quadratic approximation at the point x(k) and = x x(k)
is the step correction.
Choose (k) as the minimizer of q (k) (), i.e., as the solution to
5q (k) () = 0,
giving
1
35
(1) requires f (k) , g (k) and G(k) , i.e. function values; first and second derivatives.
(2) the step (k) is only appropriate and well-defined if the quadratic model
has a minimum, i.e. G(k) is positive definite.
(3) basic Newtons method does not involve a line search as a step of (k) (i.e.
(k) = 1) goes to minimum of quadratic.
1
2
kA00 ()k kxk x k
2
2
C kxk x k
kxk+1 x k =
(3.3.7)
Global Convergence
From (3.3.7) we see that if, at one stage, kxk x k > 1, then the method may
x(k) || < , then STOP. Otherwise, let k = k + 1 and GOTO not converge. Lets consider a damped scheme corresponding to (3.3.6):
xk+1 = xk k G1
k gk ,
0 < k 1.
Remarks:
We may use k to control the step length. Now, consider a general form
(1) Newtons method is not a general purpose method, as G(k) may not be
xk+1 = xk Mk gk ,
(3.3.8)
always positive definite when x(k) is remote from x , where x is a local
minimum.
where Mk is an n n matrix. A two special cases are
Define
A(x) = x (G(x))
Suppose x is a point such that
g(x ) = 0
and
G(x )
non-singular
= A(xk ) x + G1 (x )g(x )
= A(xk ) A(x ).
gkT Mk gk > 0
or Mk is positive definite.
Steepest Descent: Mk = I
positive definite
Newton:
Mk = G k
positive definite
When x is close to x .
May not be p.d. if x is away from x .
Then
xk+1 x
f (xk+1 )
(3.3.6)
g(x).
kA0 (x )(xk x )k +
1
2
kA00 ()k kxk x k
2
(a) Assume that Gk = G(xk ) has eigenvalues k1 < k2 < . . . < kn . Choose k
such that
k + k1 = > 0
a small positive number.
2x1 4
10x2 + 30
G(x)
2
0
0
10
k I + Gk
x(1) =
Gk = Lk Dk LTk
Lk lower triangular matrix.
Dk diagonal matrix.
g (2)
Example
= Lk Ek LTk
1
k = 0, 1,
4x1 +
5x22
g (1)
1
0 2 1
1
= G(1) g (1) = 2 1
=
10
1
0
10
1
1
2
x(2) = x(1) + (1) =
+
=
2
1
3
0
=
x(2) = x
0
is easy to evaluate.
x21
k = Lk Ek LT
G
k
N.B. G
0
1
10
S (k)
1
2
xk+1 = xk k Gk gk ,
G(x)1
1
2
=
0
2
=
10
Note : Newtons method converged in 1 iteration. This is true for any positive definite quadratic function as Newtons method is based on a quadratic
model of function.
Choose
with
Let
xk+1 = xk k (k I + Gk )1 gk
i.e.
+ 30x2 + 50
Look at h() = f (x + s)
i.e.
1008(1) 104 = 0
76
26
1
+
63
126
x(2) =
130 = 191
2
126
63
very slow!
(1) to ensure the existence of a point satisfying both these conditions need
.
for > 0 small enough. Thus, we could replace finding minimizer of f (x) by
any small enough and still have a descent method. But, if the (k) s are
chosen to be too small, we may not get to the minimum of f . We need at least
linear decrease in function value to guarantee convergence. If chosen too big,
s may be no longer a descent direction.
Here:
(ii) g (k) 0 as
k , or
Note: as with any method using only first derivative information, this method 3.3 Quasi-Newtons Methods
can only guarantee convergence to a stationary point (i.e. x such that g =
Consider the optimization problem
g(x ) = 0) of a general function f.
min f (x).
xRn
Method of Steepest Descent
Let
Recall that G(k) = 2 f x(k) may not be always positive definite when x(k)
(k)
(k)
(k)
2
(k)
g
= 5f (x ), G = 5 f (x )
is far away from the local minimum. Thus, Newtons method is not a general
(k+1)
purpose method. However, it has good local properties.
x
= x(k) (k) 5 f (x(k) )
1
Quasi-Newtons methods are based on the idea of approximating G(k)
Global convergence
at each iteration by a symmetric positive definite matrix H (k) which we will
Linear convergence rate local behaviour not good
update at each iteration.
Only need knowledge of f and 5f plus some line search.
Newtons Method
(k+1)
(k)
=x
(k) 1
5 f (x
(k)
5 f (x
Algorithm
Given x(1) , H (1) . Set k = 1.
1. Evaluate f (k) = f (x(k) ), g (k) = f (x(k) ).
4. Set x(k+1) = x(k) + (k) s(k) , where (k) is chose by a line search.
5. Update H (k) to H (k+1)
6. Set k = k + 1, go to step 1.
Usually H (1) = I. This implies that
S (1) = g (1)
1
(k)
2
(k)
(k)
x 5 f (x )
5 f (x )
over R+
If 52 f (x(k) ) and hence 52 f (x(k) )1 is positive definite, then the search direction
1
s(k) = 52 f (x(k) )
5 f (x(k) )
is a descent direction, because
T
G(k)
1
g (k) < 0
Newtons Method
second derivatives
G(k) may be indefinite
Quasi-Newton Methods
only first derivatives
H (k) always positive definite
S (k) is descent direction.
= x(k+1) x(k)
= g (k+1) g (k)
where
g k = 5f (xk ).
Set
or
This expansion is exact if f (x) is quadratic. The above equality shows that G(k) Then,
depends on (k) and in the case that G(i) is constant for all i = 0, 1, 2, . . . k, (i.e.
f (x) is quadratic), we have
(i) = G (i) ,
i = 0, 1, , n.
(k+1)
auT (k) = 1
=H
(k)
or
= G
(3.3.9)
T
Theorem 3.9 Suppose G is well-defined and positive definite, and (1) , . . . , (n)
are linearly independent, then the rank one method terminates on a quadratic
function in at most n + 1 searches, with H (n+1) = G1 .
where
and
H (i) (j)
So, it is natural to construct successive approximations H (k+1) to G(k)
T
T
based on data obtained from the first k steps of a descent process in such a way
(i)
(j)
(i)
=
(j)
that if G is constant, then the approximation would be consistent with (1) for
these steps. More specifically, H (k+1) would satisfy
= 0,
H (k+1) (i) = (i) 0 i k
(3.3.10)
(k+1)
j < i.
Therefore,
=H
(k)
(k)
=H
(k)
+ auu
Symmetric
rank one matrix
H (n+1) G = I
H (n+1) = G1
termination.
Also
(k+1)
=H
(k)
+ auu + bvv
(k)
T
(k)
T
T
(k) g (k+1) (k) g (k)
T
= (k) g (k) ,
(3.3.11)
Unlike before, u and v are not determined uniquely. Letu = (k) andv = since
T
H (k) (k) . Then,
(k) g (k) = 0,
1
auT (k) = 1 a =
T
(k)
because x(k+1) is the minimum point of f (x) along (k) .
(k)
Thus, by definition of (k) and (3.3.11)
and
1
T
T
T
bv T (k) = 1 b =
T
(k)
(k) (k) = k H (k) g (k) g (k) = k g (k) H (k)
H (k) (k)
Thus,
H
(k+1)
=H
(k)
and hence
T
T
H (k) (k) (k) H (k)
(k) (k)
+
T
T
(k) (k)
(k) H (k) (k)
xT H (k+1) x =
x H
Let
(k+1)
T
(k) (k)
x = x H x+x
x
T
(k) (k)
T
(k) (k)
(k) H (k)
TH
x
x
( k )T H (k) (k)
2
2
xT (k)
xT H (k) (k)
T
(k)
k T (k) (k)
= x H x+
T
( ) H
(k) (k)
T
(k)
21
a = H (k) x,
by Candy-Schwarz inequality.)
Properties:
T
12
b = H (k) (k) .
(Note H 1/2 is not defined as that in a the real number set.) We have
xT (k)
aT b
T
(k+1)
T
x H
x = a a T +
T
b b
(k) (k)
2
xT (k)
(aT a)(bT b) aT b
=
+
T
bT b
(k) (k)
>0
>0
Had:
20
s(k) g (k) ( 1)
T
H (k) g (k) g (k) (6 1) > 0
1
2
10
2(1 )
T
2
2
=
0 1 20 + 1 2 = 0
(1) =
1
.
11
steepest
Quasi-Newton
descent
s(k) = g (k)
s(k) = H (k) g (k)
can converge
Quadratic termination
arbitrary slowly
i.e. converges in
1st derivative only at most n iterations
uses only 1st derivatives
Newton
1
Example
x(2)
1
20
=
0
"
x(1)
G1
s(1)
= g (1)
#
g (x) =
0
1
2
2
=
2
20x1
2x2
, G(x) =
20 0
0 2
(1)
(implying H (1) = I)
(1)
(1) (1)
42
9
18
11
(2)
= 110
, g = 18
9
11
11
2
11
(2)
(1)
(1) (1)
= x x = s =
2
11
40
4
11
(2)
(1)
= g g =
4 = 11
11
8
16 101
(1)T
(1) (1)
=
,
H =
11
121
=
11
10
1
1
1
H (2)
=
1
0
1
2222
0
1
+
4
11
8
121
123 119
119 2301
1
1
1
1
16
121
16 101 121
100
10
10
1
T
(k+1)
HBF GS
(k) (k)
(k)T (k)
!
T
T
(k) (k) H (k) + H (k) (k) (k)
(k)T (k)
!
!
T
T
T
(k) (k)
(k) (k)
(k) (k)
(k)
=
1 (k)T (k) H
I (k)T (k) + (k)T (k)
= H
(k)
18
123 119
1
= H (2) g (2) =
119
2301
1
11 2222
18
9
242
1
=
=
2420
10
11 1111
101
18
9
+
9
18
1
1
101
x(3) = x(2) + s(2) = 110
=
+
9
180
110 10
101 10
T
T
T
T
(k) (k)
(k) (k)
(k) (k) H (k) (k) (k)
11
101
= H (k) (k)T (k) H (k) H (k) (k)T (k) +
T
220
(k) (k)
110 18
+
.
0
0
(k)T (k)
x(3) =
= x , g (3) =
0
0
101 18
9
The way we get this is by approximating G instead of G1 by some matrix
1
1
(2) = (2) s(2) =
=
(k)
B
= (H (k) )1 . Then, the quasi-Newton formula changes to
220 101 10
110 10
18
1
(2) = g (3) g (2) =
(k) = B (k+1) (k)
11 1
T
9 18
9 18
(2) (2) =
11 =
Update B (k) by rank two correction as in DF P
110 11
110
18
1
T
T
H (2) (2) = H (2) g (2) = s(2) =
(k) (k)
B (k) (k) (k) B (k)
(k+1)
101 10
BBF GS = B (k) + (k)T (k)
BBF GS HBF GS = I
10
11 101 1
101
1
81
110
123 119
1
10
The properties that the DFP method had are also evident in BFGS. What is
H (3) =
+
9
g
(k)
1
1230 + 101 220
1190 1010 + 2200
=
T
T
2222 1190 1010 + 2200 23010 + 10100 22000
43
Notes:
1
f (zk ) f (xk ) + gk (zk xk ) + (zk xk )T Gk (zk xk ),
2
(3.4.12)
4.1
N := {Bu : u Rm }
is a linear subspace of Rn . Let zk xk = Buk . The expression (3.4.12) becomes
1
f (zk ) = f (xk ) + gk Buk + uTk B T Gk Buk .
2
(3.4.13)
E(x) =
1
(x x )T Q(x x ).
2
So, our problem becomes: find uk such that the RHS of (3.4.13) is minimized. For any given n m matrix of rank m, the sequence {xk }
k=0 produced by
Differentiating gives
applying
the
above
algorithm
to
E(x)
satisfies
B T Gk Buk + B T gk = 0.
E(xk+1 (1 )E(xk ), k = 0, 1, ...,
From this we have
uk = (B T Gk B)1 B T gkT ,
(pT p)2
over all vectors p in the
where [0, 1] is the minimum of T
and so
(p Qp)(pT Q1 p)
zk = xk B(B T Gk B)1 B T gkT .
T
null-space of B .
Obviously B(B T Gk B)1 B T is the inverse of Gk restricted on N . We expect it
The proof is omitted here.
is easy to evaluate.
Example. Suppose B = (I 0)T , where I is the m m identity matrix. Let
G11 G12
G=
,
G21 G22
where G11 and G22 are respectively m m and (n m) (n m) matrix. Then,
1
G11 G12
I
T
1
(B Gk B) = (I 0)
= G1
11 .
G21 G22
0
and
B(B T Gk B)1 B T =
G1
11
0
5
5.1
1 T
x Gx bT x,
2
x Rn ,
(3.5.14)
0
.
0
wRk+1
(3.5.15)
B T gkT ,
Set dK = B(B Gk B)
where gk and G are respectively the
gradient and Hessian of f , and B is a given n m (m < n) matrix.
F (xk + Pk w) =
44
1
(xk + Pk w)T G(xk + Pk w) bT (xk + Pk w).
2
Theorem 3.13 Let {s(i) }n1 be a set of nonzero Q orthogonal vectors. For any
x(1) Rn , the sequence {x(i) } generated by
So,
F (xk + Pk w) =
or
PkT G(xk
+ Pk w)
PkT b
= 0,
x(k+1)
(k)
(3.5.16)
for some set {(i) }. Multiplying by Q and taking the inner product with respect
to s(k) ,
s(k)T Q(x x ) = (k) s(k)T Qs(k) ,
so that
s(k)T Q(x x(1) )
.
(3.5.20)
(k) =
s(k)T Qs(k)
From (3.5.18), we get
i 6= j.
with k =
gkT pk
.
pTk Gpk
x(k) x(1) =
Definition 3.5 Given a symmetric matrix Q, two vectors d(1) and d(2) are said
to be Q-orthogonal, or conjugate with respect to Q, if d(1)T Qd(2) = 0.
k1
X
(i) s(i)
(k) =
+ + k d
Theorem 3.12 If Q is positive definite and the set of nonzero vectors d(i) , i =
1, 2, ..., k are Q-conjugate, then they are linearly independent.
(k)
i=1
(1)
(k) =
=0
s(k)T g (k)
.
s(k)T Qs(k)
Let B (k) be the subspace of Rn spanned by {s(i) }k1 . Then we have the following theorem.
(3.5.19)
(3.5.17)
g s
s(k)T Qs(k)
where = gk T Pk and ek is the kth column of the identity matrix. Furthermore, since p, p1 , ..., pk are arbitrary, we can choose the such that
pTi Gpj = 0,
and g (k) = Qx(k) b converges to the unique solution x after n iterations, that
is x(n+1) = x .
Consider min
satisfies Qx = b.
(3.5.18)
(k) (k)
Theorem 3.14 Assume G is positive definite and let {s(i) }n1 be a sequence of
nonzero G-orthogonal vectors in Rn . Then, for any x(1) Rn , the sequence
{x(k) } generated by
i = 0
x(k+1)
(3.5.21)
(k) (k)
(k)
45
g s
s(k)T Gs(k)
(3.5.22)
5.2
1 T
x Gx bT x
2
on the line x = x(k) +s(k) , for all (, ) as well as on the linear variety
x(1) + B (k) .
PROOF. Since f is convex (actually strictly convex), a local minimum is a global
minimum. This implies that we need only to show g (k+1) B(k) . (B (k) contains
the line x(k) + s(k) .) We use mathematical induction to prove this.
Trivially true for B (0) = .
Assume it holds for k 1. That is g (k1) B (k) . We have
x(k+1)
g
(k)
= Gx
=g
(k)
Gs
(k) =
When k = n, set xn+1 = x(1) (or continue as it is) and go back to Step 1.
On a quadratic function with G positive definite, the choice of (k) ensures
that s(k) are conjugate. Since
we have
(k)
g (k+1)T g (k+1)
.
g (k)T g (k)
(k)
(3.5.23)
But
s(k+1)T (k)
For i < k,
But the first term on the RHS vanishes by induction, and the second term
vanishes by conjugacy. Therefore,
(k)
s(i)T g (k+1) = 0
That is g (k+1) B(k) .
for i k.
=
g (k+1)T (k)
s(k)T (k)
g (k+1)T (g (k+1) g (k) )
(k)
(g + (k1) s(k1) )T (g (k+1) g (k) )
because g (k+1) g (k) = Gs(k) . But the previous theorem showed that
g (j)T s(i) = 0,
46
for i < j
(3.5.24)
and since
g
(i)
(i)
= s
(i1) (i1)
(j)T (i)
i < j,
x(3) = x(2) + (2) s(2) =
we have
(k) =
g (k+1)T g (k+1)
.
g (k)T g (k)
0
= x .
0
x(3) =
0
0
Other conjugate gradient methods arise from different choices of (k) . For
example, Polak-Ribiere (1971) method
g (k+1)T (g (k+1) g (k) )
.
G(k)T g (k)
(k) =
1 T
x Gx bT x,
2
where G is symmetric and positive definite.
min (x) =
6.1
In the quadratic case, the above two expressions for (k) are identical, as
g (k+1)T g (k) = o. For general functions, the two methods behave differently,
but both on numerical and theoretical grounds, Polak-Ribiere method is preferable.
Example. Find the optimal point of f (x) = 10x21 + x22 using Flecher-Reeves
method.
Solution. We start with (0) = 0. The gradient of f is g(x) = f =
(20x1 2x2 )T . We choose x(1) = (1/10 1)T and so
g (1) = (2 2)T ,
xk+1
rk+1
k
1
.
11
So,
x
(1)
=x
= rk + k1 pk1 ;
||rk ||22
;
=
pTk Gpk
= xk + k pk ;
= rk + k Gpk ;
||rk+1 ||22
=
,
||rk ||22
(2)
(1) (1)
9
110
9
11
,
(2)
=
18
11
18
11
.
g (2)T g (2)
92
=
.
112
g (1)T g (1)
18
92 2
36
1
11
= 18 + 2
=
.
2
11
121 10
11
(1) =
s(2) = g (2) + (1) s(1)
Rate of Convergence
Define ||x||G := xT Gx. The convergence of the linear CG algorithm is given
by
k
1
||xk x ||G C
||x0 x ||G
+1
where = max /min is the condition number of G and x is the minimum
point of (x). From the above estimate we have
k
||xk x ||G
1
lim
= lim
= 0.
k ||x0 x ||G
k
+1
47
Normally >> 1 1
1, and thus the convergence of the linear
W 1/2 GW 1/2 (W 1/2 x) = W 1/2 b
+1
CG method is very slow when used for solving large-scale linear systems.
or in short
x = b.
G
6.2
Preconditioned CG methods
Algorithm(Preconditioned CG method):
we have
Gx = M
r0 = Gx0 , 1 = 0, p1 = 0;
b
= b,
or Gx
pk
xk+1
rk+1
M is easily invertible.
Obviously, if we choose M = G, then M 1 G = I and we solve the problem in
one iteration. This choice is not practical!
= W 1 rk + k1 pk1 ;
rkT W 1 rk
=
;
pTk Gpk
= xk + k pk ;
= rk + k Gpk ;
T
rk+1
W 1 rk+1
.
=
T
rk W 1 rk
The preconditioner W can be chosen in various ways. For example, we can use
(Choleski),
where L is a lower triangular matrix. In the case that G = (Gij )nn is a largescale, sparse matrix, we can use the following algorithm to find the Incomplete
Choleski Factorization or Decomposition (ICF or ICD) M = LLT with L =
(lij )nn :
Algorithm (ICF):
1/2
l11 = G11 ,
for i = 2 to n do i
for j = 1 to i 1 do j
where (G
ij 6= 0)Pdo
j1
lij = Gij k=1 lik ljk /ljj
end do
end do j
Pi1 2 1/2
lii = Gii k=1 lik
48
Chapter 4
Introduction
subject to
4r2 + 6x2 = A = constant and r 0, x 0.
2.1
Direct method
4
V = r3 +
3
Problem (2).
min V =
A 2 2
r .
6
3
x2 =
A 2 2
r
6
3
4
3 A
3r2 + {
3
2 6
A
2
4r 2r{
6
2r{2r (
23
1
2
2
r2 } 2 { 2r}
3
3
2 2 1
2
r } = 0.
3
A 2 2 1
r ) 2 } = 0
6
3
implying either
r=0
or
2r (
(4.2.1)
A 2 2 1
r ) 2 = 0
6
3
4 3
r + x3
3
4r2 =
49
A 2 2
r
6
3
2
A
(4 + )r2 = .
3
6
(4.2.2)
A
At r = 12 ( 6+
)2 ,
So,
r2 =
A
3
A
=
6 12 + 2
4(6 + )
x2
=
r=
A
Check at r = 0 and r = 12 ( 6+
)
(i) At r = 0
d2 V
dr2 r=0
=
=
1 A 1
(
)2
2 6+
(4.2.3)
=
1
2
A 2 2 1
4 2 r2 A 2r2 1
= 8r 2{ r } 2 +
(
)2
6
3
3
6
3
r=0
A 1
2
= 2( ) < 0
6
=
=
A 2 2
r |r= 1 ( A )1/2
2 6+
6
3
A 2
1 A
6
3
46+
A
A
6
6(6 + )
A(6 + ) A
6(6 + )
6A + A A
6(6 + )
A
6+
A 1/2
A
So, r = 12 ( 6+
) 2 and x = ( 6+
)
is the solution of Problem 2.
= maximum.
1
A
(ii) At r = 12 ( 6+
)2
2.2
d V
dr2 r= 1 (
2
A
1/2
6+ )
1 A 1/2
1 A 1
A 2
) 2 } 2{
}
8{ (
2 6+
6
3
46+
4 2 1 A
A 2 1 A 1/2
+
{
}
3 46+ 6
3 46+
1
A
A A 1
= 4{
} 2 2{
}1/2
6+
6
6 6+
2 A
A A 1/2
+
{
}
3 6+ 6
6 6+
A 1/2
2 A
A(6 + ) A 1/2
= 4{
} +{
2}{
}
6+
3 6+
6(6 + )
3 A
A 1/2
A 1/2
} +{
2}{
}
> 0.
= 4{
6+
3 6+
6+
=
= minimum
Therefore, at r = 0, we have
x2 =
A 2 2
A
r |r=0 =
6
3
6
A
x = ( )1/2
6
gm
gm
x1
xn
i = 1, ..., m
(4.2.12)
m
f (x )
gi (x )
i
= 0, j = 1, ..., n
(4.2.14)
b = (xm+1 , ..., xn )
x
i=1
xj
xj
must vanish if x
b is a local minimum. Let us write this in the form of a total Remark. These necessary conditions can also be obtained easily as follows:
differential:
Define
m
n
F (x, ) = f (x) i gi (x)
h(b
x )
i=1
dh(b
x )=
dxj = 0
(4.2.4)
j=m+1 xj
Then,
F
gi (x)
However, df = dh. We thus have the from the above two equalities
=
= 0, j = 1, ..., n
xj
xj
n f (x )
df =
dxj = 0
(4.2.5) and
F
j=1 xj
= gi (x) = 0, i = 1, ..., m
i
b
where x = [x1 , ..., xm , x ).
These coincide with (4.2.13) and (4.2.14). But, we must note that the approach
Let us now consider the total differential of the constraints:
given in this Remark is not a proof.
Examples. The same as the one given before.
gi (x) = 0, i = 1, ..., m
(4.2.6)
1. min V = 43 r3 + x3 subject to 4r2 + 6x2 = A = constant, r 0, x 0.
at x . It is
n g (x )
2. max V = 43 r3 + x3 subject to 4r2 + 6x2 = A = constant, r 0, x 0.
i
dgi (x ) =
dxj = 0, i = 1, ..., m
(4.2.7)
j=1 xj
Solution. The region is defined by
Multiplying each of the functions in (4.2.7) by an associated Lagrange multiplier, i , and subtracting this from (4.2.5), we obtain
m
n g (x )
f (x )
i
dxj i
dxj (4.2.8)
df (x ) i dgi (x ) =
i=1
j=1 xj
i=1 j=1 xj
n
X
m
gi (x )
f (x )
dxj i
}dxj = (4.2.9)
0.
=
{
i=1
xj
xj
j=1
m
4r2 + 6x2 = A,
(4.2.10)
L
= 4r2 8r = 0
r
L
= 3x2 12x = 0
x
L
= 4r2 + 6x2 A = 0
j=m+1
m
f (x)
gi (x )
i
i=1
xj
xj
dxj = 0
r 0, x 0.
(4.2.11)
51
A
6
A
4
12
1
2
A
6
= V =
=
1
2
max
Subjectto
1
2
32
= V =
(A)3/2
6()1/2
(iiii) If r 6= 0 and x 6= 0 =
1
1
r and = x,
2
4
2
2
4 (2) + 6 (4) A = 0,
Note that we have assumed we wish to spend all of the $10 per unit that
is available. It might, in certain cases, turn out to be the case that it is not
optional to spend it all. Hence, it might be preferable to specify the constraint
as:
x1 + x2 10
We shall hope the solution satisfies these constraints for this problem. We form
the Lagrangian Function
=
=
=
=
=
162 + 6 162 A = 0,
12
1
A
,
=
4 +6
1/2
1/2
A
1
A
x=
and r =
,
+6
2 +6
= 40 + 8x2 20x1 = 0
F
=
55
+
8x
12x
=
0
1
2
x1
F
= 10 x1 x2 = 0
3/2
(A)
V =
1/2
6 ( + 6)
Combining the above cases we have
(4.2.15)
2.3
Example 2. Consider the problem of finding the most profitable level of production of an entrepreneur who produces two goods or outputs Q1 and Q2 with
a single input X, and let
x = h(q1 , q2 )
Various examples
We wish to find the cost-minimizing input levels for a given output level q if Then
the price (rental) of capital is r and the usage rate is w. The total cost c is
24
X
i=1
0 < qi B, i = 1, 2
Notice again that we assume that we buy some beans and get bean curd and
fertilizer as products. Suppose the demand functions are
subject to
n
X
Ki 0, i = 1, ..., n
i=1
p1 = 40 2q1 and p2 = 20 q2
C = w1 x1 + w2 x2
subject to
53
1
+ b1 3a21 + a22 + a23 + a2 a3
3
1
+ bm 3a2m a2m1 a2m2 am1 am2 .
3
subject to
There are also certain behavioral constraints relating to stresses which must be
taken into account. Let us call these behavioral variables y (x). The compoex1 x2 x3 x4 x5 x6 1080 = 0,
nents of y (x) are stresses. The constraints on them are given by
2
2
x1 x4 (x1 + x2 + x3 ) x5 + x2 x3 (x1 + 1.57x2 + x4 ) x6 28 0,
L y (x) U
x1 x4 (x1 + x2 + x3 ) x25 + x2 x3 (x1 + 1.57x2 + x4 ) x26
0.16 0 where
2x1 (2x1 + 4x2 + 2x3 + 3x4 ) + 4x2 (1.57x2 + 1.57x3 + x4 ) + 2x3 x4
L=0
x1, x2, x3 , x4 , x5 , x6 .
U = [0 , 0 , ..., 0 ]T
0 = critical stress
The derivation of the functional relationship used in this example are given by
Schinzinger in Lavi, A., and T. P. Vogl (eds.) : Recent Advances in OptimizaThe nonlinear programming problem which is to be solved to determine the
tion Techniques, Wiley, New York, 1966.
minimum weight is to find an x such that
54
min W
A = 2r2 + 2rl
m2
X
1
+ b1 3a21 + a22 + a23 + a2 a3
3
1
+ bm 3a2m a2m1 a2m2 am1 am2
3
subject to
1 x u,
L y (x) U.
j=1
subject to
n
P
xj = c
j=1
subject to
n
P
xk = b
k=1
P
)
1
2
1
2
1
t
k
k
p0
If = 0, an investor has no risk aversion, i.e., we simply maximize expected
1
2 1
h
ic3
rate of return. If the problem becomes one of maximizing the variance
Im GRT
P
+
c
c
+b4 (D1 P ) (t2 t1 ) kRT
ln
1
2
of the rate of return. Intermediate values of provide a balance between these
k
p
p
2 1
0
extreme attitudes towards risk.
subject to
This general approach to portfolio selection was first suggested by Markowitz
P Dav
in ()
p p0
All parameters except p and P in the objective function are either known as
() Markowitz, H. M., Portfolio Selection, Wiley, New York, 1959.
constants or operating conditions which are chosen.
Example 12
() Jen, F. C., C. C. Pegels, and T. M. Dupuis, Optimal Capacities of ProIt is desired to find the dimensions of a closed cylindrical tank with a fixed
(given) volume Vc which has the minimum surface area A. The area A of this duction Facilities, Management Science, 4: B-573-580, 1968.
tank is given by
Example 16 (An animal breeders problem)
55
5 10 0 1 0
3
10 3 0 2 0
0
0
0 1 2 0
1
D=
2 2 6 3
0
1
0
0 0 3 10 1
3
0 1 0 1 20
Where the dij are the dollars per pound of nutrient i in feed material j.
In order to find the minimum cost diet for the animals we need to solve the
problem
0
0
min z = c x + x Dx
subject to
Ax = b
x0
Table 1. Content of nutrient per 100lb of feed material
Feed Material
Vitamins
Protein
Minerals
1
1
3
10
2
21
500
5
3
40
60
3
4
65
20
0
5
10
40
15
6
0
700
5
where c, D are given above. From table 1 and the previous problem statement,
we see that
b = [10,
80, 2] and
1
21 40 65 10
0
A = 3 500 60 20 40 700
10
5
3
0 15
5
Example 17 (An Inventory Problem for a small business)
Consider a retail store which stocks and sells three different models of lawnmowers. The store owner cannot afford to have an inventory on hand worth
more than $15,000 at any time. We shall assume that the past several years
have given him enough experience so that he can predict his demand sufficiently
accurately to treat this problem deterministically, i.e., without attempting to
introduce probabilistic complications.
The lawn mowers are ordered in lots. If the models are numbered 1,2,3, then
Qj is the order quantity for model j. The store owner calculates his carrying
charge on each item, Ij is 0.25. This means that the rate at which inventory
costs accumulate are proportional to the investment in inventory at that time.
I is the constant of proportionality. I has units of dollars per year per dollar of
investment in inventory. We define further
Aj = fixed cost of ordering a lot of model j
cj = cost of one unit of model j
j = demand rate (units per year)for model j
The data the store owner has is shown in Table 1. We note above that there
is a limitation of $15,000 on the total value of the inventory. The store owner
has an additional constraint that he has an additional constraint that he has
a maximum effective storage space of 6000ft2 in which to store his inventory.
Each occupies 25ft2 .
We can now state the problem of determining what values of Q1 , Q2 , and
Q3 will minimize the average annual cost of ordering and storage subject to
constraints of limitation of capital and storage space. This problem is
1 A1
2 A2
3 A3
min Z = Q
+ I1 c21 Q1 + Q
+ I2 c22 Q2 + Q
+ I3 c23 Q3
1
2
3
subject to
c1 Q1 + c2 Q2 + c3 Q3 M
s1 Q1 + s2 Q2 + s3 Q3 S
In terms of our given data, the optimization problem becomes
Table 1 Lawnmower stores costs
Model
1
2
3
Ordering cost Aj 60
80
100
Unit cost cj
30 110
60
Demand rate j 800 500 1500
40,000
150,000
+ 7.50Q3
min Z = 48,000
Q1 + 3.75Q1 + Q2 +
Q3
subject to
30Q1 + 110Q2 + +60Q3 15, 000
25Q1 + 25Q2 + 25Q3 6000
Q1 , Q2 , Q3 0
Example 18
A chemical manufacturing company sells three products and has found that
its revenue function is
f = 10x + 4.4y 2 + 2z,
where x, y, and z are the monthly production rates of each chemical. It is found
from break even charts that it is necessary to impose the following limits on the
production rates:
x>2
1 2
2
2z + y > 3
In addition, only a limited amount of raw material is available ; hence the
following restrictions must be imposed upon the production schedule:
x + 4y + 5z < 32
x + 3y + 2z < 29
Determine the best production schedule for this company and find the best
value of the revenue function.
56
Inequality constraints
gj (x ) 0
1 (x21 + x22 5) = 0,
2 (3x1 + x2 6) = 0.
= 0,
= 0,
= 0.
g(x )
0,
0,
where
T h(x )
m
X
i hi (x ),
i=1
T g(x )
r
X
gi (x ).
i=1
NOTE :Kuhn-Tucker conditions can only be used to generate optimal solution in low dimensional problems.
Example. Use Kuhn-Tucker conditions to solve
min
subbject to
3x1 + x2 6.
3.1
(x)
- x
(h (x)) = 2h (x) hx
j
j
=
(
0
if h (x) < 0
(x)
- Px
=
h (x)
j
if h (x) > 0
2h (x) xj
(x)
But, if hx
is continuous at
j
h (x) = 0
then
(x)
2h (x) hx
= 0 at h (x) = 0.
j
(x)
This, in turn, implies that Px
= 0 at h (x) = 0.
j
We then consider objective function
F (x) = f (x) + M P (x),
M >0
x22
+ 5) + 2 (3x1 + x2 6) = 0
1 , 2 0.
57
with no constraint.
In the limit as M
=
M P (x) = 0
for h (x) 0
M P (x)
if h (x) > 0
In the limit as M =
Minimum of F (x) is inside feasible region if minimum of f (x) is inside.
Minimum of f (x) is on the boundary if minimum of f (x) is outside the
feasible region.
We now consider a simple optimization problem subject to an equalyy constraint
min f (x)
subject to
g (x) = 0
Introduce penalty function
2
P (x) = (g (x))
Then we consider objective function
F (x) = f (x) + M P (x)
Clearly, in the limit as M , the minimum of F (x) is on g (x) = 0.
We now return to the general problem
min f (x)
subject to
hi (x) 0, i = 1, ..., r
gi (x) = 0, i = 1, ..., m
We then replace the problem by a sequence of unconstrained optimization
problems, depending on M , = 1, ..., r, and M , = 1, ..., m.
r
m
P
P
min F (x) = f (x) +
M P (x) +
M P (x)
=1
NOTE : For large M s and M s, we still get steep valleys but have a good
estimate of minimum by then so search methods will work well.
WARNING : Methods find local minimum. So check for others, use different starting points and resolve.
Example min f (x) = x2 + 3x 4
subject to
x 0
0 x 4 =
x40
2
2
F (x) = x2 + 3x 4 + M1 (max{0, x}) + M2 (max{0, x 4})
Step 1. Unconstrained problem M1 = M2 = 0
=
F (x) = x2 + 3x 4
2
= x 32 + 49 16
4
2
= x 32 47
always negative = minimum at x = = constraints active
Step 2.
F
X = 2x + 3 2M1 (max{0, x}) + 2M2 (max{0, x 4}) = 0
Case 1. x < 0, x 4 < 0
=
2x + 3 = 0 = x = 32 and f 32 = 74
Case 2 x 0, x 4 < 0
=
2x + 3 + 2M1 x = 0
=
=1
(2M1 2) x = 3
=
x = 2M3
0 as M1
1 2
=
f (0) = 4
Case 3 x < 0, x 4 0
2x + 3 + 2M2 (x 4) = 0
(2M2 2)x = 8M2 3
=
2 3
x = 8M
2M2 2 4 as M2
- (all constraints satisfied)
= f (4) = 8
Case 4
x 0, x 4 0
= impossible
CONCLUSION. minimum at x = 4 and f (4) = 8
Example.
min f (x) = x21 2x22 + x1 x2
subject to
2x1 + x2 = 0
x1 2 = x1 2 0
x2 1 = x2 1 0
=
2
F (x) = x21 2x22 + x1 x2 + M 1 (2x1 + x2 )
2
2
+M1 (max{0, x1 2}) + M2 (max{0, x2 1})
=
4M 1 +1
4M 1 +1
x2 = 2M
x 2M
2
4 1
4
1
= 2
F
= 2x1 + x2 + 4M 1 (2x1 + x2 )
x1
+2M1 (max{0, x1 2}) = 0
(A)
F
= 4x2 + x1 + 2M 1 (2x1 + x2 )
x2
+2M2 (max{0, x2 1}) = 0
Case 1.
x1 2 < 0, x2 1 < 0
(A) = 2x1 + x2 + 4M 1 (2x1 + x2 ) = 0
(B) = 4x2 + x1 + 2M 1 (2x1 + x2 ) = 0
In the limit as M 1 = we have
2x1 + x2 = 0
=
2x1 + x2 = 0
= x1 = x2 = 0
4x2 + x1 = 0
= f (0, 0) = 0
Case 2
x1 2 0, x2 1 < 0
(A) = 2x1 + x2 + 4M 1 (2x1 + x2 ) + 2M1 (x1 2) = 0
(B) = x1 4x2 + 2M 1 (2x1 + x2 ) = 0
=
8M 1 + M1 2 x1 + 4M 1 + 1 x2 = 4M1
4M 1 + 1 x2 + 2M 1 4 x2 = 0
1 + 1 =
h(1) 2M1 4 (2) 4M
2 i
2M 1 4 8M 1 + M1 2 4M 1 + 1
x1 = 4M1 2M 1 4
=
8M1 M 1 16M1
x1 =
2
2
(B)
(1)
(2)
1
M
(3)
(4)
M1
( 12 , 1)
+4M1 8 M 1 8 M1
8M1
44+4M1
8
2
44
M
+4
as M 1
4M 1 + 1 x1 + 2M 1 + 2M2 4 x2 = 2M2
(3) 4M 1 + 1 (4) 8M 1 2 =
4M 1 + 1 4M 1 + 1 2M 1 + 2M2 4 8M 1 2 x2
= 2M2 8M 1 2
=
16M 1 M2 +4M2
x2 =
2
2
8M1 16 M 1
8M1
36+4M1 8
2 42 2 = 4
36+ M8
1
1
4
M1
as x1 2
4+ M1
as M1
59
(5)
(6)
4(4M 1 +1)2M2 8 M 1 +2 M2
(4M 1 +1)2
8M
1
M1
M1
M2
1
4 4+ M 4 M
1
1
M
2 2+2 M 2 M4
1
1
4(4M 1 +1)
4
M2
2M
2 M 1 +2 M4
2
2
+2 M2
x = Sb +
where
= Zy,
y <nm
Columns of Z act as basis vectors for the null space of AT , where the null
space of AT is
(2M 1 +2M2 4)
(as M 1 )
(as M2 )
16
4
4
4
= 4
{ : AT = 0}
=1
inconsistent
CONCLUSION.
Minimum at (2, 4) with f (2, 4) = 44. Case 4 means that we cannot
simultaneously have active inequality constraints, i.e. x1 = 2 and x2 = 1, and
also satisfy 2x1 + x2 = 0.
Quadratic Programming
(QP)
min q(x) = 12 xT Gx + dT x
subject to
aTi x = bi , i E
aTi x bi , i I
Equality Constraints. (I = )
(QPE)
min q (x) = 12 xT Gx + dT x
subject to
AT x = b
A has full rank = m < n
Elimination Method
Since A has rank m, one way to solve (QPE) would be to solve the equality
constraints to obtain x1 <m in terms of x2 <nm (Gaussian elimination,
say).
AT1 x1 + AT2 x2 = b
x1 = AT
b AT2 x2
(1)
1
So
x = Sb + Zy
(2)
(QPE) becomes
min (x2 )
No constraints, quadratic
If 2 is positive definite, then minimum is given by the unique point x2
satisfying
(x2 ) = 0
x1 is found from (1).
Generalized Elimination
Let S and Z be n m and n (n m) matrices such that
AT S = I,
where S is the generized inverse,
AT Z = 0
such
that
..
S . Z nonsingular
A solution of AT x = b is given by
60
(y) =
1 T
y Z T GZ y + Sb
2
Z T (d + GSb)
is reduced gradient.
LAGRANGIAN METHODS FOR EQUALITY CONSTRAINED
PROBLEMS
Quadratic Programming Problem
min Q (x) = 21 xT Gx + dT x + Q0
subject to
AT x = b
Lagrangian function
L (x, ) = Q (x) T AT x b
Equality constraints : necessary conditions are that
x L (x , ) = 0 = Gx + d A
=0
T
L (x , ) = 0 = A x b = 0
In matrix form
G
A
x
d
=
b
AT
0
Thus,
find
x
,
by
solving
of linear equations
system
G
A
x
d
=
b
AT
0
If
G is positive definite, and A has full rank, then
G
A
AT
0
1 T
is nonsingular, and as it is square, its inverse exists.
H = Z Z T GZ
Z
1 T
Let
T
T
:
1
T = S S GZ Z T GZ
Z
G
A
H T T
1 T
=
T
T
T
V = S GZ Z GZ
Z GS S T GS
A
0
T
V
Quadratic Programming (QP)
where
1 T 1
min q (x) = 12 xT Gx + dT x +
H = G1 G1 A AT G1 A
A G
1
subject to
T = AT G1 A
AT G1
ci (x) = aTi x + i = 0, i E
1
V = AT G1 A
ci (x) = aTi x + i 0, i I
Thus the solution to the problem can be written as
Objective function is quadratic, i.e. Hessian 2 f = G is constant.
x = Hd + T T b
Constraints
linear, i.e gradient ci constant.
= T d V b
If G is positive semi-definite, then this is a convex programming problem and
the first order necessary conditions
In fact, if x(k) satisfies the constraints
such that
P
AT x(k) = b
Gx + d =
i ci = A
then, by setting
iA
g(k) = Q(x(k) )
A = [ci , i A ]
it can also be shown that the optimal solution to the problem can also be written
ci = 0, i E
as
(feasibility)
(k)
(k)
ci 0,
iI
x = x Hg
(k)
i ci = 0 for all i E I
= Tg
Indeed,
are both necessary and sufficient.
Dantzig-Wolfe (1959, 1963)
x = x(k) Hg(k)
(k)
(k)
min Q (x) = 12 xT Gx + dT x
= x H{Gx + d}
subject to
= x(k) HGx(k) Hd
1 T 1
(k)
1
1
T 1
(k)
AT x = b
= x {G G A A G A
A G }Gx Hd
x0
b
}|
{
z
G positive definite = objective function convex
I
I
Linear constraints = feasible region convex
z }| {
z }| {
1
T 1
1
(k)
(k)
(k)
1
T 1
=
convex programming problem
= x G Gx + G A A G A
A G Gx Hd
=
first order necessary conditions both necessary and sufficient.
1
= G1 A AT G1 A
b Hd
Lagrangian
L (x, , ) = 21 xT Gx + dT x T AT x b T x
= Hd + T T b
= T g(k) = T {Gx(k) + d} = T Gx(k) + T d
Inequality constraints= 0, x 0
b
Necessary conditions x L = 0 =
}|
{
Gx + d A = 0
1 z T 1
(k)
A G
G
x
+
T
d
= AT G1 A
Thus, need a solution
T 1
=
= A G A
b + Td
G A I
d
= V b + T d
x 0, 0
The proof is complete.
xi i = 0 for all i
Let
S
and
Z
be
n
m
and
n
(n
m)
matrices,
respectively,
such
that
Very similar to the linear programming problem
..
- a method for quadratic programming based on solving the above problem
S.Z
by LP like techniques exists.
- turns out to be equivalent to the following active set method.
is non-singular, and, in addition, let
Differences with linear programming
AT S = I
T
A Z=0
Then it can be shown that
61
- The multiplies are only tested when x(k) is a solution of the corresponding
Equality
constrained problem. (i.e. when know they exist, assuming linear
independence of ci , i (k) . Automatically holds if x(k) is a vertex.
- The search direction s(k) can be obtained directly.
:
min q (x)
q x(k) + s
subject to
T
= q x(k) + g(k) s + 12 sT Gs
ci (x) = 0, i (k)
ci x(k) = ci x(k) + sT ci , i (k)
Let x(1) is a solution of the corresponding EQP (if not start at step 2)
Obtain s(k) as solution to
kth iteration
min 12 sT Gs + sT g(k)
1. Evaluation the Lagrange multipiers
s
(k)
i , i (k)
subject to
(k)
sT ci = 0,
i (k)
(i) If i 0, i (k) I, stop.
T
(k)
(k)
(k)
(i.e. s A = 0)
(ii) Otherwise, let j be the index such that j = most negative i , i
- Multipliers : satisfy
(k) I.
g(k) = A(k) (k)
Set
(k+1)
(k)
(k+1)
(k)
(i.e. Gx(k) + d = A(k) (k) )
= j,
x
= x and k = k + 1.
Very similar in structure to linear programming
2. Let x be the solution of the EQP corresponding to (k) . Let s(k) = xx(k) .
Starting point : x(1) feasible point of QP
(k)
3. Choose steplength which maintains feasibility (ci 0, i I) .
- can be obtained by methods analogous to artificial variables in linear proLet
gramming.
(k)
c
- good starting point is a vertex of feasible region.
(k) =
min
{1, i = cTi s(k) }
i
iI(k)
Example.
(k) <0
cT
min x21 + x22 4x1 5x2 + 2
i s
xR2
minimum
minimum due
subject to
due to
to new constraint
c1 (x) = 2x1 x2 + 2 0
curvature
becoming active.
of quadratic
c2 (x) = x1 0
(l = 0)
c3 (x) = x2 0
If minimum is due to new constraint becoming active, let l be the index of
Start at
this constraint.
0
x(1) =
4. Set
(k+1)
(k)
(k) (k)
0
x
=x + s
(k+1)
(k)
(k)
2x1 4
2 0
2
= + l,
if < 1
g (x) =
, G=
, c1 =
,
2x 5
0 2
1
(k) = l
2
1
0
(a) If either x(k+1) is a vertex ((k+1) has n elements)
c2 =
, c3 =
0
1
(k)
(k+1)
or = 1 (so x
is a solution to the equality constrained problem)
(1)
x is feasible, vertex.
Set k = k + 1 and go to 1
1 0
(b) Otherwise set k = k + 1 and go to 2.
(1)
(1)
(1)
= A = {2, 3}, A =
,
Notes:
0 1
62
g(1) =
g
(1)
4
5
(1) (1)
=A
=
(1)
4
5
1
0
0
1
"
(1)
2
(1)
3
(1)
= 2 = 4, 3 = 5
(1)
Most negative = 3 = j = 3 (Drop c3 )
(1) j = {2} =
Solve EQP
min 21 sT Gs + sT g(1)
subject to
sT c2 = 0
0
sT c2 = 0 = s =
= sT = [0, ]
Thus,
min 2 5 = = 52
Step length :
i
} = {1, 25 } = 45
min {1, sTcc
i
i
/ (1)
s.t. sT ci <0
H =I h
(2)
0
2
= {2, 3} {3} + {1} = {2, 1},
vertex.
" (2) #
4
1 2
2
:
=
(2)
1
0 1
1
z = a22 + ka3 k
(i.e. c2 )
EQP :
1
5
= s(2) =
(2) = min{1, 22 } = 1
5 1 1
0
(3)
5
x =
+1
= 58
2
2
12 5 5
5
2
(3)
(3) : g(3) =
=
1
1
95
(3)
= 1 =
9
5
1
5
25
Optimal solution : x =
1
5
8
5
a scalar,
0 is a (n r 1) 1 vector of zeros,
0
1
2 2
h = , and = a2 + a22 + ka3 k
a3
2
8
. Take a1 = 2 , a2 = 6, a3 = 3 . =
6
Example.
a=
8
5
3
5
12
1
2
= (36 + 34) 2 = 70
z = a22 + ka3 k
= h =
6
+
70
3
5
2
= khk = 2 70 + 6 70
=
(2)
(2)
1 = 1, 2 = 2 = Drop j = 2
min{10 2 2} = =
(*)
a1
a = a2 n 1 vector
a3
where a1 is r 1, a2 is a scalar, and a3 is (n r 1) 1. Then, it can be shown
that
a1
Ha = z
0
where
1
sT c1 = 0
1
= s =
2
=
T
2h
Let
i=1
(2)
khk
H2 = I
= l = 1 (add)
x(2) = x(1) + (1) s(1) =
where h is an n 1 vector.
Nice numerical properties : one is if H is a householder matrix, then
63
2
T
Ha = {I h khk
2 h }a
0
2
8
= {I 70+6170
6 + 70 0, 0, 6 + 70, 3, 5 } 6
3
3
5
5
1
0
0
0
0
0 1 0 0 0
0 0 1 0 0
=
0 0 0 1 0
0 0 0 0 1
0 0
0
0
0
2
0 0
0
0
0
2
8
1
0 0 3(6 + 70)
3
9
15
5
0 0 5(6 + 70)
15
25
70
0
0
Example. Indefinite QP.
min f (x) = x21 4x22
subject to
x1 4x2 0
x1 1
x1 8x2 4
=
c1 (x) = x1 4x2 0
c2 (x) = 1 x1 0
c3 (x) =4 + x1 + 8x2 0
2x1
2 0
g(x) =
, G(x) =
8x2
0 8
1
1
1
c1 =
, c2 =
, c3 =
4
0
8
2 0
Hessian G =
not positive definite; in fact, indefinite-one positive
0 8
eigenvalue of 2 and one negative eigenvalue of -8.
Starting point
0
1
x(1) = 1 , = c(1) = 0
4
4
=
(1)
(1)
(1)
= A = {1, 2}. f = 34 .
(1)
3
= 1 = 21 , 2 =
2
1
1
(2)
(1)
(2)
= {1}, x = 1 , A =
4
4
(2)
Factor A
(2)
R
(2) (2)
Q A =
0
2
T
1
(2)
Q = I hh , where = 12 khk
1 + 17
h=
,
4
h
i
2
= 12 1 + 17 + 16
= 12 2 (17) + 2 17 = 17 + 17
=
"
2
#
17
4
1
+
17
1
+
1
Q(2) = I 17+17
4 1 + 17
16
2 17 17 4 1 +
17
17 + 17 1
1
= 17+ 17
4 1 + 16
1 + 17
1 4
1
4
1+ 17
1
= 17+ 17
= 17
4 1
4 1
17
Q(2) A(2) =
0
T
17 0
1 0
1
(2)
(2)
Q = 17
Q
=
0 17
0 1
=
(2)
T
1 4
R
17
A(2) = Q(2)
= 117
4 1
0
0
h
i (2)
R
(2)
(2)
= Q1 , Q2
0
4
(2)
Z = Q2 = 117
1
Reduced gradient
2
(2)
= 617
gR = Z T g(2) = 117 [4, 1]
2
Reduced Hessian
2 0
4
(2)
1
T
GR = Z GZ = 17 [4, 1]
= 24
17
0 8
1
which is positive definite.
64
(2) <0
aT
i s
= min{1,
=
7
3 }
=1
1
1
4
+
1
14
=
0
0
#
2
65 8 1 + 65
1+
=I
8 1 + 65
64
1
8
1
8
1+ 65
1
= 65+ 65
= 65
8 1
8 1
T
65 0
1
Q(5) Q(5) = 65
=I
0 65
h
iT
R
R
(5)
(5)
(5)
(5)T
A =Q
= Q1 , Q2
0
0
8
(5)
Z = Q2 = 165
1
Reduce gradient
0
(5)
= gR = Z T g(5) = 165 [8, 1]
4
= 465
"
(3)
0
Although (3) = 0 0, the indefiniteness of G may mean x(3) is not
0
optimal. Try to remove constraint 1. (4) = .
G indefinite. Try to find descent direction of negative curvature moving into
interiorof feasible
region from x(3) . if no such vector exists, then x(3) is optimal.
0
x(4) =
,
0
0
0
g(4) =
, c(4) = 1
0
4
T (4)
We want s such that s g 0, sT Gs < 0 and aT1 s > 0
o.k.
as g(4) =0
2 0
s1
[s1 , s2 ]
= 2s21 8s22 < 0
0 8
s2
aT1 s = s1 4s2 > 0
0
Try s(4) =
1
Line search :
i
(4) = min{ sTcc
: sT ci < 0}
i
(4)
bi aT
i x
(4)
aT
i s
1
2
1
65+ 65
= 1 = 0
= min{
Q(5) = I 1 hhT
4
8
1 + 65
h=
8
h
i
2
2
1
= 2 khk = 21 1 + 65 + 64 = 65 + 65
b aT x(2)
(2) = min 1, min { i aT si (2) }
i
iI(2)
(5)
sR = GR gR = 13
24 65 = 6 65
The search direction is
8 13
8
8
(5)
1
13
1
(5)
s = ZsR = 65
= 390
= 30
6 65
1
1
1
1
(5)
= min{1, 8 } = 1
1
65
30
(6) = {3}
65
(6)
3
=
1
15
1
15
8
64
=
1
8
(6)
hi = li xi 0, i = r + n + 1, ..., r + 2n
(6)
Let x(k) be a current iterate, (k) , (k) an approximation of the optimal Lagrange multipliers, B (k) a positive definite approximation of the Hessian matrix
of the lagrangian function
8
15 .
4
8
is optimal.
min f (x)
subject to
AT x b
Necessary Conditions
AT x b
Z T g (x ) = 0
i 0, i A
Z T G (x ) Z ..is positivesemidefinite
L (x, , ) = f (x)
m
X
i=1
i gi (x)
r+2n
X
i hi (x)
(7)
i=1
Sufficient Conditions
AT x b
Z T g (x ) = 0
i > 0, i A
Z T G (x ) Z..positivedefinite.
Linearizing the nonlinear constraints (2) and (3), and minimizing a quadratic
approximation of the lagrangian function (7), we obtain a sub-problem of the
form
T
1
(8a)
min dT B (k) d + f x(k) d
2
subject to
The Sequential Quadratic Programming Algorithm
Sequential quadratic programming methods for nonlinear constrained opti
T
mization were developed mainly by Han [S.P.Han, Superlinearly convergent
gi x(k) d + gi x(k) = 0, i = 1, ..., m
(8b)
variable metric algorithms for general nonlinear programming problems, Mathimatical Programming 11 (1976) 263; S.P.Han, A globally convergent method
T
for nonlinear programming, J. of Optimization Theory and Applications, 22
hi x(k) d + hi x(k) , i = 1, ..., r
(8c)
(1977) 297 ] and Powell [M.J.D.Powell, A fast algorithm for nonlinearly constrained optimization calculations, in : Numerical Analysis, ed, G.A.Watson,
(k)
(k)
li xi di ui xi , i = 1, ..., n
(8d)
Lecture Notes in Mathematics, Val. 630 (Springer-Verlag, Berlin-HeidelbergNew York, 1978; M.J.D.Powell, The convergence of variable metric methods for
Let d(k) be the solution of (8). Introduce the corresponding Lagrangian
nonlinearly constrained optimization calculations, in Nonlinear Programming
(k)
3, ed. O.L.Mangasarian, R.R.Meyer and S.M.Robinson (Academic Press, New function L :
York, 1978)], based on the initial work of Wilson [R.B.Wilson, A simplicial al
T
1
gorithm for concave programming, Ph.D. Thesis, Graduate School of Business
L(k) = dT B (k) d + f x(k) d
2
Administration, Harvard University, Boston (1963)].
Consider the constrained nonlinear optimization problem
m
T
X
i {gi x(k) d + gi x(k) }
min f (x)
(1)
i=1
subject to
gi (x) = 0, i = 1, ..., m
(2)
hi (x) 0, i = 1, ..., r
(3)
li xi ui , i = 1, ..., n
(4)
i=1
r
T
X
i {hi x(k) d + hi x(k) }
n
X
(k)
r+1 {li xi di }
i=1
n
X
(k)
r+n+i {di ui xi }
(9)
i=1
(k)
(k) (k)
(k)
+ f x
m
X
(k)
i gi
(k)
i=1
r
n
n
X
X
X
(k)
(k)
i h x(k) +
r+i ei
r+n+i ei = 0
i=1
i=1
(10)
i=1
u(k)
(k)
..
(k)
m
=
(k)
1
..
.
(k)
r+2n
(11)
(12)
(13)
v(k)
u(k) v(k)
Since the line search may depend on the approximation v(k) of the optimal
lagrange multipliers of (7), we update v(k) simultaneneously by
v(k+1) = v(k) + k u(k) v(k)
(14)
subprobem (8) as it stands. It is possible that the feasible region of (8) will be
empty although the original problem (1)-(4) is solvable. The second draw is the
recalculation of gradients of all constraints at each iteration, although some of
them might be inactive at an optimal solution, i.e., locally redundant.
To avoid both disadvantages, an additional variable and an active set strategy are introduced, leading to the modified subproblem
T
min 21 dT B (k) d + f x(k) d + 12 k 2
(16a)
subject to
T
gi x(k) d + (1 ) gi x(k) = 0
i Jk
(16b)
(k) T
(k)
hi x
d + (1 ) hi x
0
k(j) T
(k)
hi x
d + (1 ) hi x
0, i Kk
(16c)
l x(k) d u x(k)
(16d)
d <n
(16e)
<
(16f)
where
(k)
Jk = {1, ..., m} {i : hi x(k) or vi > 0}
and
(k)
Kk = {i = 1, ..., r}\{i : hi x(k) or vi > 0}.
Here, we
have
(k)
(k)
(k)
v = v1 , ..., vr+2n
and is a user-provided tolerance. The index k(j) indicates gradients
which have been calculated in previous iterations. The term k is an additional
penalty parameter designed to reduce the influence of on the solution of (16).
It is easy to see that the point
d(0) = 0,
0 = 1
satisfies the constraints of (16) and can also be used as a feasible starting point
for a quadratic programming algorithm.
v
h
(x)
r
h
(x)
, if hi (x) vrii
i
i
i
i
2
m+2n
P
(15)
2
i=m+1
vi
1 vi
if
hi (x) > ri
2 ri ,
proposed by Schittkowski. The penalty parameter r(k) is updated by a suitable
rule to guarantee a descent direction d(k) with respect to the chosen merit
function. However, we can not always implement the quadratic programming
67