5 Numerical Methods For Unconstrained Optimization PDF

Numerical Methods for
Unconstrained Optimization
Cheng-Liang Chen
PSE
LABORATORY
Department of Chemical Engineering
National TAIWAN University
Chen CL 1
Analytical vs. Numerical ?
In Analytical Methods, we write necessary
conditions and solve them (analytical or numerical
?) for candidate local minimum designs
Some Diculties:
Number of design variables in constraints can be large
Functions for the design problem can be highly nonlinear
In many applications, cost and/or constraint functions can be
implicit in terms of design variables
Numerical Methods: estimate an initial design and
improve it until optimality conditions are satised
Chen CL 2
Unconstrained Optimization
Chen CL 3
General Concepts Related to
Numerical Algorithms
A General Algorithm
Current estimate: x
(k)
k = 0, 1,
Subproblem 1: d
(k)
: feasible search direction
Subproblem 2:
k
: (positive scalar) step size
New estimate: x
(k+1)
= x
(k)
+
k
d
(k)
= x
(k)
+ x
(k)
Chen CL 4
Chen CL 5
Descent Step Idea
current
estimate
..
f(x
(k)
) >
new
estimate
..
f(x
(k+1)
)
= f(x
(k)
+
k
d
(k)
)
f(x
(k)
) +f
T
(x
(k)
)
__
x
(k)
+
k
d
(k)
_
x
(k)
_
= f(x
(k)
) +
k
c
(k)
d
(k)
. .
<0
f
T
(x
(k)
)d
(k)
= c
(k)
d
(k)
< 0 : descent condition
Angle between c
(k)
and d
(k)
must be between 90
o
and 270
o
Chen CL 6
Example: check the descent condition
f(x) = x
2
1
x
1
x
2
+ 2x
2
2
2x
1
+ e
(x
1
+x
2
)
Verify d
1
= (1, 2), d
2
= (1, 0) at (0, 0) are descent directions or
not
c =
_
_
2x
1
x
2
2 + e
(x
1
+x
2
)
x
1
+ 4x
2
+ e
(x
1
+x
2
)
_
_
(0,0)
=
_
_
1
1
_
_
c d
1
=
_
1 1
_
_
_
1
2
_
_
= 1 + 2 = 1 > 0 (not a descent dir.)
c d
2
=
_
1 1
_
_
_
1
0
_
_
= 1 + 0 = 1 < 0 (a descent dir.)
Chen CL 7
One-Dimensional Minimization:
Reduction to A Function of Single Variable
Assume: a descent direction has been found
f(x) = f(x
(k)
+ d
(k)
)
f(x
(k)
) + f
T
(x
(k)
)d
(k)
. .
=cd<0
=

f()
f() < f(0) = f(x
(k)
) small move reducing f
f
(0) = c
(k)
d
(k)
< 0 d should be a descent direction
Chen CL 8
Analytical Method to Compute Step Size
d
(k)
is a descent direction > 0
df(
k
)
d
= 0,
df
2
(
k
)
d
2
> 0
0 =
df(x
(k+1)
)
d
=
df(x
(k+1)
)
dx
dx
(k+1)
d
= f
T
(x
(k+1)
)
. .
c
(k+1)
T
d
(k)
Gradient of the cost function at NEW point, c
(k+1)
,
is orthogonal to the current search direction, d
(k)
Chen CL 9
Example: analytical step size determination
f(x) = 3x
2
1
+ 2x
1
x
2
+ 2x
2
2
+ 7
d
(k)
= (1, 1) at x
(k)
= (1, 2)
c
(k)
= f(x
(k)
) =
_
_
6x
1
+ 2x
2
2x
1
+ 4x
2
_
_
x
(k)
=
_
_
10
10
_
_
c
(k)
d
(k)
=
_
10 10
_
_
_
1
1
_
_
= 20 < 0
x
(k+1)
=
_
_
1
2
_
_
+
_
_
1
1
_
_
=
_
_
1
2
_
_
f(x
(k+1)
) = 3(1 )
2
+ 2(1 )(2 ) + 2(2 )
2
+ 7
= 7
2
20 + 22 f()
Chen CL 10
NC:
df
d
= 14
k
20 = 0
k
=
10
7
d
2
f
d
2
= 14 > 0
x
(k+1)
=
_
_
1
2
_
_
+ (
10
7
)
_
_
1
1
_
_
=
_
_
3/7
4/7
_
_
f(x
(k+1)
) =
54
7
< 22 = f(x
(k)
)
Chen CL 11
Numerical Methods to Compute Step Size
Most one-dimensional search methods work for only
unimodal functions
(work for
= 0 =
u
,)
(
u
interval of uncertainty)
Chen CL 12
Unimodal Function
Unimodal function: f(x) is one unimodal function if
x
1
< x
2
< x
implies f(x
1
) > f(x
2
), and
x
> x
3
> x
4
implies f(x
3
) < f(x
4
)
Chen CL 13
Unimodal Function
Outcome of two experiments
x
[0, 1], 0 < x

1
< x
2
< 1
f
1
< f
2
x
[0, x
2
]
f
1
> f
2
x
[x
1
, 1]
f
1
= f
2
x
[x
1
, x
2
]
Chen CL 14
Equal Interval Search
To reduce successively the interval of uncertainty, I,
to a small acceptable value
I =
u
, (
= 0)
Evaluate the function at , 2, 3,
If f((q + 1)) < f(q) then continue
If f((q + 1)) > f(q) then
= (q 1),
u
= (q + 1)
,
u
]
Chen CL 15
Chen CL 16
Equal Interval Search: Example
f(x) = x(x1.5), x
[0, 1] x
[x
7
, x
8
] = [0.7, 0.8]
i 1 2 3 4 5 6 7 8 9
x
i
.1 .2 .3 .4 .5 .6 .7 .8 .9
f(x
i
) .14 .26 .36 .44 .50 .54 .56 .56 .54
Chen CL 17
Equal Interval Search:
Example
f() = 2 4 + e
= 0.5
= 0.001
No. Trial step Function
1 0.000000 3.000000 = 0.5
2 0.500000 1.648721
3
1.000000 0.718282
4 1.500000 0.481689
5
u
2.000000 1.389056
6 1.050000 0.657651 start
7 1.100000 0.604166 from
8 1.150000 0.558193
= 1.0
9 1.200000 0.520117 = 0.05
10 1.250000 0.490343
11 1.300000 0.469297
12
1.350000 0.457426
13 1.400000 0.455200
14
u
1.450000 0.463115
15 1.355000 0.456761 start
16 1.360000 0.456193 from
17 1.365000 0.455723
= 1.35
18 1.370000 0.455351 = 0.005
19 1.375000 0.455077
20
1.380000 0.454902
21 1.385000 0.454826
22
u
1.390000 0.454850
23 1.380500 0.454890 start
24 1.381000 0.454879 from
25 1.381500 0.454868
= 1.38
26 1.382000 0.454859 = 0.0005
27 1.382500 0.454851
28 1.383000 0.454844
29 1.383500 0.454838
30 1.384000 0.454833
31 1.384500 0.454829
32 1.385000 0.454826
33 1.385500 0.454824
34
1.386000 0.454823
35 1.386500 0.454823
36
u
1.387000 0.454824
37 1.386500 0.454823
Chen CL 18
Equal Interval Search: 3 Interior Points
x
[a, b] three tests x

1
, x
0
, x
2
three possibilities
Chen CL 19
Equal Interval Search: 2 Interior Points
a
=
+
1
3
(
u
) =
1
3
(
u
+ 2
b
=
+
2
3
(
u
) =
1
3
(2
u
+
)
Case 1: f(
a
) < f(
b
)
<
<
b
Case 2: f(
a
) > f(
b
)
a
<
<
u
I
=
2
3
I : reduced interval of uncertainty
Chen CL 20
Golden Section Search
Question of Equal Interval Search (n = 2):
known midpoint is not used in next iteration
Solution: Golden Section Search
Fibonacci Sequence:
F
0
= 1; F
1
= 1; F
n
= F
n1
+ F
n2
, n = 2, 3,
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,
F
n
F
n1
1.618,
F
n1
F
n
0.618 as n
Chen CL 21
Initial Bracketing of Minimum
Starting at = 0,
evaluate
q
=
q
j=0
(1.618)
j
=
q1
+ (1.618)
q
, q = 0, 1, 2,
q = 0;
0
=
q = 1;
1
= +
0
..
1.618
. .
1.618(
0
0)
= 2.618
q = 2;
2
= 2.618 +
1
..
1.618
2
. .
1.618(
1
0
)
= 5.236
q = 3;
3
= 5.236 +
2
..
1.618
3
. .
1.618(
2
1
)
= 9.472
.
.
.
.
.
.
Chen CL 22
If f(
q2
) > f(
q1
) and
f(
q1
) < f(
q
)
Then
q2
<
<
q
u
=
q
=
q
j=0
(1.618)
j
=
q2
=
q2
j=0
(1.618)
j
I =
u

= (1.618)
q
. .
q1
+(1.618)
q1
. .
q1
q2
= 2.618(1.618)
q1
Chen CL 23
Reduction of Interval of Uncertainty
Given
u
,
I =
u
Select
a
,
b
s.t.
u

a
= I,
a

= (1 )I
= I,
u

b
= (1 )I
Suppose f(
b
) > f(
a
)
[
b
,
u
], delete [
b
,
u
]

b
=
a
,
u
=
b
, I
= I
b
= (1 )I
Chen CL 24
Reduction of Interval of Uncertainty
I
= I, (1 )I = I
= (I)

2
+ 1 = 0
=
1+
5
2
= 0.618 =
1
1.618
q1
..
q2
..
= 0.382I,
u
..
q

a
..
q1
= 0.618I = (1.618) 0.382I
. .
q1
q2
=

q
q1
q1
q2
=
0.618I
0.382I
= 1.618
ratio of increased trial step size is 1.618
Chen CL 25
Algorithm
Step 1: choose q,
=
q2
,
u
=
q
, I
Step 2: =
+ 0.382I,
b
=
+ 0.618I, f(
a
), f(
b
)
Step 3: compare f(
a
), f(
b
), go to Step 4, 5, or 6
Step 4: if f(
a
) < f(
b
)
<
<
b
u
=
b
,
b
=
a
,
a
=
+ 0.382(
), go to Step 7
Step 5: if f(
a
) > f(
b
)
a
<
<
u
=
a
,
u
=
u
,
a
=
b
,
b
=
+ 0.618(
), go to Step 7
Step 6: if f(
a
) = f(
b
)
a
<
<
b
=
a
,
u
=
b
, return to Step 2
Step 7: if I
<
u
+
2
and Stop; otherwise return to Step 3
Chen CL 26
Example
f() = 2 4 + e
= 0.5
= 0.001
Chen CL 27
Golden Section Search: Example
f(x) = x(x 1.5)
No. Trial xs Fcn value
1 0.000000 0.000000
2 0.250000 0.312500
3 0.500000 0.500000 X
= 0.5
4 0.750000 0.562500 f
min
5 1.000000 0.500000 X
u
= 1.0
Reducing Interval of Uncertainty
No. X
X
a
X
b
X
u
I
1 0.5000000 0.6910000 0.8090000 1.0000000 0.50000000
0.5000000 0.5590190 0.5590190 0.5000000
2 0.6910000 0.7360760 0.7639240 0.8090000 0.11800000
0.5590190 0.5623061 0.5623061 0.5590190
3 0.7360760 0.7469139 0.7532861 0.7639240 0.02784800
0.5623061 0.5624892 0.5624892 0.5623061
4 0.74691393 0.74922448 0.75077551 0.75328606 0.00657212
0.562489202 0.562499399 0.562499399 0.562489202
5 0.7492244890 0.7498169790 0.7501830210 0.7507755110 0.001551022
0.562493399 0.562499967 0.562499967 0.562499399
6 0.7498467900 0.7499566900 0.7500431210 0.7501830210 0.000366231
0.562499966 0.562499998 0.562499998 0.562499967
Chen CL 28
Polynomial Interpolation
Quadratic Curve Fitting
q() = a
0
+ a
1
+ a
2
2
(approximated quadratic function)
f(
) = q(
) = a
0
+ a
1
+ a
2
f(
i
) = q(
i
) = a
0
+ a
1
i
+ a
2
2
i
f(
u
) = q(
u
) = a
0
+ a
1
u
+ a
2
2
u
a
2
=
1
i
_
f(
u
) f(
f(
i
) f(
_
a
1
=
f(
i
) f(
a
2
(
i
+
)
a
0
= f(
) a
1
a
2
dq()
d

= a
1
+ 2a
2
= 0
=
a
1
2a
2
if
d
2
q
d
2
= 2a
2
> 0
Chen CL 29
Computational Algorithm:
Step 1: locate initial interval of uncertainty (
,
u
)
Step 2: select
<
i
<
u
f(
i
)
Step 3: compute a
0
, a
1
, a
2
, , f( )
Step 4:
f(
i
) < f( ) f(
i
) > f( )
i
<

[ ,
u
]

,
i
,
,
i
]

i
, ,
u
<
i
, ]
,
i
,
u
[
i
,
u
]

, ,
i
Step 5: Stop if two successive estimates of minimum point of
f() are suciently close. Otherwise delete primes on
i
,
u
and return to Step 2
Chen CL 30
Example:
f() = 2 4 + e
= 0.5
= 0.5
i
= 1.309017
u
= 2.618034
f(
) = 1.648721 f(
i
) = 0.466464 f(
u
) = 5.236610
a
2
=
1
1.30902
_
3.5879
2.1180
1.1823
0.80902
= 2.410
a
1
=
1.1823
0.80902
(2.41)(1.80902) = 5.821
a
0
= 1.648271 (5.821)(0.50) 2.41(0.25) = 3.957
= 1.2077 <
i
f( ) = 0.5149 > f(
i
)

= = 1.2077
u
=
u
= 2.618034,
i
=
i
= 1.309017
= 1.2077
i
= 1.309017
u
= 2.618034
f(
) = 0.5149 f(
i
) = 0.466464 f(
u
) = 5.236610
a
2
= 5.3807 a
1
= 7.30547 a
0
= 2.713
= 1.3464 f( ) = 0.4579
Chen CL 31
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Conjugate Directions
Let A be an n n symmetric matrix.
A set of n vectors (directions) {S
i
} is said to be
A-conjugate if
S
T
i
AS
j
= 0 for i, j = 1, , n; i = j
Note: orthogonal directions are a special case of
conjugate directions (A = I)
Chen CL 32
Quadratically Convergent Method
If a minimization method, using exact arithmetic, can
nd the minimum point in n steps while minimizing a
quadratic function in n variables, the method is called
a quadratically convergent method
Theorem: Given a quadratic function of n variables
and two parallel hyperplanes 1 and 2 of dimensions
k < n. Let the constrained stationary points of the
quadratic function in the hyperplanes be X
1
and X
2
,
respectively. Then the line joining X
1
and X
2
is
conjugate to any line parallel to the hyperplanes.
Chen CL 33
Proof:
Q(X) =
1
2
X
T
AX +B
T
X + C
Q(X) = AX +B (n 1)
search from a along S X
1
(stationary pt)
search from b along S X
2
S orthogonal to Q(X
1
) and Q(X
2
)
S
T
Q(X
1
) = S
T
AX
1
+S
T
B = 0
S
T
Q(X
2
) = S
T
AX
2
+S
T
B = 0
S
T
[Q(X
1
) Q(X
2
)] = S
T
A(X
1
X
2
) = 0
Chen CL 34
Meaning: If X
1
and X
2
are the minima of Q obtained
by searching along the direction S from two dierent
starting points X
a
and X
b
, respectively,
the line (X
1
X
2
) will be conjugate to S
Chen CL 35
Theorem:
If a quadratic function
Q(X) =
1
2
X
T
AX +B
T
X + C
is minimized sequentially,
once along each direction
of a set of n mutually
conjugate directions, the
minimum of the function Q
will be found at or before
the nth step irrespective of
the starting point
Proof:
Q(X
) = B +AX
= 0
Let X
= X
1
+
n
j=1
j
S
j
S
j
: conjugate directions to A
0 = B +AX
1
+A
_
_
n
j=1
j
S
j
_
_
0 = S
T
i
(B +AX
1
)+
S
T
i
A
_
_
n
j=1
j
S
j
_
_
= (B +AX
1
)
T
S
i
+
i
S
T
i
AS
i

i
=
(B +AX
1
)
T
S
i
S
T
i
AS
i
Chen CL 36
Note: X
i+1
= X
i
+
i
S
i
, i = 1, , n
i
is found by minimizing Q(
i
S
i
) so that
0 = S
T
i
Q(X
i+1
)
Q(X
i+1
) = B +AX
i+1
= B +A(X
i
+
i
S
i
)
0 = S
T
i
Q(X
i+1
) = S
T
i
{B +A(X
i
+
i
S
i
)}
= (B +AX
i
)
T
S
i
+
i
S
T
i
S
i

i
=
(B +AX
i
)
T
S
i
S
T
i
AS
i
X
i
= X
1
+
i1
j=1
j
S
j
X
i
T
AS
i
= X
1
T
AS
i
+
i1
j=1
j
S
j
T
AS
i
= X
1
T
AS
i

i
= (B +AX
i
)
T
S
i
S
T
i
AS
i
= (B +AX
1
)
T
S
i
S
T
i
AS
i
=
i
Chen CL 37
Powells Conjugate Directions: Example
f(x
1
, x
2
) = 6x
2
1
+ 2x
2
2
6x
1
x
2
x
1
2x
2
=
_
1 2
_
_
x
1
x
2
_
+
1
2
_
x
1
x
2
_
_
12 6
6 4
__
x
1
x
2
_
if S
1
=
_
1
2
_
X
1
=
_
0
0
_
S
T
1
AS
2
=
_
1 2
_
_
12 6
6 4
__
s
1
s
2
_
=
_
0 2
_
_
s
1
s
2
_
= 0 S
2
=
_
1
0
_
1
=
1 2
1
2
1 2
12 6
6 4
1
2
=
5
4
X
2
= X
1
+
1
S
1
=
_
0
0
_
+
5
4
_
1
2
_
=
_
5/4
5/2
_
Chen CL 38
2
=
1 2
1
0
1 0
12 6
6 4
1
0
=
1
12
X
3
= X
2
+
2
S
2
=
_
5/4
5/2
_
+
1
12
_
1
0
_
=
_
4/3
5/2
_
= X
(?)
Chen CL 39
Powells Algorithm
Chen CL 40
Progress of Powells Method
u
n
; u
1
, u
2
, , u
n1
, u
n
;
S
(1)
; u
2
, , u
n1
, u
n
, S
(1)
;
S
(2)
; u
3
, , u
n
, S
(1)
, S
(2)
;

S
(n1)
; u
n
, S
(1)
, S
(2)
, , S
(n1)
(u
n
, S
(1)
), (S
(1)
, S
(2)
),
u
n
, S
(1)
, S
(2)
,
are A-conjugate
Chen CL 41
Powells Conjugate Directions: Example
Min: f(x
1
, x
2
) = x
1
x
2
+ 2x
2
1
+ 2x
1
x
2
+ x
2
2
X
1
= [0 0]
T
Chen CL 42
Cycle 1: Univariate Search
along u
2
: f(X
1
+ u
2
) = f(0, ) =
2
df
d
= 0
=
1
2
X
2
= X
1
+
u
2
=
_
_
0
0.5
_
_
along u
1
: f(X
2
u
1
) = f(, 0.5) = 2
2
2 0.25
df
d
= 0
=
1
2
X
3
= X
2
u
1
=
_
_
0.5
0.5
_
_
along u
2
: f(X
3
+ u
2
) = f(0.5, 0.5 + ) =
2
0.75
df
d
= 0
=
1
2
X
4
= X
1
+
u
2
=
_
_
0.5
1
_
_
Chen CL 43
Cycle 2: Pattern Search
S
(1)
= X
4
X
2
=
_
_
0.5
1
_
_
_
_
0
0.5
_
_
=
_
_
0.5
0.5
_
_
f(X
4
+ S
(1)
) = f(0.5 0.5, 1 + 0.5)
= 0.25
2
0.5 1
df
d
= 0
= 1.0
X
5
= X
4
+
S
(1)
=
_
_
1.0
1.5
_
_
Chen CL 44
Simplex Method
Chen CL 45
Simplex Method
Chen CL 46
Simplex Method
Chen CL 47
Simplex Method
Chen CL 48
Properties of Gradient Vector
f =
_
_
f
x
1
.
.
.
f
x
n
_
_
= c
c
(k)
= c(x
(k)
) = f(x
(k)
) =
_
f(x
(k)
)
x
i
_
Chen CL 49
Property 1: The gradient vector c of a function
f(x
1
, , x
n
) at point x
= (x
1
, , x
n
) is orthogonal
(normal) to the tangent plane for the surface
f(x
1
, , x
n
) = constant.
C is any curve on the surface through x
T is a vector tangent to curve C at x
c T = 0
Chen CL 50
Proof:
s : any parameter along C
T =
_
_
x
1
s

x
n
s
_
_
x=x
(a unit tangent vector along C at x
)
f(x) = constant
df
ds
= 0
0 =
df
ds
=
f
x
1
x
1
s
+ +
f
x
n
x
n
s
= c
T
T = c T
Chen CL 51
Property 2: Gradient represents a direction of
maximum rate of increase for f(x) at x
Proof:
u : a unit vector in any direction not tangent to C
t : a parameter along u
df
dt
= lim
0
f(x + u)
f(x + u) = f(x) +
_
u
1
f
x
1
+ + u
n
f
x
n
_
+ O(
2
)
f(x + u) f(x) =
n
i=1
u
i
f
x
i
+ O(
2
) (
1
)
df
dt
= lim
0
f(x + u)
=
n
i=1
u
i
f
x
i
= c u = c
T
u
= ||c|| ||u|| cos (max rate of increase when = 0)
Chen CL 52
Property 3: The maximum rate of change of f(x) at
any point x
is the magnitude of the gradient vector

(max
df
dt
= ||c||)
u is in the direction of gradient vector for = 0
Chen CL 53
Verify Properties of Gradient Vector
f(x) = 25x
2
1
+ x
2
2
, x
(0)
= (0.6, 4)
f(x
(0)
) = 25
c = f(0.6, 4) =
_
f
x
1
f
x
2
_
=
_
50x
1
2x
2
_
=
_
30
8
_
C =
c
||c||
=
30
8
30
2
+8
2
=
_
0.966235
0.257663
_
t =
_
_
(25x
2
1
+x
2
2
=25)
s
1
(25x
2
1
+x
2
2
=25)
s
2
_
_
=
_
4
15
_
T =
t
||t||
=
4
15
(4)
2
+15
2
=
_
.257663
0.966235
_
Chen CL 54
Property 1: C T = 0
Slope of tangent: m
1
=
dx
2
dx
1
=
5x
1
1x
2
1
=
1
3.75
Slope of gradient: m
2
=
c
1
c
2
=
50x
1
2x
2
=
30
8
= 3.75
Property 2: choose arbitrary direction
D = (0.501034, 0.865430), = 0.1
x
(1)
C
= x
(0)
+ C =
_
0.6
4.0
_
+ 0.1
_
0.966235
0.257663
_
=
_
0.6966235
4.0257663
_
x
(1)
D
= x
(0)
+ D =
_
0.6
4.0
_
+ 0.1
_
0.501034
0.865430
_
=
_
0.6501034
4.0854300
_
f(x
(1)
C
) = 28.3389
f(x
(1)
D
) = 27.2566 < f(x
(1)
C
)
Property 3: C C = 1.00 > C D = 0.7071059
Chen CL 55
Steepest Descent Algorithm
Steepest Descent Direction
Let f(x) be a dierentiable function w.r.t. x. The
direction of steepest descent for f(x) at any point
is d = c
Steepest Descent Algorithm:
Step 1: a starting design x
(0)
, k = 0,
Step 2: c
(k)
= f(x
(k)
); stop if ||c
(k)
|| <
Step 3: d
(k)
= c
(k)
Step 4: calculate
k
to minimize f(x
(k)
+ d
(k)
)
Step 5: x
(k+1)
= x
(k)
+
k
d
(k)
, k = k + 1 Step 2
Chen CL 56
Notes:
d = c c d = ||c||
2
< 0
The successive directions of steepest
descent are normal to each other
d = c
d
(k)
d
(k+1)
= c
(k)
c
(k+1)
= 0
proof:
0 =
df(x
(k+1)
)
d
=
_
f(x
(k+1)
)
x
_
T
. .
c
(k+1)
T
x
(k+1)
. .
(x
(k)
+d
(k)
)
= c
(k+1)
T
d
(k)
= c
(k+1)
c
(k)
= d
(k+1)
d
(k)
Chen CL 57
Steepest Descent: Example
f(x
1
, x
2
) = x
2
1
+ x
2
2
2x
1
x
2
x
(0)
= (1, 0)
Step 1: x
(0)
= (1, 0), k = 0, ()
Step 2: c
(0)
= f(x
(0)
) = (2x
1
2x
2
, 2x
2
2x
1
)
= (2, 2); ||c
(0)
|| = 2
2 = 0
Step 3: d
(0)
= c
(0)
= (2, 2)
Step 4: to minimize f(x
(0)
+ d
(0)
) = f(1 2, 2)
f(1 2, 2) = (1 2)
2
+ (2)
2
2(1 2)(2)
= 16
2
8
1
= f()
df()
d
= 32 8 = 0
0
= 0.25
d
2
f()
d
2
= 32 > 0
Step 5:
x
(1)
= x
(0)
+
0
d
(0)
= (1 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)
c
(1)
= (0, 0) stop
Chen CL 58
Steepest Descent: Example
f(x
1
, x
2
, x
3
) = x
2
1
+ 2x
2
2
+ 2x
2
3
+ 2x
1
x
2
+ 2x
2
x
3
x
(0)
= (2, 4, 10)
x
= (0, 0, 0)
Step 1: k = 0, = 0.005, ( = 0.05, = 0.0001 for Golden)
Step 2: c
(0)
= f(x
(0)
) = (2x
1
+ 2x
2
, 4x
2
+ 2x
1
+ 2x
3
, 4x
3
+ 2x
2
)
= (12, 40, 48); ||c
(0)
|| =
4048 = 63.6 >

Step 3: d
(0)
= c
(0)
= (12, 40, 48)
Step 4: to minimize f(x
(0)
+ d
(0)
) by Golden
0
= 0.158718
Step 5:
x
(1)
= x
(0)
+
0
d
(0)
= (0.0954, 2.348, 2.381)
c
(1)
= (4.5, 4.438, 4.828); ||c
(1)
|| = 7.952 >
Note: c
(1)
d
(0)
= 0 (perfect line search)
Chen CL 59
Chen CL 60
Steepest Descent: Disadvantages
Slow to converge, especially when approaching the optimum
a large number of iterations
Information calculated at previous iterations is not used,
each iteration is started independent of others
Chen CL 61
Scaling of Design Variables
The steepest descent method converges in only one iteration for a
positive denite quadratic function with a unit condition number
of the Hessian matrix
To accelerate the rate of convergence
scale design variables such that
condition number of new Hessian matrix is unity
Chen CL 62
Example:
Min: f(x
1
, x
2
) = 25x
2
1
+ x
2
2
x
0
= (1, 1)
H =
_
50 0
0 2
_
let x = Dy D =
_
1
50
0
0
1
2
_
Min: f(y
1
, y
2
) =
1
2
_
y
2
1
+ y
2
2
_
y
0
= (
50,
2)
Chen CL 63
Chen CL 64
Example:
Min: f(x
1
, x
2
) = 6x
2
1
6x
1
x
2
+ 2x
2
2
5x
1
+ 4x
2
+ 2
H =
_
12 6
6 4
_
1,2
= 0.7889, 15.211 (eigenvalues)
v
1,2
= (0.4718, 0.8817), (0.8817, 0.4718)
let x = Qy Q =
_
v
1
v
2
_
=
_
0.4718 0.8817
0.8817 0.4718
_
Min: f(y
1
, y
2
) = 0.5(0.7889y
2
1
+ 15.211y
2
2
) + 1.1678y
1
+ 6.2957y
2
+ 2
let y = Dz D =
_
_
1
0.7889
0
0
1
15.211
_
_
Min: f(y
1
, y
2
) = 0.5(z
2
1
+ z
2
2
) + 1.3148z
1
+ 1.6142z
2
x
0
= (1, 2) z
= (1.3158, 1.6142)
x
= QDz
= (
1
3
,
2
3
)
Chen CL 65
Conjugate Gradient Method
Fletcher and Reeves (1964)
Steepest Descent: orthogonal at consecutive steps
converge but slow
Conjugate Gradient Method:
modify current steepest descent direction by adding a
scaled previous direction
cut diagonally through orthogonal steepest descent directions
Conjugate Gradient Directions: d
(i)
, d
(j)
orthogonal w.r.t. a symmetric and positive denite
matrix A
d
(i)
T
Ad
(j)
= 0
Chen CL 66
Conjugate Gradient Method: algorithm
Step 1: k = 0, x
(0)
d
(0)
= c
(0)
= f(x
(0)
)
Stop if ||c
(0)
|| < , otherwise go to Step 4
Step 2: c
(k)
= f(x
(k)
), Stop if ||c
(k)
|| <
Step 3: d
(k)
= c
(k)
+
k
d
(k1)
,
k
=
_
||c
(k)
||/||c
(k1)
||
2
Step 4: compute
k
= to minimize f(x
(k)
+ d
(k)
)
Step 5: x
(k+1)
= x
(k)
+ d
(k)
, k = k + 1, go to Step 2
Note:
Find the minimum in n iterations for positive denite quadratic
forms having n design variables
Inexact line search, non-quadratic forms
re-started every n + 1 iterations for computational stability
(x
(0)
= x
(n+1)
)
Chen CL 67
Example:
Min: f(x) = x
2
1
+ 2x
2
2
+ 2x
2
3
+ 2x
1
x
2
+ 2x
2
x
3
x
(0)
= (2, 4, 10)
c
(0)
= (12, 40, 48); ||c
(0)
|| = 63.6; f(x
(0)
) = 332.0
x
(1)
= (0.0956, 2.348, 2.381)
c
(1)
= (4.5, 4.438, 4.828); ||c
(1)
|| = 7.952; f(x
(1)
) = 10.75
1
=
_
||c
(1)
||/||c
(0)
||
2
= [7.952/63.3]
2
= 0.015633
d
(1)
= c
(1)
+
1
d
(0)
=
_
_
4.500
4.438
4.828
_
_
+ (0.015633)
_
_
12
40
48
_
_
=
_
_
4.31241
3.81268
5.57838
_
_
Chen CL 68
x
(2)
= x
(1)
+ d
(1)
=
_
_
0.0956
2.348
2.381
_
_
+
_
_
4.31241
3.81268
5.57838
_
_
Min: f(x
(1)
+ d
(1)
) = 0.3156
x
(2)
= (1.4566, 1.1447, 0.6205)
c
(2)
= (0.6238, 0.4246, 0.1926), ||c
(2)
|| = 0.7788
Note: c
(2)
d
(1)
= 0
Chen CL 69
Newton Method
A Second-order Method
x : current estimate of x
x + x (desired)
f(x + x) = f(x) +c
T
x +
1
2
x
T
Hx
NC:
f
x
= c +Hx = 0
x = H
1
c
x = H
1
c (modied)
Chen CL 70
Steps: (modied)
Step 1: k = 0; c
(0)
;
Step 2: c
(k)
i
=
f(x
(k)
)
x
i
, i = 1 n; Stop if ||c
(k)
|| <
Step 3: H(x
(k)
) =
_

2
f
x
i
x
j
_
Step 4: d
(k)
= H
1
c
(k)
or Hd
(k)
= c
(k)
Note: for computational eciency, a system of linear simultaneous eqns
is solved instead of evaluating the inverse of Hessian
Step 5: compute
k
= to minimize f(x
(k)
+ d
(k)
)
Step 6: x
(k+1)
= x
(k)
+ d
(k)
, k = k + 1, go to Step 2
Note: unless H is positive denite,
d
(k)
will not be that of descent for f
H > 0 c
(k)
T
d
(k)
=
k
c
(k)
T
H
1
c
(k)
. .
> 0 for positive H
< 0
Chen CL 71
Example:
f(x) = 3x
2
1
+ 2x
1
x
2
+ 2x
2
2
+ 7
x
(0)
= (5, 10); = 0.0001
c
(0)
= (6x
1
+ 2x
2
, 2x
1
+ 4x
2
) = (50, 50); ||c
(0)
|| = 50
2
H
(0)
=
_
6 2
2 4
_
, H
(0)
1
=
1
20
_
4 2
2 6
_
d
(0)
= H
1
c
(0)
=
1
20
_
4 2
2 6
__
50
50
_
=
_
5
10
_
x
(1)
= x
(0)
+ d
(0)
=
_
5
10
_
+
_
5
10
_
=
_
5 5
10 10
_
df
d
= 0 or f(x
(1)
) d
(0)
= 0
f(x
(1)
) =
_
6(5 5) + 2(10 10)
2(5 5) + 4(10 10)
_
=
_
50 50
50 50
_
f(x
(1)
) d
(0)
=
_
50 50 50 50
_
_
5
10
_
= 5(50 50) 10(50 50) = 0 = 1
Chen CL 72
x
(1)
=
_
5 5
10 10
_
=
_
0
0
_
c
(1)
=
_
50 50
50 50
_
=
_
0
0
_
Chen CL 73
Example:
f(x) = 10x
4
1
20x
2
1
x
2
+ 10x
2
2
+ x
2
1
2x
1
+ 5, x
(0)
= (1, 3)
c = f(x) = (40x
3
1
40x
1
x
2
+ 2x
1
2, 20x
2
1
+ 20x
2
)
H =
2
f(x) =
_
120x
2
1
40x
2
+ 2 40x
1
40x
1
20
_
Chen CL 74
Chen CL 75
Comparison of Steepest Descent, Newton,
Conjugate Gradient Methods
f(x) = 50(x
2
x
2
1
)
2
+ (2 x
1
)
2
x
(0)
= (5, 5) x
= (2, 4)
Chen CL 76
Chen CL 77
Chen CL 78
Newton Method
Advantage: quadratic convergent rate
Disadvantages:
Calculation of second-order derivatives at each iteration
A system of simultaneous linear equations needs to be solved
Hessian of the function may be singular at some iterations
Memoryless method: each iteration is started afresh
Not convergent unless Hessian remains positive denite and a
step size determination scheme is used
Chen CL 79
Marquardt Modication (1963)
d
(k)
= (H + I)
1
c
(k)
Far away solution point use Steepest Descent
Near the solution point use Newton Method
Step 1: k = 0; c
(0)
; ; (= 10000) (large)
Step 2: c
(k)
i
=
f(x
(k)
)
x
i
, i = 1 n; Stop if ||c
(k)
|| <
Step 3: H(x
(k)
=
_

2
f
x
i
x
j
_
)
Step 4: d
(k)
= (H +
k
I)
1
c
(k)
Step 5: if f(x
(k)
+d
(k)
) < f(x
(k)
), go to Step 6
Otherwise, let
k
= 2
k
and go to Step 4
Step 6: Set
k+1
= 0.5
k
, k = k + 1 and go to Step 2
Chen CL 80
Quasi-Newton Methods
Steepest Descent:
Use only 1st-order information poor rate of convergence
Each iteration is started with new design variables without using
any information from previous iterations
Newton Method:
Use 2nd-order derivatives quadratic convergence rate
Requires calculation of
n(n+1)
2
2nd-order derivatives !
DIculties if Hessian is singular
Not learning processes
Chen CL 81
Quasi-Newton Methods
Quasi Newton Methods, Update Methods:
Use rst-order derivatives to generate approximations for Hessian
combine desirable features of both steepest descent and Newtons
methods
Use information from previous iterations to speed up convergence
(learning processes)
Several ways to approximate (updated) Hessian or its inverse
Preserve properties of symmetry and positive deniteness
Chen CL 82
Davidon-Fletcher-Powell (DFP) Method
Davidon (1959), Fletcher and Powell (1963)
To approximate Hessian inverse using only rst
derivatives
x = H
1
c
Ac
A : nd A by using only 1
st
-order information
Chen CL 83
DFP Procedures: A H
1
Step 1: k = 0; c
(0)
, ; A
(0)
(= I, H
1
)
Step 2: c
(k)
= f(x
(k)
), Stop if ||c
(k)
|| <
Step 3: d
(k)
= A
(k)
c
(k)
Step 4: compute
k
= to minimize f(x
(k)
+ d
(k)
)
Step 5: x
(k+1)
= x
(k)
+
k
d
(k)
Step 6: update A
(k)
A
(k+1)
= A
(k)
+B
(k)
+C
(k)
B
(k)
=
s
(k)
s
(k)
T
s
(k)
y
(k)
C
(k)
=
z
(k)
z
(k)
T
y
(k)
z
(k)
s
(k)
=
k
d
(k)
(change in design)
y
(k)
= c
(k+1)
c
(k)
(change in gradient)
c
(k+1)
= f(x
(k+1)
) z
(k)
= A
(k)
y
(k)
Step 7: set k = k + 1 and go to Step 2
Chen CL 84
DFP Properties:
Matrix A
(k)
is always positive denite
always converge to a local minimum if > 0
d
d
f(x
(k)
+ d
(k)
)
=0
= c
(k)
T
A
(k)
c
(k)
< 0
When applied to a positive denite quadratic form,
A
(k)
converges to inverse of Hessian of the quadratic form
Chen CL 85
DFP Example:
f(x) = 5x
2
1
+ 2x
1
x
2
+ x
2
2
+ 7 x
(0)
= (1, 2)
1-1. x
(0)
= (1, 2); A
(0)
= I; k = 0, = 0.001
c
(0)
= (10x
1
+ 2x
2
, 2x
1
+ 2x
2
) = (14, 6),
1-2. ||c
(0)
|| =

14
2
+ 6
2
= 15.232 >
1-3. d
(0)
= c
(0)
= (14, 6)
1-4. x
(1)
= x
(0)
+ d
(0)
= (1 14, 2 6)
f(x
(1)
) = f() = 5(1 14)
2
+ 2(1 14)(2 6) + 2(2 6)
2
+ 7
df
d
= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)
+2(6)(2 6) = 0
= 0.0988,
d
2
f
d
2
= 2348 > 0
1-5. x
(1)
= x
(0)
+
0
d
(0)
= (1 14, 2 6) = (0.386, 1.407)
Chen CL 86
1-6. s
(0)
=
0
d
(0)
= (1.386, 0.593), c
(1)
= (1.406, 2.042)
y
(0)
= c
(1)
c
(0)
= (15.046, 3.958), z
(0)
= y
(0)
s
(0)
y
(0)
= 23.20, y
(0)
z
(0)
= 242.05
s
(0)
s
(0)
T
=
_
_
1.921 0.822
0.822 0.352
_
_
z
(0)
z
(0)
T
=
_
_
226.40 59.55
59.55 15.67
_
_
B
(0)
=
_
_
0.0828 0.0354
0.0354 0.0152
_
_
C
(0)
=
_
_
0.935 0.246
0.246 0.065
_
_
A
(1)
= A
(0)
+B
(0)
+C
(0)
=
_
_
0.148 0.211
0.211 0.950
_
_
Chen CL 87
2-2. ||c
(1)
|| =

1.046
2
+ 2.042
2
= 2.29 >
2-3. d
(1)
= A
(1)
c
(1)
= (0.586, 1.719)
2-4. x
(2)
= x
(1)
+ d
(1)
1
= 0.776 (minimize f(x
(1)
+ d
(1)
))
2-5. x
(2)
= x
(1)
+ d
(1)
= (0.386, 1.407) + (0.455, 1.334)
= (0.069, 0.073)
Chen CL 88
2-6. s
(1)
=
1
d
(1)
= (0.455, 1.334), c
(2)
= (0.836, 0.284)
y
(1)
= c
(2)
c
(1)
= (1.882, 1.758)
z
(1)
= A
(1)
y
(1)
= (0.649, 2.067)
s
(1)
y
(1)
= 3.201, y
(1)
z
(1)
= 4.855
s
(1)
s
(1)
T
=
_
_
0.207 0.607
0.607 1.780
_
_
z
(1)
z
(1)
T
=
_
_
0.421 1.341
1.341 4.272
_
_
B
(1)
=
_
_
.0647 0.19
0.19 0.556
_
_
C
(1)
=
_
_
.0867 0.276
0.276 0.880
_
_
A
(2)
= A
(1)
+B
(1)
+C
(1)
=
_
_
0.126 0.125
0.125 0.626
_
_
Chen CL 89
Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Method
Direct update Hessian using only rst derivatives
x = H
1
c
Hx = c
Ax c
A : nd A by using only 1
st
-order information
Chen CL 90
BFGS Procedures:
Step 1: k = 0; c
(0)
, ; H
(0)
(= I, H)
Step 2: c
(k)
= f(x
(k)
), Stop if ||c
(k)
|| <
Step 3: solve H
(k)
d
(k)
= c
(k)
to obtain d
(k)
Step 4: compute
k
= to minimize f(x
(k)
+ d
(k)
)
Step 5: x
(k+1)
= x
(k)
+
k
d
(k)
Step 6: update H
(k)
H
(k+1)
= H
(k)
+D
(k)
+E
(k)
D
(k)
=
y
(k)
y
(k)
T
y
(k)
s
(k)
E
(k)
=
c
(k)
c
(k)
T
c
(k)
d
(k)
s
(k)
=
k
d
(k)
(change in design)
y
(k)
= c
(k+1)
c
(k)
(change in gradient)
c
(k+1)
= f(x
(k+1)
)
Step 7: set k = k + 1 and go to Step 2
Chen CL 91
BFGS Example:
f(x) = 5x
2
1
+ 2x
1
x
2
+ x
2
2
+ 7 x
(0)
= (1, 2)
1-1. x
(0)
= (1, 2); H
(0)
= I; k = 0, = 0.001
c
(0)
= (10x
1
+ 2x
2
, 2x
1
+ 2x
2
) = (14, 6),
1-2. ||c
(0)
|| =

14
2
+ 6
2
= 15.232 >
1-3. d
(0)
= c
(0)
= (14, 6)
1-4. x
(1)
= x
(0)
+ d
(0)
= (1 14, 2 6)
f(x
(1)
) = f() = 5(1 14)
2
+ 2(1 14)(2 6) + 2(2 6)
2
+ 7
df
d
= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)
+2(6)(2 6) = 0
= 0.0988,
d
2
f
d
2
= 2348 > 0
1-5. x
(1)
= x
(0)
+
0
d
(0)
= (1 14, 2 6) = (0.386, 1.407)
Chen CL 92
1-6. s
(0)
=
0
d
(0)
= (1.386, 0.593), c
(1)
= (1.406, 2.042)
y
(0)
= c
(1)
c
(0)
= (15.046, 3.958)
y
(0)
s
(0)
= 23.20, c
(0)
d
(0)
= 232.0
y
(0)
y
(0)
T
=
_
_
226.40 59.55
59.55 15.67
_
_
c
(0)
c
(0)
T
=
_
_
196 84
84 36
_
_
D
(0)
=
_
_
9.760 2.567
2.567 0.675
_
_
E
(0)
=
_
_
0.845 0.362
0.362 0.155
_
_
H
(1)
= H
(0)
+D
(0)
+E
(0)
=
_
_
9.915 2.205
2.205 0.520
_
_
Chen CL 93
2-2. ||c
(1)
|| =

1.046
2
+ 2.042
2
= 2.29 >
2-3. c
(1)
= H
(1)
d
(1)
d
(1)
= (17.20, 76.77)
2-4. x
(2)
= x
(1)
+ d
(1)
1
= 0.018455 (minimize f(x
(1)
+ d
(1)
))
2-5. x
(2)
= x
(1)
+
1
d
(1)
= (0.0686, 0.0098)
Chen CL 94
2-6. s
(1)
=
1
d
(1)
= (0.317, 1.417), c
(2)
= (0.706, 0.157)
y
(1)
= c
(2)
c
(1)
= (0.340, 2.199)
y
(1)
s
(1)
= 3.224, c
(1)
d
(1)
= 174.76
y
(1)
y
(1)
T
=
_
_
0.1156 0.748
0.748 4.836
_
_
c
(1)
c
(1)
T
=
_
_
1.094 2.136
2.136 4.170
_
_
D
(1)
=
_
_
0.036 0.232
0.232 1.500
_
_
E
(1)
=
_
_
.0063 .0122
.0122 .0239
_
_
H
(2)
= H
(1)
+D
(1)
+E
(1)
=
_
_
9.945 1.985
1.985 1.996
_
_

5 Numerical Methods For Unconstrained Optimization PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

5 Numerical Methods For Unconstrained Optimization PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Numerical Methods for

[0, 1], 0 < x

[a, b] three tests x

T is a vector tangent to curve C at x

(a unit tangent vector along C at x

is the magnitude of the gradient vector

4048 = 63.6 >

Das könnte Ihnen auch gefallen