Beruflich Dokumente
Kultur Dokumente
Chapter 2
2.1 Motivation
Most practical optimization problems involve many variables, so the study of single variable mini-
mization may seem academic. However, the optimization of multivariable functions can be broken
into two parts: 1) finding a suitable search direction and 2) minimizing along that direction. The
second part of this strategy, the line search, is our motivation for studying single variable mini-
mization.
Consider a scalar function f : R R that depends on a single independent variable, x R.
Suppose we want to find the value of x where f (x) is a minimum value. Furthermore, we want to
do this with
low computational cost (few iterations and low cost per iteration),
Often the computational effort is dominated by the computation of f and its derivatives so some
of the requirements can be translated into: evaluate f (x) and df / dx as few times as possible.
3. Global minimum
As we will see, Taylors theorem is useful for identifying local minima. Taylors theorem states
that if f (x) is n times differentiable, then there exists (0, 1) such that
1 1 1
f (x + h) = f (x) + hf 0 (x) + h2 f 00 (x) + + hn1 f n1 (x) + hn f n (x + h) (2.1)
2 (n 1)! |n! {z }
O(hn )
AA222: MDO 38 Sunday 1st April, 2012 at 19:48
Does it converge?
The rate of convergence is a measure of how fast an iterative method converges to the numerical
solution. An iterative method is said to converge with order r when r is the largest positive number
such that
kxk+1 x k
0 lim < . (2.8)
k kxk x kr
For a sequence with convergence rate r, asymptotic error constant, is the limit above, i.e.
kxk+1 x k
= lim . (2.9)
k kxk x kr
To understand this better, lets assume that we have ideal convergence behavior so that condi-
tion (2.9) is true for every iteration k + 1 and we do not have to take the limit. Then we can write
||xk x ||
. (2.11)
||xk ||
One potential problem with this definition is that xk might be zero. To fix this, we write,
||xk x ||
. (2.12)
1 + ||xk ||
AA222: MDO 40 Sunday 1st April, 2012 at 19:48
Another potential problem arises from situations where the gradients are large, i.e., even if we have
a small ||xk x ||, |f (xk ) f (x )| is large. Thus, we should use a combined quantity, such as,
||xk x || |f (xk ) f (x )|
+ (2.13)
1 + ||xk || 1 + |f (xk )|
Final issue is that when we perform numerical optimization, x is usually not known! However,
you can monitor the progress of your algorithm using the difference between successive steps,
Sometimes, you might just use the second fraction in the above term, or the norm of the gradient.
You should plot these quantities in a log-axis versus k.
Ignoring the terms higher than order two and assuming the function next iteration to be the root
(i.e., f (xk+1 ) = 0), we obtain,
f (xk )
xk+1 = xk 0 . (2.18)
f (xk )
Sample page from NUMERICAL RECIPES IN FORTRAN 77: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43064-X)
Sunday 1st April, 2012 at 19:48
Copyright (C) 1986-1992 by Cambridge University Press. Programs Copyright (C) 1986-1992 by Numerical Recipes Software.
Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of ma
readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit we
http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America
Figure 9.4.1. Newtons method extrapolates the local derivative to find the next estimate of the root. In
357
x
1
1
9.4 Newton-Raphson Method Using Derivative
2
3
2
this example it works well and converges quadratically.
41
f(x)
f (x)
3
AA222: MDO
N FORTRAN 77: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43064-X)
ersity Press. Programs Copyright (C) 1986-1992 by Numerical Recipes Software.
er computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website
ke one paper copy for their own personal use. Further reproduction, or any copying of machine-
rth America only), or send email to directcustserv@cambridge.org (outside North America).
st
Figure 9.4.1. MDO
AA222:
this example it works well and converges quadratically.
358 42of the root.Chapter
Newtons method extrapolates the local derivative to find the next estimate In
9. Sunday
Root Finding 1 April,Sets
and Nonlinear 2012of at 19:48
Equations
f(x)
f(x)
2
3
1
x
x
Figure 9.4.2. Unfortunate case where Newtons method encounters a local extremum and shoots off to
outer space. Here bracketing bounds, as in rtsafe, would save the day. Figure 9.4.3. Unfortunate case where Newtons method enters a nonconvergent cycle. This behavior
Figure 2.2: Cases iswhere the Newton
often encountered when the method
function f isfails
obtained, in whole or in part, by table interpolation. With
a better initial guess, the method would have succeeded.
Figure 2.3: Newtons method with several different initial guesses. The xk are Newton iterates, xN
is the converged solution, and f (1) (x) dentoes the first derivative of f (x).
AA222: MDO 44 Sunday 1st April, 2012 at 19:48
which is the secant method (also known as the poor-mans Newton method).
Under favorable conditions, this method has superlinear convergence (1 < r < 2), with r
1.6180.
Solve this using the golden section method. The result are shown in Figure 2.4. Note the con-
vergence to different optima, depending on the starting interval, and the fact that it might not
converge to the best optimum within the starting interval.
Figure 2.4: The golden section method with initial intervals [0, 3], [4, 7], and [1, 7]. The horizontal
lines represent the sequences of the intervals of uncertainty.
point xk , we can write a quadratic approximation of the function value at x as the first three terms
of a Taylor series
1
f(x) = f (xk ) + f 0 (xk )(x xk ) + f 00 (xk )(x xk )2 (2.25)
2
If f 00 (xk ) is not zero, and setting x = xk+1 this yields
f 0 (xk )
xk+1 = xk . (2.26)
f 00 (xk )
This is Newtons method used to find a zero of the first derivative.
xk+1 = xk + k pk (2.27)
where 0 < 1 < 2 < 1. The only difference when comparing with the Wolfe conditions is that we
do not allow points where the derivative has a positive value that is too large and therefore exclude
points that are far from the stationary points.
() = f (xk + pk ) (2.32)
Then at the starting point of the line search, (0) = f (xk ). The slope of the univariate function,
is the projection of the n-dimensional gradient onto the search direction pk , i.e.,
1. Begins with trial 1 , and keeps increasing it until it finds either an acceptable step length or
an interval that brackets the desired step lengths.
2. In the latter case, a second stage (the zoom algorithm below) is performed that decreases the
size of the interval until an acceptable step length is found.
1. Choose
initial point
3. Bracket interval
3. Does point satisfy sufficient between previous
NO
decrease? point and current
point
YES
YES
YES
6. Bracket interval
between current
point and previous "zoom" function
point
Point is good
enough
Figure 2.6: Line search algorithm satisfying the strong Wolfe conditions
AA222: MDO 49 Sunday 1st April, 2012 at 19:48
YES
4.2 Does point satisfy the curvature 4.3 Does derivative sign at point 4.3 Replace high point with low
NO agree with interval trend? YES
condition? point
YES
Point is good NO
enough
4.4 Replace low point with trial
point
Figure 2.7: Zoom function in the line search algorithm satisfying the strong Wolfe conditions
AA222: MDO 50 Sunday 1st April, 2012 at 19:48
1. the interval between low , and high contains step lengths that satisfy the strong Wolfe con-
ditions
The interpolation that determines j should be safeguarded, to ensure that it is not too close
to the endpoints of the interval.
Using an algorithm based on the strong Wolfe conditions (as opposed to the plain Wolfe condi-
tions) has the advantage that by decreasing 2 , we can force to be arbitrarily close to the local
minimum.
AA222: MDO 51 Sunday 1st April, 2012 at 19:48
Bibliography
[1] Ashok D. Belegundu and Tirupathi R. Chandrupatla. Optimization Concepts and Applications
in Engineering, chapter 2. Prentice Hall, 1999.
[2] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Academic
Press, 1981.
[3] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer, 2nd edition, 2006.
AA222: MDO 52 Sunday 1st April, 2012 at 19:48
Figure 2.8: The line search algorithm iterations. The first stage is marked with square labels and
the zoom stage is marked with circles.