Sie sind auf Seite 1von 5

Steepest Descent Method

Kefu Liu

Properties of Gradient Vector

The gradient vector of a scalar function f ( x1 , x2 ,", xn ) is defined as a column vector

T
f f f
f = " =c
x1 x2 xn

For example

f ( x1 , x2 ) = 25 x12 + x22

at the point x1* = .6, x2* = 4

2(25) x1* 2(25)(.6) 30


c = f = * = =
2 x2 2(4) 8

The normalized gradient vector

c
c=
cT c

For example, at the point x1* = .6, x2* = 4

1 30 .96625
c= =
302 + 82 8 .2577

Property 1. The gradient vector represents a direction of maximum rate of increase for
the function f (x) at x* . For example,

f (.6, 4) = 25(.6)2 + 42 = 25

If we increase x in the direction c by a step size of = .5

.6 .96625 1.083125
x (1) = x(0) + c = + .5 =
4 .2577 4.12885

The function value becomes

1
f (x(1) ) = 25(1.083125)2 + (4.122885)2 = 46.327

If we move in a direction [1 0]
T

.6 1 1.1
x(1) = x(0) + c = + .5 =
4 0 4

The function value becomes

f (x(1) ) = 25(1.1)2 + (4)2 = 46.25

If we move in a direction [ 0 1]
T

.6 0 .6
x (1) = x(0) + c = + .5 =
4 1 4.5

The function value becomes

f (x(1) ) = 25(.6)2 + (4.5)2 = 29.25

We can see that moving along the gradient direction results in the maximum increase in
the function.

Property 2. The gradient vector c of f ( x1 , x2 ,", xn ) at the point x* is orthogonal


(normal) to the tangent plane for the surface f (x* ) = constant . For example,
f ( x1 , x2 ) = 25 x12 + x22 = 25 the slope at x1* = .6, x2* = 4 can be found to be

2(25) x1dx1 + 2 x2 dx2 = 0

dx2 25 x1 25(.6)
slope = = = = 3.75
dx1 x2 4

The direction of the tangent line is given by

1
t=
3.75

c and t are normal each other as

2
1
cT t = [30 8] = 30 8(3.75) = 0
3.75
Property 3. The maximum rate of change of f (x) at any point x* is the magnitude of
the gradient vector given by

c = cT c

Steepest descent direction. Let f ( x) be a differentiable function with respect to x . The


direction of steepest descent for f (x) at any point is

d = c or d = c

Example. Use the steepest descent direction to search for the minimum for
f ( x1 , x2 ) = 25 x12 + x22 starting at x (0) = [1 3] with a step size of = .5 . The function
T

value at the starting point is

f (x(0) ) = 25(1)2 + 32 = 34

An analytical solution revealsthat the minimum point is at x* = [ 0 0] and f (x* ) = 0 . Let


T

us start the process of iterations.

2(25)(1) 50 (0) 1 50 .9929


c(0) = = , c = =
2(3) 6 502 + 62 6 .1191

1 .9929 .5035
x (1) = x (0) .5 c (0) = .5 =
3 .1191 2.9404

f (x(1) ) = 25(.5035)2 + (2.9404)2 = 14.984

2(25)(.5035) 25.175 (1) .9738


c(1) = = , c = .2275
2(2.9404) 5.8808

.5035 .9738 .0166


x(2) = x(1) .5 c (1) = .5 =
2.9404 .2275 2.8267

f (x(2) ) = 25(.0166)2 + (2.8267)2 = 7.997

.83 (2) .1453


c(2) = , c = .9894
5.6534

3
.0166 .1453 .0561
x(3) = x(2) .5 c (2) = .5 =
2.8267 .9894 2.332

f (x(3) ) = 25(.0561)2 + (2.332)2 = 5.5169

2.805 (3) .5154


c(3) = , c = .8570
4.664

.0561 .5154 .2016


x(4) = x (3) .5 c (3) = .5 =
2.332 .857 1.9035

f (x(4) ) = 25(.2016)2 + (1.9035)2 = 4.6394

10.08 (4) .9355


c(4) = , c = .3533
3.807

.2016 .9355 -.2662


x (5) = x (4) .5 c (4) = .5 =
1.9035 .3533 1.7269

f (x(5) ) = 25(.2662)2 + (1.7269)2 = 4.7537

It is noted that the function values start to oscillate, i.e., not monotonically reduce. This is
caused by the constant step size. When the search is near the minimum, a smaller step
size should be used. Otherwise, an overshoot will occur. Overshoot means that we
move along the steepest direction more than needed. As a matter of fact, we are supposed
to find the best step size at each iteration by conducting a one-D optimization in the
steepest descent direction. For example, the new point can be expressed as a function of
step size , i.e.,

1 .9929 1 .9929
x (1) = x(0) c (0) = =
3 .1191 3 .1191

f ( x (1) ) = 25(1 .9929 ) 2 + (3 .1191 ) 2

f ( x (1) ) is a function of . Using the analytical approach, we get

df ( )
= 2(25)(1 .9929 (0) )(.9929) + 2(3 .1191 (0) )(.1191) = 0
d

4
25(.9929) + 3(.1191)
(0) = = 1.0211
25(.99292 ) + .11912

1 .9929 -.0139
x(1) = x(0) (0) c (0) = 1.0211 =
3 .1191 2.8784

f ( x (1) ) = 25( .0139) 2 + 2.8784 2 = 8.29

f = 8.29 is the minimum value we can find at this iteration.

Steepest descent algorithm

Step 1. Estimate a starting design x (0) and set the iteration counter k = 0 . Select a
convergence parameter > 0 .
Step 2. Calculate the gradient of f (x) at the point x ( k ) as c( k ) = f (x ( k ) ) . Calculate
c = cT c . If c < , then stop the iteration process as x* = x(k ) is a minimum
point. Otherwise, go to Step 3.
Step 3. Let the search direction at the current point x ( k ) as d ( k ) = c( k ) .
Step 4. Calculate a step size ( k ) to minimize f (x ( k ) + ( k ) d ( k ) ) . A one-dimensional
search is used to determine ( k ) .
Step 5. Update the design as x( k +1) = x( k ) + ( k ) d( k ) . Set k = k + 1 and go to Step 2.

Das könnte Ihnen auch gefallen