Steepest

Steepest Descent Method
Kefu Liu
Properties of Gradient Vector
The gradient vector of a scalar function f ( x1 , x2 ,", xn ) is defined as a column vector
T
f f f
f = " =c
x1 x2 xn
For example
f ( x1 , x2 ) = 25 x12 + x22
at the point x1* = .6, x2* = 4
2(25) x1* 2(25)(.6) 30

c = f = * = =
2 x2 2(4) 8
The normalized gradient vector
c
c=
cT c
For example, at the point x1* = .6, x2* = 4
1 30 .96625
c= =
302 + 82 8 .2577
Property 1. The gradient vector represents a direction of maximum rate of increase for
the function f (x) at x* . For example,
f (.6, 4) = 25(.6)2 + 42 = 25
If we increase x in the direction c by a step size of = .5
.6 .96625 1.083125
x (1) = x(0) + c = + .5 =
4 .2577 4.12885
The function value becomes
1
f (x(1) ) = 25(1.083125)2 + (4.122885)2 = 46.327
If we move in a direction [1 0]
T
.6 1 1.1
x(1) = x(0) + c = + .5 =
4 0 4
f (x(1) ) = 25(1.1)2 + (4)2 = 46.25
If we move in a direction [ 0 1]
T
.6 0 .6
x (1) = x(0) + c = + .5 =
4 1 4.5
f (x(1) ) = 25(.6)2 + (4.5)2 = 29.25
We can see that moving along the gradient direction results in the maximum increase in
the function.
Property 2. The gradient vector c of f ( x1 , x2 ,", xn ) at the point x* is orthogonal

(normal) to the tangent plane for the surface f (x* ) = constant . For example,
f ( x1 , x2 ) = 25 x12 + x22 = 25 the slope at x1* = .6, x2* = 4 can be found to be
2(25) x1dx1 + 2 x2 dx2 = 0
dx2 25 x1 25(.6)
slope = = = = 3.75
dx1 x2 4
The direction of the tangent line is given by
1
t=
3.75
c and t are normal each other as
2
1
cT t = [30 8] = 30 8(3.75) = 0
3.75
Property 3. The maximum rate of change of f (x) at any point x* is the magnitude of
the gradient vector given by
c = cT c
Steepest descent direction. Let f ( x) be a differentiable function with respect to x . The

direction of steepest descent for f (x) at any point is
d = c or d = c
Example. Use the steepest descent direction to search for the minimum for
f ( x1 , x2 ) = 25 x12 + x22 starting at x (0) = [1 3] with a step size of = .5 . The function
T
value at the starting point is
f (x(0) ) = 25(1)2 + 32 = 34
An analytical solution revealsthat the minimum point is at x* = [ 0 0] and f (x* ) = 0 . Let

T
us start the process of iterations.
2(25)(1) 50 (0) 1 50 .9929

c(0) = = , c = =
2(3) 6 502 + 62 6 .1191
1 .9929 .5035
x (1) = x (0) .5 c (0) = .5 =
3 .1191 2.9404
f (x(1) ) = 25(.5035)2 + (2.9404)2 = 14.984
2(25)(.5035) 25.175 (1) .9738

c(1) = = , c = .2275
2(2.9404) 5.8808
.5035 .9738 .0166

x(2) = x(1) .5 c (1) = .5 =
2.9404 .2275 2.8267
f (x(2) ) = 25(.0166)2 + (2.8267)2 = 7.997
.83 (2) .1453

c(2) = , c = .9894
5.6534
3
.0166 .1453 .0561
x(3) = x(2) .5 c (2) = .5 =
2.8267 .9894 2.332
f (x(3) ) = 25(.0561)2 + (2.332)2 = 5.5169
2.805 (3) .5154

c(3) = , c = .8570
4.664
.0561 .5154 .2016

x(4) = x (3) .5 c (3) = .5 =
2.332 .857 1.9035
f (x(4) ) = 25(.2016)2 + (1.9035)2 = 4.6394
10.08 (4) .9355

c(4) = , c = .3533
3.807
.2016 .9355 -.2662

x (5) = x (4) .5 c (4) = .5 =
1.9035 .3533 1.7269
f (x(5) ) = 25(.2662)2 + (1.7269)2 = 4.7537
It is noted that the function values start to oscillate, i.e., not monotonically reduce. This is
caused by the constant step size. When the search is near the minimum, a smaller step
size should be used. Otherwise, an overshoot will occur. Overshoot means that we
move along the steepest direction more than needed. As a matter of fact, we are supposed
to find the best step size at each iteration by conducting a one-D optimization in the
steepest descent direction. For example, the new point can be expressed as a function of
step size , i.e.,
1 .9929 1 .9929
x (1) = x(0) c (0) = =
3 .1191 3 .1191
f ( x (1) ) = 25(1 .9929 ) 2 + (3 .1191 ) 2
f ( x (1) ) is a function of . Using the analytical approach, we get
df ( )
= 2(25)(1 .9929 (0) )(.9929) + 2(3 .1191 (0) )(.1191) = 0
d
4
25(.9929) + 3(.1191)
(0) = = 1.0211
25(.99292 ) + .11912
1 .9929 -.0139
x(1) = x(0) (0) c (0) = 1.0211 =
3 .1191 2.8784
f ( x (1) ) = 25( .0139) 2 + 2.8784 2 = 8.29
f = 8.29 is the minimum value we can find at this iteration.
Steepest descent algorithm
Step 1. Estimate a starting design x (0) and set the iteration counter k = 0 . Select a
convergence parameter > 0 .
Step 2. Calculate the gradient of f (x) at the point x ( k ) as c( k ) = f (x ( k ) ) . Calculate
c = cT c . If c < , then stop the iteration process as x* = x(k ) is a minimum
point. Otherwise, go to Step 3.
Step 3. Let the search direction at the current point x ( k ) as d ( k ) = c( k ) .
Step 4. Calculate a step size ( k ) to minimize f (x ( k ) + ( k ) d ( k ) ) . A one-dimensional
search is used to determine ( k ) .
Step 5. Update the design as x( k +1) = x( k ) + ( k ) d( k ) . Set k = k + 1 and go to Step 2.

Steepest

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Steepest

Hochgeladen von

Copyright:

Verfügbare Formate

Steepest Descent Method

Properties of Gradient Vector

The gradient vector of a scalar function f ( x1 , x2 ,", xn ) is defined as a column vector

at the point x1* = .6, x2* = 4

2(25) x1* 2(25)(.6) 30

The normalized gradient vector

For example, at the point x1* = .6, x2* = 4

If we increase x in the direction c by a step size of = .5

The function value becomes

The function value becomes

f (x(1) ) = 25(1.1)2 + (4)2 = 46.25

The function value becomes

f (x(1) ) = 25(.6)2 + (4.5)2 = 29.25

Property 2. The gradient vector c of f ( x1 , x2 ,", xn ) at the point x* is orthogonal

2(25) x1dx1 + 2 x2 dx2 = 0

The direction of the tangent line is given by

c and t are normal each other as

Steepest descent direction. Let f ( x) be a differentiable function with respect to x . The

value at the starting point is

An analytical solution revealsthat the minimum point is at x* = [ 0 0] and f (x* ) = 0 . Let

us start the process of iterations.

2(25)(1) 50 (0) 1 50 .9929

f (x(1) ) = 25(.5035)2 + (2.9404)2 = 14.984

2(25)(.5035) 25.175 (1) .9738

.5035 .9738 .0166

f (x(2) ) = 25(.0166)2 + (2.8267)2 = 7.997

.83 (2) .1453

f (x(3) ) = 25(.0561)2 + (2.332)2 = 5.5169

2.805 (3) .5154

.0561 .5154 .2016

f (x(4) ) = 25(.2016)2 + (1.9035)2 = 4.6394

10.08 (4) .9355

.2016 .9355 -.2662

f (x(5) ) = 25(.2662)2 + (1.7269)2 = 4.7537

f ( x (1) ) = 25(1 .9929 ) 2 + (3 .1191 ) 2

f ( x (1) ) is a function of . Using the analytical approach, we get

f ( x (1) ) = 25( .0139) 2 + 2.8784 2 = 8.29

f = 8.29 is the minimum value we can find at this iteration.

Steepest descent algorithm

Das könnte Ihnen auch gefallen