Gradient Methods

Gradient Methods
May 2005
Preview
Background
Steepest Descent
Conjugate Gradient
Preview
Background
Steepest Descent
Conjugate Gradient
Background
Motivation
The gradient notion
The Wolfe Theorems
Motivation
The min(max) problem:

But we learned in calculus how to solve that
kind of question!
) ( min x f
x
Motivation
Not exactly,
Functions:
High order polynomials:

What about function that dont have an analytic
presentation: Black Box
+
x
1
6
x
3
1
120
x
5
1
5040
x
7
R R f
n
:
Motivation- real world problem
Connectivity shapes (isenburg,gumhold,gotsman)

What do we get only from C without geometry?
{ ( , ), } mesh C V E geometry = =
First we introduce error functionals and then try
to minimize them:
( )
2
3
( , )
( ) 1
n
s i j
i j E
E x x x
e
e =
( , )
1
( )
i j i
i j E
i
L x x x
d
e
=
3 2
1
( ) ( )
n
n
r i
i
E x L x
=
e =
9
9
Then we minimize:

High dimension non-linear problem.
The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what well see here.
( )
3
( , ) arg min 1 ( ) ( )
n
s r
x
E C E x E x
e
= + (

Changing the parameter:

( )
3
( , ) arg min 1 ( ) ( )
n
s r
x
E C E x E x
e
= + (

Motivation
General problem: find global min(max)
This lecture will concentrate on finding local
minimum.
Background
Motivation
The gradient notion
The Wolfe Theorems
:= f

( ) , x y
|
\
|
.
|
|
cos
1
2
x
|
\
|
.
|
|
cos
1
2
y x
Directional Derivatives:
first, the one dimension derivative:
u
x
y x f
c
c ) , (
y
y x f
c
c ) , (
Directional Derivatives :
Along the Axes
v
y x f
c
c ) , (
2
R ve
1 = v
Directional Derivatives :
In general direction
Directional
Derivatives
x
y x f
c
c ) , (
y
y x f
c
c ) , (
In the plane
2
R
R R f
2
:
|
|
.
|
\
|
c
c
c
c
= V
y
f
x
f
y x f : ) , (
The Gradient: Definition in
) , ( y x f V
|
|
.
|
\
|
c
c
c
c
= V
n
n
x
f
x
f
x x f ,..., : ) ,..., (
1
1
R R f
n
:
The Gradient: Definition
The Gradient Properties
The gradient defines (hyper) plane
approximating the function infinitesimally

y
y
f
x
x
f
z A
c
c
+ A
c
c
= A
The Gradient properties
By the chain rule: (important for later use)
v f p
v
f
p
, ) ( V =
c
c
1 = v
p
f V
v
Proposition 1:
is maximal choosing

is minimal choosing

(intuitive: the gradient points at the greatest change direction)
v
f
c
c
p
p
f
f
v V
V
=
1
p
p
f
f
v V
V
=
1
Proof: (only for minimum case)
Assign: by chain rule:

p
p
p
p p
p
p
p
p
f
f
f
f f
f
f
f
f p
v
y x f
V =
V
V
= V V
V
= V
V
V =
c
c
2
,
1
) (
) (
1
, ) ( ) (
) , (
p
p
f
f
v V
V
=
1
On the other hand for general v:

p
p
p p
f p
v
y x f
f
v f v f p
v
y x f
V >
c
c
V =
= V s V =
c
c
) (
) , (
, ) (
) , (
Proposition 2: let be a
smooth function around P,
if f has local minimum (maximum) at p
then,

(Intuitive: necessary for local min(max))
R R f
n
:
0 = V
p
f
1
C
Proof:
Intuitive:

Formally: for any
We get:

} 0 { \
n
R ve
0 ) (
, ) ( ) 0 (
) (
0
= V
V =
+
=
p
p
f
v f
dt
v t p df
We found the best INFINITESIMAL DIRECTION
at each point,
Looking for minimum: blind man procedure
How can we derive the way to the minimum
using this knowledge?
Background
Motivation
The gradient notion
The Wolfe Theorems
The Wolfe Theorem
This is the link from the previous gradient
properties to the constructive algorithm.
The problem:

) ( min x f
x
The Wolfe Theorem
We introduce a model for algorithm:
Data:
Step 0: set i=0
Step 1: if stop,
else, compute search direction
Step 2: compute the step-size

Step 3: set go to step 1
n
R x e
0
0 ) ( = V
i
x f
n
i
R h e
) ( min arg
0
i i i
h x f + e
>

i i i i
h x x + =
+

1
The Wolfe Theorem
The Theorem: suppose C1
smooth, and exist continuous function:

And,

And, the search vectors constructed by the
model algorithm satisfy:
R R f
n
:
] 1 , 0 [ :
n
R k
0 ) ( 0 ) ( : > = V x k x f x
i i i i i
h x f x k h x f V s V ) ( ) ( ), (
The Wolfe Theorem
And

Then if is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
=0
} {
i i
x
0 ) ( = V y f
0 0 ) ( = = V
i
h y f
The Wolfe Theorem
The theorem has very intuitive interpretation :
Always go in decent direction.
) (
i
x f V
i
h
Preview
Background
Steepest Descent
Conjugate Gradient
Steepest Descent
What it mean?
We now use what we have learned to
implement the most basic minimization
technique.
First we introduce the algorithm, which is a
version of the model algorithm.
The problem:
) ( min x f
x
Steepest Descent
Steepest descent algorithm:
Data:
Step 0: set i=0
Step 1: if stop,
else, compute search direction
Step 2: compute the step-size

Step 3: set go to step 1
n
R x e
0
0 ) ( = V
i
x f
) (
i i
x f h V =
) ( min arg
0
i i i
h x f + e
>

i i i i
h x x + =
+

1
Steepest Descent
Theorem: if is a sequence constructed
by the SD algorithm, then every accumulation
point y of the sequence satisfy:

Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives arent
given (are calculated numerically).
0 ) ( = V y f
=0
} {
i i
x
Steepest Descent
From the chain rule:

Therefore the method of steepest descent
looks like this:
0 ), ( ) ( = + V = +
i i i i i
h h x f h x f
d
d

Steepest Descent
Steepest Descent
The steepest descent find critical point and
local minimum.
Implicit step-size rule
Actually we reduced the problem to finding
minimum:

There are extensions that gives the step size
rule in discrete sense. (Armijo)
R R f :
Steepest Descent
Back with our connectivity shapes: the authors
solve the 1-dimension problem analytically.

They change the spring energy and get a
quartic polynomial in x
) ( min arg
0
i i i
h x f + e
>

( )
2
2
3
( , )
( ) 1
n
s i j
i j E
E x x x
e
e =
Preview
Background
Steepest Descent
Conjugate Gradient
Conjugate Gradient
We from now on assume we want to minimize
the quadratic function:

This is equivalent to solve linear problem:

There are generalizations to general functions.

c x b Ax x x f
T T
+ =
2
1
) (
b Ax x f = V = ) ( 0
Conjugate Gradient
What is the problem with steepest descent?

We can repeat the same directions over and
over
Conjugate gradient takes at most n steps.

Conjugate Gradient
0
x
1
x
0
d
1
e
0
e
0
~
x
b x A =
~
,... ,..., ,
1 0 j
d d d
Search directions should span
i i i i
d x x o + =
+1
i i i
Ae x x A x f
x A Ax b Ax x f
= = V
= = V
)
~
( ) (
~
) (
x x e
i i
~
=
n
9
Conjugate Gradient
0
x
1
x
0
d
0
~
x
Given , how do we calculate ? (as before)
j
d
i
T
i
i
T
i
i
T
i
i
T
i
i
i i i
T
i
i
T
i
i
T
i
Ad d
x f d
Ad d
Ae d
d e A d
Ae d
x f d
) (
0 ) (
0
0 ) (
1
1
V
= =
= +
=
= V
+
+
o
o
j
o
) (
1 +
V
i
x f
Conjugate Gradient
0
x
1
x
0
d
1
e
0
e
0
~
x
How do we find ?
We want that after n step the error will be 0 :
j
d
=
=
1
0
0
n
i
i i
d e o
=
= = = =
1
0
1 1 0 0 2 0 0 1 0
...
j
i
i i j
d e d d e d e e o o o o

=
=
+ =
1
0
1
0
j
i
i i
n
i
i i j
d d e o o
Conjugate Gradient
Here an idea: if then:
j j
o o =

=
=
= = =
1 1
0
1
0
1
0
1
0
n
j i
i i
j
i
i i
n
i
i i
j
i
i i
n
i
i i j
d d d d d e o o o o o
So if , n j =
0 =
n
e
Conjugate Gradient
So we look for such that :
j j
o o =
j
d
0 =
i
T
j
Ad d
Simple calculation shows that if we take
A - conjugate (- orthogonal)
j i =
Conjugate Gradient
We have to find an A conjugate basis

We can do gram-schmidt process, but we
should be careful since it is an O(n) process:

1 ... 0 , = n j d
j
k
i
k
k i i i
d u d

=
+ =
1
0
,
|
n
u u u ,..., ,
2 1
Some series of vectors
Conjugate Gradient
So for a arbitrary choice of we dont earn
nothing.
Luckily, we can choose so that the
conjugate direction calculation is O(m) where
m is the number of non-zero entries in .
The correct choice of is:

i
u
i
u
A
i
u
) (
i i
x f u V =
Conjugate Gradient
So the conjugate gradient algorithm for minimizing f:
Data:

Step 0:

Step 1:

Step 2:

Step 3:

Step 4: and repeat n times.
n
x 9 e
0
) ( :
0 0 0
x f r d V = =
i
T
i
i
T
i
i
Ad d
r r
= o
i i i i
d x x o + =
+1
i
T
i
i
T
i
i
r r
r r
1 1
1
+ +
+
= |
i i i i
d r d
1 1 1 + + +
+ = |
) ( :
i i
x f r V =

Gradient Methods

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Gradient Methods

Hochgeladen von

Copyright:

Verfügbare Formate

Gradient Methods

Das könnte Ihnen auch gefallen