Week 6 Day 2

CS545: Gradient
Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
CS545: Gradient Descent
Chuck Anderson
Department of Computer Science

Colorado State University
Fall, 2009
1 / 26
CS545: Gradient
Outline Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
Gradient Descent
Parabola
Examples in R
2 / 26
CS545: Gradient
Finding Minimum of Parabola Descent
Chuck Anderson
Find x that is minimum of f (x) = 1.2(x 2)2 + 3.2 or,
said another way, find argmaxx f (x). How? Gradient Descent
Parabola
Examples in R
3 / 26
CS545: Gradient
Finding Minimum of Parabola Descent
Chuck Anderson
Find x that is minimum of f (x) = 1.2(x 2)2 + 3.2 or,
said another way, find argmaxx f (x). How? Gradient Descent
Parabola
Yep. Take derivative, set equal to zero, and try to solve Examples in R
for x.
f (x) = 1.2(x 2)2 + 3.2
df (x)
= 1.2(2)(x 2) = 2.4(x 2)
dx
df (x)
= 0 = 2.4(x 2)
dx
x =2
8
Closedform solution
7
1.2(x 2)2 + 3.2
6
5
4
0 1 2 3 4
x 4 / 26
CS545: Gradient
Gradient Descent Descent
But, if dfdx(x) is cannot be solved directly for x, what can

Chuck Anderson
we do? Gradient Descent

Parabola
Examples in R
5 / 26
CS545: Gradient

Chuck Anderson

Parabola
Start at some x value, use derivative at that value to tell Examples in R
us which way to move, and repeat. Gradient descent.
6 / 26
CS545: Gradient

Chuck Anderson

Parabola
Start at some x value, use derivative at that value to tell Examples in R
us which way to move, and repeat. Gradient descent.

is factor of derivative to control how far to go
df (x)
= 2.4(x 2)
dx
x(0) = 0 (for example)
x(n) = x(n 1) 2.4(x 2)

8
Closedform solution
7
1.2(x 2)2 + 3.2
Gradient Descent
6
5
0 1 2 3 4 7 / 26
For a parabola, can get there much faster if we also CS545: Gradient
Descent
know the second derivative, which is what?
Chuck Anderson
Gradient Descent
Parabola
Examples in R
8 / 26
Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
9 / 26
Descent
Chuck Anderson
df (x)
= f 0 = 2.4(x 2) Gradient Descent
dx Parabola
d 2 f (x) Examples in R
= f 00 = 2.4
dx 2
and use Newtons method (see the Wikipedia entry for

Newtons method)
f0
x(n) = x(n 1)
f 00
2.4(x 2)
x(n) = x(n 1)
2.4
x(n) = x(n 1) (x 2)
8

7
1.2(x 2)2 + 3.2
Newton's Gradient Descent

6
5
4

10 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
Parabola
Examples in R
If the function is not a parabola, what can we do?

Cannot solve directly for x. Can still do gradient
descent. Can we always use Newtons method?
11 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
Parabola
Examples in R

No. Reason 1: If x has 1000 components, the second
derivative (Hessian) is a 1000 1000 matrix. May be
too big.
12 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
Parabola
Examples in R

No. Reason 1: If x has 1000 components, the second
derivative (Hessian) is a 1000 1000 matrix. May be
too big.
Reason 2: If not a parabola the second derivative
information may lead you very far away. When?
13 / 26
CS545: Gradient
Approximating the Second Derivative Descent
Chuck Anderson
Gradient Descent
Say we have picked a direction, p, to go. Rather than Parabola
compute the second derivative in that direction, we can Examples in R
approximate it using two first derivative values.
f 0 (x + p) f 0 (x)
f 00 (x)p for 0 < << 1

14 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
f 0 (x + p) f 0 (x)
f 00 (x)p for 0 < << 1

In practice, Moller found he had to modify this by
adding p where is set to a value for which the
resulting approximated second derivative is well
behaved.
f 0 (x + p) f 0 (x)
f 00 (x)p + p, for 0 < << 1

15 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
f 0 (x + p) f 0 (x)
f 00 (x)p for 0 < << 1

In practice, Moller found he had to modify this by
adding p where is set to a value for which the
resulting approximated second derivative is well
behaved.
f 0 (x + p) f 0 (x)
f 00 (x)p + p, for 0 < << 1

This gives us a way to scale the step size.
16 / 26
CS545: Gradient
Picking a Good Direction Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
Now, how about that direction? How do we decide

that?
17 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
Parabola
Examples in R

that?
Moller uses conjugate gradients. (See the wikipedia
entry for conjugate gradient)
18 / 26
CS545: Gradient
Chuck Anderson
Gradient Descent
Parabola
Examples in R

that?
Moller uses conjugate gradients. (See the wikipedia
entry for conjugate gradient)
The conjugate gradient direction is based on the
previous direction and the current gradient.
19 / 26
CS545: Gradient
Parabola Example Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
f < function(x) {
1.2 * (x2)2 + 3.2
}
grad < function(x) {

1.2 * 2 * (x2)
}
secondGrad < function(x) {

2.4
}
20 / 26
CS545: Gradient
Steepest Descent Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
xs < seq(0,4,len=20)
plot (xs , f (xs ), type=l,xlab=x,ylab=expression(1.2(x2)2 +3.2))
### df/dx = 2.4(x2)
### df/dx = 0 > 0 = 2.4x 4.8 > x = 2
lines (c (2,2), c (3,8), col=red,lty=2)
text (2.1,7, Closedform solution,col=red,pos=4)
### gradient descent
x < 0.1
xtrace < x
ftrace < f(x)
stepFactor < 0.6 ### try larger and smaller values (0.8 and 0.01)
for (step in 1:100) {
x < x stepFactor * grad(x)
xtrace < c(xtrace,x)
ftrace < c(ftrace,f(x))
}
lines ( xtrace , ftrace , type=b,col=blue)
text (0.5,6, Gradient Descent,col=blue,pos=4)
21 / 26
CS545: Gradient
Descent
Chuck Anderson
Gradient Descent
Parabola
8
Examples in R
Closedform solution
7
1.2(x 2)2 + 3.2
Gradient Descent
6
5
0 1 2 3 4
22 / 26
CS545: Gradient
Steepest Descent with gradientDescents.R Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R
source(gradientDescents .R)
x < 0.1
result < steepest(x, f, grad, stepsize =0.6, nIterations =100, xtracep=TRUE, ftracep=TRUE)
lines ( result $ xtrace , result $ ftrace ,type=b,col=blue)
text (0.5,6, Gradient Descent with steepest (), col=blue,pos=4)
23 / 26
CS545: Gradient
Steepest Descent scaled with Newtons Method Descent
Chuck Anderson
Gradient Descent
Parabola
Examples in R

x < 0.1
xtrace < x
ftrace < f(x)
for (step in 1:100) {
x < x grad(x)/secondGrad(x)
xtrace < c(xtrace,x)
ftrace < c(ftrace,f(x))
}
lines ( xtrace , ftrace , type=b,col=blue)
text (0.5,6, Newton's Gradient Descent,col=blue,pos=4)
24 / 26
CS545: Gradient
With Scaled Conjugate Gradient from Descent
Chuck Anderson
gradientDescents.R
Gradient Descent
Parabola
Examples in R
source(gradientDescents .R)
x < 0.1
result < scg(x, f, grad, nIterations =100, xtracep=TRUE, ftracep=TRUE)
lines ( result $ xtrace , result $ ftrace ,type=b,col=blue)
text (0.5,6, Gradient Descent with scg(), col=blue,pos=4)
25 / 26
CS545: Gradient
Results Descent
8 Chuck Anderson
Closedform solution Gradient Descent

7
7
Parabola
1.2(x 2)2 + 3.2
1.2(x 2)2 + 3.2

Gradient Descent Gradient Descent with steepest() Examples in R
6
6
5
5

4
0 1 2 3 4 0 1 2 3 4
x x
8
8

7
7
1.2(x 2)2 + 3.2
1.2(x 2)2 + 3.2
Newton's Gradient Descent Gradient Descent with scg()

6
6
5
5
4
0 1 2 3 4 0 1 2 3 4
x x
26 / 26

Week 6 Day 2

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Week 6 Day 2

Hochgeladen von

Copyright:

Verfügbare Formate

CS545: Gradient

CS545: Gradient Descent

Department of Computer Science

But, if dfdx(x) is cannot be solved directly for x, what can

we do? Gradient Descent

But, if dfdx(x) is cannot be solved directly for x, what can

we do? Gradient Descent

us which way to move, and repeat. Gradient descent.

But, if dfdx(x) is cannot be solved directly for x, what can

we do? Gradient Descent

us which way to move, and repeat. Gradient descent.

and use Newtons method (see the Wikipedia entry for

Newton's Gradient Descent

If the function is not a parabola, what can we do?

If the function is not a parabola, what can we do?

If the function is not a parabola, what can we do?

approximate it using two first derivative values.

approximate it using two first derivative values.

approximate it using two first derivative values.

Now, how about that direction? How do we decide

Now, how about that direction? How do we decide

Now, how about that direction? How do we decide

grad < function(x) {

secondGrad < function(x) {

plot (xs , f (xs ), type=l,xlab=x,ylab=expression(1.2(x2)2 +3.2))

Closedform solution Gradient Descent

1.2(x 2)2 + 3.2

1.2(x 2)2 + 3.2

Newton's Gradient Descent Gradient Descent with scg()

Das könnte Ihnen auch gefallen