Sie sind auf Seite 1von 26

CS545: Gradient

Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

CS545: Gradient Descent

Chuck Anderson

Department of Computer Science


Colorado State University

Fall, 2009

1 / 26
CS545: Gradient
Outline Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

Gradient Descent
Parabola
Examples in R

2 / 26
CS545: Gradient
Finding Minimum of Parabola Descent

Chuck Anderson
Find x that is minimum of f (x) = 1.2(x 2)2 + 3.2 or,
said another way, find argmaxx f (x). How? Gradient Descent
Parabola
Examples in R

3 / 26
CS545: Gradient
Finding Minimum of Parabola Descent

Chuck Anderson
Find x that is minimum of f (x) = 1.2(x 2)2 + 3.2 or,
said another way, find argmaxx f (x). How? Gradient Descent
Parabola
Yep. Take derivative, set equal to zero, and try to solve Examples in R

for x.
f (x) = 1.2(x 2)2 + 3.2
df (x)
= 1.2(2)(x 2) = 2.4(x 2)
dx
df (x)
= 0 = 2.4(x 2)
dx
x =2
8

Closedform solution
7
1.2(x 2)2 + 3.2

6
5
4

0 1 2 3 4

x 4 / 26
CS545: Gradient
Gradient Descent Descent

But, if dfdx(x) is cannot be solved directly for x, what can


Chuck Anderson

we do? Gradient Descent


Parabola
Examples in R

5 / 26
CS545: Gradient
Gradient Descent Descent

But, if dfdx(x) is cannot be solved directly for x, what can


Chuck Anderson

we do? Gradient Descent


Parabola
Start at some x value, use derivative at that value to tell Examples in R

us which way to move, and repeat. Gradient descent.

6 / 26
CS545: Gradient
Gradient Descent Descent

But, if dfdx(x) is cannot be solved directly for x, what can


Chuck Anderson

we do? Gradient Descent


Parabola
Start at some x value, use derivative at that value to tell Examples in R

us which way to move, and repeat. Gradient descent.


is factor of derivative to control how far to go
df (x)
= 2.4(x 2)
dx
x(0) = 0 (for example)
x(n) = x(n 1) 2.4(x 2)

8

Closedform solution
7
1.2(x 2)2 + 3.2

Gradient Descent
6
5

0 1 2 3 4 7 / 26
For a parabola, can get there much faster if we also CS545: Gradient
Descent
know the second derivative, which is what?
Chuck Anderson

Gradient Descent
Parabola
Examples in R

8 / 26
For a parabola, can get there much faster if we also CS545: Gradient
Descent
know the second derivative, which is what?
Chuck Anderson

Gradient Descent
Parabola
Examples in R

9 / 26
For a parabola, can get there much faster if we also CS545: Gradient
Descent
know the second derivative, which is what?
Chuck Anderson
df (x)
= f 0 = 2.4(x 2) Gradient Descent
dx Parabola
d 2 f (x) Examples in R
= f 00 = 2.4
dx 2

and use Newtons method (see the Wikipedia entry for


Newtons method)
f0
x(n) = x(n 1)
f 00
2.4(x 2)
x(n) = x(n 1)
2.4
x(n) = x(n 1) (x 2)
8


7
1.2(x 2)2 + 3.2

Newton's Gradient Descent


6
5
4


10 / 26
CS545: Gradient
Gradient Descent Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

If the function is not a parabola, what can we do?


Cannot solve directly for x. Can still do gradient
descent. Can we always use Newtons method?

11 / 26
CS545: Gradient
Gradient Descent Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

If the function is not a parabola, what can we do?


Cannot solve directly for x. Can still do gradient
descent. Can we always use Newtons method?
No. Reason 1: If x has 1000 components, the second
derivative (Hessian) is a 1000 1000 matrix. May be
too big.

12 / 26
CS545: Gradient
Gradient Descent Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

If the function is not a parabola, what can we do?


Cannot solve directly for x. Can still do gradient
descent. Can we always use Newtons method?
No. Reason 1: If x has 1000 components, the second
derivative (Hessian) is a 1000 1000 matrix. May be
too big.
Reason 2: If not a parabola the second derivative
information may lead you very far away. When?

13 / 26
CS545: Gradient
Approximating the Second Derivative Descent

Chuck Anderson

Gradient Descent
Say we have picked a direction, p, to go. Rather than Parabola
compute the second derivative in that direction, we can Examples in R

approximate it using two first derivative values.

f 0 (x + p) f 0 (x)
f 00 (x)p for 0 < << 1

14 / 26
CS545: Gradient
Approximating the Second Derivative Descent

Chuck Anderson

Gradient Descent
Say we have picked a direction, p, to go. Rather than Parabola
compute the second derivative in that direction, we can Examples in R

approximate it using two first derivative values.

f 0 (x + p) f 0 (x)
f 00 (x)p for 0 < << 1

In practice, Moller found he had to modify this by
adding p where is set to a value for which the
resulting approximated second derivative is well
behaved.
f 0 (x + p) f 0 (x)
f 00 (x)p + p, for 0 < << 1

15 / 26
CS545: Gradient
Approximating the Second Derivative Descent

Chuck Anderson

Gradient Descent
Say we have picked a direction, p, to go. Rather than Parabola
compute the second derivative in that direction, we can Examples in R

approximate it using two first derivative values.

f 0 (x + p) f 0 (x)
f 00 (x)p for 0 < << 1

In practice, Moller found he had to modify this by
adding p where is set to a value for which the
resulting approximated second derivative is well
behaved.
f 0 (x + p) f 0 (x)
f 00 (x)p + p, for 0 < << 1

This gives us a way to scale the step size.

16 / 26
CS545: Gradient
Picking a Good Direction Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

Now, how about that direction? How do we decide


that?

17 / 26
CS545: Gradient
Picking a Good Direction Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

Now, how about that direction? How do we decide


that?
Moller uses conjugate gradients. (See the wikipedia
entry for conjugate gradient)

18 / 26
CS545: Gradient
Picking a Good Direction Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

Now, how about that direction? How do we decide


that?
Moller uses conjugate gradients. (See the wikipedia
entry for conjugate gradient)
The conjugate gradient direction is based on the
previous direction and the current gradient.

19 / 26
CS545: Gradient
Parabola Example Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

f < function(x) {
1.2 * (x2)2 + 3.2
}

grad < function(x) {


1.2 * 2 * (x2)
}

secondGrad < function(x) {


2.4
}

20 / 26
CS545: Gradient
Steepest Descent Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R
xs < seq(0,4,len=20)
plot (xs , f (xs ), type=l,xlab=x,ylab=expression(1.2(x2)2 +3.2))
### df/dx = 2.4(x2)
### df/dx = 0 > 0 = 2.4x 4.8 > x = 2
lines (c (2,2), c (3,8), col=red,lty=2)
text (2.1,7, Closedform solution,col=red,pos=4)
### gradient descent
x < 0.1
xtrace < x
ftrace < f(x)
stepFactor < 0.6 ### try larger and smaller values (0.8 and 0.01)
for (step in 1:100) {
x < x stepFactor * grad(x)
xtrace < c(xtrace,x)
ftrace < c(ftrace,f(x))
}
lines ( xtrace , ftrace , type=b,col=blue)
text (0.5,6, Gradient Descent,col=blue,pos=4)

21 / 26
CS545: Gradient
Descent

Chuck Anderson

Gradient Descent
Parabola
8

Examples in R

Closedform solution
7
1.2(x 2)2 + 3.2

Gradient Descent
6
5

0 1 2 3 4

22 / 26
CS545: Gradient
Steepest Descent with gradientDescents.R Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

source(gradientDescents .R)
x < 0.1
result < steepest(x, f, grad, stepsize =0.6, nIterations =100, xtracep=TRUE, ftracep=TRUE)
plot (xs , f (xs ), type=l,xlab=x,ylab=expression(1.2(x2)2 +3.2))
lines ( result $ xtrace , result $ ftrace ,type=b,col=blue)
text (0.5,6, Gradient Descent with steepest (), col=blue,pos=4)

23 / 26
CS545: Gradient
Steepest Descent scaled with Newtons Method Descent

Chuck Anderson

Gradient Descent
Parabola
Examples in R

plot (xs , f (xs ), type=l,xlab=x,ylab=expression(1.2(x2)2 +3.2))


x < 0.1
xtrace < x
ftrace < f(x)
for (step in 1:100) {
x < x grad(x)/secondGrad(x)
xtrace < c(xtrace,x)
ftrace < c(ftrace,f(x))
}
lines ( xtrace , ftrace , type=b,col=blue)
text (0.5,6, Newton's Gradient Descent,col=blue,pos=4)

24 / 26
CS545: Gradient
With Scaled Conjugate Gradient from Descent

Chuck Anderson
gradientDescents.R
Gradient Descent
Parabola
Examples in R

source(gradientDescents .R)
x < 0.1
result < scg(x, f, grad, nIterations =100, xtracep=TRUE, ftracep=TRUE)
plot (xs , f (xs ), type=l,xlab=x,ylab=expression(1.2(x2)2 +3.2))
lines ( result $ xtrace , result $ ftrace ,type=b,col=blue)
text (0.5,6, Gradient Descent with scg(), col=blue,pos=4)

25 / 26
CS545: Gradient
Results Descent

8 Chuck Anderson

Closedform solution Gradient Descent


7

7
Parabola
1.2(x 2)2 + 3.2

1.2(x 2)2 + 3.2


Gradient Descent Gradient Descent with steepest() Examples in R
6

6
5

5

4

0 1 2 3 4 0 1 2 3 4

x x
8

8

7

7
1.2(x 2)2 + 3.2

1.2(x 2)2 + 3.2

Newton's Gradient Descent Gradient Descent with scg()


6

6
5

5
4

0 1 2 3 4 0 1 2 3 4

x x

26 / 26

Das könnte Ihnen auch gefallen