Beruflich Dokumente
Kultur Dokumente
Lecture 4
Outline
‣ Understanding optimization view of learning
- large margin linear classification
- regularization, generalization
‣ Optimization algorithms
- preface: gradient descent optimization
- stochastic gradient descent
- quadratic program
Recall: learning as optimization
‣ Machine learning problems are often cast as optimization
problems
neg margin
pos margin boundary
boundary ✓ · x + ✓0 = 1
✓ · x + ✓0 = 1 decision
✓ · x + ✓0 = 0 boundary
n
X
1 (i) (i) 2
J(✓, ✓0 ) = Lossh y (✓ · x + ✓0 ) + k✓k
n 2
= 0.1 (C = 10)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 1 (C = 1)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 100 (C = 0.01)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 1000 (C = 0.001)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 0.01 (C = 100)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 0.1 (C = 10)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 1 (C = 1)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
= 100 (C = 0.01)
✓ · x + ✓0 = 1 ✓ · x + ✓0 = 1
✓ · x + ✓0 = 0
Regularization, generalization
Xn
1
J(✓, ✓0 ) = Lossh y (i) (✓ · x(i) + ✓0 ) + k✓k2
n i=1 2
Outline
‣ Understanding optimization view of learning
- large margin linear classification
- regularization, generalization
‣ Optimization algorithms
- preface: gradient descent optimization
- stochastic gradient descent
- quadratic program
Preface: Gradient descent
J(✓)
✓
Preface: Gradient descent
J(✓)
✓
Stochastic gradient descent
Xn
1
J(✓, ✓0 ) = Lossh y (i) (✓ · x(i) + ✓0 ) + k✓k2
n i=1 2
Xn
1
= Lossh y (i) (✓ · x(i) + ✓0 ) + k✓k2
n i=1 2
Stochastic gradient descent
n
X
1
J(✓) = Lossh (y (i) ✓ · x(i) ) + k✓k2
n i=1
2
<latexit sha1_base64="YLu6FZGzynSY9qGb/PdX7SsQ7+M=">AAACn3icbVHfb9MwEHbCj40wWAuPvBgqpFZIVVJA8DKpEg8ghFCR6DZUp5HjOIm1xInsC6zy/IfyyH+C2+aBbZxk6dP33Z3vvkvbSmgIw9+ef+fuvfsHhw+Ch0ePHh8Phk9OddMpxpesqRp1nlLNKyH5EgRU/LxVnNZpxc/Siw9b/ewnV1o08jtsWh7XtJAiF4yCo5KB+TwmUHKgE3yCSa4oM5E10hLd1YkRJ5FdS0xSURSrgAC/BPOl0dom5XizNmMxsftqwrIG8OWemuBXfStSuUkyas3Mkqs+82o9C3YN42QwCqfhLvBtEPVghPpYJEPviGQN62ougVVU61UUthAbqkCwituAdJq3lF3Qgq8clLTmOjY7lyx+6ZgM541yTwLesf9WGFprvalTl1lTKPVNbUv+T1t1kL+PjZBtB1yy/Ud5V2Fo8NZynAnFGVQbByhTws2KWUmdP+AOExDN3dVkAaUhLVVCZm47a9xi9pq2Nf+XyNwM5vX0rZA2cP5FN926DU5n0yicRt/ejObz3slD9Ay9QGMUoXdojj6hBVoihv54B97AG/rP/Y/+V3+xT/W9vuYpuhb+j78aaM4O</latexit>
Stochastic gradient descent
n
X
1
J(✓) = Lossh (y (i) ✓ · x(i) ) + k✓k2
n i=1
2
<latexit sha1_base64="YLu6FZGzynSY9qGb/PdX7SsQ7+M=">AAACn3icbVHfb9MwEHbCj40wWAuPvBgqpFZIVVJA8DKpEg8ghFCR6DZUp5HjOIm1xInsC6zy/IfyyH+C2+aBbZxk6dP33Z3vvkvbSmgIw9+ef+fuvfsHhw+Ch0ePHh8Phk9OddMpxpesqRp1nlLNKyH5EgRU/LxVnNZpxc/Siw9b/ewnV1o08jtsWh7XtJAiF4yCo5KB+TwmUHKgE3yCSa4oM5E10hLd1YkRJ5FdS0xSURSrgAC/BPOl0dom5XizNmMxsftqwrIG8OWemuBXfStSuUkyas3Mkqs+82o9C3YN42QwCqfhLvBtEPVghPpYJEPviGQN62ougVVU61UUthAbqkCwituAdJq3lF3Qgq8clLTmOjY7lyx+6ZgM541yTwLesf9WGFprvalTl1lTKPVNbUv+T1t1kL+PjZBtB1yy/Ud5V2Fo8NZynAnFGVQbByhTws2KWUmdP+AOExDN3dVkAaUhLVVCZm47a9xi9pq2Nf+XyNwM5vX0rZA2cP5FN926DU5n0yicRt/ejObz3slD9Ay9QGMUoXdojj6hBVoihv54B97AG/rP/Y/+V3+xT/W9vuYpuhb+j78aaM4O</latexit>
Find ✓, ✓0 that
1 2
minimize k✓k subject to
2
(i) (i)
y (✓ · x + ✓0 ) 1, i = 1, . . . , n
distance from the
1 decision boundary
k✓k to the margin
boundary