Solid Theoretical Groundings

Support Vector Machine (SVM)
Probably the most popular/influential classification algorithm
Backed by solid theoretical groundings (Vapnik and Cortes, 1995)
Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 17
A hyperplane based classifier (like the Perceptron)
A hyperplane based classifier (like the Perceptron)
Additionally uses the Maximum Margin Principle

Finds the hyperplane with maximum separation margin on the training data
Support Vector Machine
A hyperplane based linear classifier defined by w and b

Prediction rule: y = sign(w T x + b)

Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin

For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)


Assume the hyperplane is such that
w T x n + b ≥ 1 for yn = +1

w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1

w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1
Equivalently, yn (w T x n + b) ≥ 1

w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1
⇒ min1≤n≤N |w T x n + b| = 1
The hyperplane’s margin:
|w T x n +b|
γ = min1≤n≤N ||w ||

w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1
⇒ min1≤n≤N |w T x n + b| = 1
The hyperplane’s margin:
|w T x n +b|
γ = min1≤n≤N ||w ||
= ||w1 ||
Support Vector Machine: The Optimization Problem
1
We want to maximize the margin γ = ||w ||
1
Maximizing the margin γ = minimizing ||w || (the norm)
1

Our optimization problem would be:
||w ||2
Minimize f (w , b) =
2
subject to yn (w T x n + b) ≥ 1, n = 1, . . . , N
1

Our optimization problem would be:
||w ||2
Minimize f (w , b) =
2
subject to yn (w T x n + b) ≥ 1, n = 1, . . . , N
This is a Quadratic Program (QP) with N linear inequality constraints

Large Margin = Good Generalization
Large margins intuitively mean good generalization
We can give a slightly more formal justification to this
1
Recall: Margin γ = ||w ||
Large margin ⇒ small ||w ||
1
Recall: Margin γ = ||w ||
Large margin ⇒ small ||w ||
Small ||w || ⇒ regularized/simple solutions (wi ’s don’t become too large)
Simple solutions ⇒ good generalization on test data
Next class..
Solving the SVM optimization problem
Introduction to kernel methods (nonlinear SVMs)

Solid Theoretical Groundings

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Solid Theoretical Groundings

Hochgeladen von

Copyright:

Verfügbare Formate

Support Vector Machine (SVM)

Probably the most popular/influential classification algorithm

Backed by solid theoretical groundings (Vapnik and Cortes, 1995)

Probably the most popular/influential classification algorithm

Backed by solid theoretical groundings (Vapnik and Cortes, 1995)

A hyperplane based classifier (like the Perceptron)

Probably the most popular/influential classification algorithm

Backed by solid theoretical groundings (Vapnik and Cortes, 1995)

A hyperplane based classifier (like the Perceptron)

Additionally uses the Maximum Margin Principle

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

A hyperplane based linear classifier defined by w and b

Maximizing the margin γ = minimizing ||w || (the norm)

Maximizing the margin γ = minimizing ||w || (the norm)

Maximizing the margin γ = minimizing ||w || (the norm)

This is a Quadratic Program (QP) with N linear inequality constraints

Large margins intuitively mean good generalization

We can give a slightly more formal justification to this

Large margins intuitively mean good generalization

We can give a slightly more formal justification to this

Large margin ⇒ small ||w ||

Large margins intuitively mean good generalization

We can give a slightly more formal justification to this

Large margin ⇒ small ||w ||

Small ||w || ⇒ regularized/simple solutions (wi ’s don’t become too large)

Simple solutions ⇒ good generalization on test data

Solving the SVM optimization problem

Introduction to kernel methods (nonlinear SVMs)

Das könnte Ihnen auch gefallen