Sie sind auf Seite 1von 21

Support Vector Machine (SVM)

Probably the most popular/influential classification algorithm

Backed by solid theoretical groundings (Vapnik and Cortes, 1995)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 17
Support Vector Machine (SVM)

Probably the most popular/influential classification algorithm

Backed by solid theoretical groundings (Vapnik and Cortes, 1995)

A hyperplane based classifier (like the Perceptron)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 17
Support Vector Machine (SVM)

Probably the most popular/influential classification algorithm

Backed by solid theoretical groundings (Vapnik and Cortes, 1995)

A hyperplane based classifier (like the Perceptron)

Additionally uses the Maximum Margin Principle


Finds the hyperplane with maximum separation margin on the training data

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 17
Support Vector Machine

A hyperplane based linear classifier defined by w and b

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)
Assume the hyperplane is such that
w T x n + b ≥ 1 for yn = +1

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)
Assume the hyperplane is such that
w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)
Assume the hyperplane is such that
w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1
Equivalently, yn (w T x n + b) ≥ 1

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)
Assume the hyperplane is such that
w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1
Equivalently, yn (w T x n + b) ≥ 1
⇒ min1≤n≤N |w T x n + b| = 1
The hyperplane’s margin:
|w T x n +b|
γ = min1≤n≤N ||w ||

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine

A hyperplane based linear classifier defined by w and b


Prediction rule: y = sign(w T x + b)
Given: Training data {(x 1 , y1 ), . . . , (x N , yN )}
Goal: Learn w and b that achieve the maximum margin
For now, assume the entire training data is correctly classified by (w , b)
Zero loss on the training examples (non-zero loss case later)
Assume the hyperplane is such that
w T x n + b ≥ 1 for yn = +1
w T x n + b ≤ −1 for yn = −1
Equivalently, yn (w T x n + b) ≥ 1
⇒ min1≤n≤N |w T x n + b| = 1
The hyperplane’s margin:
|w T x n +b|
γ = min1≤n≤N ||w ||
= ||w1 ||

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 18
Support Vector Machine: The Optimization Problem
1
We want to maximize the margin γ = ||w ||

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 19
Support Vector Machine: The Optimization Problem
1
We want to maximize the margin γ = ||w ||

Maximizing the margin γ = minimizing ||w || (the norm)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 19
Support Vector Machine: The Optimization Problem
1
We want to maximize the margin γ = ||w ||

Maximizing the margin γ = minimizing ||w || (the norm)


Our optimization problem would be:

||w ||2
Minimize f (w , b) =
2
subject to yn (w T x n + b) ≥ 1, n = 1, . . . , N

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 19
Support Vector Machine: The Optimization Problem
1
We want to maximize the margin γ = ||w ||

Maximizing the margin γ = minimizing ||w || (the norm)


Our optimization problem would be:

||w ||2
Minimize f (w , b) =
2
subject to yn (w T x n + b) ≥ 1, n = 1, . . . , N

This is a Quadratic Program (QP) with N linear inequality constraints


Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 19
Large Margin = Good Generalization

Large margins intuitively mean good generalization

We can give a slightly more formal justification to this

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 20
Large Margin = Good Generalization

Large margins intuitively mean good generalization

We can give a slightly more formal justification to this

1
Recall: Margin γ = ||w ||

Large margin ⇒ small ||w ||

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 20
Large Margin = Good Generalization

Large margins intuitively mean good generalization

We can give a slightly more formal justification to this

1
Recall: Margin γ = ||w ||

Large margin ⇒ small ||w ||

Small ||w || ⇒ regularized/simple solutions (wi ’s don’t become too large)

Simple solutions ⇒ good generalization on test data

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 20
Next class..

Solving the SVM optimization problem

Introduction to kernel methods (nonlinear SVMs)

Machine Learning (CS771A) Online Learning via Stochastic Optimization, Perceptron, and Intro to SVMs 21

Das könnte Ihnen auch gefallen