Sie sind auf Seite 1von 4

SUPPORT VECTOR MACHINE

The approach done by SVM is follows: (SVM algorithm)

1. Starting with the assumptions of linearly separable data, explain what a hyperplane is
and how it relates to the classification problem.

2. Introduce the concept of maximum margin hyperplane and the intuition behind why
its the optimal classifier.

3. Using simple math (algebra + analytical geometry), develop the mathematical


formulation of the Hard-Margin SVM.

4. Introduce noisy non-linearly separable data and the intuition behind using slack
variables.

5. Extend the mathematical formulation from (3) to include the slack variables.

6. Conclude with the intuition behind the kernel trick.

SVM is a method that allows us to find MAXIMUM MARGINAL HYPER PLANE.

SVM is suitable for both linear and non- linear separable data and pattern.

Let us focus on the first case (Linear)

How can you use your data to predict if some person will or will not buy an IPAD, given two
measurements?
1) Draw a line such that all people who bought an IPAD would be above this line, and people who didnt -
below.

2) Now look at the measurements of the person, whose decision you are trying to predict? If the point
representing this person is above the line you predict yes, if not no.

You have probably already notice a problem with this: There is an infinite amount of possible lines. Which one
is the best?
Intuitively the best line is the line that is far away from both positive and negative examples (has the largest
margin)

This line is called maximum marginal line (or, since we are usually working with multidimensional
data Maximum Marginal Hyperplane).

SVM is a method that allows us to find this Hyperplane

Now, from middle school geometry classes you probably remember that line equation can be written as

So all of the point above this line will satisfy

Similarly, all of the points below this line will satisfy

w0+w1x1+w2x2<0w0+w1x1+w2x2<0
But remember we want to have the largest margin as well.

So we need to define the sides of the margin [dashed lines on the following figure]
(For all who have been wondering as to where 1 and -1 came from: We in fact can write them
as such due to re-scaling, I am not going to go into the details as to how)

We can combine these two inequalities into one getting:

Data points that are actually on the margins (so we have the equality instead of an inequality)
are called support vectors. Intuitively they are the hardest to classify examples in the data
set.

How can we find these vectors and the MMH using this inequality?

You can rewrite this inequality as a constrained quadratic optimization problem and there are
standard methods to solve it (Again, I am not including how to do that, this is the beginner
answer).

Based on the Lagrangian formulation (the one you have been using to re-write your inequality
as an optimization problem) your decision boundary (line) can be re-written as:

Alphas and b are parameters that are connected to w in the original formulation, you determine
them by solving the optimization problem.

l represent support vectors they too are determined during training

To predict if a certain person is to buy an IPAD using this boundary, you just plug his/her
measurements into the equation and check the sign of the result (you are checking if they are
above or below the class separating line).

So whats great about support vectors?

You can calculate the answer only using the support vectors, meaning you can discard your
training set after training is done: No need to keep all of the data to make the prediction.

But what if my data looks similar to the non linear case?

Then your data is not linearly separable in two dimensions, meaning you cannot find a line that
would separate your examples perfectly. You will have to send your data into a high
dimensional space where it is separable and then you are back to linear case in more than two
dimensions.

You have to choose a function that will transform your two dimensional data cases into higher
dimensional data cases. You dont have to do it explicitly: you can formulate your testing and
training objectives in terms of Kernels (Kernel trick). Kernel is a function representing dot
product of two transformed inputs.

Picking Kernel function and checking if a certain function is a valid Kernel function is beyond
the scope of this answer.

Das könnte Ihnen auch gefallen