Sie sind auf Seite 1von 28

Linear Classification: Maximizing the Margin

Yufei Tao
Department of Computer Science and Engineering
Chinese University of Hong Kong

1 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Consider that we are given a linearly separable dataset P in Rd . In other


words, P has points of two colors: red and blue; and we can find a plane
c1 x1 + c2 x2 + ... + cd xd = 0 such that given a point p(x1 , ..., xd ) in P:
if p is red, then c1 x1 + c2 x2 + ... + cd xd > 0.
if p is blue, then c1 x1 + c2 x2 + ... + cd xd < 0.
There can be many such separation planes. So far we have been satisfied
with finding any of them. In this lecture, we will introduce a metric to
measure the quality of a plane, and then discuss how to find the best
plane according to the metric.

2 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Given a separation plane with respect to P, we define its margin to be


its smallest distance to the points in P.
Example 1.

margin

margin

Which plane would you choose?

3 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Definition 2.
Let P be a linearly separable dataset in Rd . The goal of the large margin
separation problem is to find a separation plane with the maximum
margin.

margin

Any algorithm solving this problem is called a support vector machine.

4 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Next, we will discuss two methods to approach the problem. The first
one find gives the optimal solution, but is quite complicated and (often)
computationally expensive. The second method, on the other hand, is
much simpler and (often) much faster, but gives an approximate solution
which is nearly optimal.

5 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Finding the Optimal Plane

6 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

We will model the problem as a quadratic programing problem. Towards


that purpose, let us do some analysis. Consider an arbitrary separation
plane c1 x1 + c2 x2 + ... + cd xd = 0. Imagine that there are two copies of
the plane. One person moves a copy up, while another person moves the
other copy down, both at the same speed. They stop as soon as one of
the two copies hits a point in P.
1

margin

7 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Now, focus on the two copies of the plane in their final positions. If one
copy has equation c1 x1 + c2 x2 + ... + cd xd = cd+1 , then the other copy
must have equation c1 x1 + c2 x2 + ... + cd xd = cd+1 . Here cd+1 is a
strictly positive value. Let p(x1 , ..., xd ) be a point in P. We must have
(think: why?):
if p is red, then c1 x1 + c2 x2 + ... + cd xd

cd+1 ;

if p is blue, then c1 x1 + c2 x2 + ... + cd xd

cd+1 .

By dividing cd+1 on both sides of each inequality, we have:


if p is red, then w1 x1 + w2 x2 + ... + wd xd

if p is blue, then w1 x1 + w2 x2 + ... + wd xd

where
wi

8 / 28

Yufei Tao

ci
.
cd+1

Linear Classification: Maximizing the Margin

We will refer to the following plane as 1 :


w1 x1 + w2 x2 + ... + wd xd

the following plane as 2 :


w1 x1 + w2 x2 + ... + wd xd

The margin of the original separation plane is exactly half of the distance
between 1 and 2 :
1

margin

9 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Lemma 3.
The distance between 1 and 2 is pP2d

i=1

wi2

~ = [w1 , w2 , ..., wd ]. Also,


To prove the lemma, we will introduce vector w
given a point p(x1 , x2 , ..., xd ), we define vector p~ = [x1 , x2 , ..., xd ]. Thus,
~ p~ = 1, and that of 2 is w
~ p~ = 1. Note that
the equation
qP of 1 is w
d
2
|~
w| =
i=1 wi .

10 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Proof of Lemma 3
Take an arbitrary point p1 on 1 , and an arbitrary poitn p2 on 2 . Hence,
~ p~1 = 1 and w
~ p~2 = 1. It follows that w
~ (~
w
p1 p~2 ) = 2.
1
w
~

p1
2
p~1

p~2

p2

The distance between the two planes is precisely

11 / 28

Yufei Tao

~
w
|~
w|

(~
p1

p~2 ) =

2
|~
w| .

Linear Classification: Maximizing the Margin

In summary of the above, the essence of the large margin separation


problem is essentially
the following problem. We want to find w1 , ..., wd
Pd
to minimize i=1 wi2 (i.e., maximizing pP2d 2 ) subject to the
i=1

following constraints:

wi

For each red point p(x1 , ..., xd ) in P:


w1 x1 + w2 x2 + ... + wd xd

For each blue point p(x1 , ..., xd ) in P:


w1 x1 + w2 x2 + ... + wd xd

1.

This is an instance of quadratic programming.

12 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

We will not go into the details of quadratic programming, except to


mention:
The instance of quadratic programming on the previous slide is
polynomial-time solvable. The algorithm, however, is exceedingly
complex and falls out of the scope of the course.
There are standard packages online that will allow you to solve the
instance within a reasonable amount of time, provided that the
input set P is not too large.

13 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Finding an Approximate Plane

14 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Since quadratic programming is complex and expensive, next we trade


precision for simplicity (and usually, also efficiency). Let opt be the
margin of the optimal solution to the large margin separation problem.
We say that a separation plane is c-approximate if its margin is at least
opt /c.
We will give a simple algorithm to find a 4-approximate separation plane.
Here, the constant 4 is chosen to simplify our presentation. In fact, our
algorithm can be easily extended to obtain a (1 + )-approximate
separation plane for any arbitrarily small > 0.

15 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Let us first assume that we know a value satisfying opt (we will
clarify how to find later). Recall that a separation plane has the
equation of c1 x1 + c2 x2 + ... + cd xd = 0. Define vector ~c = [c1 , c2 , ..., cd ],
and refer to the plane as the plane determined by ~c . The goal is to find a
good ~c .
Our weapon is once again Perceptron. The dierence from before is that
we will now correct our ~c not only when a point falls on the wrong side
of the plane determined by ~c , but also when the point is too close to the
plane. Specifically, we say that a point p causes a violation in any of the
following situations:
Its distance to the plane determined by ~c is less than or equal to
/2, regardless of the color.
p is red but ~c p~ < 0.
p is blue but ~c p~ > 0.

16 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Margin Perceptron
The algorithm starts with ~c = [0, 0, ..., 0], and then runs in iterations.
In each iteration, it simply checks whether any point in p 2 P causes a
violation. If so, the algorithm adjusts ~c as follows:
If p is red, then ~c
If p is blue, then ~c

~c + p~.
~c

p~.

As soon as ~c has been adjusted, the current iteration finishes; and a new
iteration starts.
The algorithm finishes if no point causes any violation in the current
iteration.

17 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Define:
R

max{|~
p |}
p2P

Namely, R is the maximum distance from the origin to the points in P.


Theorem 4.
2
Margin Perceptron always terminates in at most 1 + 8R 2 / opt
iterations.
At termination, it returns a separation plane with margin at least /2.

The proof can be found in the appendix.

18 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Margin Perceptron requires a parameter opt . Theorem 4 tells us


that the larger is, the better the quality guarantee is for the returned
plane. Ideally, we would like to set = opt , which will give us a
2-approximate plane. Unfortunately, we do not know the value of opt .
Next, we present a strategy that allows us to obtain an estimate of opt
that is o by at most a factor of 4. The strategy is to start a low value of
, and then gradually increases it until we are sure that it is greater than
opt already.

19 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

An Incremental Algorithm
1

an arbitrary separation plane (obtained by, e.g., running the


old Perceptron algorithm or linear programming). Let 0 be the
margin of this plane.

4 0; i

1.

Run Margin Perceptron with parameter i . If the algorithm does


2
not terminate after 1 + 8R2 iterations, stop it manually. In this case,
i
we return as our final answer.

Update to the plane returned by the algorithm. Let


margin of .

6
7

20 / 28

the maximum distance from the origin to the points in P

i+1

be the

i + 1; Go to Line 4.

Yufei Tao

Linear Classification: Maximizing the Margin

Lemma 5.
Consider the i-th call to Margin Perceptron. If manual termination
occurs at Line 4, then i > opt . Otherwise, i+1 2 i .
Proof.
First consider the case of manual termination. If opt , then by
Theorem 4, we know that Margin Perceptron should have terminated in
2
2
at most 1 + 8R2 1 + 8R2 iterations. Contradiction.
opt

Now consider that Margin Perceptron terminated normally. Then, by


Theorem 4, 0 (the quality of the plane returned) must be at least i /2.
Hence, i+1 = 4 0 2 i .

21 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Corollary 6.
The algorithm terminates after at most 1 + log2
Perceptron.

opt
0

calls to Margin

Proof.
From the previous lemma, we know that i 2i 0 . Suppose that the
algorithm makes k calls to Margin Perceptron. Then,
2k 1 0 . Solving k from the inequality gives the
opt
k 1
corollary.

22 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Theorem 7.
Our incremental algorithm returns a separation plane with margin at least
opt /4.
Proof.
Suppose that the algorithm terminates after k calls to Margin
Perceptron. Hence:
k

>

opt

The plane we return eventually has quality


opt /4.

23 / 28

Yufei Tao

k /4,

which is at least

Linear Classification: Maximizing the Margin

Appendix

24 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Proof of Theorem 4.
Let u~ ~x = 0 be the optimal separation plane with margin
loss of generality, suppose that |~
u | = 1. Hence:
opt

opt .

Without

min{|~
p u~|}
p2P

Recall that the perceptron algorithm adjusts ~c in each iteration. Let ~ci
(i 1) be the ~c after the i-th iteration. Also, let ~c0 = [0, ..., 0] be the
initial ~c before the first iteration. Also, let k be the total number of
adjustments.

25 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Proof (cont.).
We claim that, for any i 0, ~ci+1 u~ ~ci u~ + opt . Due to symmetry,
we prove this only for the case where ~ci+1 was adjusted from ~ci because
of the violation of a red point p~.
In this case, ~ci+1 = ~ci + p~; and hence, ~ci+1 u~ = ~ci u~ + p~ u~. From the
definition of opt , we know that p~ u~
opt . Therefore,
~ci+1 u~ ~ci u~ + opt .
It follows that
|~ck |

26 / 28

~ck u~

Yufei Tao

opt .

Linear Classification: Maximizing the Margin

(1)

Proof (cont.).
2

opt
R
We also claim that, for any i 0, |~ci+1 | |~ci | + 2|~
ci | + 2 . Due to
symmetry, we will prove this only for the case where ~ci+1 was adjusted
from ~ci due to the violation of a red point p~.

~ci

p~1
O
(origin)
p~
p~2

As shown above, p~ = p~1 + p~2 , where p~1 is perpendicular to ~ci , and p~2 is
parallel to ~ci (and hence, perpendicular to the plane determined by ~ci ).
Therefore, ~ci+1 = ~ci + p~ = ~ci + p~1 + p~2 . The claim is true due to:
By definition of violation, p~2 is either is to the opposite of ~ci or has
a norm of |~
p2 | /2 opt /2.

Notice that |~
p1 | |~
p | R. Hence, |~ci + p~1 |2 = |~ci |2 + |~
p1 |2
R2 2
R2
2
2
|~ci | + R (|~ci | + 2|~ci | ) . It thus follows that |~ci + p~1 | |~ci | + 2|~
ci | .
27 / 28

Yufei Tao

Linear Classification: Maximizing the Margin

Proof (cont.).
The claim on the previous slide implies that when |~ci |
|~ci+1 | |~ci | +

opt

2R 2
opt

. Therefore:
|~ck |

2R 2

3k

opt

opt

Combining the above and (1) gives:


k

28 / 28

opt

2R 2

opt
2

8R

2
opt

Yufei Tao

3k

opt

Linear Classification: Maximizing the Margin

Das könnte Ihnen auch gefallen