Beruflich Dokumente
Kultur Dokumente
recognition
a paper by J.C.Burges
anirban sanyal
1 introduction
the main topics of my discussion can be broadly classified into the
following points;
1 . brief history of SVM
2 . separating hyperplane theory.
3 . application of SVM.
3 objective
since this is a supervised classification probelm,we have one train-
ning set and one teest set observations.we have to fit a boundary
over the trainning set and this boundary is applied on test set data
to classify the test set observations.
here we have the trainning set as (xn ,yn ) where yn ∈ {−1, 1} andxn
∈ Rp . 1
we call those observations with yn = 1 as the positive observa-
tions and those with yn = −1 as negative observation.
let us now consider the following three situations;
1. separable classes
2. non-separable classes.
3. non-linear boundary.
we will develop the algorithm of SVM from situation 1 and will try
to modify the algorithm in the subsequent cases.
4 completely separable classes
as we can see from the picture that the two clases are completely sep-
arable andf hence there can be a number of different boundaries(or
hyperplane) that can separate these two classes,as we can see from
the picture(2)
2
5 derivation of optimal hyprplane
for support vectors,the perpendicular distance from the hyperplane
is given by
kw’*xk + ak/kwki.e; 1/kwk. since for the optimal hyperplane ,this
band width should be maximum,we have to minimize kwk.
thus our objective function will be to minimize kwk subject to the
condition that yn ×(w’*xn + a) ≥1.
6 optimisation problem
we form the Lagrangian as
L=0.5* kwk2 + λi ∗ (yi ∗ (w0 ∗ xi + a − 1).we have to minimize L
P
⇒w= λi × yi ×xi ,here the sum is taken over all the support vec-
P
tors.
we put the value of w in L and so the dual of the problem reduces
to P
L= λi − ( 21 ) × λi × λj ×xi ×xj
P
7 overlapping classes
now let us extend this concept to overlapping classes.clearly we can-
not extend this things straight way to the overlapping classes.so we
do a little bit of modification.we introduce some slack variables ηi
3
where ηi ≥ 0.so we have xi .w+a ≥1-ηi for yi = +1 .......(1)
xi .w+a ≤ηi − 1 for yi = −1.......(2)
thus for a misclassified point,ηi ≥ 1 and hence ηi gives a upper
P
8 optimisation problem
here the lagrangian would be
9 non-linear boundaries
but there can atill be situations with boundaries which cannot be
approximated by linear boundaries.in those cases we transform the
data into a new data set so that the boundary in the new data set
is linear or approximately linear.so we transform the data xn to zn
such that
zn =φ(xn ) where φ is the required transformation.that is
φ :Rp −→H .here p ≤ dim(H ).here H may be hilbert space,euclidean
space.for the sake of simplicity,let us consider H to be Euclidean
space.
λi ×yi ×φ(xi ) and the optimal hyperplane is
P
so here w=
λi × yi × φ(xi )’ ×φ(x)+a=0.......(3).
P
4
finding the actual transformation.for this instance,let us suppose
thta such kernels do exist that can express the inner product be-
tween two vectors,then we can easily use that kernel to express the
optimal hyperplane in (3).let us call this kernel as K(.).then optimal
hyperplane is given by
λi × yi × K(xi ,x)+a= 0.
P
10 existance of kernel
10.1 Mercer’s condition
there exists a kernel K and a mapping φ such that K(x, y=φ(x) × φ(y)
iff for any g(x) such that
R 2
g (x)dx <∞
then, K(x, y) × g(x) × g(y)dxdy ≥ 0
R
then such P
kernels K(,.,)can be expressed as
K(x,y) = cp × (x.y)p
where cp are positive constants and the series is uniformly conver-
gent.
5
useful.
3. even the boundary is non-linear ,we don’t have to think about
the appropriate transformations because once we get the kernel,we
can easily compute the optimal hyperplane.
13 disadvantages of SVM
1. if the dimension of data is too large ,then the computation of
SVM is quite extensive
2. also in some situstions the appropriate kernels cannot be found
out easily.
14 references
1. a tutorial on support vector machine for pattern recognition
(J.C.Burges)
2.C.Cortes and Vapnik