Beruflich Dokumente
Kultur Dokumente
Stéphane Canu
stephane.canu@litislab.eu
2.5
1.5
0.5
−0.5
−1
0
Slack j
0
The non separable case: a bi criteria optimization problem
Modeling potential errors: introducing slack variables ξi
yi (w⊤ xi + b) ≥ 1 ⇒ ξi = 0
no error:
(xi , yi )
error: ξi = 1 − yi (w⊤ xi + b) > 0
1
kwk2
0
min
2
w,b,ξ
n
CX p
min ξi
Slack j w,b,ξ p
i=1
0
with yi (w⊤ xi + b) ≥ 1 − ξi
ξi ≥ 0 i = 1, n
n
L(w) = 1 ξip
admisible set
X
Pareto’s front
w=0
p
p
L(w) = 1/p Yi=1 ji
Admissible solution
i=1
n
P(w) = kwk2
Dominance
w1 dominates w2 P(w) = || w ||2
n
min 1 X ξ p
w p i
i=1
with kwk2 ≤ k
n
min 1 X ξ p
w p i
i=1
with kwk2 ≤ k
min kwk2
w
n
1X p it works for CONVEX criteria!
with ξi ≤ k ′
p
i=1
yi (w⊤ xi + b) ≥ 1 ⇒ ξi = 0
no error:
(xi , yi )
error: ξi = 1 − yi (w⊤ xi + b) > 0
Looking for the saddle point of the lagrangian with the Lagrange
multipliers αi ≥ 0 and βi ≥ 0
n n n
1 C X p X X
L(w, b, α, β) = kwk2 + ξi − αi yi (w⊤ xi + b) − 1 + ξi − β i ξi
2 p
i=1 i=1 i=1
The KKT(p = 1)
n n n
1 2 C X p X ⊤
X
L(w, b, α, β) = kwk + ξi − αi yi (w xi + b) − 1 + ξi − β i ξi
2 p
i=1 i=1 i=1
n
X n
X
stationarity w − αi yi xi = 0 and αi yi = 0
i=1 i=1
C − αi − βi = 0 i = 1, . . . , n
(C − αi ) ξi = 0 i = 1, . . . , n
sets I0 IA IC
αi 0 0<α<C C
βi C C −α 0
ξi 0 0 1 − yi (w⊤ xi + b)
yi (w⊤ xi + b) > 1 yi (w⊤ xi + b) = 1 yi (w⊤ xi + b) < 1
useless usefull (support vec) suspicious
The importance of being support
4 4
3 3
2 2
1 1
0 0
−1 −1
−2
−2 −1 0 1 2 3 4
−2
−2 −1 0 1 2 3 4 .
data constraint
α set
point value
xi useless αi = 0 ⊤
yi w xi + b > 1 I0
xi support 0 < αi < C yi w⊤ xi + b = 1 Iα
xi suspicious αi = C yi w⊤ xi + b < 1 IC
Table: When a data point is « support » it lies exactly on the margin.
n
X
∇w L(w, b, α) =w− αi yi xi
i=1
n
Computing the gradients: ∂L(w, b, α) X
= α i yi
∂b
i=1
∇ξi L(w, b, α) = C − αi − βi
n
minn 21 α⊤ G α − e⊤ α
X
1 2
min 2 kwk +C ξi
α∈IR
w,b,ξ∈IRn
i=1 with y⊤ α = 0
with yi (w⊤ xi + b) ≥ 1 − ξi
and 0 ≤ αi ≤ C i = 1, n
ξi ≥ 0 i = 1, n
n unknown
d + n + 1 unknown G Gram matrix (pairwise
influence matrix)
2n constraints
2n box constraints
classical QP
easy to solve
to be used when n is too
large to build G to be used when n is not too
large
The smallest C
6
n
X
5 −1 ≤ fj = C yi (x⊤
i xj )+b ≤ 1
4 i=1
3
1
fM = max(f ) fm = min(f )
0
−1
−2 2
Cmax =
−3
−2 −1 0 1 2 3 4 fM − fm
Road map
1 Linear SVM
The non separable case
The C (L1) SVM
The L2 SVM and others “variations on a theme”
The hinge loss
0
Slack j
0
L2 SVM: optimality conditions (p = 2)
n n
1 C X 2 X
kwk2 + αi yi (w⊤ xi + b) − 1 + ξi
L(w, b, α, β) = ξi −
2 2
i=1 i=1
n
X
w
∇ L(w, b, α) = w − αi yi xi
i=1
n
Computing the gradients: ∂L(w, b, α) X
= α i yi
∂b
i=1
∇ L(w, b, α) = C ξ − α
ξi i i
Primal Dual
n
minn 21 α⊤ (G + C1 I )α − e⊤ α
X
C
1 2 ξi2
min 2 kwk + 2
α∈IR
w,b,ξ∈IRn
i=1 with y⊤ α = 0
with yi (w⊤ xi + b) ≥ 1 − ξi
and 0 ≤ αi i = 1, n
n unknown
d + n + 1 unknown
G Gram matrix is regularized
n constraints
n box constraints
classical QP
easy to solve
to be used when n is too
large to build G to be used when n is not too
large
One more variant: the ν SVM
max m
v,a
with min |v⊤ xi + a| ≥ m
i=1,n
kvk2 = k
1 2
Pn
min
v,a 2 kvk − ν m + i=1 ξi
⊤
with yi (v xi + a) ≥ m − ξi
ξi ≥ 0, m ≥ 0
2 kuk − kv k
w⊤ x = u ⊤ x − v ⊤ x and b =
ku − v k ku − v k
SVM with non symetric costs
0
Slack j
0
Eliminating the slack but not the possible mistakes
n
X
1 2
min 2 kwk +C ξi
w,b,ξ∈IRn
i=1
with yi (w⊤ xi + b) ≥ 1 − ξi
ξi ≥ 0 i = 1, n
n
X
min 12 kwk2 + C max 0, 1 − yi (w⊤ xi + b)
w,b
i=1
Definition (Subdifferential)
∂J(f ), the subdifferential of J at f is the set of all subgradients of J at f .
αn = αo + (λo − λn )d
d = (G )−1 y
Solving SVM in the primal
n
X
1 2
max 0, 1 − yi (w⊤ xi + b)
min 2 kwk +C
w,b
i=1
w,b logistic
i=1 exponential
classification loss
sigmoid
A tentative of classification:
convex/non convex
differentiable/non differentiable
universality = nonlinearity
◮ kernels
Bibliography
informatik.unibas.ch/fileadmin/Lectures/FS2013/CS331/Slides/my_SVM_without_b.pdf
http://ttic.uchicago.edu/~gregory/courses/ml2010/lectures/lect12.pdf
http://olivier.chapelle.cc/primal/