Sie sind auf Seite 1von 70

V Hu Tip

Convex Optimization - Ti u Li
Theo blog: http://machinelearningcoban.com

Last update:

July 4, 2017
Contents

0 Li ni u . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

16 Convex sets v convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

17 Convex Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

18 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

51 n tp i s tuyn tnh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
0

Li ni u

Cho cc bn,

Ti liu ny l mt phn trong blog Machine Learning c bn ca ti c thc hin t u


nm 2017 cho ti thi im ny. Khi chun b ti liu ny, blog c 28 bi vit v nhiu
ghi ch ngn v Machine Learning/Artificial Intelligence v Optimization. Hng tip cn
ca ti l gii thiu mi thut ton Machine Learning thng qua vic xy dng mt hm s
c bit c gi l hm mt mt, hoc hm mc tiu v cc phng php ti u hm mc
tiu. c th hiu su v cc thut ton Machine Learning, ti lun cho rng hiu r cch
xy dng hm mt mt v cch ti u cc hm ng mt vai tr quan trng. V vy, ti
cng dnh thi gian cho mt s bi vit lin quan n Ti u.

Trong lnh vc Ti u, Ti u Li ng vai tr quan trng hn c v nhng tnh cht quan


trng ca n. Ti chn ba bi vit v Ti u Li chun b cho ti liu ny v ti bit rng
nhiu bn c trong blog mun c nhng phn c nhiu ton trn giy hn l c trn
my. Vic in trc tip t mn hnh website ra khng s tt v d sao vic chuyn i cng
l t ng.

Khng ch trong Machine Learning, cc lnh vc khoa hc k thut v c ti chnh kinh t


cng rt cn ti ti u. Ti hy vng rng ti liu ny s gip ch cho nhiu ngi Vit ang
hc tp v nghin cu trong v ngoi nc.

Ngn ng trong ti liu ny gn ging vi ngn ng trong blog v c lin quan cht ch ti
cc bi vit khc m ti c dn links. Tuy nhin, c gi cha c blog cng c th hiu
c v y l kin thc tng quan v Ti u Li.

Ni dung ca ti liu c da trn cun Convex Optimization ca tc gi ni ting


Stephen P. Boyd. Ti c gng lc b nhng phn i qu su vo ton, ng thi cng thm
cc v d gn vi thc t v chng trnh hc ton Vit Nam. Cc hnh v trong ti liu
c v li hon ton.

Ti liu ny c tng hp trong 1 ngy, ni dung gn nh tng t nh trn blog. Tuy


nhin, v vic chuyn t ngn ng trn web sang LaTeX kh phc tp nn chc chn ti
CHAPTER 0. LI NI U 2
khng trnh khi sai st. Nu thy phn no cn phi sa li, bn hy cho ti bit qua a
ch email vuhuutiep@gmail.com. Ti s tr li v chnh sa ngay khi c th.

Vn bn quyn:

Ton b ni dung trong bi, source code, v hnh nh minh ha (tr ni dung c trch dn)
u thuc bn quyn ca ti, V Hu Tip.

Ti rt mong mun kin thc ti vit trong blog ny n c vi nhiu ngi. Tuy nhin,
ti khng ng h bt k mt hnh thc sao chp khng trch ngun no. Mi ngun tin trch
ng bi vit cn nu r tn blog (Machine Learning c bn), tn tc gi (V Hu Tip),
v km link gc ca bi vit. Cc bi vit trch dn qu 25% ton vn bt k mt post no
trong blog ny l khng c php, tr trng hp c s ng ca tc gi.

Mi vn lin quan n vic sao chp, ng ti, s dng bi vit, cng nh trao i, cng
tc, xin vui lng lin h vi ti ti a ch email: vuhuutiep@gmail.com.

Ni dung trn blog ny l hon ton min ph. Ti cng khng s dng dch v qung co
no v khng mun lm phin cc bn trong khi c. Tuy nhin, nu bn thy ni dung blog
v ti liu ny hu ch v mun ng h, bn c th Mi ti mt ly c ph bng cch click vo
nt Buy me a coffee pha trn ct bn tri ca blog, loi c ph m bn vn thch ung.

Ti xin chn thnh cm n!

Trn trng,

V Hu Tip

www.machinelearningcoban.com

Hoa K, ngy 25 thng 6 nm 2017.

Machine Learning c bn www.machinelearningcoban.com


16

Convex sets v convex functions

16.1 Gii thiu

Nu bn c cc bi trc trong Blog Machine Learning c bn, chng ta lm quen


vi rt nhiu bi ton ti u. Hc Machine Learning l phi hc Ton Ti u, v hiu
hn v Ton Ti u, vi ti cch tt nht l tm hiu cc thut ton Machine Learning. Cho
ti lc ny, nhng bi ton ti u cc bn nhn thy trong blog u l cc bi ton ti
u khng rng buc (unconstrained optimization problems), tc ti u hm mt mt m
khng c iu kin rng buc (constraints) no v nghim c.

Khng ch trong Machine Learning, trn thc t cc bi ton ti u thng c rt nhiu


rng buc khc nhau. V d:

Ti mun thu mt ngi nh cch trung tm H Ni khng qu 5km vi gi cng thp


cng tt. Trong bi ton ny, gi thu nh chnh l hm mt mt (loss function, i khi
ngi ta cng dng cost function ch hm s cn ti u), iu kin khong cch khng
qu 5km chnh l rng buc (constraint).

Quay li bi ton d on gi nh theo Linear Regression, gi nh l mt hm tuyn tnh


ca din tch, s phng ng v khong cch ti trung tm. R rng, khi lm bi ton ny,
ta d on rng gi nh tng theo din tch v s phng ng, gim theo khong cch.
Vy nn mt nghim c gi l c l mt cht nu h s tng ng vi din tch v s
phng ng l cc s dng, h s tng ng vi khong cch l mt s m. trnh cc
nghim ngoi lai khng mong mun, khi gii bi ton ti u, ta nn cho thm cc iu
kin rng buc ny.

Trong Ti u, mt bi ton c rng buc thng c vit di dng:

x = arg min f0 (x)


x
subject to: fi (x) 0, i = 1, 2, . . . , m
hj (x) = 0, j = 1, 2, . . . , p
CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 4

Trong , vector x = [x1 , x2 , . . . , xn ]T c gi l bin ti u (optimization variable). Hm s


f0 : Rn R c gi l hm mc tiu (objective function, cc hm mc tiu trong Machine
Learning thng c gi l hm mt mt). Cc hm s fi , hj : Rn R, i = 1, 2, . . . , m; j =
1, 2, . . . , p c gi l cc hm rng buc (hoc n gin l rng buc - constraints). Tp hp
cc im x tha mn cc rng buc c gi l feasible set. Mi im trong feasible set c
gi l feasible point, cc im khng trong feasible set c gi l infeasible points.

Ch :

Nu bi ton l tm gi tr ln nht thay v nh nht, ta ch cn i du ca f0 (x).

Nu rng buc l ln hn hoc bng, tc fi (x) bi , ta ch cn i du ca rng buc l


s c iu kin nh hn hoc bng fi (x) bi .

Cc rng buc cng c th l ln hn hoc nh hn.

Nu rng buc l bng nhau, tc hj (x) = 0, ta c th vit n di dng hai bt ng thc


hj (x) 0 v hj (x) 0. Trong mt vi ti liu, ngi ta b cc phng trnh rng buc
hj (x) = 0 i.

Trong bi vit ny, x, y c dng ch yu k hiu cc bin s, khng phi l d liu


nh trong cc bi trc. Bin ti u chnh l bin c ghi di du arg min. Khi vit
mt bi ton Ti u, ta cn ch r bin no cn c ti u, bin no l c nh.

Cc bi ton ti u, nhn chung khng c cch gii tng qut, thm ch c nhng bi cha
c li gii. Hu ht cc phng php tm nghim khng chng minh c nghim tm c
c phi l global optimal hay khng, tc ng l im lm cho hm s t gi tr nh nht
hay ln nht hay khng. Thay vo , nghim thng l cc local optimal, tc cc im cc
tr. Trong nhiu trng hp, cc nghim local optimal cng mang li nhng kt qu tt.

bt u hc Ti u, chng ta cn hc mt mng rt quan trng trong , c tn l Ti


u Li (convex optimization), trong hm mc tiu l mt hm li (convex function),
feasible set l mt tp li (convex set). Nhng tnh cht c bit v local optimal v global
optimal ca mt hm li khin Ti u Li tr nn cc k quan trng. Trong bi vit ny,
ti s gii thiu ti cc bn cc nh ngha v tnh cht c bn ca tp li v hm li. Bi
ton ti u li (convex optimization problems) s c cp trong bi tip theo.

Machine Learning c bn www.machinelearningcoban.com


5 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

Example of convex sets

Hnh 16.1: Cc v d v convex sets.

16.2 Convex sets

16.2.1 nh ngha

Khi nim v convex sets c l khng xa l vi cc bn hc sinh Vit Nam khi chng ta
nghe v a gic li. Li, hiu n gin l phnh ra ngoi, hoc nh ra ngoi. Trong ton hc,
bng phng cng c coi l li.

nh ngha 1: Mt tp hp c gi l tp li (convex set) nu on thng ni hai im


bt k trong tp hp hp nm trn vn trong tp hp .

Mt vi v d v convex sets c cho trong Hnh 16.1.

Cc hnh vi ng bin mu en th hin vic bao gm c bin, bin mu trng th hin


vic bin khng nm trong tp hp ang xt. ng hoc on thng cng l mt tp
li theo nh ngha pha trn.

Mt vi v d thc t:

Gi s c mt cn phng c dng hnh li, nu ta t mt bng n sng bt k v


tr no trong phng, mi im trong cn phng u c chiu sng.

Nu mt t nc c bn dng mt hnh li th ng bay ni gia hai thnh ph bt


k trong t nc u nm trn vn trong khng phn ca nc . (Khng nh Vit
Nam, mun bay thng H Ni - H Ch Minh phi bay qua khng phn Campuchia).

Hnh 16.2 minh ho mt vi v d v nonconvex sets, tc tp hp m khng phi l li:

Ba hnh u tin khng phi l li v cc ng nt t cha nhiu im khng nm trong


cc tp . Hnh th t, hnh vung khng c bin y, khng phi l tp li v on

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 6

Examples of nonconvex sets

Hnh 16.2: Cc v d v nonconvex sets.

thng ni hai im y c th cha phn gia khng thuc tp ang xt (Nu khng c
bin th thnh vung vn l mt tp li, nhng bin na vi nh v d ny th hy ch ).
Mt ng cong bt k cng khng phi l tp li v d thy ng thng ni hai im bt
k khng thuc ng cong .

m t mt tp li di dng ton hc, ta s dng:

nh ngha 2: Mt tp hp C c gi l convex nu vi hai im bt k x1 , x2 C, im


x = x1 + (1 )x2 cng nm trong C vi bt k 0 1.

C th thy rng, tp hp cc im c dng (x1 + (1 )x2 ) chnh l on thng ni hai


im x1 v x2 .

Vi cc nh ngha ny th ton b khng gian l mt tp li v on thng no cng nm


trong khng gian . Tp rng cng c th coi l mt trng hp c bit ca tp li.

Di y l mt vi v d hay gp v tp li.

16.2.2 V d

Hyperplanes v halfspaces

Mt hyperplane (siu mt phng) trong khng gian n chiu l tp hp cc im tha mn


phng trnh:
a1 x1 + a2 x2 + + an xn = aT x = b (16.1)
vi b, ai , i = 1, 2, . . . , n l cc s thc.

Hyperplanes l cc tp li. iu ny c th d dng suy ra t nh ngha 1. Vi nh ngha


2, chng ta cng d dng nhn thy. Nu:

Machine Learning c bn www.machinelearningcoban.com


7 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

aT x1 = aT x2 = b
th vi 0 1 bt k:

aT x = aT (x1 + (1 )x2 ) = b + (1 )b = b

Mt halfspace (na khng gian) trong khng gian n chiu l tp hp cc im tha mn


phng trnh:
a1 x1 + a2 x2 + + an xn = aT x b
vi b, ai , i = 1, 2, . . . , n l cc s thc.

Cc halfspace cng l cc tp li, bn c c th d dng nhn thy theo nh ngha 1 hoc


chng minh theo nh ngha 2.

Norm balls

Euclidean balls (hnh trn trong mt phng, hnh cu trong khng gian ba chiu) l tp
hp cc im c dng:

B(xc , r) = x kx xc k2 r} = {xc + ru kuk2 1

Theo nh ngha 1, chng ta c th thy Euclidean balls l cc tp li, nu phi chng


minh, ta dng nh ngha 2 v cc tnh cht ca norms. Vi x1 , x2 bt k thuc B(xc , r) v
0 1 bt k:

kx xc k2 = k(x1 xc ) + (1 )(x2 xc )k2


kx1 xc k2 + (1 )kx2 xc k2
r + (1 )r = r

Vy nn x B(xc , r).

Euclidean ball s dng norm 2 lm khong cch. Nu s dng norm bt k l khong


cch, ta vn c mt tp li.

Khi s dng norm p:


1
kxkp = (kx1 kp + kx2 kp + . . . kxn kp ) p

vi p l mt s thc bt k khng nh hn 1 ta cng thu c cc tp li.

Hnh 16.13 minh ha tp hp cc im c ta (x, y) trong khng gian hai chiu tha mn:

(|x|p + |y|p )1/p 1 (16.2)

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 8


Cp = {(x, y) (|x|p + |y|p )1/p 1}

1 1 1 1 1

-1 1 -1 1 -1 1 -1 1 -1 1

-1 -1 -1 -1 -1

p = 18 p = 14 p= 12 p = 23 p = 45
p < 1: nonconvex sets

1 1 1 1 1

-1 1 -1 1 -1 1 -1 1 -1 1

-1 -1 -1 -1 -1
4 p=
p=1 p= 3
p=2 p=4
p 1: convex sets

Hnh 16.3: Hnh dng ca cc tp hp b chn bi pseudo-norms (hng trn) v norm (hng
di).

vi hng trn l cc tp vi 0 < p < 1 (khng phi norm) v hng di tng ng vi p 1.

Chng ta c th thy rng khi p nh gn bng 0, tp hp cc im tha mn bt ng thc


(16.2) gn nh nm trn cc trc ta v b chn trong on [0, 1]. Quan st ny s gip
ch cho cc bn khi lm vic vi (gi) norm 0 sau ny. Khi p , cc tp hp hi t v
hnh vung.

y cng l mt trong cc l do v sao cn c iu kin p 1 khi nh ngha norm.

Ellipsoids Cc ellipsoids (ellipse trong khng gian nhiu chiu) cng l cc tp li. Thc
cht, ellipsoides c mi quan h mt thit ti Khong cch Mahalanobis. Khong cch ny
vn d l mt norm nn ta c th chng minh theo nh ngha 2 c tnh cht li ca cc
ellipsoids.

Mahalanobis norm ca mt vector x Rn c nh ngha l:



kxkA = xT A1 x

Vi A1 l mt ma trn tha mn:

xT A1 x 0, x Rn (16.3)

Machine Learning c bn www.machinelearningcoban.com


9 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

Hnh 16.4: Tri: Giao ca cc tp li l mt tp li. Phi: giao ca cc hyperplanes v halfspace


l mt tp li v c gi l polyhedron (s nhiu l polyhedra).

Khi mt ma trn A1 tha mn iu kin (16.3), ta ni ma trn xc nh dng (positive


definite). Mt ma trn l xc nh dng nu cc tr ring (eigenvalues) ca n l dng.

Nhn tin, mt ma trn B c gi l na xc nh dng (positive semidefinite) nu cc


tr ring ca n l khng m. Khi xT Bx 0, x. Nu du bng xy ra khi v ch khi
x = 0 th ta ni ma trn xc nh dng. Trong biu thc (16.3), v ma trn A c nghch
o nn mi tr ring ca n phi khc khng. V vy, A l mt ma trn xc nh dng.

Mt ma trn A l xc nh dng hoc na xc nh dng s c k hiu ln lt nh


sau:
A  0, A  0.

Cng li nhn tin, khong cch Mahalanobis c lin quan n khong cch t mt im ti
mt phn phi xc sut (from a point to a distribution).

16.2.3 Giao ca cc tp li l mt tp li.

Vic ny c th nhn d nhn thy vi Hnh 16.4 (tri). Giao ca hai trong ba hoc c ba
tp li u l cc tp li.

Vic chng minh vic ny theo nh ngha 2 cng khng kh. Nu x1 , x2 thuc vo giao ca
cc tp li, tc thuc tt c cc tp li cho, th (x1 + (1 )x2 ) cng thuc vo tt c
cc tp li, tc thuc vo giao ca chng!

T suy ra giao ca cc halfspaces v cc hyperplanes cng l mt tp li. Trong khng


gian hai chiu, tp li ny chnh l a gic li, trong khng gian ba chiu, n c tn l a
din li.

Trong khng gian nhiu chiu, giao ca cc halfspaces v hyperplanes c gi l polyhedra.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 10

e
lan
erp
hyp
ng ti
ara
sep
Hnh 16.5: Tri: Giao ca cc tp li l mt tp li. Phi: giao ca cc hyperplanes v halfspace
l mt tp li v c gi l polyhedron (s nhiu l polyhedra).

Gi s c m halfspace v p hyperplanes. Mi mt halfspace, theo nh trnh by pha trn,


c th vit di dng aTi x bi , i = 1, 2, . . . , m. Mi mt hyperplane c th vit di dng:
cTi x = di , i = 1, 2, . . . , p.

Vy nu t A = [a1 , a2 , . . . , am ], b = [b1 , b2 , . . . , bm ]T , C = [c1 , c2 , . . . , cp ] v d =


[d1 , d2 , . . . , dp ]T , ta c th vit polyhedra di dng tp hp cc im x tha mn:

AT x  b, CT x = d

trong  l element-wise, tc mi phn t trong v tri nh hn hoc bng phn t tng


ng trong v phi.

16.2.4 Convex combination v Convex hulls

Mt im c gi l convex combination (t hp li) ca cc im x1 , x2 , . . . , xk nu n


c th vit di dng:

x = 1 x1 + 2 x2 + + k xk , with 1 + 2 + + k = 1

Convex hull ca mt tp hp bt k l tp hp tt c cc im l convex combination


ca tp hp . Convex hull l mt convex set. Convexhull ca mt convex set l chnh n.
Mt cch d nh, convex hull ca mt tp hp l mt convex set nh nht cha tp hp
. Khi nim nh nht rt kh nh ngha, nhng n cng l mt cch nh trc quan.

Hai tp hp c gi l linearly separable nu cc convex hulls ca chng khng c im


chung.

Trong Hnh 16.5, convex hull ca cc im mu xanh l vng mu xm bao vi cc a gic


li. Hnh 16.5 phi, vng mu xm nm di vng mu xanh.

Machine Learning c bn www.machinelearningcoban.com


11 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

nh l siu phng phn chia (Separating hyperplane theorem): nh l ny ni


rng nu hai tp li khng rng C, D l disjoint (khng giao nhau), th tn ti vector a v
s b sao cho:
aT x b, x C, aT x b, x D
Tp hp tt c cc im x tha mn aT x = b chnh l mt hyperplane. Hyperplane ny
c gi l separating hyperplane.

Ngoi ra cn nhiu tnh cht th v ca cc tp li v cc php ton bo ton chnh cht


li ca mt tp hp, cc bn c khuyn khch c thm Chng 2 ca cun Convex
Optimization trong phn ti liu tham kho.

16.3 Convex functions

Hn cc bn nghe ti khi nim ny khi n thi i hc mn ton. Khi nim hm li


c quan h ti o hm bc hai v Bt ng thc Jensen (nu bn cha nghe ti phn ny,
khng sao, by gi bn s bit).

16.3.1 nh ngha

trc quan, trc ht ta xem xt cc hm 1 bin, th ca n l mt ng trong mt


mt phng. Mt hm s c gi l li nu tp xc nh ca n l mt tp li v nu
ta ni hai im bt k trn th hm s , ta c mt on thng nm v pha trn
hoc nm trn th (xem Hnh 16.6).

Tp xc nh (domain) ca mt hm s f (.) thng c k hiu l domf .

nh ngha theo ton hc:

nh ngha convex function: Mt hm s f : Rn R c gi l mt hm li (convex


function) nu domf l mt tp li, v:

f (x + (1 )y) f (x) + (1 )f (y)

vi mi x, y domf, 0 1.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 12

Hnh 16.6: nh ngha hm li.


Din t bng li, mt hm s l
f (y) li nu on thng ni 2 im bt
f (x) + (1 )f (y) k trn th ca n khng nm
di th .


f (x) f x + (1 )y

f (x) + (1 )f (y) f x + (1 )y

iu kin domf l mt tp li l rt quan trng, v nu khng c n, ta khng nh ngha


c f (x + (1 )y).

Mt hm s f c gi l concave (nu bn mun dch l lm cng c, ti khng thch


cch dch ny) nu f l convex. Mt hm s c th khng thuc hai loi trn. Cc hm
tuyn tnh va convex, va concave.

nh ngha strictly convex function: (ting Vit c mt s ti liu gi l hm li mnh


hoc hm li cht) Mt hm s f : Rn R c gi l strictly convex nu domf l mt
tp li, v:
f (x + (1 )y) < f (x) + (1 )f (y)
vi mi x, y domf, x 6= y, 0 < < 1.

Tng t vi nh ngha strictly concave.

y l mt im quan trng: Nu mt hm s l strictly convex v c im cc tr,


th im cc tr l duy nht v cng l global minimum.

16.3.2 Cc tnh cht c bn

Nu f (x) l convex th af (x) l convex nu a > 0 v l concave nu a < 0. iu ny c


th suy ra trc tip t nh ngha.

Tng ca hai hm li l mt hm li, vi tp xc nh l giao ca hai tp xc nh kia


(nhc li rng giao ca hai tp li l mt tp li)

Pointwise maximum and supremum: Nu cc hm s f1 , f2 , . . . , fm l convex th:


f (x) = max{f1 (x), f2 (x), . . . , fm (x)}
cng l convex trn tp xc nh l giao ca tt c cc tp xc nh ca cc hm s trn.
Hm max pha trn cng c th thay th bng hm sup. Tnh cht ny c th chng minh

Machine Learning c bn www.machinelearningcoban.com


13 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

Hnh 16.7: V d v Pointwise maxi-


f (x) = max{f1 (x), f2 (x)} mum. Maximum ca cc hm li l mt
x)
f 2( hm li.

f1 (x)

y = ax + b y = |x| y = x2 y = ex y = x1 , x > 0

Hnh 16.8: V d v cc convex functions mt bin.

c theo nh ngha. Bn cng c th nhn ra da vo hnh v d di y. Mi on


thng ni hai im bt k trn ng mu xanh u khng nm di ng mu xanh.

16.3.3 V d

Cc hm mt bin

V d v cc convex functions mt bin:

Hm y = ax + b l mt hm li v ng ni hai im bt k nm trn chnh th .

Hm y = eax vi a R bt k.

Hm y = xa trn tp cc s thc dng v a 1 hoc a 0.

Hm negative entropy y = x log x trn tp cc s thc dng.

Hnh 16.8 minh ho th ca mt vi convex functions:

V d v cc concave functions mt bin:

Hm y = ax + b l mt concave function v y l mt convex function.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 14


y = ax + b y=2 x y = 2 log(x) y = 3 + x if x < 0
y = 3 x2 if x 0

Hnh 16.9: V d v cc concave functions mt bin.

Hm y = xa trn tp s dng v 0 a 1.

Hm logarithm y = log(x) trn tp cc s dng.

Hnh 16.9 minh ho th ca mt vi concave functions.

Affine functions

Cc hm s dng f (x) = aT x + b va l convex, va l concave.

Khi bin l mt ma trn X, cc hm affine c nh ngha c dng:

f (X) = trace(AT X) + b

trong trace l hm s tnh tng cc gi tr trn ng cho ca mt ma trn vung, A


l mt ma trn c cng chiu vi X ( m bo php nhn ma trn thc hin c v kt
qu l mt ma trn vung).

Quadratic forms

Hm bc hai mt bin c dng f (x) = ax2 + bx + c l convex nu a > 0, l concave nu


a < 0.

Vi bin l mt vector x = [x1 , x2 , . . . , xn ], mt quadratic form l mt hm s c dng:

f (x) = xT Ax + bT x + c

Vi A thng l mt ma trn i xng, tc aij = aji , i, j, c s hng bng s phn t ca


x, b l mt ma trn bt k cng chiu vi x v c l mt hng s bt k.

Nu A l mt ma trn (na) xc nh dng th f (x) l mt convex function.

Machine Learning c bn www.machinelearningcoban.com


15 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

Nu A l mt ma trn (na) xc nh m, tc xT Ax 0, x, th f (x) l mt concave


function.

Cc bn c th tm c v ma trn xc nh dng v cc tnh cht ca n trong sch i


s tuyn tnh bt k. Nu bn gp nhiu kh khn trong phn ny, hy c li kin thc v
i s tuyn tnh, rt rt quan trng trong Ti u v Machine Learning.

Hm mt mt trong Linear Regression c dng:


1 1
L(w) = ky Xwk22 = (y Xw)T (y Xw)
2 2
1 T T 1
= w X Xw yT Xw + yT y
2 2
v XT X l mt ma trn xc nh dng, hm mt mt ca Linear Regression chnh l mt
convex function.

Norms

Vng, li l norms. Mt hm s bt k tha mn ba iu kin ca norm u l mt convex


function. Bn c c th chng minh iu ny bng nh ngha.

Hnh 16.10 minh ho hai v d v norm 1 (tri) v norm 2 (phi) vi s chiu l 2 (chiu th
ba trong hnh di y l gi tr ca hm s).

Nhn thy rng cc b mt ny u c mt y duy nht tng ng vi gc ta (y


chnh l iu kin u tin ca norm). Cc hm strictly convex khc cng c dng tng t,
tc c mt y duy nht. iu ny cho thy nu ta th mt hn bi v tr bt k trn cc
b mt ny, cui cng n s ln v y. Nu lin tng ti thut ton Gradient Descent th
vic p dng thut ton ny vo cc bi ton khng rng buc vi hm mc tiu l strictly
convex (v gi sa l kh vi, tc c o hm) s cho kt qu rt tt nu learning rate khng
qu ln. y chnh l mt trong cc l do v sao cc convex functions l quan trng, cng l
l do v sao ti dnh bi vit ny ch ni v convexity. (Bn c c khuyn khch c
hai bi v Gradient Descent trong blog ny).

Tin y, ti cng ly thm hai v d v cc hm khng phi convex (cng khng phi
1
concave). Hm th nht f (x, y) = x2 y 2 l mt hyperbolic, hm th hai f (x, y) = 10 (x2 +
2
2y 2 sin(xy)).

Contours - level sets Vi cc hm s phc tp hn, khi v cc mt trong khng gian ba


chiu s kh tng tng hn, tc kh nhn c tnh convexity ca n. Mt phng php
thng c s dng l dng contours hay level sets. Ti cng cp n khi nim ny
trong Bi Gradient Descent, phn ng ng mc.

Contours l cch m t cc mt trong khng gian ba chiu bng cch chiu n xung khng
gian hai chiu. Trong khng gian hai chiu, cc im thuc cng mt ng tng ng vi

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 16

(a) Norm 1
(b) Norm 2

Hnh 16.10: V d v mt ca cc norm hai bin.

Hnh 16.11: V d v cc hm hai bin khng convex.

cc im lm cho hm s c gi tr bng nhau. Mi ng cn c gi l mt level set.


Trong Hnh 16.10 v Hnh 16.11, cc ng ca cc mt ln mt phng 0xy chnh l cc
level sets. Mt cch hiu khc, mi ng level set l mt vt ct nu ta ct cc b mt bi
mt mt phng song song vi mt phng 0xy.

Machine Learning c bn www.machinelearningcoban.com


17 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

f(x, y) = |x| + |y| f(x, y) = x 2 + y 2 f(x, y) = max(2x 2 + y 2 xy, |x| + 2|y|)

f(x, y) = xlog(x) + ylog(y)


f(x, y) = x + y f(x, y) = x 2 y 2 (nonconvex)

Hnh 16.12: V d v Countours. Cc ng mu cng xanh m th tng ng vi cc gi tr


cng nh, cc ng mu cng m th tng ng cc gi tr cng ln.

Khi th hin mt hm s hai bin kim tra tnh convexity ca n, hoc tm im cc


tr ca n, ngi ta thng v contours thay v v cc mt trong khng gian ba chiu. Hnh
16.12 minh ho mt vi v d v contours.

hng trn, cc ng level sets l cc ng khp kn (closed). Khi cc ng kn ny tp


trung nh dn mt im th cc im l cc im cc tr. Vi cc convex functions nh
trong ba v d ny, ch c 1 im cc tr v cng l im lm cho hm s t gi tr nh
nht (global optimal). Nu , bn s thy cc ng khp kn ny to thnh mt vng
li!.

hng di, cc ng khng phi khp kn. Hnh bn tri tng ng vi mt hm tuyn
tnh f (x, y) = x + y v l mt convex function. Hnh gia cng l mt convex function
(bn c th chng minh iu ny sau khi tnh o hm bc hai, ti s ni pha di) nhng
cc level sets l cc ng khng kn. Hm ny c log nn tp xc nh l gc phn t th
nht tng ng vi cc ta dng (ch rng tp hp cc im c ta dng cng l
mt tp li). Cc ng khng kn ny nu kt hp vi trc Ox, Oy s to thnh bin ca
cc tp li. Hnh cui cng l contours ca mt hm hyperbolic, hm ny khng phi l hm
li.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 18

16.3.4 sublevel sets

nh ngha: sublevel set ca mt hm s f : Rn R c nh ngha l:



C = {x domf f (x) }

Tc tp hp cc im trong tp xc nh ca f m ti hm s t gi tr nh hn hoc
bng .

Quay li vi Hnh 16.12, hng trn, cc sublevel sets chnh l phn b bao bi cc level
sets.

hng di, bn tri, cc sublevel sets chnh l phn na mt phng pha di xc nh


bi cc ng thng level sets. hnh gia, cc sublevel sets chnh l cc vng b gii
hn bi cc trc ta v cc level sets.

Hng di, bn phi, cc sublevel sets hi kh tng tng cht. Vi > 0, cc level
sets l cc ng mu vng hoc . Cc sublevel sets tng ng l phn b bp vo
trong, gii hn bi cc ng cng mu. Cc vng ny, c th d nhn thy, l khng li.

nh l: Nu mt hm s l li th mi sublevel sets ca n l li. iu gc li cha


chc ng, tc nu cc sublevel sets ca mt hm s l li th hm s cha chc
li.

iu ny ch ra rng nu tn ti mt gi tr sao cho mt sublevel set ca mt hm s


l khng li, th hm s l khng li (khng li nhng khng c ngha l concave, ch ).
Vy nn Hyperbolic khng phi l hm li.

Cc v d Hnh 16.12, tr hnh cui cng, u tng ng vi cc hm li.

Mt v d v vic mt hm s khng convex nhng mi sublevel sets l convex l hm


f (x, y) = ex+y . Hm ny c mi sublevel sets l na mt phng - l convex, nhng n
khng phi l convex (trong trng hp ny n l concave).

Di y l mt v d khc v vic mt hm s c mi sublevel sets l li nhng khng


phi hm li.

Mi sublevel sets ca hm s ny u l cc hnh trn - convex nhng hm s khng


phi l li. V c th tm c hai im trn mt ny sao cho on thng ni hai im nm
hon ton pha di ca mt (mt im cnh v 1 im y chng hn).

Machine Learning c bn www.machinelearningcoban.com


19 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

Hnh 16.13: Mi alpha-sublevel sets l convex sets nhng hm s l nonconvex.

Nhng hm s c tp xc nh l mt tp li v c mi c sublevel sets l li c gi


chung l quasiconvex. Mi convex function u l quasiconvex nhng ngc li khng ng.
nh ngha chnh thc ca quasiconvex function c pht biu nh sau:

Quasiconvex function: Mt hm s f : C R vi C l mt tp con li ca Rn c gi


l quasiconvex nu vi mi x, y) C v mi [0, 1], ta c:

f (x + (1 )y) max{f (x), f (y)}

nh ngha ny khc vi nh ngha v convex function mt cht.

16.3.5 Kim tra tnh cht li da vo o hm.

C mt cch nhn bit mt hm s kh vi c l hm li hay khng da vo cc o hm


bc nht hoc o hm bc hai ca n.

First-order condition

Trc ht chng ta nh ngha phng trnh ng (mt) tip tuyn ca mt hm s f kh


vi ti mt im nm trn th (mt) ca hm s (x0 , f (x0 ). Vi hm mt bin, bn c
quen thuc:
y = f 0 (x0 )(x x0 ) + f (x0 )

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 20

f is differentiable with convex domain


f is convex iff f (x) f (x0 ) + f (x0 )T (x x0 ), x, x0 domf
)
x0
T ( x
)
f (x
0
f (x)
)+
f (x
(x0 , f (x0 )) 0

convex function nonconvex function

Hnh 16.14: Kim tra tnh convexity da vo o hm bc nht. Tri: hm li v tip tuyn ti
mi im u nm di th hm s , phi: hm khng li.

Vi hm nhiu bin, t f (x0 ) l gradient ca hm s f ti im x0 , phng trnh mt


tip tuyn c cho bi:
y = f (x0 )T (x x0 ) + f (x0 )

First-order condition ni rng: Gi s hm s f c tp xc nh l mt tp li, c o


hm ti mi im trn tp xc nh . Khi , hm s f l li nu v ch nu vi mi
x, x0 trn tp xc nh ca hm s , ta c:

f (x) f (x0 ) + f (x0 )T (x x0 ) (16.4)

Tng t nh th, mt hm s l stricly convex nu du bng trong (16.4) xy ra khi v ch


khi x = x0 .

Ni mt cch trc quan hn, mt hm s l li nu ng (mt) tip tuyn ti mt im


bt k trn th (mt) ca hm s nm di th (mt) .

(ng qun iu kin v tp xc nh l li.)

Di y l v d v hm li v hm khng li.

Hm bn tri l mt hm li. Hm bn phi khng phi l hm li v th ca n va nm


trn, va nm di tip tuyn.

(iff l vit tt ca if and only if )

V d: Nu ma trn i xng A l xc nh dng th hm s f (x) = xT Ax l hm li.

Machine Learning c bn www.machinelearningcoban.com


21 CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS

Chng minh: o hm bc nht ca hm s trn l:


f (x) = 2Ax
Vy first-order condition c th vit di dng (ch rng A l mt ma trn i xng):
xT Ax 2(Ax0 )T (x x0 ) + xT0 Ax0
xT Ax 2xT0 Ax xT0 Ax0
(x x0 )T A(x x0 ) 0

Bt ng thc cui cng l ng da trn nh ngha ca mt ma trn xc nh dng. Vy


hm s f (x) = xT Ax l hm li.

First-order condition t c s dng tm tnh cht li ca mt hm s, thay vo ,


ngi ta thng dng Second-order condition vi cc hm c o hm ti bc hai.

Second-order condition

Vi hm nhiu bin, tc bin l mt vector, gi s c chiu l d, o hm bc nht ca n


l mt vector cng c chiu l d. o hm bc hai ca n l mt ma trn vung c chiu
l d d. o hm bc hai ca hm s f (x) c k hiu l 2 f (x). o hm bc hai cn
c gi l Hessian.

Second-order condition: Mt hm s c o hm bc hai l convex nu domf l convex


v Hessian ca n l mt ma trn na xc nh dng vi mi x trong tp xc nh:
2 f (x)  0.

Nu Hessian l mt ma trn xc nh dng th hm s strictly convex. Tng t, nu


Hessian l mt ma trn xc nh m th hm s l strictly concave.

Vi hm s mt bin f (x), iu kin ny tng ng vi f (x) 0 vi mi x thuc tp


xc nh (v tp xc nh l li).

V d:

Hm negative entropy f (x) = x log(x) l stricly convex v tp xc nh l x > 0 l mt


tp li v f (x) = 1/x l mt s dng vi mi x thuc tp xc nh.

Hm f (x) = x2 + 5 sin(x) khng l hm li v o hm bc hai f (x) = 2 5 sin(x) c


th nhn gi tr m.

Hm cross entropy l mt hm strictly convex. Xt v d n gin vi ch hai xc sut x


v 1 x vi a l mt hng s thuc on [0, 1] v 0 < x < 1: f (x) = (a log(x) + (1
1a
a) log(1 x)) c o hm bc hai l xa2 + (1x) 2 l mt s dng.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 16. CONVEX SETS V CONVEX FUNCTIONS 22

Nu A l mt ma trn xc nh dng th f (x) = 12 xT Ax l li v Hessian ca n chnh


l A l mt ma trn xc nh dng.

Xt hm s negative entropy vi hai bin: f (x, y) = x log(x) + y log(y) trn tp cc gi


 x v 
tr dng ca y. Hm s ny c o hm bc nht l [log(x) + 1, log(y) + 1]T v
1/x 0
Hessian l , l mt ma trn ng cho vi cc thnh phn trn ng cho
0 1/y
l dng nn l mt ma trn xc nh dng. Vy negative entropy l mt hm strictly
convex.(Ch rng mt ma trn l xc nh dng nu cc tr ring ca n u dng.
Vi mt ma trn l ma trn ng cho th cc tr ring ca n chnh l cc thnh phn
trn ng cho.)

Ngoi ra cn nhiu tnh cht th v ca cc hm li, cc bn c khuyn khch c thm


Chng 3 ca cun Convex Optimization trong phn ti liu tham kho.

16.4 Tm tt

Machine Learning v Optimization c quan h mt thit vi nhau. Trong Optimization,


Convex Optimization l quan trng nht. Mt bi ton l convex optimization nu hm
mc tiu l convex v tp hp cc im tha mn cc iu kin rng buc l mt convex
set.

Trong convex set, mi on thng ni hai im bt k trong tp s nm hon ton


trong tp . Tp hp cc giao im ca cc convex sets l mt convex set.

Mt hm s l convex nu on thng ni hai im bt k trn th hm s khng


nm di th .

Mt hm s kh vi l convex nu tp xc nh ca n l convex v ng (mt) tip


tuyn khng nm pha trn th (b mt) ca hm s .

Cc norms l cc hm li, c s dng nhiu trong ti u.

16.5 Ti liu tham kho

[1] Convex Optimization Boyd and Vandenberghe, Cambridge University Press, 2004.

Machine Learning c bn www.machinelearningcoban.com


17

Convex Optimization Problems

Ni dung trong bi vit ny ch yu c dch t Chng 4 ca cun Convex Optimization


trong phn Ti liu tham kho.

17.1 Gii thiu

Ti xin bt u bi vit ny bng ba bi ton kh gn vi thc t:

17.1.1 Bi ton nh xut bn

Bi ton

Mt nh xun bn (NXB) nhn c n hng 600 bn ca cun "Machine Learning c


bn" ti Thi Bnh v 400 bn ti Hi Phng. NXB c 800 cun kho Nam nh v
700 cun kho Hi Dng. Gi chuyn pht mt cun sch t Nam nh ti Thi Bnh l
50,000 VND (50k), ti Hi Phng l 100k. Gi chuyn pht mt cun t Hi Dng ti Thi
Bnh l 150k, trong khi ti Hi Phng ch l 40k. Hi tn t chi ph chuyn pht nht,
cng ty nn phn phi mi kho chuyn bao nhiu cun ti mi a im?

Phn tch

cho n gin, ta xy dng bng s lng chuyn sch t ngun ti ch nh sau:

Tng chi ph (objective function) s l f (x, y, z, t) = 5x + 10y + 15z + 4t. Cc iu kin rng
buc (constraints) vit di dng biu thc ton hc l:

Chuyn 600 cun ti Thi Bnh: x + z = 600.


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 24

Ngun ch n gi (10k) S lng


Nam nh Thi Bnh 5 x
Nam nh Hi Phng 10 y
Hi Dng Thi Bnh 15 z
Hi Dng Hi Phng 4 t

Chuyn 400 cun ti Hi Phng: y + t = 400.

Ly t kho Nam nh khng qu 800: x + y 800.

Ly t kho Hi Dng khng qu 700: z + t 700.

x, y, z, t l cc s t nhin. Rng buc l s t nhin s khin cho bi ton rt kh gii


nu s lng bin l rt ln. Vi bi ton ny, ta gi s rng x, y, z, t l cc s thc dng.
Khi tm c nghim, nu chng khng phi l s t nhin, ta s ly cc gi tr t nhin
gn nht.

Vy ta cn gii bi ton ti u sau y:

Bi ton NXB:

(x, y, z, t) = arg minx,y,z,t 5x + 10y + 15z + 4t (17.1)


subject to: x + z = 600 (17.2)
y + t = 400 (17.3)
x + y 800 (17.4)
z + t 700 (17.5)
x, y, z, t 0 (17.6)

Nhn thy rng hm mc tiu (objective function) l mt hm tuyn tnh ca cc bin


x, y, z, t. Cc iu kin rng buc u c dng hyperplanes hoc halfspaces, u l cc rng
buc tuyn tnh (linear constraints). Bi ton ti u vi c objective function v constraints
u l linear c gi l Linear Programming (LP). Dng tng qut v cch thc lp
trnh gii mt bi ton thuc loi ny s c cho trong phn sau ca bi vit ny.

Nghim cho bi ton ny c th nhn thy ngay l x = 600, y = 0, z = 0, t = 400. Nu rng


buc nhiu hn v s bin nhiu hn, chng ta cn mt li gii c th tnh c bng cch
lp trnh.

Machine Learning c bn www.machinelearningcoban.com


25 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

17.1.2 Bi ton canh tc

Bi ton

Mt anh nng dn c tng cng 10ha (10 hecta) t canh tc. Anh d tnh trng c ph v
h tiu trn s t ny vi tng chi ph cho vic trng ny l khng qu 16T (triu ng).
Chi ph trng c ph l 2T cho 1ha, trng h tiu l 1T/ha/. Thi gian trng c ph l
1 ngy/ha v h tiu l 4 ngy/ha; trong khi anh ch c thi gian tng cng l 32 ngy. Sau
khi tr tt c cc chi ph (bao gm chi ph trng cy), mi ha c ph mang li li nhun 5T,
mi ha h tiu mang li li nhun 3T. Hi anh phi trng nh th no ti a li nhun?
(Cc s liu c th v l v chng c chn bi ton ra nghim p)

Phn tch

Gi x v y ln lt l s ha c ph v h tiu m anh nng dn nn trng. Li nhun anh


y thu c l f (x, y) = 5x + 3y (triu ng).

Cc rng buc trong bi ton ny l:

Tng din tch trng khng vt qu 10: x + y 10.

Tng chi ph trng khng vt qu 16T: 2x + y 16.

Tng thi gian trng khng vt qu 32 ngy: x + 4y 32.

Din tch c ph v h tiu l cc s khng m: x, y 0.

Vy ta c bi ton ti u sau y:

Bi ton canh tc:


(x, y) = arg maxx,y 5x + 3y (17.7)
subject to: x + y 10 (17.8)
2x + y 16 (17.9)
x + 4y 32 (17.10)
x, y 0 (17.11)

Bi ton ny hi khc mt cht l ta cn ti a hm mc tiu thay v ti thiu n. Vic


chuyn bi ton ny v bi ton ti thiu c th c thc hin n gin bng cch i du

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 26

y Hnh 17.1: Minh ho nghim cho


bi ton canh tc. Phn ng gic
x+4

5x
y=3 mu xm th hin tp hp cc
2

+
im tho mn cc rng buc. Cc

3y
ng nt t th hin cc ng

=
b(
ng mc ca hm mc tiu vi

ac
fe
as
mu cng tng ng vi gi tr

on
ib

st
le cng cao. Nghim tm c chnh

ant
se

x
tX

)
+
l im mu xanh, l giao im ca

y
=
hnh ng gic xm v ng ng

10
2x
x mc ng vi gi tr cao nht.
+
(0, 0)
y=
16

hm mc tiu. Khi hm mc tiu vn l linear, cc rng buc vn l cc linear constraints,


ta li c mt bi ton Linear Programming (LP) na.

Bn cng c th da vo Hnh 17.1 suy ra nghim ca bi ton.

Vng mu xm c dng polyhedron (trong trng hp ny l a gic) chnh l tp hp cc


im tho mn cc rng buc t (17.8) n (17.11). Cc ng nt t c mu chnh l cc
ng ng mc ca hm mc tiu 5x + 3y, mi ng ng vi mt gi tr khc nhau vi
ng cng ng vi gi tr cng cao. Mt cch trc quan, nghim ca bi ton c th
tm c bng cch di chuyn ng nt t mu xanh v pha bn phi (pha lm cho gi
tr ca hm mc tiu ln hn) n khi n khng cn im chung vi phn a gic mu xm
na.

C th nhn thy nghim ca bi ton chnh l im mu xanh l giao im ca hai ng


thng x + y = 10 v 2x + y = 16. Gii h phng trnh ny ta c x} = 6 v y } = 4. Tc anh
nng dn nn trng 6ha c ph v 4ha h tiu. Lc li nhun thu c l 5x} + 3y } = 42
triu ng, trong khi anh ch mt thi gian l 22 ngy. (Chu tnh ton ci l khc ngay, lm
t, hng nhiu).

Vi nhiu bin hn v nhiu rng buc hn, chng ta liu c th v c hnh nh th ny


nhn ra nghim hay khng? Cu tr li ca ti l nn tm mt cng c vi nhiu bin
hn v vi cc rng buc khc nhau, chng ta c th tm ra nghim gn nh ngay lp tc.

17.1.3 Bi ton ng thng

Bi ton

Mt cng ty phi chuyn 400 m3 ct ti a im xy dng bn kia sng bng cch thu
mt chic x lan. Ngoi chi ph vn chuyn mt lt i v l 100k ca chic x lan, cng ty
phi thit k mt thng hnh hp ch nht t trn x lan ng ct. Chic thng ny

Machine Learning c bn www.machinelearningcoban.com


27 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

khng cn np, chi ph cho cc mt xung quanh l 1T/m2 , cho mt y l 2T/m2 . Hi kch
thc ca chic thng nh th no tng chi ph vn chuyn l nh nht. cho n
gin, gi s ct ch c ngang hoc thp hn vi phn trn ca thnh thng, khng c
ngn. Gi s thm rng x lan rng v hn v cha c sc nng v hn, gi s ny khin
bi ton d gii hn.

Phn tch

Gi s chic thng cn lm c chiu di l x (m), chiu rng l y v chiu cao l z. Th tch


ca thng l xyz (n v l m3 ). C hai loi chi ph l:

400
Chi ph thu x lan: s chuyn x lan phi thu l xyz (ta hy tm gi s rng y l mt
s t nhin, vic lm trn ny s khng thay i kt qu ng k v chi ph vn chuyn
400 40
mt chuyn l nh so vi chi ph lm thng). S tin phi tr cho x lan s l 0.1 xyz = xyz .

Chi ph lm thng: Din tch xung quanh ca thng l 2(x + y)z. Din tch y l xy.
Vy tng chi ph lm thng l 2(x + y)z + 2xy = 2(xy + yz + zx).

Tng ton b chi ph l f (x, y, z) = 40x1 y 1 z 1 + 2(xy + yz + zx). iu kin rng buc
duy nht l kch thc thng phi l cc s dng. Vy ta c bi ton ti u sau:

Bi ton vn chuyn:

(x, y) = arg minx,y,z 40x1 y 1 z 1 + 2(xy + yz + zx)


subject to: x, y, z > 0 (17.12)

Bi ton ny thuc loi Geometric Programming (GP). nh ngha ca GP v cch


dng cng c ti u s c trnh by trong phn sau ca bi vit.

Nhn thy rng bi ny hon ton c th dng bt ng thc Cauchy gii c, nhng ti
vn mun mt li gii cho bi ton tng qut sao cho c th lp trnh c.

(Li gii:
20 20
5
f (x, y, z) = + + 2xy + 2yz + 2zx 5 3200
xyz xyz


du bng xy ra khi v ch khi x = y = z = 5 10. Bi ny c l hp vi cc k thi v d kin
qu p. C nhn ti thch cc bi ra kiu ny hn l yu cu i tm gi tr nh nht ca
mt biu thc nhm chn, nhiu hc sinh cho rng khng bit hc bt ng thc lm g!)

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 28

Nu c cc rng buc v kch thc ca thng v trng lng m x lan ti c th c th


tm c li gii n gin nh th ny khng?

Nhng bi ton trn y u l cc bi ton ti u. Chnh xc hn na, chng u l cc


bi ton ti u li (convex optimization problems) nh cc bn s thy phn sau. V vic
tm li gii c th khng my kh khn, thm ch gii bng tay cng c th ra kt qu. Tuy
nhin, mc ch ca bi vit ny khng phi l hng dn cc bn gii cc bi ton trn
bng tay, m l cch nhn din cc bi ton v a chng v cc dng m cc toolboxes sn
c c th gip chng ta. Trn thc t, lng d kin v s bin cn ti u ln hn nhiu,
chng ta khng th gii cc bi ton trn bng tay c.

Trc ht, chng ta cn hiu cc khi nim v convex optimization problems v ti sao convex
li quan trng. (Bn c c th c ti phn 4 nu khng mun bit cc khi nim v nh
l ton trong phn 2 v 3.)

17.2 Nhc li bi ton ti u

17.2.1 Cc khi nim c bn

Ti xin nhc li bi ton ti u dng tng qut:

x = arg min f0 (x)


x
subject to: fi (x) 0, i = 1, 2, . . . , m (17.13)
hj (x) = 0, j = 1, 2, . . . , p

Pht biu bng li: Tm gi tr ca bin x ti thiu hm f0 (x) trong s cc gi tr ca


x tho mn cc iu hin rng buc. Ta c bng cc tn gi ting Anh v ting Vit nh
trong Bng 17.1.

Ngoi ra:

Khi m = p = 0, bi ton (17.13) c gi l unconstrained optimization problem (bi


ton ti u khng rng buc).

D ch l tp xc nh, tc giao ca tt c cc tp xc nh ca mi hm s xut hin


trong bi ton. Tp hp cc im tho mn mi iu kin rng buc, thng thng, l
mt tp con ca D c gi l feasible set hoc constraint set. Khi feasible set l mt tp
rng th ta ni bi ton ti u (17.13) l infeasible. Nu mt im nm trong feasible set,
ta gi im l feasible.

Optimal value (gi tr ti u) ca bi ton ti u (17.13) c nh ngha l:

Machine Learning c bn www.machinelearningcoban.com


29 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

K hiu Ting Anh Ting Vit


x Rn optimization variable bin ti u
f0 : Rn R objective/loss/cost/function hm mc tiu
fi (x) 0 inequality constraint bt ng thc rng buc
fi : Rn R inequality constraint function hm bt ng thc rng buc
hj (x) = 0 equality constraint ng thc rng buc
hj : Rn R equality constraint function hm ng thc rng buc
Tm Tp
D = i=0 domfi j=1 domhj domain tp xc nh

Bng 17.1: Bng cc thut ng trong cc bi ton ti u.

p = inf {f0 (x)|fi (x) 0, i = 1, . . . , m; hj (x) = 0, j = 1, . . . , p}

trong inf l vit tt ca hm infimum. p c th nhn cc gi tr . Nu bi ton


l infeasible, ta coi p = +, Nu hm mc tiu khng b chn di (unbounded below )
trong tp xc nh, ta coi p = .

17.2.2 Optimal and locally optimal points

Mt im x c gi l mt im optimal point (im ti u), hoc l nghim ca bi


ton (17.13) nu x l feasible v f0 (x ) = p . Tt hp tt c cc optimal points c gi l
optimal set.

Nu optimal set l mt tp khng rng, ta ni bi ton (17.13) l solvable (gii c). Ngc
li, nu optimal set l mt tp rng, ta ni optimal value l khng th t c (not attained/
not achieved ).

V d: xt hm mc tiu f (x) = 1/x vi rng buc x > 0. Optimal value ca bi ton ny


l p} = 0 nhng optimal set l mt tp rng v khng c gi tr no ca x hm mc tiu
t gi tr 0. Lc ny ta ni gi tr ti u l khng t c.

Vi hm mt bin, mt im l cc tiu ca mt hm s nu ti , hm s t gi tr nh
nht trong mt ln cn (v ln cn ny thuc tp xc nh ca hm s). Trong khng gian
1 chiu, ln cn c hiu l tr tuyt ti ca hiu 2 im nh hn mt gi tr no .

Trong ton ti u (thng l khng gian nhiu chiu), ta gi mt im x l locally optimal


(cc tiu) nu tn ti mt gi tr (thng c gi l bn kinh) R sao cho:

f0 (x) = inf f0 (z)|fi (z) 0, i = 1, . . . , m,

hj (z) = 0, j = 1, . . . , p, kz xk2 R (17.14)

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 30

Nu mt im feasible x tho mn fi (x) = 0, ta ni rng bt ng thc rng buc th


i : fi (x) = 0 l active. Nu fi (x) < 0, ta ni rng rng buc ny l inactive ti x.

17.2.3 Mt vi lu

Mc d trong nh ngha bi ton ti u (17.13) l cho bi ton ti thiu hm mc tiu vi


cc rng buc tho mn cc iu kin nh hn hoc bng 0, cc bi ton ti u vi ti a
hm mc tiu v iu kin rng buc dng khc u c th a v c dng ny:

max f0 (x) min f0 (x).

fi (x) g(x) fi (x) g(x) 0.

fi (x) 0 fi (x) 0.

a fi (x) b fi (x) b 0 v a fi (x) 0.

fi (x) 0 fi (x) + si = 0 v si 0. si c gi l slack variable. Php bin i n


gin ny trong nhiu trng hp li t ra hiu qu v bt ng thc si 0 thng d gii
quyt hn l fi (x) 0.

17.3 Bi ton ti u li

Trong ton ti u, chng ta c bit quan tm ti nhng bi ton m hm mc tiu l mt


hm li, v feasible set cng l mt tp li.

17.3.1 nh ngha

Mt bi ton ti u li (convex optimization problem) l mt bi ton ti u c dng:

x = arg min f0 (x)


x
subject to: fi (x) 0, i = 1, 2, . . . , m (17.15)
aTj x bj = 0, j = 1, . . . ,

trong f0 , f1 , . . . , fm l cc hm li.

So vi bi ton ti u (17.13), bi ton ti u li (17.15) c thm ba iu kin na:

Hm mc tiu l mt hm li.

Machine Learning c bn www.machinelearningcoban.com


31 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

Cc hm bt ng thc rng buc fi l cc hm li.

Hm ng thc rng buc hj l affine (hm linear cng vi mt hng s na c gi l


affine).

Mt vi nhn xt:

Tp hp cc im tho mn hj (x) = 0 l mt tp li v n c dng mt hyperplane.

Khi fi l mt hm li th tp hp cc im tho mn fi (x) 0 chnh l 0-sublevel set


ca fi v l mt tp li.

Nh vy tp hp cc im tho mn mi iu kin rng buc chnh l giao im ca cc


tp li, v vy n l mt tp li.

Vy, trong mt bi ton ti u li, ta ti thiu mt hm mc tiu li trn mt


tp li .

17.3.2 Cc tiu ca bi ton ti u li chnh l im ti u.

Tnh cht quan trng nht ca bi ton ti u li chnh l bt k locally optimal point chnh
l mt im (globally) optimal point.

Tnh cht quan trng ny c th chng minh bng phn chng nh sau. Gi x0 l mt im
locally optimal, tc:

f0 (x0 ) = inf{f0 (x)|x is feasible, kx x0 k2 R}

vi R > 0 no . Gi s x0 khng phi l globally optimal point, tc tn ti mt feasible


point y sao cho f (y) < f (x0 ) (hin nhin rng y khng nm trong ln cn ang xt). Ta c
th tm c [0, 1] nh sao cho z = (1 )x0 + y nm trong ln cn ca x0 , tc
kz x0 k2 < R. Ch rng z cng l mt feasible point v feasible set l mt tp li. Hn
na, v hm mc tiu f0 l mt hm li, ta c:

f0 (z) = f0 ((1 )x0 + y) (17.16)


(1 )f0 (x0 ) + f0 (y) (17.17)
< (1 )f0 (x0 ) + f0 (x0 ) (17.18)
= f0 (x0 ) (17.19)

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 32

Hnh 17.2: Biu din hnh hc ca

lane
erp
iu kin ti u cho hm mc tiu

ing hyp
kh vi. Cc ng nt t c mu

support
tng ng vi cc level sets (ng
x0 ng mc).
f0 (x0 )
x

feasible set X

f 0(x
0)
level sets

iu ny mu thun vi gi thit x0 l mt im cc tiu. Vy gi s sai, tc x0 chnh l


globally optimal point v ta c iu phi chng minh.

Chng minh bng li: gi s mt im cc tiu khng phi l im lm cho hm s t gi


tr nh nht. Vi iu kin feasible set v hm mc tiu l li, ta lun tm c mt im
khc trong ln cn ca im cc tiu sao cho gi tr ca hm mc tiu ti im mi ny
nh hn gi tr ca hm mc tiu ti im cc tiu. S mu thun ny ch ra rng vi mt
bi ton ti u li, im cc tiu phi l im lm cho hm s t gi tr nh nht.

17.3.3 iu kin ti u cho hm mc tiu kh vi

Nu hm mc tiu f0 l kh vi (differentiable), theo first-order condition, vi mi x, y


domf0 , ta c:
f0 (x) f0 (x0 ) + f0 (x0 )T (x x0 ) (17.20)

t X l feasible set. iu kin cn v mt im x0 X l optimal point l:

f0 (x0 )T (x x0 ) 0, x X (17.21)

Ti xin c b qua vic chng minh iu kin cn v ny, bn c c th tm trong


trang 139-140 ca cun Convex Optimization trong Ti liu tham kho.

Mt cch hnh hc, iu kin ny ni rng: Nu x0 l im optimal th vi mi x X , vector


i t x0 ti x hp vi vector f0 (x0 ) mt gc t (Xem Hnh 17.2). Ni cch khc, nu ta
v mt tip tuyn ca hm mc tiu ti x0 th mi im feasible nm v mt pha so vi
mt tip tuyn ny. Hn na, feasible set nm v pha lm cho hm mc tiu t gi tr cao
hn f0 (x0 ). Mt tip tuyn ny chnh l supporting hyperplane ca feasible set ti im x0 .
Nhc li rng khi v cc level set, ti thng dng mu lam ch gi tr nh, mu
ch gi tr ln ca hm s.

(Mt mt phng i qua mt im trn bin ca mt tp hp sao cho mi im trong tp hp


nm v mt pha (hoc nm trn) so vi mt phng c gi l supporting hyperplane

Machine Learning c bn www.machinelearningcoban.com


33 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

(siu phng h tr). Nu mt tp hp l li, tn ti supporting hyperplane ti mi im trn


bin ca n.)

Nu tn ti mt im x0 trong feasible set sao cho f0 (x0 ) = 0, y chnh l optimal point.


iu ny d hiu v chnh l im lm cho gradient bng 0, tc im cc tiu ca hm
mc tiu. Nu f0 (x0 ) 6= 0, vector f0 (x0 ) chnh l vector php tuyn ca supporting
hyperplane ti x0 .

17.3.4 Gii thiu th vin CVXOPT

CVXOPT l mt th vin min ph trn Python gip gii rt nhiu cc bi ton trong cun
sch Convex Optimization phn Ti liu tham kho. Tc gi th hai ca cun sch ny,
Lieven Vandenberghe, chnh l ng tc gi ca th vin ny. Hng dn ci t, ti liu
hng dn, v cc v d mu ca th vin ny cng c y trn trang web CVXOPT.

Trong phn cn li ca bi vit, ti s gii thiu 3 bi ton rt c bn trong Convex


Optimization: Linear Programming, Quadratic Programming, v Geometric Programming.
Ti cng s cng cc bn lp trnh gii cc v d nu phn u bi vit da trn
th vin CVXOPT ny.

17.4 Linear Programming

Chng ta cng bt u vi lp cc bi ton n gin nht trong Convex Optimization -


Linear Programming (LP, mt s ti liu cng gi l Linear Program), trong hm mc
tiu f0 v hm bt ng thc rng buc fi , i = 1, . . . , m u l cc hm tuyn tnh cng vi
mt hng s (tc hm affine).

17.4.1 Dng tng qut ca LP

A general LP:

x = arg min cT x + d
x
subject to: Gx  h (17.22)
Ax = b

Trong G Rmn , h Rm v, A Rpn , b Rp , c, x Rn v d l mt s v hng (s


v hng ny c th b qua v n khng nh hng ti nghim ca bi ton ti u, n ch

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 34

lm thay i gi tr ca hm mc tiu). Nhc li rng k hiu  ngha l mi phn t trong


vector (ma trn) v tri nh hn hoc bng phn t tng ng trong vector (ma trn)
v phi.

Ch rng nhiu bt ng thc dng gi x hi , vi gi l cc vector hng, c th vit gp


di dng Gx  h trong mi hng ca G ng vi mt gi , mi phn t ca h tng ng
vi mt hi .

17.4.2 Dng tiu chun ca LP

Trong dng tiu chun (standard form) LP, cc bt ng thc rng buc ch l iu kin cc
nghim c thnh phn khng m:

A standard form LP:

x = arg min cT x
x
subject to: Ax = b (17.23)
x0

Bi ton (17.22) c th a v bi ton (17.23) bng cch t thm bin slack s.

x = arg min cT x
x,s
subject to: Ax = b (17.24)
Gx + s = h
s0

Tip theo, nu ta biu din x di dng hiu ca hai vector m thnh phn ca n u
khng m, tc: x = x+ x , vi x+ , x  0. Ta c th tip tc vit li (17.24) di dng:

x = arg +min

cT x+ cT x
x ,x ,s
+
subject to: Ax Ax = b (17.25)
Gx+ Gx + s = h
x+  0, x  0, s  0

Ti y, bn c c th thy rng (17.25) c th vit gn li nh (17.23).

Bi ton nh xut bn v Bi ton canh tc trong phn u ca bi vit ny chnh l cc


LP.

Machine Learning c bn www.machinelearningcoban.com


35 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

Hnh 17.3: Biu din hnh hc ca


Linear Programming Linear Programming.

x1 x0 c
feasible set X

17.4.3 Minh ho bng hnh hc ca bi ton LP

Cc bi ton LP c th c minh ho nh Hnh 17.3.

im x0 chnh l im lm cho hm mc tiu t gi tr nh nht, im x1 chnh l im


lm cho hm mc tiu t gi tr ln nht. Vi cc bi ton LP, nghim, nu c, thng l
mt im nh ca polyheron hoc l mt mt ca polyhedron (trong trng hp cc
ng level sets ca hm mc tiu song song vi mt , v trn mt , hm mc tiu t
gi tr ti u).

V LP, cc bn c th tm thy rt nhiu ti liu c ting Vit (Quy hoch tuyn tnh) v
ting Anh. C rt nhiu cc bi ton trong thc t c th a v dng LP. Phng php
thng c dng gii bi ton ny c tn l simplex (n hnh). Ti s khng cp
n cc phng php ny, thay vo , ti s hng dn cc bn dng th vin CVXOPT
gii quyt cc bi ton thuc dng ny.

17.4.4 Gii LP bng CVXOPT

Ti s dng th vin CVPOPT gii Bi ton canh tc pha trn. Nhc li bi ton ny:

Bi ton canh tc:


(x, y) = arg maxx,y 5x + 3y
subject to: x + y 10 (17.26)
2x + y 16
x + 4y 32
x, y 0

Cc iu kin rng buc c th vit li di dng Gx  h, trong :

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 36


1 1 10
2 1 16

G=
1 4 h = 32

1 0 0
0 1 0

Li gii cho bi ton ny khi dng CVXOPT l:


1 from cvxopt import matrix, solvers
2 c = matrix([-5., -3.])
3 G = matrix([[1., 2., 1., -1., 0.], [1., 1., 4., 0., -1.]])
4 h = matrix([10., 16., 32., 0., 0.])
5
6 solvers.options[show_progress] = False
7 sol = solvers.lp(c, G, h)
8
9 print(Solution")
10 print(sol[x])

1 Solution:
2 [ 6.00e+00]
3 [ 4.00e+00]

Nghim ny chnh l nghim m ti tm c trong phn u ca bi vit.

Mt vi lu :

Hm solvers.lp ca cvxopt gii bi ton (17.24).

Trong bi ton ca chng ta, v ta cn tm gi tr ln nht nn ta phi i hm mc tiu


v dng 5x 3y. Chnh v vy m c = matrix([5., 3.]).

Hm matrix nhn u vo l mt list (trong Python), list ny th hin mt vector ct.


Nu mun biu din mt ma trn, u vo ca matrix l mt list ca list, trong mi
list bn trong th hin mt vector ct ca ma trn .

Cc hng s trong bi ton cn dng s thc. Nu chng l cc s nguyn, ta cn thm


du . vo sau cc s th th hin l s thc. (Ti thy im ny hi tha, nhng
nu khng c du . th chng trnh s bo li.)

Vi ng thc rng buc Ax = b, solvers.lp ly gi tr mc nh ca A v b l None, tc


nu khng khi bo th ngha l khng c ng thc rng buc no.

Vi cc tu chn khc, bn c c th tm trong Ti liu ca CVXOPT.

Vic gii Bi ton nh xut bn bng CVXOPT xin nhng li cho bn c nh mt bi


tp n gin.

Machine Learning c bn www.machinelearningcoban.com


37 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

Hnh 17.4: Biu din hnh hc ca


Quadratic Programming Quadratic Programming.

x0 f0 (x0 )
feasible set X

17.5 Quadratic Programming

17.5.1 nh ngha bi ton Quadratic Programming

Mt dng Convex Optimization m cc bn s gp rt nhiu trong cc bi sau ca blog l


Quadratic Programming (QP, hoc Quadratic Program). Khc bit duy nht ca QP so vi
LP l hm mc tiu c dng Quadratic form:

Quadratic Programming:
1
x = arg min xT Px + qT x + r
x 2
subject to: Gx  h (17.27)
Ax = b

Trong P Sn+ (tp cc ma trn vung na xc nh dng c s ct l n), G Rmn , A


Rpn . iu kin P l na xc nh dng m bo hm mc tiu l convex .

Chng ta c th thy rng LP chnh l mt trng hp c bit ca QP vi P = 0.

Din t bng li: trong QP, chng ta ti thiu mt hm quadratic li trn mt polyhedron
(Xem Hnh 17.4).

17.5.2 V d v QP

Bi ton vui: C mt hn o m hnh dng ca n c dng mt a gic li. Mt con thuyn


ngoi bin th cn i theo hng no ti o nhanh nht, gi s rng tc ca sng
v gi bng 0.

Bi ton khong cch t mt im ti mt polyhedron c pht biu nh sau:

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 38

Cho mt polyhedron c biu din bi Ax  b v mt im u, tm im x thuc polyhedron


sao cho khong cch Euclidean gia x v u l nh nht.

Bi ton ny c th pht biu nh sau:


1
x = arg min kx uk22
x 2
subject to: Gx  h

Hm mc tiu t gi tr nh nht bng 0 nu u nm trong polyheron v optimal point


chnh l x = u. Khi u khng nm trong polyhedron, ta vit:
1 1 1 1
kx uk22 = (x u)T (x u) = xT x uT x + uT u
2 2 2 2

Biu thc ny c dng hm mc tiu nh trong (17.27) vi P = I, q = u, r = 12 uT u, trong


I l ma trn n v.

17.5.3 V d v gii QP bng CVXOPT

Xt bi ton sau y (xem Hnh 17.5):

(x, y) = arg min(x 10)2 + (y 10)2


x,y

1 1 10
2 1   16

subject to: 1 4 x  32

1 0 y 0
0 1 0

Feasible set trong bi ton ny ti ly trc tip t Bi ton canh tc v u = [10, 10]T . Bi
ton ny c th c gii bng CVXOPT nh sau:
1 from cvxopt import matrix, solvers
2 P = matrix([[1., 0.], [0., 1.]])
3 q = matrix([-10., -10.])
4 G = matrix([[1., 2., 1., -1., 0.], [1., 1., 4., 0., -1.]])
5 h = matrix([10., 16., 32., 0., 0])
6
7 solvers.options[show_progress] = False
8 sol = solvers.qp(P, q, G, h)
9
10 print(Solution:)
11 print(sol[x])

Machine Learning c bn www.machinelearningcoban.com


39 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

y (10, 10) Hnh 17.5: V d v khong


cch gia mt im v mt
x+4 polyhedron.
y=3
2

x
+
y
=
10
2x
x

+
(0, 0)
y=
16

1 Solution:
2 [ 5.00e+00]
3 [ 5.00e+00]

Trong cc thut ton Machine Learning, cc bn s gp cc bi ton v tm hnh chiu


(projection) ca mt im ln mt tp li ni chung rt nhiu. Ti tng phn, ti s cp
hng gii quyt ca cc bi ton .

17.6 Geometric Programming

Trong mc ny, chng ta s thy mt lp cc bi ton khng li khi nhn vo hm mc tiu


v cc hm rng buc, nhng c th c bin i v dng li bng mt vi k thut.

Trc ht, chng ta cn c mt vi nh ngha:

17.6.1 Monomials v posynomials

Mt hm s f : Rn R vi tp xc inh domf = Rn++ (tt c cc phn t u l s dng)


c dng:
f (x) = cxa11 xa22 . . . xann (17.28)
trong c > 0 v ai R, c gi l mt monomial function (khi nim ny kh ging vi
n thc khi ti hc lp 8, nhng khi SGK nh ngha vi c bt k v ai l cc s t
nhin).

Tng ca cc monomials:

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 40

X
K
f (x) = ck xa11k xa22k . . . xannk
k=1

trong cc ck > 0 c gi l posynomial function (a thc), hoc n gin l posynomial.

17.6.2 Geometric Programming

Mt bi ton ti u c dng:

x = arg min f0 (x)


x
subject to: fi (x) 1, i = 1, 2, . . . , m (17.29)
hj (x) = 1, j = 1, 2, . . . , p

trong f0 , f1 , . . . , fm l cc posynomials v h1 , . . . , hp l cc monomials, c gi l Geo-


metric Programming (GP). iu kin x  0 c n i.

Ch rng nu f l mt posynomial, h l mt monomial th f /h l mt posynomial.

V d:

(x, y, z) = arg min x/y


x,y,z
subject to: 1x2 (17.30)

x3 + 2y/z y
x/y = z

C th c vit li di dng GP:

(x, y, z) = arg min xy 1


x,y,z
1
subject to: x 1 (17.31)
(1/2)x 1
x3 y 1/2 + 2y 1/2 z 1 1
xy 1 z 1 = 1

Bi ton ny r rng l nonconvex v c hm mc tiu v iu kin rng buc u khng


li.

Machine Learning c bn www.machinelearningcoban.com


41 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

17.6.3 Bin i GP v dng convex

GP c th c bin i v dng li nh sau:

t yi = log(xi ), tc xi = exp(yi ). Nu f l mt monomial function ca x th:

f (x) = c(exp(y1 ))a1 . . . (exp(yn ))an = exp(aT y + b)

vi b = log(c). Lc ny, hm s g(y) = exp(aT y + b) l mt hm li theo y. (Bn c c


th chng minh theo nh ngha rng hp ca hai hm li l mt hm li. Trong trng hp
ny, hm exp v hm affine trn u l cc hm li.)

Tng t nh th, posynomial trong ng thc (24) c th vit di dng:

X
K
f (x) = exp(aTk y + bk )
k=1

trong ak = [a1k , . . . , ank ]T v bk = log(ck ). Lc ny, posynomial c vit di dng


tng ca cc hm exp ca cc hm affine (v v vy l mt hm li, nh li rng tng ca
cc hm li l mt hm li).

Bi ton GP (17.29) c vit li di dng:

X
K0
y = arg min exp(aT0k y + b0k )
y
k=1
Ki
X
subject to: exp(aTik y + bik ) 1, i = 1, . . . , m (17.32)
k=1
exp(gjT y + hj ) = 1, j = 1, . . . , p

vi aik Rn , i = 1, . . . , p v gi Rn .
P
Vi ch rng hm s log m i=1 exp(gi (x)) l mt hm li nu gi l cc hm li (ti xin b
qua phn chng minh), ta c th vit li bi ton (17.32) di dng li bng cch ly log
ca cc hm nh sau:

GP in convex form:
!
X
K0
minimizey f0 (y) = log exp(aT0k y + bi0 )
k=1
Ki
!
X
subject to: fi (y) = log exp(aTik y + bik ) 0, i = 1, . . . , m (17.33)
k=1

hj (y) = gjT y + hj = 0, j = 1, . . . , p

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS 42

Lc ny, ta c th ni rng GP tng ng vi mt bi ton ti u li v hm mc tiu


v cc hm bt ng thc rng buc trong (17.33) u l hm li, ng thi iu hin ng
thc cui cng chnh l dng affine. Dng ny thng c gi l geometric program in
convex form ( phn bit n vi nh ngha ca GP).

17.6.4 Gii GP bng CVXOPT

Chng ta quay li v d v Bi ton ng thng khng c rng buc v hm mc tiu l


f (x, y, z) = 40x1 y 1 z 1 + 2xy + 2yz + 2zx l mt posynomial. Vy y l mt GP.

Code cho vic tm optimal point ca bi ton ny bng CVXOPT nh sau:


1 from cvxopt import matrix, solvers
2 from math import log, exp# gp
3 from numpy import array
4 import numpy as np
5
6 K = [4]
7 F = matrix([[-1., 1., 1., 0.],
8 [-1., 1., 0., 1.],
9 [-1., 0., 1., 1.]])
10 g = matrix([log(40.), log(2.), log(2.), log(2.)])
11 solvers.options[show_progress] = False
12 sol = solvers.gp(K, F, g)
13
14 print(Solution:)
15 print(np.exp(np.array(sol[x])))
16
17 print(\nchecking sol^5)
18 print(np.exp(np.array(sol[x]))**5)

1 Solution:
2 [[ 1.58489319]
3 [ 1.58489319]
4 [ 1.58489319]]
5
6 checking sol^5
7 [[ 9.9999998]
8 [ 9.9999998]
9 [ 9.9999998]]


Nghim thu c chnh l x = y = z = 5 10. Bn c c khuyn khch c thm ch dn
ca hm solvers.gp hiu cch thit lp bi ton.

Machine Learning c bn www.machinelearningcoban.com


43 CHAPTER 17. CONVEX OPTIMIZATION PROBLEMS

17.7 Tm tt

Cc bi ton ti u xut hin rt nhiu trong thc t, trong Ti u Li ng mt vai


tr quan trng. Trong bi ton Ti u Li, nu tm c cc tr th cc tr chnh l
mt im optimal ca bi ton (nghim ca bi ton).

C nhiu bi ton ti u khng c vit di dng convex nhng c th bin i v


dng convex, v d nh bi ton Geometric Programming.

Linear Programming v Quadratic Programming ng mt vi tr quan trng trong ton


ti u, c s dng nhiu trong cc thut ton Machine Learning.

Th vin CVXOPT c dng ti u nhiu bi ton ti u li, rt d s dng v thi


gian chy tng i nhanh.

17.8 Ti liu tham kho

[1] Convex Optimization Boyd and Vandenberghe, Cambridge University Press, 2004.

[2] CVXOPT.

Machine Learning c bn www.machinelearningcoban.com


18

Duality

Trong bi vit ny, chng ta gi s rng cc o hm u tn ti.

Bi vit ny ch yu c dch li t Chng 5 ca cun Convex Optimization


trong ti liu tham kho.

18.1 Gii thiu

Trong Bi 16, chng ta lm quen vi cc khi nim v tp hp li v hm s li. Tip


theo , trong Bi 17, ti cng trnh by v cc bi ton ti u li, cch nhn dng v
cch s dng th vin gii cc bi ton li c bn. Trong bi ny, chng ta s tip tc
tip cn mt cch su hn: cc iu kin v nghim ca cc bi ton ti u, c li v khng
li; bi ton i ngu (dual problem) v iu kin KKT.

Trc tin, chng ta li bt u bng nhng k thut n gin cho cc bi ton c bn.
K thut ny c l cc bn tng nghe n: Phng php nhn t Lagrange (method of
Lagrange multipliers). y l mt phng php gip tm cc im cc tr ca hm mc tiu
trn feasible set ca bi ton.

Nhc li rng gi tr ln nht v nh nht (nu c) ca mt hm s f0 (x) kh vi (v tp xc


nh l mt tp m ) t c ti mt trong cc im cc tr ca n. V iu kin cn
mt im l im cc tr l o hm ca hm s ti im ny f00 (x) = 0. Ch rng mt
im tho mn f00 (x) = 0 th c gi l im dng hay stationary point. im cc tr l
mt im dng nhng khng phi im dng no cng l im cc tr. V d hm f (x) = x3
c 0 l mt im dng nhng khng phi l im cc tr.

Vi hm nhiu bin, ta cng c th p dng quan st ny. Tc chng ta cn i tm nghim


ca phng trnh o hm theo mi bin bng 0. Tuy nhin, l vi cc bi ton khng
rng buc (unconstrained optimization problems), vi cc bi ton c rng buc nh chng
ta gp trong Bi 17 th sao?

Trc tin chng ta xt bi ton m rng buc ch l mt phng trnh:


45 CHAPTER 18. DUALITY

x = arg minx f0 (x)


subject to: f1 (x) = 0 (18.1)

Bi ton ny l bi ton tng qut, khng nht thit phi li. Tc hm mc tiu v hm
rng buc khng nht thit phi li.

18.2 Phng php nhn t Lagrange

Nu chng ta a c bi ton ny v mt bi ton khng rng buc th chng ta c th


tm c nghim bng cch gii h phng trnh o hm theo tng thnh phn bng 0 (gi
s rng vic gii h phng trnh ny l kh thi).

iu ny l ng lc nh ton hc Lagrange s dng hm s: L(x, ) = f0 (x) + f1 (x).


Ch rng, trong hm s ny, chng ta c thm mt bin na l , bin ny c gi l
nhn t Lagrange (Lagrange multiplier). Hm s L(x, ) c gi l hm h tr (auxiliary
function), hay the Lagrangian. Ngi ta chng minh c rng, im optimal value ca
bi ton (18.1) tho mn iu kin x, L(x, ) = 0 (ti xin c b qua chng minh ca
phn ny). iu ny tng dng vi:

x f0 (x) + x f1 (x) = 0 (18.2)


f1 (x) = 0 (18.3)

rng iu kin th hai chnh l L(x, ) = 0, v cng chnh l rng buc trong bi
ton (18.1).

Vic gii h phng trnh (18.2) - (18.3), trong nhiu trng hp, n gin hn vic trc
tip i tm optimal value ca bi ton (18.1).

Xt cc v d n gin sau y.

18.2.1 V d

V d 1: Tm gi tr ln nht v nh nht ca hm s f0 (x, y) = x + y tho mn iu kin


f1 (x, y) = x2 + y 2 = 2. Ta nhn thy rng y khng phi l mt bi ton ti u li v
feasible set x2 + y 2 = 2 khng phi l mt tp li (n ch l mt ng trn).

Li gii:

Lagrangian ca bi ton ny l: L(x, y, ) = x + y + (x2 + y 2 2). Cc im cc tr ca


hm s Lagrange phi tho mn iu kin:

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 18. DUALITY 46


1 + 2x = 0
x,y, L(x, y, ) = 0 1 + 2y = 0 (18.4)
2
x + y2 = 2

T hai phng trnh u ca (18.4) ta suy ra x = y = 1 2


. Thay vo phng trnh ta s c
2 1 1
= 4 = 2 . Vy ta c 2 cp nghim (x, y) {(1, 1), (1, 1)}. Bng cch thay cc
gi tr ny vo hm mc tiu, ta tm c gi tr nh nht v ln nht ca hm s cn tm.

V d 2: Cross-entropy. Trong Chng 10 v Chng 13, chng ta c bit n hm


mt mt dng cross entropy. Chng ta cng bit rng hm cross entropy c dng
o s ging nhau ca hai phn phi xc sut vi gi tr ca hm s ny cng nh th hai
xc sut cng gn nhau. Chng ta cng pht biu rng gi tr nh nht ca hm cross
entopy t c khi tng gp xc sut l ging nhau. By gi, ti xin pht biu li v chng
minh nhn nh trn.
P
Cho mt phn b xc xut p = [p1 , p2 , . . . , pn ]T vi pi [0, 1] v ni=1 pi = 1. Vi mt phn
b xc sut bt k q = [q1 , q2 , . . . , qn ] v gi s rng qi 6= 0, i, hm s cross entropy c
nh ngha l:
Xn
f0 (q) = pi log(qi )
i=1

Hy tm q hm cross entropy t gi tr nh nht.


P
Trong bi ton ny, ta c rng buc l ni=1 qi = 1. Lagrangian ca bi ton l:

X
n Xn
L(q1 , q2 , . . . , qn , ) = pi log(qi ) + ( qi 1)
i=1 i=1

Ta cn gii h phng trnh:

 pi
qi + = 0, i = 1, . . . , n
q1 ,...,qn , L(q1 , . . . , qn , ) = 0
q 1 + q2 + + qn = 1

Pn Pn
T phng trnh th nht ta c pi = qi . Vy nn: 1 = i=1 pi = i=1 qi ==1
qi = pi , i.

Qua y, chng ta hiu rng v sao hm s cross entropy c dng p hai xc sut
gn nhau.

Machine Learning c bn www.machinelearningcoban.com


47 CHAPTER 18. DUALITY

18.3 Hm i ngu Lagrange (The Lagrange dual function)

18.3.1 Lagrangian

Vi bi ton ti u tng qut:

x = arg min f0 (x)


x
subject to: fi (x) 0, i = 1, 2, . . . , m (18.5)
hj (x) = 0, j = 1, 2, . . . , p

p
vi min xc inh D = (m i=0 domfi ) (j=1 domhj ). Ch rng, chng ta ang khng gi
s v tnh cht li ca hm ti u hay cc hm rng buc y. Gi s duy nht y l
D=6 (tp rng).

Lagrangian cng c xy dng tng t vi mi nhn t Lagrange cho mt (bt) phng


trnh rng buc:
p
X
m X
L(x, , ) = f0 (x) + i fi (x) + j hj (x)
i=1 j=1

vi = [1 , 2 , . . . , m ]; = [1 , 2 , . . . , p ] l cc vectors v c gi l dual variables (bin


i ngu) hoc Lagrange multiplier vectors (vector nhn t Lagrange). Lc ny nu bin
chnh x Rn th tng s bin ca hm s ny s l n + m + p.

18.3.2 Hm i ngu Lagrange

Hm i ngu Lagrange ca bi ton ti u (hoc gn l hm s i ngu) (18.5) l mt


hm ca cc bin i ngu, c nh ngha l gi tr nh nht theo x ca Lagrangian:

g(, ) = inf L(x, , ) (18.6)


xD
p
!
X
m X
= inf f0 (x) + i fi (x) + j hj (x) (18.7)
xD
i=1 j=1

Nu Lagrangian khng b chn di, hm i ngu ti , s ly gi tr .

c bit quan trng:

inf c ly trn min x D, tc min xc nh ca bi ton (l giao ca min xc nh


ca mi hm trong bi ton). Min xc nh ny khc vi feasible set. Thng thng,
feasible set l tp con ca min xc nh D.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 18. DUALITY 48

Vi mi x, Lagrangian l mt hm affine ca (, ), tc l mt hm concave. Vy, hm


i ngu chnh l pointwise infimum ca (c th v hn) cc hm concave, tc l mt
hm concave. Vy hm i ngu ca mt bi ton ti u bt k l mt hm
concave, bt k bi ton ban u c phi l convex hay khng. Nhc li rng
pointwise supremum ca cc hm convex l mt hm convex, v mt hm l concave nu
khi i du hm , ta c mt hm convex.

18.3.3 Chn di ca gi tr ti u

Nu p l optimal value (gi tr ti u) ca bi ton (18.5), th vi cc bin i ngu i 0, i


v bt k, chng ta s c:
g(, ) p (18.8)

Tnh cht ny c th c chng minh d dng. Gi s x0 l mt im feasible bt k ca


bi ton (18.5), tc tho mn cc iu kin rng buc fi (x0 ) 0, i = 1, . . . , m; hj (x0 ) =
0, j = 1, . . . , p, ta s c:
p
X
m X
i fi (x0 ) + j hj (x0 ) 0 L(x0 , , ) f0 (x0 )
i=1 j=1

V iu ny ng vi mi x0 feasible, ta s c tnh cht quan trng sau y:

g(, ) = inf L(x, , ) L(x0 , , ) f0 (x0 ).


xD

Khi x0 = x , ta c bt ng thc (18.8).

18.3.4 V d

V d 1

Xt bi ton ti u sau:

x = arg minx x2 + 10 sin(x) + 10


subject to: (x 2)2 4 (18.9)

Ch : Vi bi ton ny, min xc nh D = R nhng feasible set l 0 x 4.

Vi hm mc tiu l ng m mu xanh lam trong Hnh 18.1. Rng buc thc ra 0 x 4,


nhng ti vit dng ny bi ton thm phn th v. Hm s rng buc f1 (x) = (x2)2 4
c cho bi ng nt t mu xanh lc. Optimal value ca bi ton ny c th c nhn

Machine Learning c bn www.machinelearningcoban.com


49 CHAPTER 18. DUALITY

40
10.0

30
7.5

20 5.0

2.5
10

0.0
0

2.5
10
f0 (x)
5.0
f1 (x) g()
20
7.5
f0 (x) + f1 (x) p
30
4 2 0 2 4 6 0 1 2 3 4 5 6 7 8

Hnh 18.1: V d v dual function. Tri: ng mu lam m th hin hm mc tiu. ng


nt t m lc th hin hm s rng buc. Cc ng nt t mu th hin dual function
ng vi cc khc nhau. Phi: ng nt t th hin gi tr ti u ca bi ton . ng mu
th hin dual function. Vi mi , gi tr ca hm dual function nh hn hoc bng gi tr ti
u ca bi ton gc.

ra l im trn th c honh bng 0. Ch rng hm mc tiu y khng phi l


hm li nn bi ton ti u ny cng khng phi l li, mc d hm bt phng trnh rng
buc f1 (x) l li.

Lagrangian ca bi ton ny c dng:

L(x, ) = x2 + 10 sin(x) + 10 + ((x 2)2 4)

Cc ng du chm mu trong Hnh 1 l cc ng ng vi cc khc nhau. Vng b


chn gia hai ng thng ng mu en th hin min feasible ca bi ton ti u.

Vi mi , dual function c nh ngha l:



g() = inf x2 + 10 sin(x) + 10 + ((x 2)2 4) , 0.
x

T hnh 1 bn tri, ta c th thy ngay rng vi cc khc nhau, g() hoc ti im c


honh bng 0, hoc ti mt im thp hn im ti u ca bi ton. th ca hm
g() c cho bi ng lin mu Hnh 1 bn phi. ng nt t mu lam th hin
optimal value ca bi ton ti u ban u. Ta c th thy ngay hai iu:

ng lin mu lun nm di (hoc c on trng) vi ng nt t mu lam.

Hm g() c dng mt hm concave, tc nu ta lt th ny theo hng trn-di th


ta s c th ca mt hm convex. (Mc d bi ton ti u gc l khng phi l mt
bi ton li.)

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 18. DUALITY 50

( v c hnh bn phi, ti dng Gradient Descent tm gi tr nh nht ng vi


mi .)

V d 2

Xt mt bi ton Linear Programming:


x = arg min cT x
x
s.t.: Ax = b (18.10)
x0
Hm rng buc cui cng c th c vit li l: fi (x) = xi , i = 1, . . . , n. Lagrangian ca
bi ton ny l:
X
n
T
L(x, , ) = c x i xi + T (Ax b) = bT + (c + AT )T x
i=1

(ng qun iu kin  0.)

Dual function l:
g(, ) = inf L(x, , ) (18.11)
x
= bT + inf (c + AT )T x (18.12)
x

Nhn thy rng mt hm tuyn tnh dT x ca x b chn di khi vo ch khi d = 0. V ch


nu mt phn t di ca d khc 0, ta ch cn chn xi rt ln v ngc du vi di , ta s c
mt gi tr nh tu .

Ni cch khc, g(, ) = tr khi c + AT = 0. Tm li:


bT if c + AT = 0
g(, ) = (18.13)
otherwise

Trng hp th hai khi g(, ) = cc bn s gp rt nhiu sau ny. Trng hp ny


khng nhiu th v v hin nhin g(, ) p . V mc ch chnh l i tm chn di ca p
nn ta s ch quan tm ti cc gi tr ca v sao cho g(, ) cng ln cng tt. Trong
bi ton ny, ta s quan tm ti cc v sao cho c + AT = 0.

18.4 Bi ton i ngu Lagrange (The Lagrange dual problem)

Vi mi cp (, ), hm i ngu Lagrange cho chng ta mt chn di cho optimal value


p ca bi ton gc (18.5). Cu hi t ra l: vi cp gi tr no ca (, ), chng ta s c
mt chn di tt nht ca p ? Ni cch khc, ta i cn gii bi ton:

Machine Learning c bn www.machinelearningcoban.com


51 CHAPTER 18. DUALITY

, = arg max g(, )


,
subject to: 0 (18.14)
Mt im quan trng: v g(, ) l concave v hm rng buc fi () = i l cc hm
convex. Vy bi ton (18.14) chnh l mt bi ton li. V vy trong nhiu trng hp, li
gii c th d tm hn l bi ton gc. Ch rng, bi ton i ngu (18.14) l li bt k
bi ton gc (18.5) c l li hay khng.

Bi ton ny dc gi l Lagrange dual problem (bi ton i ngu Largange) ng vi bi


ton (18.5). Bi ton (18.5) cn c tn gi khc l primal problem (bi ton gc). Ngoi ra,
c mt khi nim na, gi l dual feasible tc l feasible set ca bi ton i ngu, bao gm
iu kin  0 v iu kin n g(, ) > (v ta ang i tm gi tr ln nht ca hm
s nn g(, ) = r rng l khng th v).

Nghim ca bi ton (18.14), k hiu l , c gi l dual optimal hoc optimal Lagrange


multipliers.

Ch rng iu kin n g(, ) > , trong nhiu trng hp, cng c th c vit c
th. Quay li vi v d pha trn, iu kin n c th c vit thnh c + AT = 0. y
l mt hm affine. V vy, khi c thm rng buc ny, ta vn c mt bi ton li.

18.4.1 Weak duality

K hiu gi tr ti u ca bi ton i ngu (18.14) l d . Theo (18.14), ta bit rng:


d p
ngay c khi bi ton gc khng phi l li.

Tnh cht n gin ny c gi l weak duality. Tuy n gin nhng n cc k quan trng.

T y ta quan st thy hai iu:

Nu bi ton gc khng b chn di, tc p = , ta phi c d = , tc l bi ton


i ngu Lagrange l infeasible (tc khng c gi tr no tho mn rng buc).

Nu bi ton i ngu l khng b chn trn, tc d = +, chng ta phi c p = +,


tc bi ton gc l infeasible.

Gi tr p d c gi l optimal duality gap (dch th l khong cch i ngu ti u).


Khong cch ny lun lun l mt s khng m.

i khi c nhng bi ton (li hoc khng) rt kh gii, nhng t nht nu ta c th tm


c d , ta c th bit c chn di ca bi ton gc. Vic tm d thng c th thc hin
c v bi ton i ngu lun lun l li.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 18. DUALITY 52

18.4.2 Strong duality v Slaters constraint qualification

Nu ng thc p = d tho mn, the optimal duality gap bng khng, ta ni rng strong
duality xy ra. Lc ny, vic gii bi ton i ngu gip ta tm c chnh xc gi tr ti
u ca bi ton gc.

Tht khng may, strong duality khng thng xuyn xy ra trong cc bi ton ti u. Tuy
nhin, nu bi ton gc l li, tc c dng:

x = arg min f0 (x)


x
subject to: fi (x) 0, i = 1, 2, . . . , m (18.15)
Ax = b

trong f0 , f1 , . . . , fm l cc hm li, chng ta thng (khng lun lun) c strong duality.


C rt nhiu nghin cu thit lp cc iu kin, ngoi tnh cht li, strong duality xy
ra. Nhng iu kin thng c tn l constraint qualifications.

Mt trong cc constraint qualification n gin nht l Slaters condition.

nh ngha: Mt im feasible ca bi ton (18.15) c gi l strictly feasible nu:

fi (x) < 0, i = 1, 2, . . . , m, Ax = b

nh l Slater: Nu tn ti mt im strictly feasible (v bi ton gc l li), th strong


duality xy ra.

iu kin kh n gin s gip ch cho nhiu bi ton ti u sau ny.

Ch :

Strong duality khng thng xuyn xy ra. Vi cc bi ton li, vic ny xy ra thng
xuyn hn. Tn ti nhng bi ton li m strong duality khng xy ra.

C nhng bi ton khng li nhng strong duality vn xy ra. V d nh bi ton trong


Hnh 1 pha trn.

Machine Learning c bn www.machinelearningcoban.com


53 CHAPTER 18. DUALITY

18.5 Optimality conditions

18.5.1 Complementary slackness

Gi s rng strong duality xy ra. Gi x l mt im optimal ca bi ton gc v ( , )


l cp im optimal ca bi ton i ngu. Ta c:

f0 (x ) = g( , ) (18.16)
p
!
X
m X
= inf f0 (x) + i fi (x) + j hj (x) (18.17)
x
i=1 j=1
p
X
m X
f0 (x ) + i fi (x ) + j hj (x ) (18.18)
i=1 j=1

f0 (x ) (18.19)

Dng u l do chnh l strong duality.

Dng hai l do nh ngha ca hm i ngu.

Dng ba l hin nhin v infimum ca mt hm nh hn gi tr ca hm ti bt k


mt im no khc.

Dng bn l v cc rng buc fi (x ) 0, i 0, i = 1, 2, . . . , m v hj (x ) = 0.

T y c th th rng du ng thc dng ba v dng bn phi ng thi xy ra. V ta


li c thm hai quan st th v na:

x chnh l mt im optimal ca g( , ).

Th v hn:
X
m
i fi (x ) = 0
i=1

V mi phn t trong tng trn l khng dng do i 0, fi 0, ta kt lun rng:

i fi (x ) = 0, i = 1, 2, . . . , m

iu kin cui cng ny c gi l complementary slackness. T y c th suy ra:

i > 0 fi (x ) = 0 (18.20)
fi (x ) < 0 i = 0 (18.21)

Tc ta lun c mt trong hai gi tr ny bng 0.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 18. DUALITY 54

18.5.2 KKT optimality conditions

Chng ta vn gi s rng cc hm ang xt c o hm v bi ton ti u khng nht thit


l li.

KKT conditions cho bi ton khng li

Gi s rng strong duality xy ra. Gi x v ( , ) l bt k primal v dual optimal points.


V x ti u hm kh vi L(x, , ), ta c o hm ca Lagrangian ti x phi bng 0.

iu kin Karush-Kuhn-Tucker (KKT) ni rng x , , phi tho mn iu kin:

fi (x ) 0, i = 1, 2, . . . , m (18.22)
hj (x ) = 0, j = 1, 2, . . . , p (18.23)
i 0, i = 1, 2, . . . , m (18.24)
i fi (x ) = 0, i = 1, 2, . . . , m (18.25)
p
X
m X

f0 (x ) + i fi (x ) + j hj (x ) = 0 (18.26)
i=1 j=1

y l iu kin cn x , , l nghim ca hai bi ton.

KKT conditions cho bi ton li

Vi cc bi ton li v strong duality xy ra, cc iu kin KKT pha trn cng l iu kin
. Vy vi cc bi ton li vi hm mc tiu v hm rng buc l kh vi, bt k im no
tho mn cc iu kin KKT u l primal v dual optimal ca bi ton gc v bi ton i
ngu.

T y ta c th thy rng: Vi mt bi ton li v iu kin Slater tho mn


(suy ra strong duality ) th cc iu kin KKT l iu cn v ca nghim.

Cc iu kin KKT rt quan trng trong ti u. Trong mt vi trng hp c bit (chng


ta s thy trong bi Support Vector Machine sp ti), vic gii h (bt) phng trnh cc
iu kin KKT l kh thi. Rt nhiu cc thut ton ti u c xy dng gi trn vic gii
h iu kin KKT.

V d: Equality constrained convex quadratic minimization. Xt bi ton:


1
x = arg min xT Px + qT x + r
x 2
subject to: Ax = b (18.27)

Machine Learning c bn www.machinelearningcoban.com


55 CHAPTER 18. DUALITY

trong P Sn+ (tp cc ma trn i xng na xc nh dng).

Lagrangian:
1
L(x, ) = xT Px + qT x + r + T (Ax b)
2
iu kin KKT cho bi ton ny l:

Ax = b (18.28)
Px + q + AT = 0 (18.29)

Phng trnh th hai chnh l phng trnh o hm ca Lagrangian ti x bng 0.

H phng trnh ny c th c vit li n gin l:


    
P AT x q
=
A 0 b

y l mt phng trnh tuyn tnh n gin!

18.6 Tm tt

Gi s rng cc hm s u kh vi:

Cc bi ton ti u vi ch rng buc l ng thc c th c gii quyt bng phng


php nhn t Lagrange. Ta cng c nh ngha v Lagrangian. iu kin cn mt im
l nghim ca bi ton ti u l n phi lm cho o hm ca Lagrangian bng 0.

Vi cc bi ton ti u c thm rng buc l bt ng thc (khng nht thit l li),


chng ta c Lagrangian tng qut v cc bin Lagrange , . Vi cc gi tr (, ) c nh,
ta c nh ngha v hm i ngu Lagrange (Lagrange dual function) g(, ) c xc
nh l infimum ca Lagrangian khi x thay i trn min xc nh ca bi ton.

Min xc nh v tp cc im feasible thng khc nhau. Feasible set l tp con ca tp


xc nh.

Vi mi (, ), g(, ) p .

Hm s g(, ) l li bt k bi ton ti u c l li hay khng. Hm s ny c gi l


dual Lagrange fucntion hay hm i ngu Lagrange.

Bi ton i tm gi tr ln nht ca hm i ngu Lagrange vi iu kin  0 c gi


l bi ton i ngu (dual problem). Bi ton ny l li bt k bi ton gc c li hay
khng.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 18. DUALITY 56

Gi gi tr ti u ca bi ton i ngu l d th ta c: d p . y c gi l weak


duality.

Strong duality xy ra khi d = p . Thng th strong duality khng xy ra, nhng vi cc


bi ton li th strong duality thng (khng lun lun) xy ra.

Nu bi ton l li v iu kin Slater tho mn, th strong duality xy ra.

Nu bi ton li v c strong duality th nghim ca bi ton tho mn cc iu kin


KKT (iu kin cn v ).

Rt nhiu cc bi ton ti u c gii quyt thng qua KKT conditions.

18.7 Kt lun

Trong ba bi 16, 17, 18, ti gii thiu s lc v tp li, hm li, bi ton li, v cc iu
kin ti u c xy dng thng qua duality. nh ban u ca ti l trnh phn ny
v kh nhiu ton, tuy nhin trong qu trnh chun b cho bi Support Vector Machine, ti
nhn thy rng cn phi gii thch v Lagrangian - k thut c s dng rt nhiu trong
Ti u. Thm na, gii thch v Lagrangian, ti cn ni v cc bi ton li. Chnh v vy
ti thy c trch nhim phi vit v ba bi ny.

Trong lot bi tip theo, chng ta s li quay li vi cc thut ton Machine Learning vi
rt nhiu v d, hnh v v code mu. Nu bn no c cm thy hi ui sau ba bi ti u
ny th cng ng lo, mi chuyn ri s n c thi.

18.8 Ti liu tham kho

[1] Convex Optimization Boyd and Vandenberghe, Cambridge University Press, 2004.

[2] Lagrange Multipliers - Wikipedia.

Machine Learning c bn www.machinelearningcoban.com


51

n tp i s tuyn tnh

51.1 Lu v k hiu

Trong cc bi vit ca ti, cc s v hng c biu din bi cc ch ci vit dng khng


in m, c th vit hoa, v d x1 , N, y, k. Cc vector c biu din bng cc ch ci thng
in m, v d y, x1 . Nu khng gii thch g thm, cc vector c mc nh hiu l cc
vector ct. Cc ma trn c biu din bi cc ch vit hoa in m, v d X, Y, W.

i vi vector, x = [x1 , x2 , . . . , xn ] c hiu l mt vector hng. Trong khi x = [x1 ; x2 ; . . . ; xn ]


c hiu l vector ct. Ch s khc nhau gia du phy (,) v du chm phy (;). y
chnh l k hiu c Matlab s dng.

Tng t, trong ma trn, X = [x1 , x2 , . . . , xn ] c hiu l cc vector ct xj c t cnh


nhau theo th t t tri qua phi to ra ma trn X. Trong khi X = [x1 ; x2 ; . . . ; xm ] c
hiu l cc vector xi c t chng ln nhau theo th t t trn xung di d to ra ma
trn X. Cc vector c ngm hiu l c kch thc ph hp c th xp cnh hoc xp
chng ln nhau.

Cho mt ma trn W, nu khng gii thch g thm, chng ta hiu rng wi l vector ct
th i ca ma trn . Ch s tng ng gia k t vit hoa v vit thng.

51.2 Norms (chun)

Trong khng gian mt chiu, vic o khong cch gia hai im rt quen thuc: ly tr
tuyt i ca hiu gia hai gi tr . Trong khng gian hai chiu, tc mt phng, chng ta
thng dng khong cch Euclid o khong cch gia hai im. Khong cch ny chnh
l ci chng ta thng ni bng ngn ng thng thng l ng chim bay. i khi, i
t mt im ny ti mt im kia, con ngi chng ta khng th i bng ng chim bay
c m cn ph thuc vo vic ng i ni gia hai im c dng nh th no na.
CHAPTER 51. N TP I S TUYN TNH 58

Vic o khong cch gia hai im d liu nhiu chiu, tc hai vector, l rt cn thit trong
Machine Learning. Chng ta cn nh gi xem im no l im gn nht ca mt im
khc; chng ta cng cn nh gi xem chnh xc ca vic c lng; v trong rt nhiu
v d khc na.

V chnh l l do m khi nim norm (chun) ra i. C nhiu loi norm khc nhau m
cc bn s thy di y:

xc nh khong cch gia hai vector y v z, ngi ta thng p dng mt hm s ln


vector hiu x = y z. Mt hm s c dng o cc vector cn c mt vi tnh cht c
bit.

51.2.1 nh ngha

Mt hm s f (.) nh x mt im x t khng gian n chiu sang tp s thc mt chiu c


gi l norm nu n tha mn ba iu kin sau y:

1. f (x) 0. Du bng xy ra x = 0.

2. f (x) = kkf (x), R

3. f (x1 ) + f (x2 ) f (x1 + x2 ), x1 , x2 Rn

iu kin th nht l d hiu v khong cch khng th l mt s m. Hn na, khong


cch gia hai im y v z bng 0 nu v ch nu hai im n trng nhau, tc x = y z = 0.

iu kin th hai cng c th c l gii nh sau. Nu ba im y, v v z thng hng,


hn na v y = (v z) th khong cch gia v v y s gp kk ln khong cch gia v
v z.

iu kin th ba chnh l bt ng thc tam gic nu ta coi x1 = w y, x2 = z w vi


w l mt im bt k trong cng khng gian.

51.2.2 Mt s chun thng dng

Gi s cc vectors x = [x1 ; x2 ; . . . ; xn ], y = [y1 ; y2 ; . . . ; yn ].

Nhn thy rng khong cch Euclid chnh l mt norm, norm ny thng c gi l norm
2: q
kxk2 = x21 + x22 + . . . x2n (51.1)

Vi p l mt s khng nh hn 1 bt k, hm s sau y:

Machine Learning c bn www.machinelearningcoban.com


59 CHAPTER 51. N TP I S TUYN TNH
1
kxkp = (kx1 kp + kx2 kp + . . . kxn kp ) p (51.2)

c chng minh tha mn ba iu kin bn trn, v c gi l norm p.

Nhn thy rng khi p 0 th biu thc bn trn tr thnh s cc phn t khc 0 ca x.
Hm s (51.2) khi p = 0 c gi l gi chun (pseudo-norm) 0. N khng phi l norm v
n khng tha mn iu kin 2 v 3 ca norm. Gi-chun ny, thng c k hiu l kxk0 ,
kh quan trng trong Machine Learning v trong nhiu bi ton, chng ta cn c rng buc
sparse, tc s lng thnh phn active ca x l nh.

C mt vi gi tr ca p thng c dng:

1. Khi p = 2 chng ta c norm 2 nh trn.

2. Khi p = 1 chng ta c:
kxk1 = kx1 k + kx2 k + + kxn k (51.3)
l tng cc tr tuyt i ca tng phn t ca x. Norm 1 thng c dng nh xp
x ca norm 0 trong cc bi ton c rng buc "sparse". Di y l mt v d so snh
norm 1 v norm 2 trong khng gian hai chiu:

kx yk1 = |x1 y1 | + |x2 y2 |


x2 x
kx
|x2 y2 |

yk
2

|x1 y1 | y
y2
z

x1 y1

Hnh 51.1: Minh ha norm 1 v norm 2

Norm 2 (mu xanh) chnh l ng thng "chim bay" ni gia hai vector x v y. Khong
cch norm 1 gia hai im ny (mu ) c th din gii nh l ng i t x ti y trong
mt thnh ph m ng ph to thnh hnh bn c. Chng ta ch c cch i dc theo
cnh ca bn c m khng c i thng.

3. Khi p , ta c norm p chnh l tr tuyt i ca phn t ln nht ca vector :

kxk = max kxi k (51.4)


i=1,2,...,n

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 51. N TP I S TUYN TNH 60

51.2.3 Chun ca ma trn

Vi mt ma trn A Rmn , chun thng c dng nht l chun Frobenius, k hiu l


kAkF l cn bc hai ca tng bnh phng tt c cc phn t ca ma trn .
v
uX
u m X n
kAkF = t a2ij
i=1 j=1

51.3 o hm ca hm nhiu bin

Trong mc ny, chng ta s gi s rng cc o hm tn ti. Chng ta s xt hai trng


hp: i) Hm s nhn gi tr l ma trn (vector) v cho gi tr l mt s thc v hng; v
ii) Hm s nhn gi tr l mt s v hng hoc vector v cho gi tr l mt vector.

51.3.1 Hm cho gi tr l mt s v hng

o hm (gradient) ca mt hm s f (x) : Rn R theo vector x c nh ngha nh


sau:


f (x)
x1
f (x)

x f (x) ,

x2
.. R
n
(51.5)
.
f (x)
xn

trong fx(x)
i
l o hm ca hm s theo thnh phn th i ca vector x. o hm ny
c ly khi gi s tt c cc bin cn li l hng s.

Nu khng c thm bin no trong hm s, x f (x) thng c vit gn l f (x).

iu quan trng cn nh: o hm ca hm s ny l mt vector c cng chiu vi


vector ang ly o hm. Tc nu vector vit dng ct th o hm cng phi vit
dng ct.

o hm bc hai (second-order gradient) ca hm s trn cn c gi l Hessian v c


nh ngha nh sau:

Machine Learning c bn www.machinelearningcoban.com


61 CHAPTER 51. N TP I S TUYN TNH
2 f (x) 2 f (x) 2 f (x)
x21 x1 x2
... x1 xn
2 f (x) 2 f (x) 2 f (x)
...
x2 x1 x22 x2 xn
f (x) ,
2

..
.
..
.
..
.
..
.
Sn
(51.6)
2 f (x) 2 f (x)

2 f (x)
... 2
xn x1 xn x2 xn

vi Sn Rnn l tp cc ma trn vung i xng c s ct l n.

o hm ca mt hm s f (X) : Rnm R theo ma trn X c nh ngha l:



f (X) f (X) f (X)
x11 x12
. . . x1m
f f (X)
x(X) f (X)
. . .
f (X) = ..
21 x22
.. . .
x2m
.. R
nm
(51.7)
. . . .
f (X) f (X) f (X)
xn1 xn2
... xnm

Mt ln na, o hm ca mt hm s theo ma trn l mt ma trn c chiu ging vi ma


trn .

Hiu mt cch n gin, o hm ca mt hm s (c u ra l 1 s v hng) theo mt


ma trn c tnh nh sau. Trc tin, tnh o hm ca hm s theo tng thnh phn
ca ma trn khi ton b cc thnh phn khc c gi s l hng s. Tip theo, ta ghp cc
o hm thnh phn tnh c thnh mt ma trn ng theo th t nh trong ma trn .
Ch rng vector l mt trng hp ca ma trn.

V d: Xt hm s: f : R2 R, f (x) = x21 + 2x1 x2 + sin(x1 ) + 2.

o hm bc nht theo x ca hm s l:
" #  
f (x)
x1 2x1 + 2x2 + cos(x1 )
f (x) = f (x) =
x
2x1
2

o hm bc hai theo x, hay Hessian l:


" 2 f (x) f 2 (x)
#  
2 x21 x1 x2 2 sin(x1 ) 2
f (x) = 2 f (x) f 2 (x) =
2 0
x2 x1 x22

Ch rng Hessian lun l mt ma trn i xng.

51.3.2 Hm cho gi tr l mt vector

Nhng hm s cho gi tr l mt vector c gi l vector-valued function trong ting Anh.

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 51. N TP I S TUYN TNH 62

Gi s mt hm s vi u vo l mt s thc v(x) : R Rn :

v1 (x)
v2 (x)

v(x) = .. (51.8)
.
vn (x)

o hm ca n l mt vector hng nh sau:

h i
v1 (x) v2 (x) vn (x)
v(x) , x x
... x
(51.9)

o hm bc hai ca hm s ny c dng:

h i
2 2 v1 (x) 2 v2 (x) 2 vn (x)
v(x) , x2 x2
... x2
(51.10)

V d: Cho vector a Rn v vector-valued function v(x) = xa, th th:

v(x) = aT , 2 v(x) = 0 Rnn (51.11)

vi 0 l ma trn vi cc thnh phn u l 0.

Xt mt vector-valued function vi u vo l mt vector h(x) : Rk Rn , o hm bc


nht ca n l:

h1 (x) h2 (x) hn (x)
. . .
hx11(x) hx2 (x)
1 x1

x2 x2 . . . hx n (x)

h(x) , . . . . .
2
(51.12)
. .
. . . .
.
h1 (x) h2 (x)
xk xk
. . . hx
n (x)
k
 
= h1 (x) h2 (x) . . . hn (x) Rkn (51.13)

Mt quy tc d nh y l nu mt hm s g : Rm Rn th o hm ca n l
mt ma trn thuc Rmn .

o hm bc hai ca hm s trn l mt ma trn ba chiu, ti xin khng cp y.

<hr> Vi cc hm s matrix-valued nhn gi tr u vo l ma trn, ti cng xin khng


cp y. Tuy nhin, phn di, khi tnh ton o hm cho cc hm cho gi tr l s thc,
chng ta vn c th s s dng khi nim ny.

Trc khi n phn tnh o hm ca cc hm s thng gp, chng ta cn bit hai tnh
cht quan trng kh ging vi o hm ca hm mt bin c hc trong chng trnh cp
ba.

Machine Learning c bn www.machinelearningcoban.com


63 CHAPTER 51. N TP I S TUYN TNH

51.3.3 Hai tnh cht quan trng

Product rules

cho tng qut, ta gi s bin u vo l mt ma trn (vector v s thc l cc trng


hp t bit ca ma trn). Gi s rng cc hm s c chiu ph hp cc php nhn thc
hin c. Ta c:


f (X)T g(X) = (f (X)) g(X) + (g(X)) f (X) (51.14)

Biu thc ny ging nh biu thc chng ta qu quen thuc:

(f (x)g(x))0 = f 0 (x)g(x) + g 0 (x)f (x)

Ch rng vi vector v ma trn, chng ta khng c s dng tnh cht giao hon.

Chain rules

Khi c cc hm hp th:
X g(f (X)) = X f T f g (51.15)

Quy tc ny cng ging vi quy tc trong hm mt bin:

(g(f (x)))0 = f 0 (x)g 0 (f )

Nhc li rng khi tnh ton vi ma trn, chng ta cn ch ti chiu ca cc ma trn, v


nhn ma trn khng c tnh cht giao hon.

51.3.4 o hm ca cc hm s thng gp

f (x) = aT x

Gi s a, x Rn , ta vit li:

f (x) = aT x = a1 x1 + a2 x2 + + an xn

C th nhn thy rng:


f (x)
= ai , i = 1, 2 . . . , n
xi
Vy nn:

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 51. N TP I S TUYN TNH 64

a1
a2

f (x) = .. = a (51.16)
.
an

Thm na, v aT x = xT a nn:


(xT a) = a

f (x) = Ax

y l mt vector-valued function f : Rn Rm vi x Rn , A Rmn . Gi s rng ai l


hng th i ca ma trn A. Ta c:

a1 x
a2 x

Ax = ..
.
am x
Theo nh ngha (51.13), v cng thc (51.16), ta c th suy ra:
 
x (Ax) = aT1 aT2 . . . aTm = AT (51.17)

T y ta c th suy ra o hm ca hm s f (x) = x = Ix, vi I l ma trn n v vi


chiu ph hp, l:
x = I

f (x) = xT Ax

vi x Rn , A Rnn . p dng Product rules (14) ta c:


 
f (x) = xT (Ax)
= ((x)) Ax + ((Ax)) x
= IAx + AT x
= (A + AT )x (51.18)

T (51.18) v (51.17), ta c th suy ra: 2 xT Ax = AT + A

Nu A l mt ma trn i xng, ta s c:
xT Ax = 2Ax, 2 xT Ax = 2A (51.19)

Nu A l ma trn n v, tc f (x) = xT Ix = xT x = kxk22 , ta c:


kxk22 = 2x, 2 kxk22 = 2I (51.20)

Machine Learning c bn www.machinelearningcoban.com


65 CHAPTER 51. N TP I S TUYN TNH

f (x) = kAx bk22

C hai cch tnh o hm ca hm s ny:

Cch 1: Trc ht, bin i:

f (x) = kAx bk22 = (Ax b)T (Ax b) = (xT AT bT )(Ax b)


= xT AT Ax 2bT Ax + bT b

Ly o hm cho tng s hng ri cng li ta c:

kAx bk22 = 2AT Ax 2AT b = 2AT (Ax b)

Cch 2: Dng Chain rule. S dng (Ax b) = AT v kxk22 = 2x v cng thc chain
rules (51.15), ta s thu c kt qu tng t.

f (x) = aT xxT b

Bng cch vit li f (x) = (aT x)(xT b), ta c th dng Product rules (14) v ra kt qu:

(aT xxT b) = axT b + baT x = abT x + baT x = (abT + baT )x

trong y ti s dng tnh cht yT z = zT y v tch ca mt s thc vi mt vector cng


bng tch ca vector v s thc .

51.3.5 Bng cc o hm thng gp

Cho vector

f (x) f (x)
aT x a
T
x Ax (A + AT )x
xT x = kxk22 2x
kAx bk22 T
2A (Ax b)
aT xT xb 2aT bx
aT xxT b (abT + baT )x

Bng 51.1: o hm theo vector

Machine Learning c bn www.machinelearningcoban.com


CHAPTER 51. N TP I S TUYN TNH 66

f (X) f (X)
kXk2F 2X
AX AT
kAX Bk2F 2AT (AX B)
kXA Bk2F 2(XA B)AT
aT XT Xb X(abT + baT )
aT XXT b (abT + baT )X,
aT YXT b baT Y
aT YT Xb YabT
aT XYT b abT Y
aT XT Yb YbaT

Bng 51.2: o hm theo ma trn

Cho ma trn

(Xem Bng 51.2)

51.3.6 Ti liu tham kho

[1] Matrix calculus

Machine Learning c bn www.machinelearningcoban.com


Index

-sublevel sets, 17 KKT conditions, 54

affine functions, 14 Lagrange/Lagrangian


auxiliary function, 45
complementary slackness, 53
dual function, 47
constraints, 4
dual functions, 47
contours, 15
dual problems, 50
convex, 3
multiplier method, 45
combination, 10
level sets, 15
functions, 11
Linear Programming, 33
first-order condition, 19
General form, 33
Second-order condition, 21
Standard form, 34
hull, 10
optimization problems, 30
Mahalanobis norm, 8
sets, 4
monomials, 39
strictly convex functions, 12
CVXOPT, 33
norm balls, 7
duality, 44
posynomials, 39
ellipsoids, 8
quadratic
feasible points, 4 forms, 14
feasible sets, 4 Quadratic Programming, 36
quasiconvex, 19
Geometric Programming, 39
convex form, 41
Separating hyperplane theorem, 11
halfspace, 7 Slaters constraint qualification, 52
hyperplane, 6 strong duality, 52

infeasible sets, 4 weak duality, 51

Das könnte Ihnen auch gefallen