Sie sind auf Seite 1von 54

Convex Optimization: Part 1 of Chapter 7

Discussion

Presenter: Brian Quanz


A KTEC Center of Excellence

About todays discussion


Chapter 7 no separate discussion of
convex optimization
Discusses with SVM problems

Instead:
Today: Discuss convex optimization

Next Week: Discuss some specific convex optimization problems


(from text), e.g. SVMs

A KTEC Center of Excellence

About todays discussion


Mostly follow alternate text:
Convex Optimization, Stephen Boyd and Lieven Vandenberghe Borrowed material from book and related course notes

Some figures and equations shown here


Available online: http://www.stanford.edu/~boyd/cvxbook/ Nice course lecture videos available from Stephen Boyd online:

http://www.stanford.edu/class/ee364a/
Corresponding convex optimization tool (discuss later) - CVX: http://www.stanford.edu/~boyd/cvx/
A KTEC Center of Excellence 3

Overview

Why convex? What is convex?

Key examples of linear and quadratic programming

Key mathematical ideas to discuss: ->Lagrange Duality ->KKT conditions Brief concept of interior point methods CVX convex opt. made easy
A KTEC Center of Excellence 4

Mathematical Optimization
All learning is some optimization problem -> Stick to canonical form

x = (x1, x2, , xp ) opt. variables ; x*

f0 : Rp -> R objective function


fi : Rp -> R constraint function
A KTEC Center of Excellence 5

Optimization Example
Well familiar with: regularized regression
Least squares

Add some constraints, ridge, lasso

A KTEC Center of Excellence

Why convex optimization?


Cant solve most OPs
E.g. NP Hard, even high polynomial time too slow

Convex OPs
(Generally) No analytic solution Efficient algorithms to find (global) solution Interior point methods (basically Iterated Newton) can be used: ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f At worst solve with general IP methods (CVX), faster specialized
A KTEC Center of Excellence

What is Convex

Optimization?

OP with convex objective and constraint


functions

f0 , , fm are convex = convex OP that has


an efficient solution!
A KTEC Center of Excellence 8

Convex Function
Definition: the weighted mean of function
evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points

A KTEC Center of Excellence

Convex Function
What does definition mean? Pick any two points x, y and evaluate along the function, f(x), f(y)

Draw the line passing through the two points f(x) and
f(y) Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y)

A KTEC Center of Excellence

10

Convex Function

A KTEC Center of Excellence

11

Convex Function

Convex!
A KTEC Center of Excellence 12

Convex Function

Not Convex!!!
A KTEC Center of Excellence 13

Convex Function
Easy to see why convexity allows for
efficient solution Just slide down the objective function as far as possible and will reach a minimum

A KTEC Center of Excellence

14

Local Optima is Global (simple proof)

A KTEC Center of Excellence

15

Convex vs. Non-convex Ex.

Affine border case of convexity

Convex, min. easy to find


A KTEC Center of Excellence 16

Convex vs. Non-convex Ex.

Non-convex, easy to get stuck in a local min.

Cant rely on only local search techniques


A KTEC Center of Excellence 17

Non-convex
Some non-convex problems highly multi-modal, or NP hard Could be forced to search all solutions, or hope stochastic search is successful Cannot guarantee best solution, inefficient

Harder to make performance guarantees with


approximate solutions
A KTEC Center of Excellence 18

Determine/Prove Convexity
Can use definition (prove holds) to prove
If function restricted to any line is convex, function is convex

If 2X differentiable, show hessian >= 0

Often easier to:


Convert to a known convex OP E.g. QP, LP, SOCP, SDP, often of a more general form Combine known convex functions (building blocks) using operations that preserve convexity Similar idea to building kernels
A KTEC Center of Excellence 19

Some common convex OPs


Of particular interest for this book and
chapter:
linear programming (LP) and quadratic programming (QP)

LP: affine objective function, affine constraints

-e.g. LP SVM, portfolio management


A KTEC Center of Excellence 20

LP Visualization
Note: constraints form feasible set -for LP, polyhedra

A KTEC Center of Excellence

21

Quadratic Program
QP: Quadratic objective, affine constraints

LP is special case
Many SVM problems result in QP, regression

If constraint functions quadratic, then Quadratically


Constrained Quadratic Program (QCQP)

A KTEC Center of Excellence

22

QP Visualization

A KTEC Center of Excellence

23

Second Order Cone Program

Ai = 0 - results in LP
ci = 0 - results in QCQP

Constraint requires the affine functions


to lie in 2nd order cone
A KTEC Center of Excellence 24

Second Order Cone (Boundary) in R3

A KTEC Center of Excellence

25

Semidefinite Programming

Linear matrix inequality (LMI) constraints


Many problems can be expressed using

LMIs
LP and SOCP
A KTEC Center of Excellence 26

Semidefinite Programming

A KTEC Center of Excellence

27

Building Convex Functions


From simple convex functions to complex:
some operations that preserve complexity
Nonnegative weighted sum Composition with affine function Pointwise maximum and supremum Composition Minimization Perspective ( g(x,t) = tf(x/t) )

A KTEC Center of Excellence

28

Verifying Convexity Remarks


For more detail and expansion, consult the referenced text, Convex Optimization Geometric Programs also convex, can be handled with a series of SDPs (skipped details here)

CVX converts the problem either to SOCP or


SDM (or a series of) and uses efficient solver
A KTEC Center of Excellence 29

Lagrangian
Standard form:

Lagrangian L:

Lambda, nu, Lagrange multipliers (dual variables)


A KTEC Center of Excellence 30

Lagrange Dual Function

Lagrange Dual found by minimizing L

with respect to primal variables


Often can take gradient of L w.r.t. primal var.s and set = 0 (SVM)
A KTEC Center of Excellence 31

Lagrange Dual Function


Note: Lagrange dual function is the pointwise infimum of family of affine functions of (lambda, nu) Thus, g is concave even if problem is not convex

A KTEC Center of Excellence

32

Lagrange Dual Function


Lagrange Dual provides lower bound on
objective value at solution

A KTEC Center of Excellence

33

Lagrangian as Linear Approximation, Lower Bound Simple interpretation of Lagrangian Can incorporate the constraints into objective as indicator functions
Infinity if violated, 0 otherwise:

In Lagrangian we use a soft linear approximation to the


indicator functions; under-estimator since

A KTEC Center of Excellence

34

Lagrange Dual Problem


Why not make the lower bound best possible? Dual problem:

Always convex opt. problem (even when primal is non-convex) Weak Duality: d* <= p* (have already seen this)
A KTEC Center of Excellence 35

Strong Duality
If d* = p*, strong duality holds
Does not hold in general

Slaters Theorem: If convex problem, and


strictly feasible point exists, then strong

duality holds! (proof too involved, refer to text)


=> For convex problems, can use dual problem to find solution
A KTEC Center of Excellence 36

Complementary Slackness
When strong duality holds
(definition)

(since constraints satisfied at x*)

Sandwiched between f0(x), last 2 inequalities are equalities, simple!

A KTEC Center of Excellence

37

Complementary Slackness
Which means: Since each term is non-positive, we have complementary slackness: Whenever constraint is non-active, corresponding multiplier is zero
A KTEC Center of Excellence 38

Complementary Slackness
This can also be described by

Since usually only a few active constraints at solution (see geometry), the dual variable lambda is often sparse
Note: In general no guarantee

A KTEC Center of Excellence

39

Complementary Slackness
As we will see, this is why support vector machines result in solution with only key support vectors
These come from the dual problem, constraints correspond to points, and complementary slackness ensures only the active points are kept

A KTEC Center of Excellence

40

Complementary Slackness
However, avoid common misconceptions when it comes to SVM and complementary slackness! E.g. if Lagrange multiplier is 0, constraint could still be active! (not bijection!)

This means:

A KTEC Center of Excellence

41

KKT Conditions
The KKT conditions are then just what we
call that set of conditions required at the solution (basically list what we know) KKT conditions play important role
Can sometimes be used to find solution analytically Otherwise can think of many methods as ways of solving KKT conditions

A KTEC Center of Excellence

42

KKT Conditions
Again given strong duality and assuming differentiable, since gradient must be 0 at x*

Thus, putting it all together, for non-convex


problems we have
A KTEC Center of Excellence 43

KKT Conditions non-convex

Necessary conditions

A KTEC Center of Excellence

44

KKT Conditions convex

Also sufficient
conditions:
1+2 -> xt is feasible. 3 -> L(x,lt,nt) is convex 5 -> xt minimizes L(x,lt,nt) so g(lt,nt) = L(xt,lt,nt)
A KTEC Center of Excellence

45

Brief description of interior point method


Solve a series of equality constrained problems with Newtons method Approximate constraints with log-barrier (approx. of indicator)

A KTEC Center of Excellence

46

Brief description of interior point method

As t gets larger, approximation becomes better


A KTEC Center of Excellence 47

Central Path Idea

A KTEC Center of Excellence

48

CVX: Convex Optimization Made Easy


CVX is a Matlab toolbox
Allows you to flexibly express convex optimization problems Translates these to a general form and uses efficient solver (SOCP, SDP, or a series of these) http://www.stanford.edu/~boyd/cvx/

All you have to do is design the convex

optimization problem
Plug into CVX, a first version of algorithm implemented More specialized solver may be necessary for some applications
A KTEC Center of Excellence 49

CVX - Examples
Quadratic program: given H, f, A, and b
cvx_begin variable x(n)

minimize (x*H*x + f*x)


subject to A*x >= b cvx_end

A KTEC Center of Excellence

50

CVX - Examples
SVM-type formulation with L1 norm
cvx_begin variable w(p) variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) + l1_lambda*norm(w,1) ) subject to X*w + by >= a - e; e >= ec; cvx_end

A KTEC Center of Excellence

51

CVX - Examples

More complicated terms built with expressions


cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a; cvx_end
A KTEC Center of Excellence

52

Questions
Questions, Comments?

A KTEC Center of Excellence

53

Extra proof

A KTEC Center of Excellence

54