Sie sind auf Seite 1von 85

IØ8400 - Mathematical Programming:

Mixed-Integer Nonlinear Optimization

Frederik Schulze Spüntrup

Norwegian University of Science and Technology


Department of Engineering Cybernetics

May 2018
Outline

1 Problem, Notation, and Definitions

2 Basic Building Blocks of MINLP Methods

3 MINLP Modeling Practices

4 Nonlinear Branch-and-Bound

5 Multitree Methods for MINLP

6 Single-Tree Methods for MINLP


Mixed-Integer Nonlinear Optimization
Mixed-Integer Nonlinear Program (MINLP)

minimize f (x)
x
subject to c(x) ≤ 0
x ∈X
xi ∈ Z for all i ∈ I

f : Rn → R, c : Rn → Rm smooth (often convex) functions


X ∈ Rn bounded, polyhedral set, e.g. X = {x : l ≤ AT x ≤ u}
I ⊂ {1, . . . , n} subset of integer variables
xi ∈ Z for all i ∈ I ... combinatorial problem
Combines challenges of handling nonlinearities
with combinatorial explosion of integer variables
More general constraints possible, e.g. l ≤ c(x) ≤ u etc.
Complexity of MINLP
Mixed-Integer Nonlinear Program (MINLP)

minimize f (x)
x
subject to c(x) ≤ 0
x ∈X
xi ∈ Z for all i ∈ I

Complexity of MINLP
MINLP is NP-hard: includes MILP, which are NP-hard
[Kannan and Monma, 1978]
Worse: MINLP are undecidable [Jeroslow, 1973]:
quadratically constrained IP for which no computing device
can compute the optimum for all problems in this class
... but we’re OK if X is compact!
Notation

Some notation used throughout the course ...


f (k) = f (x (k) ) evaluated at x = x (k)
∇f (k) = ∇f (x (k) ) gradient
λi ci (c) is ∇2 L(k)
P
Hessian of Lagrangian L(x, λ) = f (x) −
... assumes X polyhedral
Subscripts denote components, e.g. xi is component i of x
If J ⊂ {1, . . . , n} then xJ are components of x corres. to J
xI integer and xC are the continuous variables, p = |I |
Floor and ceiling operators: bxi c and dxi e:
bxi c largest integer smaller than or equal to xi
dxi e smallest integer larger than or equal to xi
Convexity of Nonlinear Functions

MINLP techniques distinguish convex and nonconvex MINLPs.


For our purposes, we define convexity as ...
Definition
A function f : Rn → R is convex, iff ∀x (0) , x (1) ∈ Rn we have:

f (x (1) ) ≥ f (x (0) ) + (x (1) − x (0) )T ∇f (0)

In a slight abuse of notation, we say that ...


Definition
MINLP is a convex if the problem functions f (x) and c(x) are
convex functions. If either f (x) or any ci (x) is a nonconvex
function, then MINLP is nonconvex.
Convexity (cont.)
We also define the convex hull of a set S as ...
Definition
For a set S, the convex hull of S is conv(S):
n o
x|x = λx (1) + (1 − λ)x (0) , ∀0 ≤ λ ≤ 1, ∀x (0) , x (1) ∈ S .

If X = {x ∈ Zp : l ≤ x ≤ u} and l ∈ Zp , u ∈ Zp ,
then conv(X ) = [l, u]p
Finding convex hull is hard, even for polyhedral X .
Convex hull important for MILP ...

Theorem
MILP can be solved as LP over the convex hull of feasible set.
MILP 6= MINLP
Important difference between MINLP and MILP
n
X
minimize (xi − 12 )2 , subject to xi ∈ {0, 1}
x
i=1

... solution is not extreme point (lies in interior)


Remedy: Introduce objective η and a constraint η ≥ f (x)


 minimize η, x2

 η,x

 subject to f (x) ≤ η,

 c(x) ≤ 0,
x ∈ X,




xi ∈ Z, ∀i ∈ I . (x̂1 , x̂2 )

Assume wlog that MINLP objective x1


η
is linear
Outline

Problem, Notation, and Definitions

2 Basic Building Blocks of MINLP Methods

MINLP Modeling Practices

Nonlinear Branch-and-Bound

Multitree Methods for MINLP

Single-Tree Methods for MINLP


Relaxation and Constraint Enforcement

Relaxation
Used to compute a lower bound on the optimum
Obtained by enlarging feasible set; e.g. ignore constraints
Typically much easier to solve than MINLP

Constraint Enforcement
Exclude solutions from relaxations not feasible in MINLP
Refine or tighten of relaxation; e.g. add valid inequalities

Upper Bounds
Obtained from any feasible point; e.g. solve NLP for fixed xI
Relaxations of Integrality

Definition (Relaxation)
Optimization problem min{f˘(x) : x ∈ R} is a relaxation of
min{f (x) : x ∈ F}, iff R ⊃ F and f˘(x) ≤ f (x) for all x ∈ F.

Goal: relaxation easy to solve globally, e.g. MILP or NLP

Relaxing Integrality
Relax Integrality xi ∈ Z to xi ∈ R for all i ∈ I
Gives nonlinear relaxation of MINLP, or NLP:

 minimize

x
f (x),

 subject to c(x) ≤ 0,
 x ∈ X , continuous

Used in branch-and-bound algorithms


Relaxations of Nonlinear Convex Constraints
Relaxing Convex Constraints
Convex 0 ≥ c(x) and η ≥ f (x)f relaxed by supporting
hyperplanes
T
η ≥ f (k) + ∇f (k) (x − x (k) )
T
0 ≥ c (k) + ∇c (k) (x − x (k) )
for a set of points x (k) , k = 1, . . . , K .
Obtain polyhedral relaxation of convex constraints.
Used in the outer approximation methods.
Relaxations of Nonconvex Constraints
Relaxing Nonconvex Constraints
Construct convex underestimators, f˘(x) and c̆(x) for
nonconvex functions c(x) and f (x):

f˘(x) ≤ f (x) and c̆(x) ≤ c(x), ∀x ∈ conv(X ).

Relax constraints z ≥ f (x) and 0 ≥ c(x) as

z ≥ f˘(x) and 0 ≥ c̆(x).

Used in spatial branch-and-bound.


Relaxations Summary

Nonlinear and polyhedral relaxation


Relaxations

Relaxations can be combined to produce better algorithms


Relax convex underestimators via supporting hyperplanes.
Relax integrality of polyhedral relaxation to obtain an LP.

Relaxations are useful because we have following result:


Theorem
If the solution of the relaxation of the η-MINLP is feasible in the
η-MINLP, then it solves the MINLP.

... but if solution of relaxation is not feasible, then need ...


Constraint Enforcement

Goal: Given solution of relaxation, x̂, not feasible in MINLP,


exclude it from further consideration to ensure convergence

Three constraint enforcement strategies


1 Relaxation refinement: tighten the relaxation
2 Branching: disjunction to exclude set of non-integer points
3 Spatial branching: divide region into sub-regions

Strategies can be combined ...


Constraint Enforcement: Refinement
Tighten the relaxation to remove current solution x̂ of relaxation
Add a valid inequality to relaxation, i.e. an inequality that is
satisfied by all feasible solutions of MINLP
Valid inequality is called a cut if it excludes x̂
Example: c(x) ≤ 0 convex, and ∃i : ci (x̂) > 0, then

0 ≥ ĉi + ∇ĉ T (x − x̂)

cuts off x.
ˆ
Used in Benders decomposition and outer approximation.
MILP: cuts are basis for branch-and-cut techniques.

19 / 73
Constraint Enforcement: Branching
Eliminate current x̂ solution by branch on integer variables:
1 Select fractional x̂i for some i ∈ I
2 Create two new relaxations by adding

xi ≤ bx̂i c and xi ≥ dx̂i e respectively

... solution to MINLP lies in one of the new relaxations.

... creates branch-and-bound tree


Branch-and-Bound Trees can be Huge

Tree after 360 s CPU time has more than 10,000 nodes
Constraint Enforcement: Spatial Branching
Enforcement for relaxed nonconvex constraints
Combine branching and relaxation refinement
Branch on continuous variable and split domain in two parts.
Create new relaxation over (reduced) sub-domains.
Generates tree similar to integer branching.
Mix with interval techniques to eliminate sub-domains.
Nonconvex MINLPs combine all 3 enforcement techniques.
Outline

1 Problem, Notation, and Definitions

2 Basic Building Blocks of MINLP Methods

3 Nonlinear Optimization Background

4 MINLP Modeling Practices

5 Course Outline

6 Summary and Exercises


MINLP Modeling Practices

Modeling plays a fundamental role in MILP see [Williams, 1999]


... even more important in MINLP
MINLP combines integer and nonlinear formulations
Reformulations of nonlinear relationships can be convex
Interactions of nonlinear functions and binary variables
Sometimes we can linearize expressions

MINLP Modeling Preference


We prefer linear over convex over nonconvex formulations.
Convexification of Binary Quadratic Programs

Consider pure binary quadratic function

q(x) = x T Qx + g T x where x ∈ {0, 1}p

Let λ be smallest eigenvalue of Q

If λ ≥ 0 then q(x) is convex

Convexification of Binary Quadratics


Let W := Q − λI and c := g + λe, where e = (1, . . . , 1),
then q(x) = x T Wx + c T x is convex.
Exploiting Low-Rank Hessians

Consider (convex) quadratic function

q(x) = x T Wx + g T x,

where x mixture of variables, and W dense with structure:


W = Z T R −1 Z low rank, e.g. estimation problems
R ∈ Rm×m nonsingular (co-variance matrix)
Z ∈ Rm×n , where m  n and Z is sparse.
Then introduce variables z, constraints

z = Zx, and write x T Wx = z T R −1 z

... QP/NLP solvers can exploit sparsity of Z .


Linearization of Constraints

Assum x2 6= 0. A simple transformation (a constant parameter):


x1
= a ⇔ x1 = ax2
x2

Linearization of bilinear terms x1 x2 with:


Binary variable x2 ∈ {0, 1}
Variable upper bound: 0 ≤ x1 ≤ Ux2
... introduce new variable x12 to replace x1 x2 and add constraints

0 ≤ x12 ≤ x2 U and − U(1 − x2 ) ≤ x1 − x12 ≤ U(1 − x2 ),


Never Multiply a Nonlinear Function by a Binary

Previous example generalizes to nonlinear functions


Often binary variables “switch” constraints on/off

Warning
Never model on/off constraints by multiplying by a binary variable.

Three alternative approaches


Disjunctive programming, see [Grossmann and Lee, 2003]
Perspective formulations (not always), see
[Günlük and Linderoth, 2012]
Big-M formulation (weak relaxations)
Avoiding Undefined Nonlinear Expressions
MINLP solvers fail because NLP solver gets IEEE exception, e.g.

c(x1 ) = − ln(sin(x1 )) ≤ 0,

cannot be evaluated at sin(x1 ) ≤ 0

Reformulate equivalently as

c̃(x2 ) = − ln(x2 ) ≤ 0, x2 = sin(x1 ), and x2 ≥ 0.

IPM solvers never evaluate at x2 ≤ 0


Active-set method can also safeguard against x2 ≤ 0
x2 ≥ 0 is s simple bound which can be enforced exactly
x2 = 0 get IEEE exception ⇒ trap & reduce trust-region
As x2 → 0, the constraint violation c(x2 ) → ∞
Variable Transformations
Design of multiproduct batch plant includes nonconvex terms
X β
X ψi
αj Nj Vj j ; Ci Nj ≥ τij ; Ci ≤ γ
Bi
j∈M i∈N

where variables are upper case, parameters are Greek letters.

Introduce log-transform variables

vj = ln(Vj ), nj = ln(Nj ), bi = ln(Bi ), ci = ln(Ci ).

Transformed expressions are convex:


X X
αj e nj +βj vj , ci + nj ≥ ln(τij ), ψi e ci −bi ≤ γ
j∈M i∈N
Design of Water Distribution Networks

Model of water, gas, air networks


Goal: design minimum cost network from discrete pipe diameters

N nodes in network
S source nodes
A: arcs in the network
Design of Water Distribution Networks

Goal: design minimum cost network from discrete pipe diameters


N nodes, S source nodes, A: arcs in the network
Variables:
qij : flow pipe (i, j) ∈ A
dij : diameter of pipe (i, j) ∈ A, where dij ∈ {P1 , . . . , Pr }
hi : hydraulic head at node i ∈ N
zij : binary variables model flow direction (i, j) ∈ A
aij : area of cross section (i, j) ∈ A
yijk : SOS-1 variables to model diameter
NB: aij = πdij2 /4 is redundant ... but useful!
Design of Water Distribution Networks

N nodes, S source nodes, A: arcs in the network


Equations for qij flow pipe (i, j) ∈ A
Conservation of flow at every node
X X
qij − qji = Di , ∀i ∈ N − S.
(i,j)∈A (j,i)∈A

Flow bounds are linear in dij ... nonlinear in aij :

−Vmax aij ≤ qij ≤ Vmax aij , ∀(i, j) ∈ A.


Design of Water Distribution Networks

Modeling Trick: SOS & Nonlinear Expressions


Modeling discrete dij ∈ {P1 , . . . , Pr } and nonlinear aij = πdij2 /4:
1 Introduce SOS-1 variables yijk ∈ {0, 1} for k = 1, . . . , r
2 Model discrete choice as
r
X r
X
yijk = 1, and Pk yijk = dij . ∀(i, j) ∈ A,
k=1 k=1

3 Model nonlinear relationship as


r
X
(πPk /4)yijk = aij , ∀(i, j) ∈ A.
k=1

⇒ no longer need aij = πdij2 /4!


Design of Water Distribution Networks
Nonsmooth pressure loss model along arc (i, j) ∈ A

sgn(qij )|qij |c1 c2 Lij Kij−c1


hi − hj =
dijc3

... introduce binary variables to model nonsmooth term |qij |c1


1 Add binary variables z ∈ {0, 1}
ij
2

0 ≤ qij+ ≤ Qmax zij , 0 ≤ qij− ≤ Qmax (1 − zij ), qij = qij+ − qij− .


3 Pressure drop becomes
h c1  c1 i
qij+ − qij− c2 Lij Kij−c1
hi − hj = , ∀(i, j) ∈ A.
dijc3

... can again linearize the dijc3 expression with SOS


... alternative uses complementarity
Other MINLP Applications
MINLP
minimize f (x)
x
subject to c(x) ≤ 0
x ∈X
xi ∈ Z for all i ∈ I

Applications:
reactor core reload operation
power grid operation & design
buildings co-generation
optimal oil-spill response
gas transmission networks
Application: Distillation Column Design

Mixed Integer Nonlinear Program (MINLP)

minimize f (x) subject to c(x) ≤ 0, x ∈ X , xi ∈ Z ∀ i ∈ I


x

Small process design example:


synthesis of distillation column
nonlinear physics: phase equilibrium,
component material balance
integers model number of trays in columns
xI ∈ {0, 1}p models position of feeds

Process network design for fossil power plants ...


Collections of MINLP Test Problems

AMPL Collections of MINLP Test Problems


1 MacMINLP www.mcs.anl.gov/~leyffer/macminlp/
2 IBM/CMU collection egon.cheme.cmu.edu/ibm/page.htm

GAMS Collections of MINLP Test Problems


1 GAMS MINLP-world www.gamsworld.org/minlp/
2 MINLP CyberInfrastructure www.minlp.org/index.php

Solve MINLPs online on the NEOS server,


www.neos-server.org/neos/
Mixed-Integer Nonlinear Optimization
Mixed-Integer Nonlinear Program (MINLP)

minimize f (x)
x
subject to c(x) ≤ 0
x ∈X
xi ∈ Z for all i ∈ I

Assumptions
A1 X is a bounded polyhedral set.
A2 f and c are twice continuously differentiable convex
functions.
A3 MINLP satisfies a constraint qualification.

A2 (convexity) most restrictive (relaxed next week);


A3 is technical (MFCQ would have been sufficient);
Overview of Basic Methods
Two broad classes of method
1 Single-tree methods; e.g.

Nonlinear branch-and-bound
LP/NLP-based branch-and-bound
Nonlinear branch-and-cut
... build and search a single tree
2 Multi-tree methods; e.g.
Outer approximation
Benders decomposition
Extended cutting plane method
... alternate between NLP and MILP solves
Multi-tree methods only evaluate functions at integer points

Concentrate on methods for convex problems today.

Can mix different methods & techniques.


Nonlinear Branch-and-Bound
Solve NLP relaxation (xI continuous, not integer)

minimize f (x) subject to c(x) ≤ 0, x ∈ X


x

If xi ∈ Z ∀ i ∈ I , then solved MINLP


If relaxation is infeasible, then MINLP infeasible
... otherwise search tree whose nodes are NLPs:

 minimize

 x
f (x),
subject to c(x) ≤ 0,

(NLP(l, u))


 x ∈ X,
 li ≤ xi ≤ ui , ∀i ∈ I .

NLP relaxation is NLP(−∞, ∞)


Nonlinear Branch-and-Bound

Branching: solution x 0 of (NLP(l, u)) feasible but not integral:


Find a nonintegral variable, say xi0 , i ∈ I .
Introduce two child nodes with bounds
(l − , u − ) = (l + , u + ) = (l, u) and setting:

ui− := bxi0 c, and li+ := dxi0 e

Two new NLPs: NLP(l − , u − ) / NLP(l + , u + )


... corresponding to down/up branch
In practice, store problems on a heap H

... pruning rules limit the tree ⇒ no complete enumeration


Nonlinear Branch-and-Bound

Pruning Rules: Let U upper bound on solution


Infeasible: (NLP(l, u)) infeasible
⇒ any NLP in subtree is also infeasible.
Integer feasible: solution x (l,u) of (NLP(l, u)) integral
If f (x (l,u) ) < U, then new x ∗ = x (l,u) and U = f (l,u) .
Otherwise, prune node: no better solution in subtree
Dominated by U: optimal value of (NLP(l, u)), f (x (l,u) ) ≥ U
⇒ prune node: no better integer solution in subtree
Nonlinear Branch-and-Bound
Solve relaxed NLP (0 ≤ y ≤ 1 continuous relaxation)
. . . solution value provides lower bound

Branch on yi non-integral
Solve NLPs & branch until
1
2
Node infeasible:•
Node integer feasible: 
⇒ get upper bound (U)
3 Lower bound ≥ U:
Search until no unexplored nodes
Software:
GAMS-SBB, MINLPBB [L]
BARON [Sahinidis] global
Couenne [Belotti] global
Nonlinear Branch-and-Bound

Branch-and-bound for MINLP


Choose tol  > 0, set U = ∞, add (NLP(−∞, ∞)) to heap H.
while H 6= ∅ do
Remove (NLP(l, u)) from heap: H = H − { NLP(l, u) }.
Solve (NLP(l, u)) ⇒ solution x (l,u)
if (NLP(l, u)) is infeasible then
Prune node: infeasible
else if f (x (l,u) ) > U then
Prune node; dominated by bound U
(l,u)
else if xI integral then
Update incumbent : U = f (x (l,u) ), x ∗ = x (l,u) .
else
(l,u)
BranchOnVariable(xi , l, u, H)
Nonlinear Branch-and-Bound

BnB is finite, provided X is bounded polyhedron:

Theorem (Finiteness of Nonlinear BnB)


Solve MINLP by nonlinear branch-and-bound, and assume that
A1-A3 hold. Then BnB terminates at optimal solution (or
indication of infeasibility) after a finite number of nodes.

Proof.
(A1-A3) ⇒ every NLP solved globally
Boundedness of X ⇒ tree is finite
⇒ convergence, see e.g. Theorem 24.1 of [Schrijver, 1986]. 
Nonlinear Branch-and-Bound

BnB trees can get pretty large ...

Synthesis MINLP B&B Tree: 10000+ nodes after 360s

... be smart about solving NLPs & searching tree!


Outline

1 Problem Definition and Assumptions

2 Nonlinear Branch-and-Bound

3 Advanced Nonlinear Branch-and-Bound

4 Multi-Tree Methods

5 Summary and Exercises


Advanced Nonlinear BnB

Basic BnB will work, but needs improvements:


Selection of branching variables
Node selection strategies
Inexact NLP solves & hot-starts
Cutting planes & branch-and-cut
Software design & modern solvers, e.g. MINOTAUR
... critical for efficient implementation
Advanced Nonlinear BnB: Variable Selection

Ideally choose branching sequence to minimize tree size


... impossible in practice; sequence not known a priori
⇒ choose variable that maximizes increase in lower bound

Let Ic ⊂ I set of fractional integer variables


... in practice choose subset of important variables (priorities)

Maximum Fractional Branching


Branch on variable i0 with largest integer violation:

i0 = argmax {min (xi − bxi c , dxi e − xi )} ,


i∈Ic

... as bad as random branching [Achterberg et al., 2004]


Advanced Nonlinear BnB: Variable Selection
Successful rules estimate change in lower bound after branching
Increasing lower bound improves pruning
For xi , i ∈ I , define degradation estimates Di+ and Di−
for increase in lower bound
Goal: make both Di+ and Di− large!
Combine Di+ and Di− into single score:

si := µ min(Di+ , Di− ) + (1 − µ) max(Di+ , Di− ),

where parameter µ ∈ [0, 1] close to 1.

Degradation-Based Branching
Branch on variable i0 with largest integer violation:

i0 = argmax {si }
i∈Ic

... methods differ by how Di+ and Di− computed


Advanced Nonlinear BnB: Variable Selection

The first approach for computing degradations is ...


Strong Branching
Solve 2 × |Ic | NLPs for every potential child node:
Solution at current (parent) node (NLP(l, u)) is fp := f (l,u)
∀ xi , i ∈ Ic create two temporary NLPs:
NLPi (l − , u − ) and NLPi (l + , u + )
Solve both NLPs ...
... if both infeasible, then prune (NLP(l, u))
... if one infeasible, then fix integer in parent (NLP(l, u))
... otherwise, let solutions be fi + and fi − and compute

Di+ = fi + − fp , and Di− = fi − − fp .


Advanced Nonlinear BnB: Variable Selection
Advantage/Disadvantage of strong branching:
Good: Reduce the number of nodes in tree
Bad: Slow overall, because too many NLPs solved
Solving NLPs approximately does not help

Fact: MILP 6= MINLP


LPs hot-start efficiently (re-use basis factors),
but NLPs cannot be warm-started (neither IPM nor SQP)!

Reason (NLPs are, well ... nonlinear):


NLP methods are iterative: generate sequence {x (k) }
At solution, x (l) , have factors from x (l−1) ... out-of-date
Advanced Nonlinear BnB: Variable Selection
Pseudocost Branching
Keep history of past branching to estimate degradations
ni+ , ni− number of times up/down node solved for variable i
pi+ , pi− pseudocosts updated when child solved:

fi + − fp
pi+ = + pi+ , ni+ = ni+ + 1 or pi− = ... ni− = ...
dxi e − xi

Compute estimates of Di+ and Di− or branching:

pi+ − pi−
Di+ = (dxi e − xi ) and D = (xi − bxi c) .
ni+ i
ni−

Initialize pseudocosts with strong branching


Good estimates for MILP, [Linderoth and Savelsbergh, 1999]
Not clear how to update, if NLP infeasible ... `1 penalty?
Advanced Nonlinear BnB: Variable Selection
Following approach combines strong branching and pseudocosts
Reliability Branching
Strong branching early, then pseudocost branching
While ni+ or ni− ≤ τ (= 5) do strong branching on xi
Once ni+ or ni− > τ switch to pseudocost

Important alternatives to variables branching:


SOS branching, see [Beale and Tomlin, 1970]
Branching on split disjunctions
   
aT xI ≤ b ∨ aT xI ≥ b + 1

where a ∈ Zp and b ∈ Z ... conceptually like conjugate


directions
Advanced Nonlinear BnB: Node Selection

Strategic decision on which node to solve next.

Goals of node selection


Find good feasible solution quickly to reduce upper bound, U
Prove optimality of incumbent x ∗ by increasing lower bound

Popular strategies:
1 Depth-first search
2 Best-bound search
3 Hybrid schemes
Advanced Nonlinear BnB: Depth-First Search

Depth-First Search
Select deepest node in tree (or last node added to heap H)

Advantages:
Easy to implement (Sven likes that ;-)
Keeps list of open nodes, H, as small as possible
Minimizes the change to next NLP (NLP(l, u)):
... only single bound changes ⇒ better hot-starts

Disadvantages:
poor performance if no upper bound is found:
⇒ explores nodes with a lower bound larger than solution
Advanced Nonlinear BnB: Best-Bound Search

Best-Bound Search
Select node with best lower bound

Advantages:
Minimizes number of nodes for fixed sequence of branching
decisions, because all explored nodes would have been
explored independent of upper bound

Disadvantages:
Requires more memory to store open problems
Less opportunity for warm-starts of NLPs
Tends to find integer solutions at the end
Advanced Nonlinear BnB: Best-Bound Search

1 Best Expected Bound: node with best bound after branching:

pi+ − pi−
bp+ = fp + (dxi e − xi ) and b p = f p + (xi − bx i c) .
ni+ ni−

Next node is maxp min bp+ , bp− .


 

2 Best Estimate: node with best expected solution in subtree

p+ p−
X  
ep = fp + min (dxi e − xi ) i+ , (xi − bxi c) i− ,
i:x fractional
ni ni
i

Next node is maxp {ep }.

... good search strategies combine depth-first and best-bound


Advanced Nonlinear BnB: Inexact NLP Solves

Role for inexact solves in MINLP


Provide approximate values for strong branching
Solve NLPs inexactly during tree-search:
[Borchers and Mitchell, 1994] consider single SQP iteration
... perform early branching if limit seems non-integral
... augmented Lagrangian dual for bounds
[Leyffer, 2001] considers single SQP iteration
... use outer approximation instead of dual
... numerical results disappointing
... reduce solve time by factor 2-3 at best
New idea: search QP tree & exploit hot-starts for QPs
... QP-diving discussed next ...
MINLP Trees are Huge

Synthesis MINLP B&B Tree: 10000+ nodes after 360s

⇒ use MILP solvers to search tree?


Multi-Tree Methods
MILP solvers much better developed than MINLP
LPs are easy to hot-start
Decades of investment into software
MILPs much easier; e.g. no need for constraint qualifications
⇒ developed methods that exploit this technology

Multi-Tree Methods
Outer approximation [Duran and Grossmann, 1986]
Benders decomposition [Geoffrion, 1972]
Extended cutting plane method
[Westerlund and Pettersson, 1995]
... solve a sequence of MILP (and NLP) problems

Multi-tree methods evaluate functions “only” at integer points!


Multi-Tree Methods

Recall the η-MINLP formulation



 minimize η,
 η,x


 subject to f (x) ≤ η,

 c(x) ≤ 0,
x ∈ X,




xi ∈ Z, ∀i ∈ I .

where we have “linearized” the objective: η ≥ f (x)

Use η-MINLP in this section


Outer Approximation
Outer Approximation

Mixed-Integer Nonlinear Program (MINLP)

minimize f (x) subject to c(x) ≤ 0, x ∈ X , xi ∈ Z ∀ i ∈ I


x

(j)
NLP subproblem for fixed integers xI :

 minimize

x
f (x)
(j)
NLP(xI ) subject to c(x) ≤ 0
 (j)
x ∈ X and xI = xI ,

with solution x (j) .

(j)
If (NLP(xI )) infeasible then solve feasibility problem ...
Outer Approximation
Mixed-Integer Nonlinear Program (MINLP)

minimize f (x) subject to c(x) ≤ 0, x ∈ X , xi ∈ Z ∀ i ∈ I


x

(j)
NLP feasibility problem for fixed integers xI :
X
wi ci+ (x)

 minimize

 x
(j) i∈J ⊥
F(xI )

 subject to ci (x) ≤ 0, i ∈ J
(j)
x ∈ X and xI = xI ,

where wi > 0 are weights and solution is x (j) .


(j)
(F(xI )) generalize minimum norm solution
(j)
... provides certificate that (NLP(xI )) infeasible
Outer Approximation

Convexity of f and c implies that

Lemma (Supporting Hyperplane)


(j) (j)
Linearization about solution x (j) of (NLP(xI )) or (F(xI )),
T T
(OA) η ≥ f (j) +∇f (j) (x −x (j) ) and 0 ≥ c (j) +∇c (j) (x −x (j) ),

are outer approximations of the feasible set of η-MINLP.

Lemma (Feasibility Cuts)


(j) (j)
If (NLP(xI )) infeasible, then (OA) cuts off xI = xI .
Outer Approximation
Mixed-Integer Nonlinear Program (η-MINLP)

min η s.t. η ≥ f (x), c(x) ≤ 0, x ∈ X , xi ∈ Z ∀ i ∈ I


x

Define index set of all possible feasible integers, X


n o
(j) (j)
X := x (j) ∈ X : x (j) solves (NLP(xI )) or (F(xI )) .

... boundedness of X implies |X | < ∞


Construct equivalent OA-MILP (outer approximation MILP)

minimize η,

η,x



 subject to η ≥ f (j) + ∇f (j)T (x − x (j) ), ∀x (j) ∈ X


T
 0 ≥ c (j) + ∇c (j) (x − x (j) ), ∀x (j) ∈ X
x ∈ X,




xi ∈ Z, ∀i ∈ I .

Outer Approximation

Theorem (Equivalence of OA-MILP and MINLP)


Let assumptions A1-A3 hold
If x ∗ solves MINLP, then it also solves OA-MILP
If (η ∗ , x ∗ ) solves OA-MILP, then η ∗ is optimal value of
MINLP, and xI∗ is an optimal integer.

MILP and MINLP are not quite equivalent


Example where OA-MILP not equivalent to MINLP

minimize x3 subject to(x1 − 12 )2 + x22 + x33 ≤ 1, x1 ∈ Z ∩ [−1, 2].


x

... OA-MILP has no coefficients for x2 ... undefined


Outer Approximation Algorithm
Solving OA-MILP clearly not sensible; define upper bound as
n o
(j)
U k := min f (j) | (NLP(xI )) is feasible .
j≤k

Define relaxation of OA-MILP, using X k ⊂ X , with X 0 = {0}

minimize η,

η,x



subject to η ≤ U k − 



 T
η ≥ f (j) + ∇f (j) (x − x (j) ), ∀x (j) ∈ X k

M(X k ) T


 0 ≥ c (j) + ∇c (j) (x − x (j) ), ∀x (j) ∈ X k
x ∈ X,




xi ∈ Z, ∀i ∈ I .

... build up better OA X k iteratively for k = 0, 1, . . .


Outer Approximation Algorithm

Outer approximation
Given x (0) , choose tol  > 0, set U −1 = ∞, set k = 0, and
X −1 = ∅.
repeat
(j) (j)
Solve (NLP(xI )) or (F(xI )); solution x (j) .
(j)
if (NLP(xI )) feasible & f (j) < U k−1 then
Update best point: x ∗ = x (j) and U k = f (j) .
else
Set U k = U k−1 .
Linearize f and c about x (j) and set X k = X k−1 ∪ {j}.
Solve (M(X k )), let solution be x (k+1) & set k = k + 1.
until MILP (M(X k )) is infeasible

57 / 66
Outer Approximation Algorithm
Alternate between solve NLP(yj ) and MILP relaxation

MILP ⇒ lower bound; NLP ⇒ upper bound


... convergence follows from convexity & finiteness
Outer Approximation Algorithm

Theorem (Convergence of Outer Approximation)


Let Assumptions A1-A3 hold, then outer approximation terminates
finitely at optimal solution of MINLP or indicates it is infeasible.

Outline of Proof.
(j)
Optimality of x (j) in (NLP(xI ))
⇒ η ≥ f (j) for feasible point of (M(X k ))
... ensures finiteness, since X compact
Convexity ⇒ linearizations are supporting hyperplanes
... ensures optimality
Worst Case Example of Outer Approximation
[Hijazi et al., 2010] construct infeasible MINLP:

minimize 0
y
n 
1 2 n−1
X 
subject to yi − ≤
2 4
i=1
y ∈ {0, 1}n

n−1
Intersection of ball of radius 2
with unit hypercube.

Lemma
OA cannot cut more than one vertex of the hypercube
MILP master problem feasible for any k < 2n OA cuts

Theorem
OA visits all 2n vertices
Benders Decomposition
Benders Decomposition
Can derive Benders cut from outer approximation:
(j)
Take optimal multipliers λ(j) of (NLP(xI ))
Sum outer approximations
T
η≥ f (j) + ∇f (j) (x − x (j) )
T T
+ λ(j) c (j) + ∇c (j) (x − x (j) )

0≥
T (j)
η≥ f (j) + ∇I L(j) (xI − xI )

Using KKT conditions wrt continuous variables xC :


T
0 = ∇C L(j) = ∇C f + ∇C cλ(j) & λ(j) c (j) = 0
... eliminates continuous variables, xC
Benders cut only involves integer variables xI .
T (j)
Can write cut as η ≥ f (j) + µ(j) (xI − xI ),
(j) (j)
where µ(j) multiplier of x = xI in (NLP(xI ))
Benders Decomposition

For MINLPs with convex problems functions f , c, we can show:


1 Benders cuts are weaker than outer approximation

Benders cuts are linear combination of OA


2 Outer Approximation & Benders converge finitely
Functions f , c convex ⇒ OA cuts are outer approximations
OA cut derived at optimal solution to NLP subproblem
⇒ 6 ∃ feasible descend directions
... every OA cut corresponds to first-order condition
(j)
Cannot visit same integer xI more than once
⇒ terminate finitely at optimal solution
(j)
Readily extended to situations where (NLP(xI )) not feasible.
Extended Cutting Plane (ECP) Method

ECP is variation of OA
Does not solve any NLPs
Linearize f , c around solution of MILP, x (k) :
If x (k) feasible in linearization, then solved MINLP
Otherwise, pick linearization violated by x (k) and add to MILP

Properties of ECP
Convergence follows from OA & Kelley’s cutting plane method
NLP convergence rate is linear
Can visit same integer more than once ...
... single-tree methods use ECP cuts to speed up convergence
Summary of Multi-Tree Methods
Three Classes of Multi-Tree Methods
1 Outer approximation based on first-order expansion
2 Benders decomposition linear combination of OA cuts
3 Extended cutting plane method: avoids NLP solves

Common Properties of Multi-Tree Methods


Only need to solve final MILP to optimality
... can terminate MILP early ... adding more NLPs
Can add cuts from incomplete NLP solves
Worst-case example for OA also applies for Benders and ECP
No warm-starts for MILP ... expensive tree-search

... motivates single-tree methods next ...


Single-Tree Methods

Goal: perform only a single MILP tree-search per MINLP


Branch-and-Bound is s single-tree method
... but can be too expensive per node
Avoid re-solving MILP master for OA, Benders, and ECP
... instead update master (MILP) data
Can be interpreted as branch-and-cut approach
... but cuts are very simple
Solve MILP with full set of linearizations X and apply delayed
constraint generation technique of “formulation constraints”
Xk ⊂ X.
At integer points, separate cuts by solving an NLP
... basis for state-of-the-art convex MINLP solvers
LP/NLP-Based Branch-and-Bound
Aim: avoid solving expensive MILPs
Form MILP outer
approximation
LP/NLP-Based Branch-and-Bound
Aim: avoid solving expensive MILPs
Form MILP outer
approximation
Take initial MILP tree
LP/NLP-Based Branch-and-Bound
Aim: avoid solving expensive MILPs
Form MILP outer
approximation
Take initial MILP tree
interrupt MILP, when new
(j)
integral xI found
(j)
⇒ solve NLP(xI ) get x (j)
LP/NLP-Based Branch-and-Bound
Aim: avoid solving expensive MILPs
Form MILP outer
approximation
Take initial MILP tree
interrupt MILP, when new
(j)
integral xI found
(j)
⇒ solve NLP(xI ) get x (j)
linearize f , c about x (j)
⇒ add linearization to tree
LP/NLP-Based Branch-and-Bound
Aim: avoid solving expensive MILPs
Form MILP outer
approximation
Take initial MILP tree
interrupt MILP, when new
(j)
integral xI found
(j)
⇒ solve NLP(xI ) get x (j)
linearize f , c about x (j)
⇒ add linearization to tree
continue MILP tree-search
... until lower bound ≥ upper bound
Software:
FilMINT: FilterSQP + MINTO [L & Linderoth]
BONMIN: IPOPT + CBC [IBM/CMU] also BB, OA
LP/NLP-Based Branch-and-Bound
Algorithmic refinements, e.g. [Abhishek et al., 2010]
Advanced MILP search and cut management techniques
... remove “old” OA cuts from LP relaxation ⇒ faster LP
Generate cuts at non-integer points: ECP cuts are cheap
... generate cuts early (near root) of tree
Strong branching, adaptive node selection & cut management
Fewer nodes, if we add more cuts (e.g. ECP cuts)
More cuts make LP harder to solve
⇒ remove outdated/inactive cuts from LP relaxation
... balance OA accuracy with LP solvability
Compress OA cuts into Benders cuts can be OK

Interpret as hybrid algorithm, [Bonami et al., 2008]

Benders and ECP versions are also possible.


Thanks for your attention!

Next part: Cutting Planes for Convex MINLPs


by Henrik

Slides adapted, modified and extended based on the mateiral from the Graduate School in Systems, Optimization, Control and Networks at Université
catholique de Louvain, February 2013

Das könnte Ihnen auch gefallen