Presentation 6 1 MINLP

IØ8400 - Mathematical Programming:
Mixed-Integer Nonlinear Optimization
Frederik Schulze Spüntrup
Norwegian University of Science and Technology

Department of Engineering Cybernetics
May 2018
Outline
1 Problem, Notation, and Definitions
2 Basic Building Blocks of MINLP Methods
3 MINLP Modeling Practices
4 Nonlinear Branch-and-Bound
5 Multitree Methods for MINLP
6 Single-Tree Methods for MINLP

Mixed-Integer Nonlinear Program (MINLP)
minimize f (x)
x
subject to c(x) ≤ 0
x ∈X
xi ∈ Z for all i ∈ I
f : Rn → R, c : Rn → Rm smooth (often convex) functions

X ∈ Rn bounded, polyhedral set, e.g. X = {x : l ≤ AT x ≤ u}
I ⊂ {1, . . . , n} subset of integer variables
xi ∈ Z for all i ∈ I ... combinatorial problem
Combines challenges of handling nonlinearities
with combinatorial explosion of integer variables
More general constraints possible, e.g. l ≤ c(x) ≤ u etc.
Complexity of MINLP
minimize f (x)
x
x ∈X
Complexity of MINLP
MINLP is NP-hard: includes MILP, which are NP-hard
[Kannan and Monma, 1978]
Worse: MINLP are undecidable [Jeroslow, 1973]:
quadratically constrained IP for which no computing device
can compute the optimum for all problems in this class
... but we’re OK if X is compact!
Notation
Some notation used throughout the course ...

f (k) = f (x (k) ) evaluated at x = x (k)
∇f (k) = ∇f (x (k) ) gradient
λi ci (c) is ∇2 L(k)
P
Hessian of Lagrangian L(x, λ) = f (x) −
... assumes X polyhedral
Subscripts denote components, e.g. xi is component i of x
If J ⊂ {1, . . . , n} then xJ are components of x corres. to J
xI integer and xC are the continuous variables, p = |I |
Floor and ceiling operators: bxi c and dxi e:
bxi c largest integer smaller than or equal to xi
dxi e smallest integer larger than or equal to xi
Convexity of Nonlinear Functions
MINLP techniques distinguish convex and nonconvex MINLPs.

For our purposes, we define convexity as ...
Definition
A function f : Rn → R is convex, iff ∀x (0) , x (1) ∈ Rn we have:
f (x (1) ) ≥ f (x (0) ) + (x (1) − x (0) )T ∇f (0)
In a slight abuse of notation, we say that ...

Definition
MINLP is a convex if the problem functions f (x) and c(x) are
convex functions. If either f (x) or any ci (x) is a nonconvex
function, then MINLP is nonconvex.
Convexity (cont.)
We also define the convex hull of a set S as ...
Definition
For a set S, the convex hull of S is conv(S):
n o
x|x = λx (1) + (1 − λ)x (0) , ∀0 ≤ λ ≤ 1, ∀x (0) , x (1) ∈ S .
If X = {x ∈ Zp : l ≤ x ≤ u} and l ∈ Zp , u ∈ Zp ,
then conv(X ) = [l, u]p
Finding convex hull is hard, even for polyhedral X .
Convex hull important for MILP ...
Theorem
MILP can be solved as LP over the convex hull of feasible set.
MILP 6= MINLP
Important difference between MINLP and MILP
n
X
minimize (xi − 12 )2 , subject to xi ∈ {0, 1}
x
i=1
... solution is not extreme point (lies in interior)

Remedy: Introduce objective η and a constraint η ≥ f (x)

 minimize η, x2

 η,x

 subject to f (x) ≤ η,

 c(x) ≤ 0,
x ∈ X,




xi ∈ Z, ∀i ∈ I . (x̂1 , x̂2 )

Assume wlog that MINLP objective x1

η
is linear
Outline
Problem, Notation, and Definitions
MINLP Modeling Practices
Nonlinear Branch-and-Bound
Multitree Methods for MINLP
Single-Tree Methods for MINLP

Relaxation and Constraint Enforcement
Relaxation
Used to compute a lower bound on the optimum
Obtained by enlarging feasible set; e.g. ignore constraints
Typically much easier to solve than MINLP
Constraint Enforcement
Exclude solutions from relaxations not feasible in MINLP
Refine or tighten of relaxation; e.g. add valid inequalities
Upper Bounds
Obtained from any feasible point; e.g. solve NLP for fixed xI
Relaxations of Integrality
Definition (Relaxation)
Optimization problem min{f˘(x) : x ∈ R} is a relaxation of
min{f (x) : x ∈ F}, iff R ⊃ F and f˘(x) ≤ f (x) for all x ∈ F.
Goal: relaxation easy to solve globally, e.g. MILP or NLP
Relaxing Integrality
Relax Integrality xi ∈ Z to xi ∈ R for all i ∈ I
Gives nonlinear relaxation of MINLP, or NLP:

 minimize

x
f (x),
 subject to c(x) ≤ 0,
 x ∈ X , continuous
Used in branch-and-bound algorithms

Relaxations of Nonlinear Convex Constraints
Relaxing Convex Constraints
Convex 0 ≥ c(x) and η ≥ f (x)f relaxed by supporting
hyperplanes
T
η ≥ f (k) + ∇f (k) (x − x (k) )
T
0 ≥ c (k) + ∇c (k) (x − x (k) )
for a set of points x (k) , k = 1, . . . , K .
Obtain polyhedral relaxation of convex constraints.
Used in the outer approximation methods.
Relaxations of Nonconvex Constraints
Relaxing Nonconvex Constraints
Construct convex underestimators, f˘(x) and c̆(x) for
nonconvex functions c(x) and f (x):
f˘(x) ≤ f (x) and c̆(x) ≤ c(x), ∀x ∈ conv(X ).
Relax constraints z ≥ f (x) and 0 ≥ c(x) as
z ≥ f˘(x) and 0 ≥ c̆(x).
Used in spatial branch-and-bound.

Relaxations Summary
Nonlinear and polyhedral relaxation

Relaxations
Relaxations can be combined to produce better algorithms

Relax convex underestimators via supporting hyperplanes.
Relax integrality of polyhedral relaxation to obtain an LP.
Relaxations are useful because we have following result:

Theorem
If the solution of the relaxation of the η-MINLP is feasible in the
η-MINLP, then it solves the MINLP.
... but if solution of relaxation is not feasible, then need ...

Constraint Enforcement
Goal: Given solution of relaxation, x̂, not feasible in MINLP,

exclude it from further consideration to ensure convergence
Three constraint enforcement strategies

1 Relaxation refinement: tighten the relaxation
2 Branching: disjunction to exclude set of non-integer points
3 Spatial branching: divide region into sub-regions
Strategies can be combined ...

Constraint Enforcement: Refinement
Tighten the relaxation to remove current solution x̂ of relaxation
Add a valid inequality to relaxation, i.e. an inequality that is
satisfied by all feasible solutions of MINLP
Valid inequality is called a cut if it excludes x̂
Example: c(x) ≤ 0 convex, and ∃i : ci (x̂) > 0, then
0 ≥ ĉi + ∇ĉ T (x − x̂)
cuts off x.
ˆ
Used in Benders decomposition and outer approximation.
MILP: cuts are basis for branch-and-cut techniques.
19 / 73
Constraint Enforcement: Branching
Eliminate current x̂ solution by branch on integer variables:
1 Select fractional x̂i for some i ∈ I
2 Create two new relaxations by adding
xi ≤ bx̂i c and xi ≥ dx̂i e respectively
... solution to MINLP lies in one of the new relaxations.
... creates branch-and-bound tree

Branch-and-Bound Trees can be Huge
Tree after 360 s CPU time has more than 10,000 nodes
Constraint Enforcement: Spatial Branching
Enforcement for relaxed nonconvex constraints
Combine branching and relaxation refinement
Branch on continuous variable and split domain in two parts.
Create new relaxation over (reduced) sub-domains.
Generates tree similar to integer branching.
Mix with interval techniques to eliminate sub-domains.
Nonconvex MINLPs combine all 3 enforcement techniques.
Outline
1 Problem, Notation, and Definitions
3 Nonlinear Optimization Background
4 MINLP Modeling Practices
5 Course Outline
6 Summary and Exercises

MINLP Modeling Practices
Modeling plays a fundamental role in MILP see [Williams, 1999]

... even more important in MINLP
MINLP combines integer and nonlinear formulations
Reformulations of nonlinear relationships can be convex
Interactions of nonlinear functions and binary variables
Sometimes we can linearize expressions
MINLP Modeling Preference

We prefer linear over convex over nonconvex formulations.
Convexification of Binary Quadratic Programs
Consider pure binary quadratic function
q(x) = x T Qx + g T x where x ∈ {0, 1}p
Let λ be smallest eigenvalue of Q
If λ ≥ 0 then q(x) is convex
Convexification of Binary Quadratics

Let W := Q − λI and c := g + λe, where e = (1, . . . , 1),
then q(x) = x T Wx + c T x is convex.
Exploiting Low-Rank Hessians
Consider (convex) quadratic function
q(x) = x T Wx + g T x,
where x mixture of variables, and W dense with structure:

W = Z T R −1 Z low rank, e.g. estimation problems
R ∈ Rm×m nonsingular (co-variance matrix)
Z ∈ Rm×n , where m n and Z is sparse.
Then introduce variables z, constraints
z = Zx, and write x T Wx = z T R −1 z
... QP/NLP solvers can exploit sparsity of Z .

Linearization of Constraints
Assum x2 6= 0. A simple transformation (a constant parameter):

x1
= a ⇔ x1 = ax2
x2
Linearization of bilinear terms x1 x2 with:

Binary variable x2 ∈ {0, 1}
Variable upper bound: 0 ≤ x1 ≤ Ux2
... introduce new variable x12 to replace x1 x2 and add constraints
0 ≤ x12 ≤ x2 U and − U(1 − x2 ) ≤ x1 − x12 ≤ U(1 − x2 ),

Never Multiply a Nonlinear Function by a Binary
Previous example generalizes to nonlinear functions

Often binary variables “switch” constraints on/off
Warning
Never model on/off constraints by multiplying by a binary variable.
Three alternative approaches

Disjunctive programming, see [Grossmann and Lee, 2003]
Perspective formulations (not always), see
[Günlük and Linderoth, 2012]
Big-M formulation (weak relaxations)
Avoiding Undefined Nonlinear Expressions
MINLP solvers fail because NLP solver gets IEEE exception, e.g.
c(x1 ) = − ln(sin(x1 )) ≤ 0,
cannot be evaluated at sin(x1 ) ≤ 0
Reformulate equivalently as
c̃(x2 ) = − ln(x2 ) ≤ 0, x2 = sin(x1 ), and x2 ≥ 0.
IPM solvers never evaluate at x2 ≤ 0

Active-set method can also safeguard against x2 ≤ 0
x2 ≥ 0 is s simple bound which can be enforced exactly
x2 = 0 get IEEE exception ⇒ trap & reduce trust-region
As x2 → 0, the constraint violation c(x2 ) → ∞
Variable Transformations
Design of multiproduct batch plant includes nonconvex terms
X β
X ψi
αj Nj Vj j ; Ci Nj ≥ τij ; Ci ≤ γ
Bi
j∈M i∈N
where variables are upper case, parameters are Greek letters.
Introduce log-transform variables
vj = ln(Vj ), nj = ln(Nj ), bi = ln(Bi ), ci = ln(Ci ).
Transformed expressions are convex:

X X
αj e nj +βj vj , ci + nj ≥ ln(τij ), ψi e ci −bi ≤ γ
j∈M i∈N
Design of Water Distribution Networks
Model of water, gas, air networks

Goal: design minimum cost network from discrete pipe diameters
N nodes in network
S source nodes
A: arcs in the network
Goal: design minimum cost network from discrete pipe diameters

N nodes, S source nodes, A: arcs in the network
Variables:
qij : flow pipe (i, j) ∈ A
dij : diameter of pipe (i, j) ∈ A, where dij ∈ {P1 , . . . , Pr }
hi : hydraulic head at node i ∈ N
zij : binary variables model flow direction (i, j) ∈ A
aij : area of cross section (i, j) ∈ A
yijk : SOS-1 variables to model diameter
NB: aij = πdij2 /4 is redundant ... but useful!
N nodes, S source nodes, A: arcs in the network

Equations for qij flow pipe (i, j) ∈ A
Conservation of flow at every node
X X
qij − qji = Di , ∀i ∈ N − S.
(i,j)∈A (j,i)∈A
Flow bounds are linear in dij ... nonlinear in aij :
−Vmax aij ≤ qij ≤ Vmax aij , ∀(i, j) ∈ A.

Modeling Trick: SOS & Nonlinear Expressions

Modeling discrete dij ∈ {P1 , . . . , Pr } and nonlinear aij = πdij2 /4:
1 Introduce SOS-1 variables yijk ∈ {0, 1} for k = 1, . . . , r
2 Model discrete choice as
r
X r
X
yijk = 1, and Pk yijk = dij . ∀(i, j) ∈ A,
k=1 k=1
3 Model nonlinear relationship as

r
X
(πPk /4)yijk = aij , ∀(i, j) ∈ A.
k=1
⇒ no longer need aij = πdij2 /4!

Nonsmooth pressure loss model along arc (i, j) ∈ A
sgn(qij )|qij |c1 c2 Lij Kij−c1

hi − hj =
dijc3
... introduce binary variables to model nonsmooth term |qij |c1

1 Add binary variables z ∈ {0, 1}
ij
2
0 ≤ qij+ ≤ Qmax zij , 0 ≤ qij− ≤ Qmax (1 − zij ), qij = qij+ − qij− .

3 Pressure drop becomes
h c1 c1 i
qij+ − qij− c2 Lij Kij−c1
hi − hj = , ∀(i, j) ∈ A.
dijc3
... can again linearize the dijc3 expression with SOS

... alternative uses complementarity
Other MINLP Applications
MINLP
minimize f (x)
x
x ∈X
Applications:
reactor core reload operation
power grid operation & design
buildings co-generation
optimal oil-spill response
gas transmission networks
Application: Distillation Column Design
Mixed Integer Nonlinear Program (MINLP)
minimize f (x) subject to c(x) ≤ 0, x ∈ X , xi ∈ Z ∀ i ∈ I

x
Small process design example:

synthesis of distillation column
nonlinear physics: phase equilibrium,
component material balance
integers model number of trays in columns
xI ∈ {0, 1}p models position of feeds
Process network design for fossil power plants ...

Collections of MINLP Test Problems
AMPL Collections of MINLP Test Problems

1 MacMINLP www.mcs.anl.gov/~leyffer/macminlp/
2 IBM/CMU collection egon.cheme.cmu.edu/ibm/page.htm
GAMS Collections of MINLP Test Problems

1 GAMS MINLP-world www.gamsworld.org/minlp/
2 MINLP CyberInfrastructure www.minlp.org/index.php
Solve MINLPs online on the NEOS server,

www.neos-server.org/neos/
minimize f (x)
x
x ∈X
Assumptions
A1 X is a bounded polyhedral set.
A2 f and c are twice continuously differentiable convex
functions.
A3 MINLP satisfies a constraint qualification.
A2 (convexity) most restrictive (relaxed next week);

A3 is technical (MFCQ would have been sufficient);
Overview of Basic Methods
Two broad classes of method
1 Single-tree methods; e.g.
Nonlinear branch-and-bound
LP/NLP-based branch-and-bound
Nonlinear branch-and-cut
... build and search a single tree
2 Multi-tree methods; e.g.
Outer approximation
Benders decomposition
Extended cutting plane method
... alternate between NLP and MILP solves
Multi-tree methods only evaluate functions at integer points
Concentrate on methods for convex problems today.
Can mix different methods & techniques.

Solve NLP relaxation (xI continuous, not integer)
minimize f (x) subject to c(x) ≤ 0, x ∈ X

x
If xi ∈ Z ∀ i ∈ I , then solved MINLP

If relaxation is infeasible, then MINLP infeasible
... otherwise search tree whose nodes are NLPs:

 minimize

 x
f (x),
subject to c(x) ≤ 0,

(NLP(l, u))


 x ∈ X,
 li ≤ xi ≤ ui , ∀i ∈ I .
NLP relaxation is NLP(−∞, ∞)

Branching: solution x 0 of (NLP(l, u)) feasible but not integral:

Find a nonintegral variable, say xi0 , i ∈ I .
Introduce two child nodes with bounds
(l − , u − ) = (l + , u + ) = (l, u) and setting:
ui− := bxi0 c, and li+ := dxi0 e
Two new NLPs: NLP(l − , u − ) / NLP(l + , u + )

... corresponding to down/up branch
In practice, store problems on a heap H
... pruning rules limit the tree ⇒ no complete enumeration

Pruning Rules: Let U upper bound on solution

Infeasible: (NLP(l, u)) infeasible
⇒ any NLP in subtree is also infeasible.
Integer feasible: solution x (l,u) of (NLP(l, u)) integral
If f (x (l,u) ) < U, then new x ∗ = x (l,u) and U = f (l,u) .
Otherwise, prune node: no better solution in subtree
Dominated by U: optimal value of (NLP(l, u)), f (x (l,u) ) ≥ U
⇒ prune node: no better integer solution in subtree
Solve relaxed NLP (0 ≤ y ≤ 1 continuous relaxation)
. . . solution value provides lower bound
Branch on yi non-integral
Solve NLPs & branch until
1
2
Node infeasible:•
Node integer feasible:
⇒ get upper bound (U)
3 Lower bound ≥ U:
Search until no unexplored nodes
Software:
GAMS-SBB, MINLPBB [L]
BARON [Sahinidis] global
Couenne [Belotti] global
Branch-and-bound for MINLP

Choose tol > 0, set U = ∞, add (NLP(−∞, ∞)) to heap H.
while H 6= ∅ do
Remove (NLP(l, u)) from heap: H = H − { NLP(l, u) }.
Solve (NLP(l, u)) ⇒ solution x (l,u)
if (NLP(l, u)) is infeasible then
Prune node: infeasible
else if f (x (l,u) ) > U then
Prune node; dominated by bound U
(l,u)
else if xI integral then
Update incumbent : U = f (x (l,u) ), x ∗ = x (l,u) .
else
(l,u)
BranchOnVariable(xi , l, u, H)
BnB is finite, provided X is bounded polyhedron:
Theorem (Finiteness of Nonlinear BnB)

Solve MINLP by nonlinear branch-and-bound, and assume that
A1-A3 hold. Then BnB terminates at optimal solution (or
indication of infeasibility) after a finite number of nodes.
Proof.
(A1-A3) ⇒ every NLP solved globally
Boundedness of X ⇒ tree is finite
⇒ convergence, see e.g. Theorem 24.1 of [Schrijver, 1986].
BnB trees can get pretty large ...
Synthesis MINLP B&B Tree: 10000+ nodes after 360s
... be smart about solving NLPs & searching tree!

Outline
1 Problem Definition and Assumptions
2 Nonlinear Branch-and-Bound
3 Advanced Nonlinear Branch-and-Bound
4 Multi-Tree Methods
5 Summary and Exercises

Advanced Nonlinear BnB
Basic BnB will work, but needs improvements:

Selection of branching variables
Node selection strategies
Inexact NLP solves & hot-starts
Cutting planes & branch-and-cut
Software design & modern solvers, e.g. MINOTAUR
... critical for efficient implementation
Advanced Nonlinear BnB: Variable Selection
Ideally choose branching sequence to minimize tree size

... impossible in practice; sequence not known a priori
⇒ choose variable that maximizes increase in lower bound
Let Ic ⊂ I set of fractional integer variables

... in practice choose subset of important variables (priorities)
Maximum Fractional Branching

Branch on variable i0 with largest integer violation:
i0 = argmax {min (xi − bxi c , dxi e − xi )} ,

i∈Ic
... as bad as random branching [Achterberg et al., 2004]

Successful rules estimate change in lower bound after branching
Increasing lower bound improves pruning
For xi , i ∈ I , define degradation estimates Di+ and Di−
for increase in lower bound
Goal: make both Di+ and Di− large!
Combine Di+ and Di− into single score:
si := µ min(Di+ , Di− ) + (1 − µ) max(Di+ , Di− ),
where parameter µ ∈ [0, 1] close to 1.
Degradation-Based Branching
Branch on variable i0 with largest integer violation:
i0 = argmax {si }
i∈Ic
... methods differ by how Di+ and Di− computed

The first approach for computing degradations is ...

Strong Branching
Solve 2 × |Ic | NLPs for every potential child node:
Solution at current (parent) node (NLP(l, u)) is fp := f (l,u)
∀ xi , i ∈ Ic create two temporary NLPs:
NLPi (l − , u − ) and NLPi (l + , u + )
Solve both NLPs ...
... if both infeasible, then prune (NLP(l, u))
... if one infeasible, then fix integer in parent (NLP(l, u))
... otherwise, let solutions be fi + and fi − and compute
Di+ = fi + − fp , and Di− = fi − − fp .

Advantage/Disadvantage of strong branching:
Good: Reduce the number of nodes in tree
Bad: Slow overall, because too many NLPs solved
Solving NLPs approximately does not help
Fact: MILP 6= MINLP

LPs hot-start efficiently (re-use basis factors),
but NLPs cannot be warm-started (neither IPM nor SQP)!
Reason (NLPs are, well ... nonlinear):

NLP methods are iterative: generate sequence {x (k) }
At solution, x (l) , have factors from x (l−1) ... out-of-date
Pseudocost Branching
Keep history of past branching to estimate degradations
ni+ , ni− number of times up/down node solved for variable i
pi+ , pi− pseudocosts updated when child solved:
fi + − fp
pi+ = + pi+ , ni+ = ni+ + 1 or pi− = ... ni− = ...
dxi e − xi
Compute estimates of Di+ and Di− or branching:
pi+ − pi−
Di+ = (dxi e − xi ) and D = (xi − bxi c) .
ni+ i
ni−
Initialize pseudocosts with strong branching

Good estimates for MILP, [Linderoth and Savelsbergh, 1999]
Not clear how to update, if NLP infeasible ... `1 penalty?
Following approach combines strong branching and pseudocosts
Reliability Branching
Strong branching early, then pseudocost branching
While ni+ or ni− ≤ τ (= 5) do strong branching on xi
Once ni+ or ni− > τ switch to pseudocost
Important alternatives to variables branching:

SOS branching, see [Beale and Tomlin, 1970]
Branching on split disjunctions

aT xI ≤ b ∨ aT xI ≥ b + 1
where a ∈ Zp and b ∈ Z ... conceptually like conjugate

directions
Advanced Nonlinear BnB: Node Selection
Strategic decision on which node to solve next.
Goals of node selection

Find good feasible solution quickly to reduce upper bound, U
Prove optimality of incumbent x ∗ by increasing lower bound
Popular strategies:
1 Depth-first search
2 Best-bound search
3 Hybrid schemes
Advanced Nonlinear BnB: Depth-First Search
Depth-First Search
Select deepest node in tree (or last node added to heap H)
Advantages:
Easy to implement (Sven likes that ;-)
Keeps list of open nodes, H, as small as possible
Minimizes the change to next NLP (NLP(l, u)):
... only single bound changes ⇒ better hot-starts
Disadvantages:
poor performance if no upper bound is found:
⇒ explores nodes with a lower bound larger than solution
Advanced Nonlinear BnB: Best-Bound Search
Best-Bound Search
Select node with best lower bound
Advantages:
Minimizes number of nodes for fixed sequence of branching
decisions, because all explored nodes would have been
explored independent of upper bound
Disadvantages:
Requires more memory to store open problems
Less opportunity for warm-starts of NLPs
Tends to find integer solutions at the end
Advanced Nonlinear BnB: Best-Bound Search
1 Best Expected Bound: node with best bound after branching:
pi+ − pi−
bp+ = fp + (dxi e − xi ) and b p = f p + (xi − bx i c) .
ni+ ni−
Next node is maxp min bp+ , bp− .

2 Best Estimate: node with best expected solution in subtree
p+ p−
X
ep = fp + min (dxi e − xi ) i+ , (xi − bxi c) i− ,
i:x fractional
ni ni
i
Next node is maxp {ep }.
... good search strategies combine depth-first and best-bound

Advanced Nonlinear BnB: Inexact NLP Solves
Role for inexact solves in MINLP

Provide approximate values for strong branching
Solve NLPs inexactly during tree-search:
[Borchers and Mitchell, 1994] consider single SQP iteration
... perform early branching if limit seems non-integral
... augmented Lagrangian dual for bounds
[Leyffer, 2001] considers single SQP iteration
... use outer approximation instead of dual
... numerical results disappointing
... reduce solve time by factor 2-3 at best
New idea: search QP tree & exploit hot-starts for QPs
... QP-diving discussed next ...
MINLP Trees are Huge
Synthesis MINLP B&B Tree: 10000+ nodes after 360s
⇒ use MILP solvers to search tree?

Multi-Tree Methods
MILP solvers much better developed than MINLP
LPs are easy to hot-start
Decades of investment into software
MILPs much easier; e.g. no need for constraint qualifications
⇒ developed methods that exploit this technology
Multi-Tree Methods
Outer approximation [Duran and Grossmann, 1986]
Benders decomposition [Geoffrion, 1972]
Extended cutting plane method
[Westerlund and Pettersson, 1995]
... solve a sequence of MILP (and NLP) problems
Multi-tree methods evaluate functions “only” at integer points!

Multi-Tree Methods
Recall the η-MINLP formulation


 minimize η,
 η,x


 subject to f (x) ≤ η,

 c(x) ≤ 0,
x ∈ X,




xi ∈ Z, ∀i ∈ I .

where we have “linearized” the objective: η ≥ f (x)
Use η-MINLP in this section

Outer Approximation
Outer Approximation

x
(j)
NLP subproblem for fixed integers xI :

 minimize

x
f (x)
(j)
NLP(xI ) subject to c(x) ≤ 0
 (j)
x ∈ X and xI = xI ,

with solution x (j) .
(j)
If (NLP(xI )) infeasible then solve feasibility problem ...
Outer Approximation

x
(j)
NLP feasibility problem for fixed integers xI :
X
wi ci+ (x)

 minimize

 x
(j) i∈J ⊥
F(xI )

 subject to ci (x) ≤ 0, i ∈ J
(j)
x ∈ X and xI = xI ,

where wi > 0 are weights and solution is x (j) .

(j)
(F(xI )) generalize minimum norm solution
(j)
... provides certificate that (NLP(xI )) infeasible
Outer Approximation
Convexity of f and c implies that
Lemma (Supporting Hyperplane)

(j) (j)
Linearization about solution x (j) of (NLP(xI )) or (F(xI )),
T T
(OA) η ≥ f (j) +∇f (j) (x −x (j) ) and 0 ≥ c (j) +∇c (j) (x −x (j) ),
are outer approximations of the feasible set of η-MINLP.
Lemma (Feasibility Cuts)

(j) (j)
If (NLP(xI )) infeasible, then (OA) cuts off xI = xI .
Outer Approximation
Mixed-Integer Nonlinear Program (η-MINLP)
min η s.t. η ≥ f (x), c(x) ≤ 0, x ∈ X , xi ∈ Z ∀ i ∈ I

x
Define index set of all possible feasible integers, X

n o
(j) (j)
X := x (j) ∈ X : x (j) solves (NLP(xI )) or (F(xI )) .
... boundedness of X implies |X | < ∞

Construct equivalent OA-MILP (outer approximation MILP)
minimize η,

η,x



 subject to η ≥ f (j) + ∇f (j)T (x − x (j) ), ∀x (j) ∈ X


T
 0 ≥ c (j) + ∇c (j) (x − x (j) ), ∀x (j) ∈ X
x ∈ X,




xi ∈ Z, ∀i ∈ I .

Outer Approximation
Theorem (Equivalence of OA-MILP and MINLP)

Let assumptions A1-A3 hold
If x ∗ solves MINLP, then it also solves OA-MILP
If (η ∗ , x ∗ ) solves OA-MILP, then η ∗ is optimal value of
MINLP, and xI∗ is an optimal integer.
MILP and MINLP are not quite equivalent

Example where OA-MILP not equivalent to MINLP
minimize x3 subject to(x1 − 12 )2 + x22 + x33 ≤ 1, x1 ∈ Z ∩ [−1, 2].

x
... OA-MILP has no coefficients for x2 ... undefined

Outer Approximation Algorithm
Solving OA-MILP clearly not sensible; define upper bound as
n o
(j)
U k := min f (j) | (NLP(xI )) is feasible .
j≤k
Define relaxation of OA-MILP, using X k ⊂ X , with X 0 = {0}
minimize η,

η,x



subject to η ≤ U k −



 T
η ≥ f (j) + ∇f (j) (x − x (j) ), ∀x (j) ∈ X k

M(X k ) T


 0 ≥ c (j) + ∇c (j) (x − x (j) ), ∀x (j) ∈ X k
x ∈ X,




xi ∈ Z, ∀i ∈ I .

... build up better OA X k iteratively for k = 0, 1, . . .

Outer approximation
Given x (0) , choose tol > 0, set U −1 = ∞, set k = 0, and
X −1 = ∅.
repeat
(j) (j)
Solve (NLP(xI )) or (F(xI )); solution x (j) .
(j)
if (NLP(xI )) feasible & f (j) < U k−1 then
Update best point: x ∗ = x (j) and U k = f (j) .
else
Set U k = U k−1 .
Linearize f and c about x (j) and set X k = X k−1 ∪ {j}.
Solve (M(X k )), let solution be x (k+1) & set k = k + 1.
until MILP (M(X k )) is infeasible
57 / 66
Alternate between solve NLP(yj ) and MILP relaxation
MILP ⇒ lower bound; NLP ⇒ upper bound

... convergence follows from convexity & finiteness
Theorem (Convergence of Outer Approximation)

Let Assumptions A1-A3 hold, then outer approximation terminates
finitely at optimal solution of MINLP or indicates it is infeasible.
Outline of Proof.
(j)
Optimality of x (j) in (NLP(xI ))
⇒ η ≥ f (j) for feasible point of (M(X k ))
... ensures finiteness, since X compact
Convexity ⇒ linearizations are supporting hyperplanes
... ensures optimality
Worst Case Example of Outer Approximation
[Hijazi et al., 2010] construct infeasible MINLP:
minimize 0
y
n
1 2 n−1
X
subject to yi − ≤
2 4
i=1
y ∈ {0, 1}n
√
n−1
Intersection of ball of radius 2
with unit hypercube.
Lemma
OA cannot cut more than one vertex of the hypercube
MILP master problem feasible for any k < 2n OA cuts
Theorem
OA visits all 2n vertices
Benders Decomposition
Can derive Benders cut from outer approximation:
(j)
Take optimal multipliers λ(j) of (NLP(xI ))
Sum outer approximations
T
η≥ f (j) + ∇f (j) (x − x (j) )
T T
+ λ(j) c (j) + ∇c (j) (x − x (j) )

0≥
T (j)
η≥ f (j) + ∇I L(j) (xI − xI )
Using KKT conditions wrt continuous variables xC :

T
0 = ∇C L(j) = ∇C f + ∇C cλ(j) & λ(j) c (j) = 0
... eliminates continuous variables, xC
Benders cut only involves integer variables xI .
T (j)
Can write cut as η ≥ f (j) + µ(j) (xI − xI ),
(j) (j)
where µ(j) multiplier of x = xI in (NLP(xI ))
For MINLPs with convex problems functions f , c, we can show:

1 Benders cuts are weaker than outer approximation
Benders cuts are linear combination of OA

2 Outer Approximation & Benders converge finitely
Functions f , c convex ⇒ OA cuts are outer approximations
OA cut derived at optimal solution to NLP subproblem
⇒ 6 ∃ feasible descend directions
... every OA cut corresponds to first-order condition
(j)
Cannot visit same integer xI more than once
⇒ terminate finitely at optimal solution
(j)
Readily extended to situations where (NLP(xI )) not feasible.
Extended Cutting Plane (ECP) Method
ECP is variation of OA
Does not solve any NLPs
Linearize f , c around solution of MILP, x (k) :
If x (k) feasible in linearization, then solved MINLP
Otherwise, pick linearization violated by x (k) and add to MILP
Properties of ECP
Convergence follows from OA & Kelley’s cutting plane method
NLP convergence rate is linear
Can visit same integer more than once ...
... single-tree methods use ECP cuts to speed up convergence
Summary of Multi-Tree Methods
Three Classes of Multi-Tree Methods
1 Outer approximation based on first-order expansion
2 Benders decomposition linear combination of OA cuts
3 Extended cutting plane method: avoids NLP solves
Common Properties of Multi-Tree Methods

Only need to solve final MILP to optimality
... can terminate MILP early ... adding more NLPs
Can add cuts from incomplete NLP solves
Worst-case example for OA also applies for Benders and ECP
No warm-starts for MILP ... expensive tree-search
... motivates single-tree methods next ...

Single-Tree Methods
Goal: perform only a single MILP tree-search per MINLP

Branch-and-Bound is s single-tree method
... but can be too expensive per node
Avoid re-solving MILP master for OA, Benders, and ECP
... instead update master (MILP) data
Can be interpreted as branch-and-cut approach
... but cuts are very simple
Solve MILP with full set of linearizations X and apply delayed
constraint generation technique of “formulation constraints”
Xk ⊂ X.
At integer points, separate cuts by solving an NLP
... basis for state-of-the-art convex MINLP solvers
LP/NLP-Based Branch-and-Bound
Aim: avoid solving expensive MILPs
Form MILP outer
approximation
Form MILP outer
approximation
Take initial MILP tree
Form MILP outer
approximation
interrupt MILP, when new
(j)
integral xI found
(j)
⇒ solve NLP(xI ) get x (j)
Form MILP outer
approximation
(j)
integral xI found
(j)
linearize f , c about x (j)
⇒ add linearization to tree
Form MILP outer
approximation
(j)
integral xI found
(j)
linearize f , c about x (j)
⇒ add linearization to tree
continue MILP tree-search
... until lower bound ≥ upper bound
Software:
FilMINT: FilterSQP + MINTO [L & Linderoth]
BONMIN: IPOPT + CBC [IBM/CMU] also BB, OA
Algorithmic refinements, e.g. [Abhishek et al., 2010]
Advanced MILP search and cut management techniques
... remove “old” OA cuts from LP relaxation ⇒ faster LP
Generate cuts at non-integer points: ECP cuts are cheap
... generate cuts early (near root) of tree
Strong branching, adaptive node selection & cut management
Fewer nodes, if we add more cuts (e.g. ECP cuts)
More cuts make LP harder to solve
⇒ remove outdated/inactive cuts from LP relaxation
... balance OA accuracy with LP solvability
Compress OA cuts into Benders cuts can be OK
Interpret as hybrid algorithm, [Bonami et al., 2008]
Benders and ECP versions are also possible.

Thanks for your attention!
Next part: Cutting Planes for Convex MINLPs

by Henrik
Slides adapted, modified and extended based on the mateiral from the Graduate School in Systems, Optimization, Control and Networks at Université
catholique de Louvain, February 2013

Presentation 6 1 MINLP

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Presentation 6 1 MINLP

Hochgeladen von

Copyright:

Verfügbare Formate

IØ8400 - Mathematical Programming:

Mixed-Integer Nonlinear Optimization

Frederik Schulze Spüntrup

Norwegian University of Science and Technology

1 Problem, Notation, and Definitions

2 Basic Building Blocks of MINLP Methods

3 MINLP Modeling Practices

5 Multitree Methods for MINLP

6 Single-Tree Methods for MINLP

f : Rn → R, c : Rn → Rm smooth (often convex) functions

Some notation used throughout the course ...

MINLP techniques distinguish convex and nonconvex MINLPs.

f (x (1) ) ≥ f (x (0) ) + (x (1) − x (0) )T ∇f (0)

In a slight abuse of notation, we say that ...

... solution is not extreme point (lies in interior)

Assume wlog that MINLP objective x1

Problem, Notation, and Definitions

2 Basic Building Blocks of MINLP Methods

MINLP Modeling Practices

Multitree Methods for MINLP

Single-Tree Methods for MINLP

Goal: relaxation easy to solve globally, e.g. MILP or NLP

Used in branch-and-bound algorithms

f˘(x) ≤ f (x) and c̆(x) ≤ c(x), ∀x ∈ conv(X ).

Relax constraints z ≥ f (x) and 0 ≥ c(x) as

z ≥ f˘(x) and 0 ≥ c̆(x).

Used in spatial branch-and-bound.

Nonlinear and polyhedral relaxation

Relaxations can be combined to produce better algorithms

Relaxations are useful because we have following result:

... but if solution of relaxation is not feasible, then need ...

Goal: Given solution of relaxation, x̂, not feasible in MINLP,

Three constraint enforcement strategies

Strategies can be combined ...

0 ≥ ĉi + ∇ĉ T (x − x̂)

xi ≤ bx̂i c and xi ≥ dx̂i e respectively

... solution to MINLP lies in one of the new relaxations.

... creates branch-and-bound tree

1 Problem, Notation, and Definitions

2 Basic Building Blocks of MINLP Methods

3 Nonlinear Optimization Background

4 MINLP Modeling Practices

6 Summary and Exercises

Modeling plays a fundamental role in MILP see [Williams, 1999]

MINLP Modeling Preference

Consider pure binary quadratic function

q(x) = x T Qx + g T x where x ∈ {0, 1}p

Let λ be smallest eigenvalue of Q

If λ ≥ 0 then q(x) is convex

Convexification of Binary Quadratics

Consider (convex) quadratic function

where x mixture of variables, and W dense with structure:

z = Zx, and write x T Wx = z T R −1 z

... QP/NLP solvers can exploit sparsity of Z .

Assum x2 6= 0. A simple transformation (a constant parameter):

Linearization of bilinear terms x1 x2 with:

0 ≤ x12 ≤ x2 U and − U(1 − x2 ) ≤ x1 − x12 ≤ U(1 − x2 ),

Previous example generalizes to nonlinear functions

Three alternative approaches

cannot be evaluated at sin(x1 ) ≤ 0

c̃(x2 ) = − ln(x2 ) ≤ 0, x2 = sin(x1 ), and x2 ≥ 0.

IPM solvers never evaluate at x2 ≤ 0

where variables are upper case, parameters are Greek letters.

Introduce log-transform variables