Mincostflow Notes

Discrete Optimization
MA3233 Course Notes

William J. Martin III
Mathematical Sciences
Worcester Polytechnic Institute
November 30, 2012
c _ 2010 William J. Martin III
all rights reserved
Contents
Contents i
1 Basic Graph Theory 2
1.1 Start at the beginning . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Coloring and Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Factors in graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Trees and the Greedy Algorithm 17
2.1 The greedy algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Prims Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Basic Search Trees 26
3.1 Generic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Breadth-rst and depth-rst search . . . . . . . . . . . . . . . . . . . 27
3.3 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Shortest Path Problems 33
4.1 The Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Dijkstras algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Proof of correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Other algorithms for shortest paths . . . . . . . . . . . . . . . . . . . 38
4.5 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Linear Programming 45
5.1 LP problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Shortest path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 LP algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 LP duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
i
ii CONTENTS
5.5 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 NP-coNP Predicates 55
6.1 Polynomial time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Non-deterministic polynomial time . . . . . . . . . . . . . . . . . . . 58
6.3 The big conjectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 NP-Complete and NP-hard problems . . . . . . . . . . . . . . . . . . 63
6.6 Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.7 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7 Network Flows 69
7.1 Statement of the problem . . . . . . . . . . . . . . . . . . . . . . . . 69
7.2 The Ford-Fulkerson algorithm . . . . . . . . . . . . . . . . . . . . . . 70
7.3 The Max-Flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . 73
7.4 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8 Dinics Algorithm for Network Flows 78
8.1 The Dinic algorithm for maximum ow . . . . . . . . . . . . . . . . . 78
8.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.3 Analysis of the Dinic algorithm . . . . . . . . . . . . . . . . . . . . . 83
8.4 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9 The Minimum Cost Flow Problem 86
9.1 Finding minimum cost ows . . . . . . . . . . . . . . . . . . . . . . . 86
9.2 Linear programming and the Magic Number Theorem . . . . . . . . . 89
9.3 The Menagerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Bibliography 91
Preface
These notes grew out of my teaching of the course MA3233, Discrete Optimization
at Worcester Polytechnic Institute in Fall 2008 and Fall 2010. I am indebted to
the students for helpful comments and corrections on the material included here. In
particular, the 2008 class produced scribe notes (mostly handwritten) on the lectures
in that rst delivery of the course.
The notes here are inuenced by several sources. In our course, we used the book
of Papadimitriou and Steiglitz as a guide; as a result, the notation used here mostly
follows that book. But our audience is dierent: rather than graduate students, we
are addressing these notes to second- and third-year undergraduates in mathematics
and related disciplines. And our focus here is only on discrete optimization; linear
programming, non-linear optimization, and basic graph theory are taught in other
courses at WPI and so these subjects are brought into purview only on an as-needed
basis. Finally, an undergraduate course at WPI consists of 28 lectures packed into
seven weeks, with the net eect that homeworks and exams are less conceptual and
more skill-oriented than at comparable universities.
I have beneted over the years from several teachers. In particular, I routinely
consult my personal course notes from C&O 650, taught by Jack Edmonds at the
University of Waterloo in the fall of 1987. I also recycle ideas picked up from Bill
Pulleyblanks oering of C&O 652 in Winter 1988, Rama Murtys lectures on the
matching lattice and discussions of computational complexity theory with various
colleagues, including James Currie, Dan Dougherty, Stan Selkow, and Madhu Sudan.
The notes are typeset using the LaTeX memoir document class. I am grateful
to Bill Farr at Worcester Polytechnic Institute, not only for teaching me about this
class, but also for teaching me about teaching, and for that extra inspiration that
gets a writing project moving.
1
One
Basic Graph Theory
Oct. 25, 2012
In this course, we consider optimization problems over discrete (usually nite)
spaces. By space here, I informally mean a set with some specied structure on
the set such as an assortment of binary relations and functions on that set and those
relations. A unifying concept for such objects is that of a graph. In this lecture, we
dene graphs, directed graphs, and some of the common substructures that we work
with in these graphs in our study of optimization.
1.1 Start at the beginning
An undirected graph is a very intuitive, simple mathematical structure. Since we
shall be dealing with these quite a lot, lets begin by dening them.
Denition 1.1.1. A graph is an ordered pair G = (V, E) where V is a nite set and
E is a nite collection (perhaps with repetition) of unordered pairs from V . The
members of V are called vertices (or nodes) and the members of E are called edges.
Technically, what we have dened here is a nite, undirected graph since we assume
V to be a nite set and the edges are unordered pairs of vertices. While we will have
little use for innite graphs in this course, we will study directed graphs, which will be
dened below. If the vertex set or edge set of a graph G have not been pre-specied,
it will be convenient to use V (G) and E(G) to denote these sets, respectively.
In Figure 1.1, we consider three small examples of graphs.
The graph on the left in Figure 1.1 has vertex set V (G) = A, B, C, D
and edge set E(G) = e
1
= [A, B], e
2
= [A, B], e
3
= [A, C], e
4
= [A, D], e
5
=
2
1.1. START AT THE BEGINNING 3
Figure 1.1: Three graphs.
[B, B], e
6
= [C, D], e
7
= [C, D]. The center graph, H, has vertex set
V (H) = x
1
, x
2
, x
3
and edge set E(H) = [x
1
, x
2
], [x
2
, x
2
]. Graph
K, on the right, has vertex set V (K) = u, v, w, x, y and edge set
E(K) = [u, w], [v, w], [w, x], [x, y].
As seen in these examples, a small graph is often best described by a drawing.
Each vertex is represented by a dot or circle in the plane and each edge is represented
by a continuous path joining the two vertices in it, i.e., the ends or endpoints of edge
e = [u, v] are the vertices u and v. It is important to note that the drawing is intended
to convey no more information than the combinatorial structure of the graph itself:
which vertices are the ends of each edge. The shape of the edge, or the fact that
two edges may cross somewhere other than an endpoint, is irrelevant to the structure
being dened or pictorially described. (Try drawing a graph with ve vertices and
ten edges, one for each pair of distinct vertices. Can you do this without making two
edges cross in the middle?) In spite of this potential for confusion, we frequently use
these graph drawings to convey information about graphs and algorithms on them.
Let G = (V, E) be a graph. An edge of the form e = [u, u] is called a loop in G
and a loopless graph is a graph with no loops, of course. If e, f E have the same
exact ends say e = [u, v] and f = [u, v], for example then we say G has multiple
edges. A simple graph is an undirected graph with no loops or multiple edges. For
example, in Figure 1, Graph K is simple, but graphs G and H are not simple.
A more precise denition of a graph can be given which avoids the set-theoretic
ambiguity of multiple edges. Formally, a graph is a triple G = (V, E, I) where V
and E are sets and I V E is an incidence relation with the property that each
e E appears either once or twice as the second coordinate of some ordered pair
4 CHAPTER 1. BASIC GRAPH THEORY
in I. (Edge e is incident with exactly one or exactly two elements of V .) In our
exploration, we will not need this level of formality.
A vertex v and an edge e in a graph G are said to be incident if v is an end of
edge e. Two vertices u and v are said to be adjacent if [u, v] is an edge. (Let us
write u v to denote the adjacency relation.) The degree of a vertex in a graph G
is dened to be the number of edges incident to that vertex, with the rule that loops
count twice. The degree of vertex v in a graph G is denoted deg(v) or deg
G
(v) if v is
a vertex of several graphs in a given discussion. For example, in graph H above,
deg(x
1
) = 1, deg(x
2
) = 3, deg(x
3
) = 0.
A walk in graph G = (V, E) is a sequence w = (v
0
, e
1
, v
1
, . . . , e
k
, v
k
) which al-
ternates between vertices and edges in such a way that only incident objects occur
in sequence; i.e., for each i, (1 i k), e
i
= [v
i1
, v
i
]. The walk w has length k
(the number of edges in the sequence), origin v
0
and terminus v
k
. We sometimes
say that w is a walk from v
0
to v
k
or simply a (v
0
, v
k
)-walk. A (v
0
, v
0
)-walk is
called a closed walk: it returns to its origin. A walk which repeats no vertex is a
path. If w = (v
0
, e
1
, v
1
, . . . , e
k
, v
k
) is a path in G, we say w is a path from v
0
to
v
k
or a (v
0
, v
k
)-path. A walk of positive length which repeats no vertex or edge,
with the exception that v
0
= v
k
is a cycle. While a cycle, described as a sequence
(v
0
, e
1
, . . . , e
k
, v
0
) has a natural origin and terminus, there are contexts in which a
cycle is best viewed as a subgraph where every vertex has degree two.
So what, then, is a subgraph? Let G = (V, E) and H = (V
, E
) be graphs. We
say H is a subgraph of G if
V
V
E
E
if e = [u, v] belongs to E
, then u, v must belong to V
A spanning subgraph is one in which all vertices are included: V
= V .
Since weve given a bunch of denitions, let us pause to remind ourselves of the
less intuitive examples of them.
If G is a graph, then both G itself and the empty graph H = (, ) are subgraphs
of G. If v is a vertex of graph G, then w = (v) is a walk of length zero in G. This w is
also a path of length zero, but it is not considered a cycle. However, if e = [u, u] is a
loop in G, then w = (u, e, u) is a cycle of length one and, if e
1
= [u, v] and e
2
= [u, v]
are multiple edges in G, then w = (u, e
1
, v, e
2
, u) is cycle of length two in G, but the
closed walk w
= (u, e
1
, v, e
1
, u) is not a cycle since it repeats an edge.
Let G = (V, E) be a graph. For u, v in V , we say v is reachable from u, and write
u

= v, provided there exists a (u, v)-path in G.
1.1. START AT THE BEGINNING 5
Exercise 1.1.1. For any graph G = (V, E), the binary relation

= is an equivalence
relation: it is
reexive: for all u V , u

= u;
symmetric: for all u, v V , if u

= v then v

= u;
transitive: for all u, v, w V , if u

= v and v

= w, then u

= w.
By the Fundamental Theorem on Equivalence Relations, we then know that the
relation

= determines a partition of the vertex set V into equivalence classes. These
equivalence classes are called the components of graph G and have a very natural
interpretation. We say G is a connected graph if every vertex is reachable from every
other (i.e.,

= is just V (G) V (G)). Otherwise, we say G is disconnected. If G is
disconnected, then some subgraphs of G are connected while others are disconnected.
The components of G are easily seen to be the maximal connected subgraphs of G:
a subgraph H of G is a component of G if and only if (i) H is a connected subgraph
and (ii) for any subgraph K of G which contains H as a subgraph, if K is connected,
then H = K.
In various network optimization problems, we are concerned with the prevention
of certain events which threaten to disconnect our graph. Obviously, this is much
easier to achieve if the failure (or loss) of any edge or vertex leaves behind a connected
graph. A vertex is called a cut vertex in graph G if its deletion (together with the
deletion of all edges incident to that vertex) leaves behind a disconnected graph. An
edge is said to be a bridge (or cut edge) if its deletion leaves behind a disconnected
graph. A bridgeless graph (or 2-edge-connected graph) is a connected graph which
has no bridge.
A directed graph (or digraph, for short) is an ordered pair G = (V, A) where V is
a set of vertices or nodes and A is a collection of ordered pairs e = (u, v) of elements
from V , called arcs. If e = (u, v) is an arc in digraph G, we say that v is the head
of e and u is the tail of e; in a drawing, e is represented by an arrow from node u
to node v. Notationally, we write h(e) = v and t(e) = u for e = (u, v). Aside from
this, we apply much the same terminology to digraphs as we do to graphs, with a few
important modications. Most importantly, in a walk, path or cycle
w = (u
0
, e
1
, u
1
, . . . , e
k
, u
k
)
we have that e
i
= (u
i1
, u
i
), i.e., arc e
i
has u
i1
as its tail and u
i
as its head. In a
digraph G, the out-degree (resp., in-degree) of a node u is dened to be the number
of arcs e having t(e) = u (resp., h(e) = u).
Figure 1.2: A digraph with a path from a to e but no path from e to a.
One common task for the graph theorist is to turn an undirected graph into a
directed graph in such a way as to meet certain objectives. For example, we may
want to make all edges in a connected graph into directed edges (or make all streets
one-way in some imaginary city) in such a way as to preserve the existence of a path
from any node to any other. An assignment of direction to each edge in an undirected
graph G replacing each edge e = [u, v] of G by either (u, v) or (v, u) is called
an orientation of G. A strong orientation of G is one in which every v V (G) is
reachable from every u V (G). There is a nice theorem on strong orientations: an
undirected graph G has a strong orientation if and only if G is bridgeless. (Maybe
you can prove this for yourself, if you think quietly for a while with a pen and paper.)
Figure 1.3: A graph G and a strong orientation of G.
While our digraphs G are not symmetric, we can still dene a symmetrized reach-
ability relation on the vertices. If we write u v provided G contains a directed
path from each to the other, then this is an equivalence relation on vertices. The
equivalence classes are called the strong components of G.
1.2. COLORING AND FLOWS 7
Figure 1.4: A graph with a bridge admits no strong orientation.
Figure 1.5: A digraph with three strong components.
1.2 Colorings and nowhere-zero ows
The Four-Color Conjecture, stated in 1852 and solved in 1976, has captured the
imagination of many students of mathematics. It states that every subdivision of the
plane into regions by piecewise linear boundaries has its regions colorable by at most
four colors in such a way that any two regions with a common boundary of positive
length are colored with dierent colors. While the Four Color Theorem (or 4CT)
has no natural practical application, the century-long search for a solution to this
problem generated perhaps the bulk of the theory of graphs, and this has turned out
to have great value in the solution of many other problems.
In spite of the esoteric nature of the 4CT, more general graph coloring prob-
lems have many practical applications and scientists continue to search for ecient
algorithms to color graphs. In this lecture, we will content ourselves with a brief
description of the problems and one application.
A vertex coloring is a coloring of the vertices of a graph in which adjacent vertices
always get dierent colors. Let G = (V, E) be a nite undirected simple graph
and let C be a set of objects which we will call colors. (While re engine red,
hunter green, charcoal and chartreuse would be more imaginative, we typically
use C = 1, 2, . . . , k when k colors are in play.) A proper vertex coloring (or, simply,
a coloring when no confusion is risked) of G with colors in C is a function
c : V C
satisfying c(u) ,= c(v) whenever [u, v] is an edge of G.
Figure 1.6: A bipartite graph is one whose vertices can be colored with two colors.
A graph G is bipartite if its vertices can be colored with two colors. An example
is given in Figure 1.6. It is not hard to prove that a graph G is bipartite if and only
if G has no cycles with an odd number of edges. Bipartite graphs arise frequently in
discrete optimization, such as the problem of optimal assignment of workers to tasks
or the transshipment problem.
In the classic map-coloring problem, the graph to color is not the conguration of
boundaries, but rather an abstract construct in which the regions become the vertices
and adjoining regions are connected by an edge.
1.2. COLORING AND FLOWS 9
But the most prevalent application of graph coloring today is in scheduling prob-
lems. In the simplest form, we have a graph where each vertex is an event which must
be scheduled. Two events which cannot be scheduled at the same time are joined by
an edge. The colors in this scenario are the possible time slots for the events.
For example, suppose we have a university at which nal examinations must be
scheduled. (At many universities, these 3-hour exams are scheduled over a ten-day
period, separated from the end of term by a one-week study period.) Two courses
which have a student in common cannot be scheduled at the same time (in the ideal
scenario) and so the students give us the edges in a graph dened on the set of courses
as vertices.
A more complicated problem (and quite a challenging one in practice) is course
scheduling for a university (or project scheduling at a factory). A full solution to
this problem assigns to each section of each course not only a time slot, but a set of
students, a professor, and a room. Various constraints such as room size, audio-
visual capabilities and handicapped accessibility, instructor expertise and preference,
and student schedules add a complex system of edges to this graph. Rarely is
a proper coloring available and the optimization problem becomes one in which the
number of conicts is to be minimized. Dierent universities handle this in dierent
ways.
Figure 1.7: A three-edge-colorable graph.
An edge coloring (or proper edge coloring, to be precise) of an undirected multi-
graph G is likewise an assignment of colors to the edges of G in such a way that
two edges with a common endpoint receive dierent colors. A nowhere zero k-ow
in a graph G is an orientation of G along with an edge weighting using integers
1, 2, . . . , k 1 which satises conservation of ow at every vertex: the total weight
of the arcs going into node u matches exactly the total weight of the arcs going out
of node u. For example, the graph of the 3-cube admits a nowhere zero 3-ow, but
the famous Petersen graph (with ten vertices, all of degree three, and no cycles of
length less than ve) does not. (It does not even admit a nowhere zero 4-ow.)
Figure 1.8: The Petersen graph is not three-edge-colorable; also, the graph admits
no nowhere zero 4-ow.
1.3 Factors in graphs
Lets nish this section with a survey of substructures in graphs. A matching in a
graph is a collection of edges no two of which share a common vertex. For example, in
a bipartite graph G = (V, E) where the vertices are partitioned into two color classes
V = W T (workers and tasks) and every edge joins one element of W to one
element of T, a matching represents an assigment of some subset of the workers to
some subset of the tasks in such a way that each worker is assigned to at most one
task and each task is matched to at most one worker. (For obvious reasons, problems
of this sort are sometimes called marriage problems, but I wont conjecture which
gender more resembles the set of tasks here.) Let G = (V, E) be a graph and let
M E be a matching. We say M saturates u V if u is the end of some edge
belong to M; an unsaturated vertex is one which is incident to no edge of the
1.3. FACTORS IN GRAPHS 11
matching. A perfect matching in a (not necessarily) bipartite graph G is a matching
which saturates all vertices. The task of nding a perfect matching in a given graph
G or a maximum weight matching in a weighted graph (G, w) is a challenging
computational task that we will address later in the course.
In a graph G = (V, E) with n vertices, a Hamilton cycle
1
is a cycle which visits
every vertex; i.e., a cycle of length n in G. We view such a cycle as a subset C
of the edge set E. Note that if C is a Hamilton cycle, then every vertex in the
subgraph H = (V, C) has degree two but, except when G is small, G typically contains
many other 2-regular spanning subgraphs. Among these, the Hamilton cycle is
distinguished by the fact that it alone is connected. As an example, consider the
Petersen graph. With a bit of work, one is easily convinced that this graph does not
have a Hamilton cycle; but if we delete any vertex whatsoever, the resulting graph on
nine vertices does admit such a cycle. So the Petersen graph is not Hamiltonian, but
any subgraph of it having 9 vertices and 12 edges is Hamiltonian: all such subgraphs
contain a Hamilton cycle.
Let (G, w) be a weighted undirected graph with edge weights w : E R. The
travelling salesman problem (or TSP) for (G, w) is to nd a Hamilton cycle in G
of minimum total weight: H = (V, C) is a Hamilton cycle and, among such, w(C)
is as small as possible. This problem is extremely hard to solve in practice, yet is
closely related to a number of problems of practical importance such as the vehicle
routing problem. Since many graphs do not admit even a single Hamilton cycle, it is
attractive to formulate all travelling salesman problems as problems on a complete
graph.
The complete graph K
n
is a simple undirected graph on n vertices with
_
n
2
_
edges,
one joining each pair of distinct nodes. By allowing + as an edge weight, we can
reformulate the TSP (or the Hamiltonicity question) on any graph with n nodes as
an equivalent TSP on K
n
. We can also consider the Hamilton cycle problem and
TSP on directed graphs, and this reduction to the (directed) complete graph works
in much the same way for these.
A k-factor in a graph G is a spanning subgraph in which every vertex has degree
exactly k. Of course, a graph having any vertex of degree less than k contains no
k-factor. For example, a 1-factor in G is the same as a perfect matching in G and
a Hamilton cycle in G is an example of a 2-factor in G, but not all 2-factors are
Hamilton cycles (consider, for example, two disjoint cycles of length four in the 3-
cube). A 2-factor is equivalent to a collection of cycles in the graph which, combined,
pass through every vertex exactly once. A Hamilton cycle occurs as the special case
of a single cycle achieving this eect: it is a connected 2-factor.
1
These are named in honor of the Irish mathematician and physicist Sir William Rowan Hamilton
who, in 1857, invented a game the Icosian Game based on nding such cycles in graphs.
Figure 1.9: A 1-factor is also called a perfect matching.
To achieve more versatility (and, hence, address a wider array of applications),
we generalize the above notion to a b-factor. Let b : V Z be an integer-valued
function on the vertex set V of a graph G. A b-factor in G is a spanning subgraph
H = (V, S) in which each vertex u V has degree exactly b(u). Clearly, if b(u) < 0 or
b(u) > deg(u) (the degree of vertex u) for any u V , then no such subgraph exists.
But non-existence conditions beyond this get harder and harder to nd. So we also
have on our plate the task of nding good algorithms to nd b-factors in graphs and
minimum/maximum weight b-factors in weighted graphs.
1.4 The Menagerie
At the end of each chapter in these notes, we will describe one or more discrete
optimization problems for which we present no solution. These problems may be
come from applications or may simply be exotic puzzles related to the material in
the chapter. We have three reasons to include such problems. First, it is important
for the student to see that not all problems are neatly described in terms of graphs
or matrices; such a formulation often requires one to be creative or to simplify the
problem. Second, a text can often give the impression that all of the answers are
known, that there is no room left for mathematical research. (On a related note, it
is amazing how many students complete freshman calculus erroneously believing that
formulas for antiderivatives are known for all algebraic functions.) Finally, it is good
1.4. THE MENAGERIE 13
Figure 1.10: This graph has a b-factor for the values b(u) given at its nodes.
to leave a few open problems in order for the talented student to try out her/his skill
at discovering and verifying the correctness of algorithms.
The optimal design of parking lots is an important and complex problem in indus-
try. The designer must consider a multitude of issues, including trac ow, pedestrian
patterns, and zoning rules such as number of trees to be planted per 10,000 square
feet of pavement. A designer can also make aisles (typically 24 feet wide at minimum)
narrower by allowing only one-way trac or placing parking spaces at an angle. (But
then these spaces need to be larger than the standard ones.) We describe only the
simplest version of this problem, ignoring all these issues as well as entrance and exit
locations.
Parking Lot Planning: Maximize the number of parking spaces in a given polygonal
region.
Input: A polygonal region in the plane.
Goal: Maximize the total weight of a legal conguration of rectangles in the region.
A rectangle 36 8.5k has weight (value) 2k and a rectangle 18 8.5k has weight k.
An arrangement of rectangles is legal if no two points in distinct rectangles are less
than 24 units apart.
When raking leaves, one does not wish to rake the same area twice. Leaves
are gathered into piles which are later gathered up by ones children. The optimal
location of these piles depends on the density of leaves across the region: one does
not wish to transport large amounts of leaves over long distances. We simplify this
problem in what, at rst, seems a ridiculous way. We assume that there are a small
number, n, of leaves and an even smaller number, k, of leaf-pile locations. Since we
can approximate the density by a bunch of points marking the centers of clumps,
this is not such a bad approximation.
Raking Leaves: Minimize the amount of raking to move n leaves in the unit square
into k piles.
Input: A set L of n pairs (x
i
, y
i
) of points inside the unit square [0, 1] [0, 1].
Goal: Find a set P of k points in [0, 1] [0, 1] and a function f : L P such that
the raking distance
L
d(, f()) is minimized. (Here, d(, ) is Euclidean distance
in the plane.)
Exercises
Exercise 1.4.1. Prove that if there is a walk from u to v in graph G, then there is a
path from u to v in G.
Exercise 1.4.2. Prove that a graph on n vertices with no cycles has at most n 1
edges. (Note that loops form cycles of length one and parallel edges between a pair of
vertices lead to cycles of length two.)
Exercise 1.4.3. Is it possible to have exactly one cut vertex? Is it possible that every
vertex in graph G is a cut vertex? How about all but one?
Exercise 1.4.4. Prove:
v
deg(v) = 2[E(G)[ where the sum is over all v V (G).
What is the analogous identity for directed graphs?
Exercise 1.4.5. Use the result of the previous exercise to prove the Handshaking
Lemma: In any undirected simple graph, the number of vertices of odd degree is
always even.
Exercise 1.4.6. What is the maximum number of edges in a non-Hamiltonian simple
graph on six vertices?
Exercise 1.4.7. Find the smallest non-bipartite graph that contains no cycles of
length three.
Exercise 1.4.8. Prove that every Hamiltonian graph has a strong orientation.
Exercise 1.4.9. The n-prism is a graph on 2n vertices formed by taking two copies of
a cycle of length n (call these the inside cycle and outside cycle) and joining each
vertex on the inside cycle by a new edge to its corresponding vertex on the outside
cycle. Find an orientation of the 8-prism having strong components of sizes exactly
6, 4, 4, 1 and 1.
Exercise 1.4.10. Find a nowhere zero 4-ow in the graph of Figure 1.7.
Exercise 1.4.11. For the graph in Figure 1.9, nd a function b on vertices satisfying
0 b(v) deg(v) for all vertices v such that no b-factor exists.
e
Two
Trees and the Greedy Algorithm
Nov. 1, 2012
Today we discuss trees and present the greedy method for nding a minimum
cost spanning tree. Trees are important for several reasons, but mainly because a
tree is, in a sense, the simplest or cheapest way to connect up a bunch of nodes in a
network. So we repeatedly nd ourselves needing them, and needing them in a hurry.
The graphs we discuss today will all be undirected. A graph is acyclic if it contains
no cycles. If u and v are vertices in an acyclic graph G and v is reachable from u,
then there is exactly one path in G from u to v. (If there were two or more distinct
paths, then the union of two of these paths would contain a cycle. Think about this.)
An undirected acyclic graph is called a forest. Naturally, a forest consists of a bunch
of trees. A tree is a connected acyclic undirected graph. So each component of a
forest is a tree. An example appears in Figure 2.1.
Figure 2.1: A forest with four components. Each component is a tree.
17
18 CHAPTER 2. TREES AND THE GREEDY ALGORITHM
Lemma 1. In a connected simple graph on n vertices, every spanning tree contains
exactly n1 edges. Any spanning subgraph with fewer than n1 edges is necessarily
disconnected; any subgraph including n or more edges must contain a cycle.
Let G = (V, E) be a graph and let H = (W, S) be a subgraph of G. We say H is a
spanning subgraph of G if W = V ; i.e., H includes all the vertices of G, but perhaps
not all the edges. For example H = (V, ) is a spanning subgraph of G = (V, E)
with no edges and [V [ components of size one. This trivial spanning subgraph is
the starting point of our greedy algorithm. We want to judiciously add edges to this
edge-less subgraph until we arrive at a best spanning tree of G; i.e., a spanning
subgraph which is a tree.
By a weighted graph (or edge-weighted graph) we mean an ordered pair (G, w)
where G = (V, E) is an undirected graph and w : E R is a real-valued function on
the edges. The Minimum Spanning Tree (MST) problem is to nd a spanning tree in
a given weighted graph having smallest possible total weight.
If H = (W, S) is a subgraph of graph G = (V, E) with edge weights w, then the
weight of H is given by
w(H) =
eS
w(e).
Our goal here is to nd, in G, a spanning tree T = (V, S) with the property that
w(T) w(T
)
for any other spanning tree T
in G.
2.1 The greedy algorithm
Here is our rst algorithm to solve this problem: the greedy algorithm.
Kruskals Algorithm
Input: Weighted graph (G, w) with G = (V, E)
Output: Either a subset S E such that T = (V, S) is a minimum weight spanning
tree in G or a report that G is disconnected.
Description: Let n = [V [ and m = [E[. As a pre-processing step, rst sort the
edge set E from lowest to highest weight. In other words, write
E = e
1
, e
2
, . . . , e
m
so that
w(e
1
) w(e
2
) w(e
m
).
2.1. THE GREEDY ALGORITHM 19
Now initialize S = and consider the edges in turn, from e
1
up to e
m
, examining each
edge once. When considering edge e
k
= [u, v], we ask if the current forest F = (V, S)
already contains a path from u to v. If so, we reject this edge; if not, then we
accept this edge and augment S to S e
k
.
If we ever reach [S[ = n 1, we stop and give T = (V, S) as our spanning tree.
If we nish examining all edges and [S[ < n 1, then we report that the graph G is
not connected.
Sometimes when we present an algorithm in English, the devil is in the details.
Here, we have left to the reader the issue of deciding whether or not the current forest
F = (V, S) at some point in the algorithm contains a path from some node u to
another node v. One way to improve this component management is to intuitively
have each component elect a leader node for that component. Then, when edges
e = [u, v] is considered, we can ask whether the components currently containing u
and v have the same leader node. If so, then each is reachable from the other and the
edge e is rejected; if not, then the edge is accepted and we then have the problem of
eciently updating the leader node.
One way to do this is to let the vertex set be identied with the rst few positive
integers, V = 1, 2, . . . , n, and to dene the leader node of a component to be the
smallest vertex in that component (in the natural ordering of integers). Then, we
can dene a function up(v) which is initialized to up(v) = v for all nodes v. When
an edge e = [u, v] is added into S, we look at the larger of the two leader nodes,
say up
(v) up
(u), and update up(up
(v)) to be equal to up(u). Then, when we

later wish to nd the leader node of this component, we start at a node v
0
= v
and iterate v
h+1
:= up(v
h
) until we reach a limit, up
(v
0
) = up(v
h
) = v
h
, which will
hold only when v
h
is the smallest node in its component. (This still allows room for
improvement; periodic tree balancing can help us avoid too many iterations of this
up function.)
The correctness of this algorithm hinges on several basic properties of trees, which
we now present.
Lemma 2. Let G = (V, E) be a nite connected undirected graph and let T = (V, S)
be a spanning subgraph of G. Then any two of the following properties imply the third:
T is acyclic
T is connected
T has [V [ 1 edges.
Conversely, if T is a spanning tree of G, then all three of these properties hold.
Proof: Exercise.
The next lemma is sometimes called the Exchange Axiom since it plays a role
in the denition of a matroid.
Lemma 3. Let G = (V, E) be a connected undirected graph and let T = (V, S) be a
spanning tree in G. If e is any non-tree edge (i.e., e E S), then the subgraph
T +e := (V, S e)
contains exactly one cycle. Moreover, if e
is any edge of this cycle, then the subgraph

T
= (V, (S e) e
) is also a spanning tree.

Proof: Exercise.
As an example, we briey summarize the progress of Kruskals algorithm in Figure
2.2.
Figure 2.2: Given this graph, the greedy algorithm chooses edges [b, c], [c, e], [d, h],
[g, j], [a, b], [c, g], [e, h], [f, i] and [c, f] rejecting, along the way, edges [b, e] and [a, d].
Now we prove that the greedy algorithm performs as promised. As in the descrip-
tion of the algorithm, let the edge set E = e
1
, . . . , e
m
be ordered so that
w(e
1
) w(e
2
) w(e
m
)
2.2. PRIMS ALGORITHM 21
and let T
= (V, S
) be the spanning tree produced by Kruskals algorithm. Let

T = (V, S) be any other spanning tree in graph G and let e
j
be chosen so that
j := minh : e
h
S S
.
We ask why the greedy algorithm did not choose edge e
j
. Of course, the edge was
rejected because its introduction would have created a cycle when added to the forest
existing at iteration j of the method. But then aside from edge e
j
itself this
cycle consists only of edges from the set e
1
, . . . , e
j1
. And all of these edges e
h
enjoy the property that w(e
h
) w(e
j
). Since T does not contain a cycle, there must
be some edge e
i
in this cycle (i ,= j) which does not belong to T. So build a new tree
T
from T
by replacing edge e
i
with edge e
j
. By the above lemma, this is again a
spanning tree. Since w(e
j
) w(e
i
), we have w(T
) w(T
). And T
has one more

edge in common with T than does our greedily-constructed tree T
. Repeating this
exchange process, we obtain a sequence of spanning trees
T
, T
, T
, . . .
each one having more edges in common with T than the previous one and having
weights
w(T
) w(T
) w(T
) . . .
Since T has nitely many edges, we eventually arrive at T and this string of inequal-
ities gives us w(T
) w(T) as claimed.
2.2 Prims Algorithm
In certain applications, it is natural to build a spanning tree starting from some root
vertex so that the edges chosen so far at any point in the algorithm induce a connected
acyclic subgraph.
Prims Algorithm
Input: Weighted graph (G, w) with G = (V, E) and r V
Output: Either a subset S E such that T = (V, S) is a minimum cost spanning
tree in G or a report that G is disconnected.
Description: Initialize U = r and S = . As long as there is an edge with one
end in U and one end in

U = V U,
nd the smallest weight edge e = [u, v] with one end u U and the other, v, in
U
Update S to S e
Update U to U v
It is fairly clear that the algorithm produces a forest T = (V, S) with [S[ = [U[1.
If Prims algorithm terminates with U ,= V , then there are no edges between U and
U and G contains no spanning tree. But when graph G is connected, this algorithm
can be shown to produce a spanning tree of minimum total weight. The proof is left
as an exercise.
Prims algorithm must also be implemented with careful thought. Heineman,
Pollice and Selkow note that a priority queue is the natural choice of data structure
to maintain a list of vertices not yet in the tree, together with a regularly updated
measure of the smallest weight edge from each vertex to a vertex in U. They then
point out that, since all nodes are initially in the queue and no node is ever added
back into the queue, a binary heap data structure will suce for this application.
2.3 The Menagerie
Let us instead consider instead a directed graph G = (V, A) on n vertices and ask
what is the directed analogue of a spanning tree? There are two possible answers
here.
An arborescence in G with root r V is a set S A of n 1 arcs in G such that
the subgraph (V, S) contains a unique directed path from node r to any other node
in G. Given a weighted directed graph G with edge weights w and a node r V , we
may ask for a minimum weight arborescence rooted at r.
A digraph G is strongly connected if it contains a directed path joining any node
to any other. (So G is strongly connected if and only if it contains an arborescence
rooted at r for every vertex r V .) A natural question to ask is how to join up
all of these pairs of vertices in the cheapest possible way. Given that G is strongly
connected, we may ask for a minimum weight subgraph with the property that this
subgraph contains a directed path from any node r to any node t in G. This is really
a dierent problem from the arborescence problem; while an arborescence contains
exactly n 1 arcs, we do not know how many arcs belong to the optimal subgraph
in this case.
We nish with a combinatorial game. In the Shannon switching game, a graph
and a pair of nodes X and Y in that graph are specied. Two players Short and
Cut take turns choosing edges from the graph. Edges chosen by Short are forever
secured and cannot thereafter be deleted; edges chosen by Cut are delete and cannot
be subsequently secured. If, at some point, the subgraph secured by Short contains
a path from X to Y , Short wins. On the other hand, if at some point the graph
becomes disconnected, with X and Y in dierent components, then Cut wins.
Figure 2.3: Can you nd a minimum weight strongly connected subgraph in this
digraph?
Exercises
Exercise 2.3.1. Apply Kruskals algorithm to nd a minimum cost spanning tree in
the graph in Figure 2.4.
Exercise 2.3.2. Apply Prims algorithm, starting from node A, to nd a minimum
cost spanning tree in the graph in Figure 2.5.
Exercise 2.3.3. Consider once again Shannons switching game. A well-known the-
orem states that, if G contains a pair of edge disjoint spanning trees, then for any
vertices X and Y in the graph, Short has a winning strategy. So who wins for the
various choices of X and Y in the graph of Figure 2.6?
Figure 2.4: Apply Kruskals algorithm
Figure 2.5: Apply Prims algorithm starting at node A.
Figure 2.6: Play Shannons switching game.
Three
Basic Search Trees
Nov. 6, 2012
Today we discuss a generic search algorithm for a connected undirected graph and
show how it specializes to the famous breadth-rst search and depth-rst search
algorithms in computer science.
Let us use the informal term bag to mean simply a set. We will later compare
two ways to move items in and out of our bag.
3.1 Generic Search
Generic Search Algorithm
Input: Connected graph G = (V, E) with root node r
Output: A subset S E such that T = (V, S) is a spanning tree in G.
Description: Start with S = . Throughout the algorithm, vertices will be split
into three groups: exhausted vertices, vertices in the bag, and unvisited vertices.
Initially, only the root node r is in the bag and all other vertices are marked unvis-
ited.
As long as there is something in the bag, do the following:
consider some node u in the bag
see if there is an edge e = [u, v] such that v is unvisited
if so
put the edge e into the tree: augment S to S e
26
3.2. BREADTH-FIRST AND DEPTH-FIRST SEARCH 27
put v into the bag
if there is no such edge e, then mark vertex u as exhausted and move u out
of the bag.
This algorithm gives the computer programmer a great deal of freedom in imple-
mentation. Many choices need to be made and these choices achieve dierent desired
eects. But, in any case, we can prove that the algorithm does indeed return a span-
ning tree when applied to a connected graph G. Every time a node v is moved into
the bag, the number of edges in S increases by one. We claim that every node other
than the root r gets moved into the bag at some point and this gives [S[ = [V [ 1.
Moreover, when a node v is moved into the bag, the current subgraph (V, S) contains
a path from v to the root r (why?) and so the algorithm ends with a connected
subgraph having [V [ 1 edges. By Lemma 1, this subgraph is a spanning tree.
Now suppose, by way of contradiction, that some vertex v ,= r never enters the
bag. Lets look at the set W of vertices which at some point are in the bag. (In class,
I called this bag T, but that is not important.) Since we are assuming that W ,= V
and that the graph is connected, there exist vertices w, x with the property that
w W, x , W and [w, x] is an edge. Now examine the point in the algorithm where
node w is declared to be exhausted. At this point, x must be labelled unvisited
and therefore the edge e = [w, x] is considered by the algorithm and accepted into the
tree, thereby moving x into the bag. This contradicts our assumption that x , W.
3.2 Breadth-rst and depth-rst search
Now we see what happens when the bag is implemented as a queue. A queue is
a rst-in-rst-out (or FIFO) data structure that implements a set. If we view the
elements in a queue as ordered horizontally, from left to right, an element is always
added at the right (at the end of the line) and when we consider, or remove, an
element from the queue, we always choose the leftmost member.
Breadth-First Search (bfs) Algorithm
Output: A subset S E such that T = (V, S) is a breadth-rst search tree rooted
at r in G.
Description: Start with S = . Initially, only the root node r is in the queue and
all other vertices are marked unvisited.
As long as the queue is non-empty, do the following:
consider the rst node u in the queue
28 CHAPTER 3. BASIC SEARCH TREES
if so
put v into the end of the queue
if there is no such edge e, then mark vertex u as exhausted and move u out
of the queue.
We have already proved above that this algorithm produces a spanning tree. This
sort of tree has some special properties. We may partition the vertex set into layers
L
0
, L
1
, L
2
, . . . as follows. Initialize L
0
= r and L
k
= for k > 0. When an edge
e = [u, v] is moved into S, one end of e node u, say is already in the queue and
belongs to some layer L
i
. So we put the other end v into layer L
i+1
. In this manner,
each node of the connected graph G ends up in a unique set L
k
and the L
k
form a
partition of set V .
Lemma 4. Let G = (V, E) be a connected undirected graph, let r V , and let
T = (V, S) be the breadth-rst search spanning tree rooted at r in G. Then
if node v in G belongs to layer L
k
, then the shortest (r, v)-path in G (in terms
of number of edges) has length k and one such path is the unique (r, v)-path in
T.
every non-tree edge in G joins vertices in the same layer L
k
or in consecutive
layers L
k
and L
k+1
.
Proof: The rst part follows by induction. Of course, there is a path with zero edges
from r to r and L
0
contains only r. If e = [u, v] is entered into S and v is entered into
the queue, then the tree at this point contains a path of length one from u to v. By
induction, we assume that u L
k
for some k and that the tree contains a shortest
path from r to u of length k. Appending edge e to this path gives a path of length
k + 1 from r to v and we do indeed have v L
k+1
.
Now suppose that v is a vertex in some layer L
k
and G contains and (r, v)-path
using less than k edges. Among such vertices, lets focus on one for which k is as
small as possible.
r = u
0
, e
1
, u
1
, e
2
, u
2
, . . . , e
, u
= v
is a path from r to v of length < k, then the subpath
r = u
0
, e
1
, u
1
, e
2
, u
2
, . . . , e
1
, u
1
3.2. BREADTH-FIRST AND DEPTH-FIRST SEARCH 29
is a path from r to v
:= u
1
of length 1. By minimality of k, we must have
v
L
1
. But 1 is less than k and so our examination of node v
would consider
the edge e = [v
v] forcing v into L
and not L
k
, a contradiction.
Now the last part of the proof follows since any non-tree edge e = [u, v] can be
appended to a shortest path from r to u (or r to v, if v is closer to the root) to get a
path from r to v (resp. to u). If u L
k
and v L
, we have without loss of generality

k . So there is a path of length k from r to u and this yields a path of length
k + 1 from r to v. Since is the minimum number of edges in a shortest (r, v)-path,
we have k k + 1.
Next, we consider implementing our generic bag as a stack. A stack is is a last-
in-rst-out (or LIFO) data structure that implements a set. If we view the elements
in a stack as piled vertically, from bottom to top, an element is always added at the
top and when we consider, or remove, an element from the stack, we again choose
the top element.
Depth-First Search (dfs) Algorithm
Output: A subset S E such that T = (V, S) is a depth-rst search tree rooted at
r in G.
Description: Start with S = . Initially, only the root node r is on the stack and
all other vertices are marked unvisited.
As long as the stack is non-empty, do the following:
consider the top node u on the stack
if so
push v onto the top of the stack
if there is no such edge e, then mark vertex u as exhausted and move (or
pop) u o the stack.
Again, our proof for the generic case shows that this algorithm produces a span-
ning tree. This sort of tree has its own special properties. We dene an ancestor
relation on V : say that node u is an ancestor of node v if u lies on the unique (r, v)-
path in our DFS tree T. So, for example, each node is an ancestor of itself and r is
an ancestor of v for every v since G is connected.
Lemma 5. Let G = (V, E) be a connected undirected graph, let r V , and let
T = (V, S) be the depth-rst search spanning tree rooted at r in G. Then
when a node u is marked exhausted by the algorithm, every node having u as
an ancestor is also exhausted;
every non-tree edge in G joins some vertex v to an ancestor u of that vertex.
Proof: Exercise.
In Figure 3.1, we give a sketch of a graph and in Figure 3.2, we give the two
search trees that our algorithms produce from this graph. In order to get a well-
dened answer, we adopt the following convention. When a query is made to the
data structure describing a graph, such as Give me an edge with endpoint u, the
edge e = [u, v] with node label v smallest is returned rst. The next time we make
the same query, the second smallest possible v is chosen, and so on. We adopt this
convention in the exercises below.
Figure 3.1: A graph to be searched eciently. Use 1 as root node.
Figure 3.2: Depth-rst and breadth-rst search trees based at root node 1 for the
graph in Figure 3.1.
3.3 The Menagerie
Ecient search is a ubiquitous problem in computing. Every day, various businesses
are looking for more ecient ways to search graphs, to search the world-wide web,
to search databases. There is a substantial literature on all these topics. So we can
wander o in any number of directions with our excursion here.
An interval graph is a graph each of whose vertices v
i
is identied with some inter-
val [a
i
, b
i
] on the real number line. Adjacency is dened by non-empty intersection:
v
i
is adjacent to v
j
if the intervals [a
i
, b
i
] and [a
j
, b
j
] have a point in common. In this
case, can you nd a more ecient way to visit every vertex than using DFS or BFS?
Exercise 3.3.1. Find breadth-rst and depth-rst search trees in the graph of Figure
3.3, starting at vertex 1.
Exercises
Exercise 3.3.2. Homework problems go here, eventually.
Figure 3.3: Compute BFS and DFS trees. Use 1 as root node.
Four
Shortest Path Problems
Nov. 8, 2012
In todays class, we look at the problem of nding the shortest path in a weighted
directed graph from a specied origin to a specied destination. We also look at some
variations on this problem without giving algorithms for their solution.
4.1 The Landscape of Problems
A path from node r to node t in a graph G = (V, E) (or a digraph G = (V, A)) is a
sequence
P : r = u
0
, e
1
, u
1
, e
2
, u
2
, . . . , e
k
, u
k
= t
which alternates between vertices u
i
and edges/arcs e
i+1
in such a way that
e
i
= [u
i1
, u
i
] in the undirected case (e
i
= (u
i1
, u
i
) in the directed case);
the vertices u
0
, . . . , u
k
are all distinct.
In a weighted graph or digraph, we aim to nd a path from a node r to a node t of
minimum total weight. So we have a weight function w : E R on edges E (or on
arcs, A, in the directed case) and we wish to minimize the length or weight of the
path
w(e
1
) +w(e
2
) + +w(e
k
).
There are a number of choices one must make in clearly dening a shortest path
problem:
Is the graph directed or undirected?
Is the graph nite or innite?
33
34 CHAPTER 4. SHORTEST PATH PROBLEMS
Do we allow negative edge weights?
Do we allow negative-length cycles?
Do we seek shortest paths between all pairs of vertices? From one vertex to all
others? Or just from one origin r to one destination t?
Do we need only one path between r and t or would we prefer to nd all such
paths of shortest length?
Do we insist on a correct answer or are we willing to allow some probability
that the path found is not shortest or that no path is found even if one exists?
The various answers to these questions lead to a range of algorithms and to subprob-
lems of varying hardness. The rst algorithm we explore is the most important one.
Dijkstras algorithm, published in 1959, takes as input a weighted nite digraph with
non-negative arc weights and a root node r. For this digraph, the algorithm nds
shortest paths from r to all nodes reachable from r. This algorithm is easily seen
to work for the undirected case; other extensions will be discussed after we give its
proof of correctness.
4.2 Dijkstras algorithm
One of the most popular algorithms in computer science, used in many industries,
many times a day (perhaps even millions of times per second, if we combine internet
packet routing and GPS systems), is Dijkstras algorithm for shortest paths.
The algorithm, in its simplest form, works on a digraph with a root node r and
computes shortest paths from r to all other vertices reachable from r. We assume
that all edge weights are non-negative. The algorithm also works just ne on an
undirected graph if we replace each edge [u, v] by the two arcs (u, v) and (v, u). In
many applications, we seek only a shortest path from r to a single node t in the
digraph; in this case, we can easily stop the algorithm when t becomes permanently
labelled (as dened below).
In our view of Dijkstras algorithm, we maintain a laminar partition of the vertex
set; in digraph G = (V, A), we have
V = T

T

|,
a disjoint union of three sets: the permanent set T, the frontier T, and the
unvisited set |. At each iteration, one node moves from the frontier T to the
permanent set T and this is repeated until all nodes are in T (or, in the single-path
case, the target vertex t belongs to T).
4.2. DIJKSTRAS ALGORITHM 35
Figure 4.1: Conceptual diagram of vertex partition in Dijkstra algorithm.
The algorithm constructs a shortest path tree T rooted at a node r. This tree
includes a shortest path from r to every node in the digraph which is reachable from
r.
Dijkstras Algorithm
Input: Digraph G = (V, A) with arc weights w : A R and root node r
Output: A shortest path tree rooted at r together with a length function (v) which
gives the length of a shortest path in G from r to v for every node v V reachable
from r.
Description: Start with
T = , T = r, | = v V [v ,= r .
Dene (r) = 0 and (v) = + for each v ,= r. Initially, pred(v) is undened for each
vertex v.
As long as the frontier T is non-empty, do the following:
choose a node u T with (u) as small as possible.
for each arc e = (u, v) for which v T |, do the following:
if (u) +w(e) < (v), then
set (v) := (u) +w(e)
set pred(v) = u
put v into T if its not already in
once every such arc out of u has been considered, move u into set T.
Figure 4.2: Example for Dijkstras algorithm.
Lets execute this algorithm on a small example. Consider the weighted digraph
G = (V, A) shown in Figure 4.2. Starting at root node r, we carry our Dijkstras
algorithm and the various values computed by the algorithm are collected in the
following table:
() pred()
T T | r a b c d r a b c d
0 r a, b, c, d 0
1 r a, b c, d 0 4 7 r r
2 r, a b, c d 0 4 6 12 r a a
3 r, a, b c, d 0 4 6 11 9 r a b b
4 r, a, b, d c 0 4 6 10 9 r a d b
5 V 0 4 6 10 9 r a d b
Note that, since arc weights are assumed to be non-negative, no values can ever be
updated in the last iteration. So we can terminate the algorithm either when T =
or when | = and T contains only one node.
4.3. PROOF OF CORRECTNESS 37
4.3 Proof of correctness
We want to be sure that our algorithms are mathematically correct. Devising a proof
of correctness not only gives us condence that the process is reliable, but helps us
understand why it works and thereby guides us as we seek to invent algorithms of our
own.
Theorem 6. Let G = (V, A) be a digraph with non-negative arc weights w and
designated root node r. Upon termination of Dijkstras algorithm
(a) the set T contains all nodes reachable from r in G;
(b) for v T, (v) gives the length of a shortest (r, v)-path in G
(c) the tree T = (T, S) where S = (u, v) : v T r, u = pred(v) is a shortest
path tree in G rooted at r.
Proof: We prove only statement (b) about function , leaving the other parts as
exercises for the reader.
For v T, let d(v) denote the length of a shortest path from r to v in G. We
prove that (v) = d(v), using induction on the order in which nodes enter the set T.
The base case for this induction is v = r and, since (r) is initialized to zero (and
there is no point in the algorithm where any value is increased), we have (r) = d(r)
at termination.
Now suppose we are at some stage in the execution of the algorithm and node v is
about to be moved into the permanently labelled set T. (This means that (v) (u)
for all nodes u T at this point.) We now prove that, at this point in the execution
of the algorithm, (v) = d(v). Assume, by way of contradiction, that d(v) < (v).
Consider a shortest (r, v)-path in G:
r = u
0
, e
1
, u
1
, . . . , e
k
, u
k
= v.
Since any subpath of a shortest path is also a shortest path, we have d(u
h
) = w(e
1
) +
+ w(e
h
) for 1 h k. Let u
j
be the last node along this path that enters T
before v does:
j := max h[0 h < k, u
h
T .
Write u = u
j
and u
= u
j+1
. Then we have d(u
) = d(u) + w(e) where e = (u, u
).
By the induction hypothesis, d(u) = (u). And, at any point in the algorithm,
d(u
) (u
). So, just before u is moved to set T, arc e is examined and we are

assured that
(u
) (u) +w(e) = d(u) +w(e) = d(u
).
So (u
) = d(u
) and u
,= v since we are assuming d(v) < (v). When v is selected

by the algorithm, we have u
T with (u
) = d(u
) d(v) < (v), contradicting the

choice of v over u
by the algorithm. This shows that our assumption d(v) < (v) is
false and, by induction, we are done.
Note. Throughout the proof, we have relied heavily on the assumption that no arc
weight is negative.
Note. If our goal is simply to nd a shortest path from the root node r to a specic
node t, the proof shows that we can stop once we have t T.
As an exercise, the reader is asked how one might adapt this algorithm to nd a
shortest (r, t)-path in an innite graph. Assume that V is an innite set, but that
each node has only nite out-degree; so when we examine u T, there are only
nitely many v with (u, v) an arc. Also assume that there is a path from r to t (i.e.,
one using only a nite number of arcs).
4.4 Other algorithms for shortest paths
As mentioned in the previous section, Dijkstras algorithm is easily adapted to handle
undirected graphs with non-negative edge weights. If some edges or arcs have nega-
tive weights, then the problem of nding shortest paths (a path having no repeated
vertices) can become quite dicult to solve. In particular, if we allow negative length
cycles, then the shortest path problem becomes NP-complete. A cycle C in a directed
graph with arcs e
1
, e
2
, . . . , e
k
has weight w(C) = w(e
1
) + + w(e
k
) and is called
a negative length cycle if w(C) < 0. Clearly the existence of such a cycle leads to
the existence of walks of length less than n from node r to node t for any (negative)
integer n when some (r, t)-path in G passes through some vertex on this cycle. The
presence of such walks makes it harder to nd an optimal path.
The Bellman-Ford algorithm (Shimon Even calls this Fords Algorithm) works
on an edge-weighted digraph G = (V, A) with a root node r and allows negative-length
edges, provided there is no negative length cycle in G.
Bellman-Ford Algorithm
Input: Digraph G = (V, A) with arc weights w : A R and root node r
Output: A length function (v) which gives the length of a shortest path in G from
r to v for every node v V reachable from r, together with a predecessor function
which describes a shortest path to each such v.
Description: Start with (r) = 0 and (v) = + for v ,= r. Initially pred(v) is
undened for all v.
4.4. OTHER ALGORITHMS FOR SHORTEST PATHS 39
As long as there is an arc e = (u, v) with (u) +w(e) < (v),
update (v) to (u) +w(e)
set pred(v) = u.
Thats all there is to it. One unimaginative way to implement this algorithm is to
order the arc set A = e
1
, e
2
, . . . , e
m
and pass through these arcs in order, checking
the condition for each one, as many times as needed. (The absence of negative length
cycles guarantees that this process eventually terminates.) The value of this simple
version is rst that it shows that the algorithm has running time O([V [ [A[), but
also that it leads to a proof, by induction on the number of arcs in a shortest path,
that the algorithm is correct. The details are left to the exercises.
Finally, let us mention the all-pairs shortest path problem. Given a network,
we are often tasked with nding a distance matrix for the graph. This matrix has
rows and columns indexed by the vertices and (u, v)-entry equal to the length of a
shortest path from u to v in the digraph. Note that this need not be a symmetric
matrix (unless G is an undirected graph, for example). Also, the algorithm below
gives only the length of a shortest path; as an exercise, the student is asked to devise
a modication which also builds a matrix that indicates which route to take for every
choice of u and v.
Floyds Algorithm
Input: A nite directed graph G = (V, A) with non-negative arc weights w : A R
Output: A distance function d : V V R such that d(u, v) is the length of a
shortest (u, v)-path in G for all u, v V .
Description: Order the vertices V = v
1
, . . . , v
n
in any way. For each u, v V ,
dene d
k
(u, v) to be the lenth of a shortest (u, v)-path passing only through vertices
in the set
v
1
, . . . , v
k
u, v.
Of course, the initial values are given by
d
0
(u, v) =
_
w(e) if e = (u, v) A
otherwise.
since d
0
(u, v) optimizes only over paths using vertices u and v and no other vertices.
Now, for k = 1, 2, . . . , n, we build the function d
k
from the previous one, d
k1
.
For each u and each v in V , we compute
d
k
(u, v) := min
_
d
k1
(u, v) , d
k1
(u, v
k
) +d
k1
(v
k
, v)
_
.
[Interestingly, this can be phrased in terms of tropical arithmetic, an algebraic
system which is rapidly gaining interest in the mathematical community.]
At the end, we have d(u, v) = d
n
(u, v) for each pair of vertices u and v.
After initialization, this algorithm has an outer loop which is executed over n
iterations. In each iteration, we must perform a comparison and update for each
pair of vertices. So each iteration requires a constant times n
2
steps. Overall, this
algorithm has running time O(n
3
); that is, except for very small values of n, the
number of basic computational steps is bounded above by a constant times n
3
where
the graph has n vertices. The proof that it correctly nds distances is a proof by
induction.
4.5 The Menagerie
The bi-directional path problem involves graphs whose edges have local orientations
at both of their endpoints. So an edge e = [u, v] can be directed into or out of u and,
independently, directed into or out of v. So there are four ways to attach these two
arrows to edge e. A bidirectional path from node r to node t in such a graph is a
sequence
r = u
0
, e
1
, u
1
, . . . , e
k
, u
k
= t
alternating between vertices and edges in such a way that
edge e
1
is directed out of r
edge e
k
is directed into of t
at each internal node u = u
i
, either e
i
is directed into u while e
i+1
is directed
out of u or e
i
is directed out of u while e
i+1
is directed into u.
(Note that we allow node repetition in such paths.)
In Figure 4.3, we give an example of a bidirectional graph problem. In this
example, there is a bidirectional path from A to H, and a dierent bidirectional path
from H to A.
A graph-theoretic topic of current research is the Stackelberg shortest path prob-
lem. Here, we imagine ourselves as making a prot from some subset of the arcs in
the graph and we want to attract customers who simply nd shortest paths from
their origin to their destination regardless of whether the arcs they use belong to us
or to our competitor to use our subnetwork as much as possible. Lets now try to
make this precise.
Suppose G = (V, A) is a directed graph with origin r and destination s specied.
Suppose that the arc set is partitioned into two sets, A
F
(xed price arcs) and A
P
Figure 4.3: A bidirectional graph.
(priceable arcs). We are given a weight function only on the xed price arcs,
w : A
F
R. The problem is to choose prices w(e) [ e A
P
in such a way as
to maximize the sum of w(e) over those e lying in both A
P
and in the arc set of a
shortest (r, s)-path in G. (For simplicity, if several shortest paths exist, we consider
the one which maximizes our revenue.)
For example, consider the graph shown in Figure 4.4 where A
P
= (a, b), (a, c), (c, e).
Figure 4.4: A Stackelberg shortest path problem.
Clearly, the optimal revenue we can obtain is 15 units and this is achieved by
choosing weights
w(a, b) 5, w(c, e) 7, w(a, c) +a(c, e) = 15.
Exercises
Exercise 4.5.1. In the weighted digraph of Figure 4.5, apply Dijkstras algorithm to
nd a shortest path tree rooted at node r. For each iteration of the algorithm, show
the partition T, T, | as well as the values of functions () and pred().
Figure 4.5: Use the Dijkstra algorithm to nd shortest paths from node r to all nodes.
Exercise 4.5.2. In the weighted digraph of Figure 4.6, construct a shortest path tree
rooted at node A.
Exercise 4.5.3. In the partially weighted digraph of Figure 4.7, solve the Stackelberg
shortest path problem for origin r and destination x for all vertices x. For which
vertices is the value of the game unbounded? For which vertices are the edge weights
w(e) : e A
P
irrelevant?
Exercise 4.5.4. Prove statement (a) of Theorem 6: for any vertex v, we have v T
at the end of the algorithm if and only if v is reachable from r in graph G.
Exercise 4.5.5. Prove statement (c) of Theorem 6: in the tree T = (T, S) where
S = (u, v) : v T r, u = pred(v) every path from r to any node v is a shortest
path (r, v)-path in G.
Figure 4.6: Find shortest paths from node A to all nodes.
Exercise 4.5.6. Prove the correctness of the Bellman-Ford algorithm. If d(v) denotes
the true length of a shortest (r, v)-path in G, your induction hypothesis should be:
assume that, after iteration k, (v) = d(v) for any v reachable from r via some
shortest path using k or fewer edges.
Exercise 4.5.7. Prove the correctness of Floyds algorithm. Your induction hypoth-
esis should be: assume that, after iteration k, it holds for every pair of vertices u and
v for which a shortest (u, v)-path exists using only vertices u, v and v
1
, . . . , v
k
that
d
k
(u, v) is the true distance from u to v in G.
Exercise 4.5.8. Describe how to modify Dijkstras algorithm to nd shortest paths
in an innite graph. Assume that each vertex has nite out-degree and that, for any
vertex u, you have oracle access to the list of edges e[t(e) = u as well as their
weights.
Exercise 4.5.9. Describe how to modify Floyds algorithm to record actual routing
information for shortest paths in addition to their length.
Figure 4.7: Solve the Stackelberg shortest path problem with origin r.
Five
A Crash Course in Linear Programming
Nov. 12, 2010
In todays class, we try to get a conceptual view of a beautiful subject which
is integrally related to our study of discrete optimization. Linear programming is
one of the most powerful pieces of twentieth century applied mathematics. Yet its
main algorithm is a simple adaption of Gauss-Jordan reduction. It is hard to over-
estimate the economic impact of linear programming: the subject has applications
in practically all scientic, business and engineering disciplines. But well have to
discuss this elsewhere; we have time only for a brief overview.
Linear programming aords a powerful duality theory that both explains and
guides a number of discrete algorithms. The theorems of Weak Duality, Strong Du-
ality and Complementary Slackness serve as unifying themes for the introduction of
dual variables in combinatorial algorithms, local improvement rules, and stopping
conditions. Our primary goal here is to survey these highlights of the theory in rela-
tion to the topics in our course. In particular, we aim to encapsulate strong duality
and complementary slackness into simple forms that can be applied as needed.
5.1 Linear programming problems
We consider problems in which we are to maximize or minimize a linear function over
all the non-negative solutions x to a linear system Ax = b. Of course, minimizing a
dot product
c
x = c
1
x
1
+c
2
x
2
+ +c
n
x
n
is the same as maximizing c
x, so there is no loss in restricting our attention

to minimization problems only (or maximization, as we choose to do in the proofs
below).
45
46 CHAPTER 5.
In the above paragraph, we are using matrix and vector notation. We have an
mn matrix A = [a
ij
] and three column vectors
x =
_
_
x
1
x
2
.
.
.
x
n
_
_
, c =
_
_
c
1
c
2
.
.
.
c
n
_
_
, b =
_
_
b
1
b
2
.
.
.
.
.
.
b
m
_
_
of length n, n and m, respectively. So our denition of a linear programming problem
(or LP, for short) in equality form is
min c
x subject to Ax = b, x 0
where the last inequality encodes the conditions that all variables x
j
are non-negative.
The linear function f(x) = c
x is called the objective function and the equations

Ax = b, together with the inequalities x 0 are called the constraints of the problem,
these latter ones being the non-negativity constraints.
The matrix equation Ax = b encodes a set of m linear equations
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
= b
i
in the variables x. Such an equation can be equivalently expressed as a set of two
inequalities
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
b
i
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
b
i
or
a
i1
x
1
a
i2
x
2
a
in
x
n
b
i
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
b
i
In this manner, any linear system of the form Ax = b can be expressed as a system
of linear inequalities Ax b (for a dierent matrix A and right-hand side vector b,
of course!).
A linear programming problem in standard form is a problem expressible as
min c
x subject to Ax b, x 0.
This is the most common form of LP studied. But by introducing extra variables that
take up the slack between the right-hand side and the left-hand side, it can easily be
5.2. SHORTEST PATH 47
converted to a problem in equality form. (We will need to use these so-called slack
variables in the proof of the Strong Duality Theorem below.)
Given the above LP, a vector x is called a feasible solution for this problem if
it satises the constraints Ax b and x 0. The set of all feasible solutions is
called the feasible region. Geometrically, this is a polyhedron; it is a convex subset
of Euclidean space R
n
with at sides and typically has a nite number of corners,
called vertices. Examples of polyhedra are convex polygons in the plane, innite
wedges in the plane and the ve platonic solids: the tetrahedron, the octahedron, the
cube, the icosahedron and the dodecahedron.
5.2 The shortest path problem
Lets next look at a simple example of a linear programming problem that arises in
discrete optimization.
Consider the digraph G = (V, E) with V = r, a, b, t, E = (r, a), (r, b), (a, b),
(a, t), (b, t) and arc weights
e (r,a) (r,b) (a,b) (a,t) (b,t)
w(e) 2 5 2 4 1
Figure 5.1: Digraph G for the shortest path problem and the feasible region.
The problem of nding a shortest path from r to t in this digraph is formulated
as a linear programming problem as follows. Introduce one variable x
e
for each arc e,
with the interpretation x
e
= 1 if arc e lies on the shortest path and x
e
= 0 otherwise.
The path must include exactly one arc out of the origin node r, so we have
x
(r,a)
+x
(r,b)
= 1.
48 CHAPTER 5.
At nodes a and b, the path can only enter if it leaves:
x
(r,a)
x
(a,b)
x
(a,t)
= 0;
x
(r,b)
+x
(a,b)
x
(b,t)
= 0.
Finally, the path must include exactly one arc into the terminal node of the path, t:
x
(a,t)
+x
(b,t)
= 1.
So we arrive at the linear formulation
minimize 2x
(r,a)
+ 5x
(r,b)
+ 2x
(a,b)
+ 4x
(a,t)
+ 1x
(b,t)
subject to x
(r,a)
x
(r,b)
= 1
x
(r,a)
x
(a,b)
x
(a,t)
= 0
x
(r,b)
+ x
(a,b)
x
(b,t)
= 0
x
(a,t)
+ x
(b,t)
= 1
x
(r,a)
, x
(r,b)
, x
(a,b)
, x
(a,t)
, x
(b,t)
0
The feasible region for this LP is given in the above gure; observe that this triangular
region belongs to a 2-dimensional subspace of a 5-dimensional space and the three
vertices of the polyhedron correspond to the three paths from r to t in digraph G.
In matrix form, the above LP is expressed
min c
x subject to Ax = b, x 0
where we simplify x = [x
1
, x
2
, x
3
, x
4
, x
5
]
and have
c = [2, 5, 2, 4, 1]
b = [1, 0, 0, 1]
and
A =
_
_
1 1 0 0 0
1 0 1 1 0
0 1 1 0 1
0 0 0 1 1
_
_
.
This matrix A is known as the incidence matrix of the digraph G. It has one row
for each vertex, one column for each arc and exactly two non-zero entries in each
column, a +1 marking the head of the arc and a 1 marking the tail of the arc.
The incidence matrix of a digraph has very special structure; in particular, every
vertex of the feasible region for this problem has integer coordinates. (This is quite a
remarkable phenomenon, but we wont have time to prove it, unfortunately. It hinges
on the equally amazing fact that any square submatrix of A has determinant 1, 0 or
1.)
5.3. LP ALGORITHMS 49
5.3 Linear programming algorithms
In 1947, mathematician George Dantzig introduced a method for nding optimal
solutions to linear programming problems. This simplex method is very simple indeed.
Algebraically, we row reduce the linear system Ax = b just as in our linear algebra
class, and this gives an equivalent linear system A
x = b
where A
has form [I[N] and

the solutions are easy to read o. Now, depending on a row-reduced version of vector
c, we iteratively re-order the variables by moving one attractive variable from the
N side to the I side and moving a less attractive variable the other way, thereby
giving ourselves another but easier row reduction problem. Geometrically, this
algorithm moves from corner to corner of the feasible region, hopping along edges on
the boundary of the polyhedron in order to make the objective function c
x smaller.
As with all algorithms for linear programming, the stopping condition is tied to the
Strong Duality Theorem, which we will present below. And let me not minimize the
importance of the simplex method; this method and its many variants such as the
Phase I method, the Revised Simplex Method and the Dual Simplex Method form
a powerful suite of optimization tools and are well worth study.
The simplex method is rather easy to implement in practice (although ecient,
numerically stable software for this algorithm commands a high price on the mar-
ket). Industrial applications such as airline scheduling routinely involve thousands or
even hundreds of thousands of variables. Remarkably, the commercial software can
typically solve these LPs in a few days, weeks, or months at worst. Nevertheless, the
number of row reductions needed to reach optimality can be exponential in the worst
case: we learned only in the 1970s that the simplex method is not a polynomial time
algorithm.
The rst polynomial time algorithm for linear programming problems was in-
troduced by the Russian mathematician Leonid Khachiyan in 1979. This ellipsoid
method was a huge breakthrough, but due to its numerical instability, it has rarely
been useful in practice and remains mostly a theoretical tool. Khachiyans discov-
ery set o a huge eort to nd better algorithms that are also provably polynomial
in their running time. In 1984, Narendra Karmarkar introduced a new interior
point method that borrowed heavily from the theory of non-linear optimization.
Karmarkars Method is also a polynomial time algorithm, and it has the advan-
tage of being eciently implementable in practice. For large practical problems, the
Karmarkar algorithm beats the simplex method, so good modern software for linear
programming incorporates both approaches and makes intelligent transitions between
them.
50 CHAPTER 5.
5.4 Linear programming duality
In spite of the greater occurrence of minimization problems in our course, let us now
work with a maximization linear programming problem in standard form:
max c
x subject to Ax b, x 0.
If we combine constraints, we can sometimes build an implied constraint
t
1
x
1
+ +t
n
x
n
w
where t
j
=
i
y
i
a
ij
(1 j n) and w =
i
y
i
b
i
for some well-chosen multipliers
y
1
, y
2
, . . . , y
m
0. Note that y 0 is enough to guarantee that this is an implied
constraint: every feasible solution x satises Ax b and so therefore also satises
y
Ax y
b.
If the stars align and we get lucky, it may be that this implied constraint which Ill
write t
x w also gives us an upper bound on the value of our objective function

f(x) = c
x. This brings us to the

Theorem 7 (Weak Duality Theorem). Let A be an m n matrix, let c R
n
and
b R
m
. Consider the two linear programming problems
max c
x min y
b
Ax b y
A c
x 0 y 0
For every feasible solution x to the LP on the left (which we call the primal LP
and for every feasible solution y to the dual LP on the right, we have
c
x y
b .
The proof of this theorem hinges on basic manipulations of inequalities. For
example, if t
1
c
1
and t
2
c
2
, then 5t
1
+ 3t
2
5c
1
+ 3c
2
. (But we cannot make
the same conclusion for 5t
1
3t
2
vis a vis 5c
1
3c
2
.) If we temporarily denote the
n-vector y
A by t
, then we have, for x and y feasible solutions to their respective

problems, t
and x 0 giving
c
x t
x =
_
y
A
_
x = y
(Ax) .
Likewise, since Ax b and y 0, we nd
y
(Ax) y
b
5.4. LP DUALITY 51
and these together give our result.
So, as we design algorithms that optimize over discrete sets, we have this added
benet when our problem can be formulated as an LP: these dual solution vectors y
often have elementary combinatorial meaning, and each one we can nd gives us a
bound on how far we have left to go in our search for optimality. We can play this
game of nding better and better upper bounds in the same way that we iterate to
nd better and better solutions. Unfortunately, in this game of combining constraints
to build upper bounds, we may nd that there are no vectors y at all to choose from.
As an example, consider the linear programming problem
maximize 2x
1
+ 3x
2
+ 2x
3
subject to x
1
x
2
x
3
10
x
1
+ 2x
2
2x
3
5
x
1
, x
2
, x
3
0
It is easy to check that x = [4r, r, 3r]
is a feasible solution for every positive real

number r and that this solution has objective value f(x) = r which is not bounded
above by any non-negative combination of the constraints. This is an example of an
unbounded LP. We will soon show that any LP which is feasible (i.e., has a non-empty
feasible region) is either unbounded or has an optimal solution.
Certicate of optimality
Now suppose we happened upon a vector x which is feasible for the primal LP and a
vector y which is feasible for the dual LP such that
c
x = y
b .
Then we would know that each of these vectors was an optimal solution to its re-
spective problem. We have a certicate of optimality! (This is important.) Not only
that, but also in the above string of inequalities, we would be forced to have equality
everywhere:
c
x=
_
y
A
_
x = y
(Ax) =y
b.
If we think about this a bit, we nd that whenever x
j
,= 0 the corresponding values t
j
and c
j
must be identical (a sum of positive numbers can never be zero). By the same
token, whenever y
i
is non-zero, the i
th
entry of vector Ax must be exactly b
i
, no less.
This gives us a very useful theorem for discrete problems. As above, we consider a
primal-dual pair of linear programming problems, which we now denote by (P) and
(D), respectively:
52 CHAPTER 5.
max c
x min y
b
(P) Ax b (D) y
A c
x 0 y 0
Theorem 8 (Complementary Slackness Theorem). If x is an optimal solution to
problem (P) above and y is an optimal solution to problem (D) above, then this pair
of vectors must satisfy the Complementary Slackness Conditions (CSC):
for each j (1 j n), if x
j
> 0, then
m
i=1
y
i
a
ij
= c
j
for each i (1 i n), if y
i
> 0, then
n
j=1
a
ij
x
j
= b
i
Please note that, using basic logic, these statements can be written in a variety of
ways, and we use them dierently for dierent applications. In particular, if we are
given any solution to one of the problems and we are asked is this solution optimal?,
we can set up a linear system to search for a matching partner (whose existence, in
the case it is truly optimal, will be guaranteed by the next theorem below). If we
can nd a partner which is feasible, then we have proof of optimality. A more subtle
matter is how to improve upon a feasible solution x if its unique partner y obtained
from the CSC is not feasible for (D).
Strong duality
In order to prove the Strong Duality Theorem (SDT), we will make use of an old
theorem from linear algebra called Farkas Lemma.
Theorem 9 (Farkas Lemma, 1902). Let M be an m n matrix and let d R
m
.
Then EITHER
there exists a non-negative vector z 0 in R
n
such that Mz = d
OR
there exists a vector w in R
m
such that w
M 0 and w
d < 0
NOT BOTH.
The not both part is easy to see: for if Mz = d, then w
Mz = w
d for any
vector w of appropriate length. Now if both z and w
M are non-negative, then

so also is their dot product. So w
d 0 in this case as well. The proof of the

either/or part of the theorem is beyond the scope of this course.
Now we have enough tools to state and prove the SDT for linear programming.
Theorem 10 (Strong Duality Theorem). If the primal problem (P) and the dual
problem (D) each have at least one feasible solution, then they both have optimal
solutions. Moreover, if x is an optimal solution to problem (P) and y is an optimal
solution to problem (D), then c
x = y
b.
Let us re-state the theorem in order to make the utility of Farkas Lemma more
evident. We are saying that, if there exist non-negative vectors x and y such that
Ax b and y
A c
, then there exist such vectors satisfying c
x = y
b. In other
words, when both problems are feasible, we have for any real number r, either an x,
feasible for (P), with c
x r or a vector y, feasible for (D), with y
b < r.
Proof of SDT: Suppose A is an mn matrix. Assume that both the primal problem
(P) and the dual problem (D) are feasible. Construct partitioned matrices
M =
_
A I 0
c
0 1
_
, d =
_
b
r
_
, z =
_
_
x
s
x
0
_
_
where s 0 and x
0
0 are new variables. Applied to this choice of M and d, Farkas
Lemma says that either there is a non-negative z of length n + m + 1 with Mz = d
or there is some vector w of length m+ 1 satisfying w
M 0 and w
d < 0.
In the rst alternative, we interpret z as above and unpack the block matrix M
to nd
Ax +Is = b, c
x x
0
= r.
Since s 0 and x
0
0, these give Ax b and c
x r. (And note that z 0 implies

x 0). In the alternative outcome of Farkas Lemma, we interpret w
= [y
[y
0
]
and, looking at the last column of M, we get y
0
0. One easily checks that any
such solution can be scaled by a positive constant to obtain another solution with
y
0
= 1; thus we may assume y
0
= 1 without any loss of generality. In this case,
the condition w
M 0 gives us
y
A +y
0
c
0, and y
I +y
0
0
0,
or
y
A c
0, and y
0.
The second alternative of Farkas Lemma also gives w
d < 0, which reduces to

y
b < r, as desired. So the proof is complete.

5.5 The Menagerie
There are several subjects similar to linear programming that are very important in
applications. One of these is Integer Linear Programming. Here, we are given and mn
54 CHAPTER 5.
matrix A and vectors c and b of lengths n and m, respectively, and we are asked to
maximize or minimize c
x over all non-negative vectors x 0 with integer entries.

Clearly each such problem as a linear relaxation where we enlarge the feasible
solution set to include also the non-integer vectors. But there are applications where
this linear relaxation reveals very little information about the integer-valued problem.
Another extension which is receiving a lot attention in the research community
these days is Semidenite Programming. A real n n matrix X is said to be positive
semidenite (denoted X _ 0) if z
Xz 0 for every vector z R

n
. (Equivalently,
X which, being symmetric with real entries, must be diagonalizable over the real
numbers has only non-negative eigenvalues.) Now we use the shorthand C, X) to
denote the trace of the matrix product C
X. Our generic semidenite programming

problem (SDP) has a symmetric n n matrix C, a vector b of length m and a list
A
1
, . . . , A
m
of symmetric n n matrices as input data and asks us to
max C, X)
A
i
, X) = b
i
(1 i m)
X _ 0
Exercises
Exercise 5.5.1. Use Dijkstras algorithm to nd the optimal solution to the following
linear programming problem.
minimize 2z
1
+ 12z
2
+ 2z
3
+ 36z
4
+ 5z
5
+ 12z
6
+ 8z
7
subject to z
1
2z
6
+ z
7
= 0
2z
4
+ z
7
= 2
2z
1
2z
3
z
5
= 0
4z
2
+ z
5
= 4
2z
3
+4z
4
+ z
5
+4z
6
= 4
z
1
, z
2
, z
3
, z
4
, z
5
, z
6
, z
7
0
HINT: First subtract the second-to-last constraint from the last constraint so that
exactly two constraints have non-zero right-hand side and each variable occurs in
exactly two constraints. Then scale constraints and variables wisely (using new vari-
ables x
j
= z
j
for a well-chosen in each case) to transform the system to one
resembling the LP at the beginning of this chapter.
Six
NP-coNP Predicates A Glimpse of
Computational Complexity Theory
Nov. 15, 2010
In todays class, we work informally with a powerful and deep subject in computer
science: computational complexity theory.
A precise presentation of the ideas here is beyond the scope of the course. So I am
taking the unusual approach (for me, at least) of working informally. But, in fact, its
not too bad: all we need to do is axiomatize the idea of a polynomially computable
function. That is, we take as clear the notion of a function (or algorithm) whose
computation on input x requires some number of basic computational steps which
is bounded above by a polynomial in the size of x. If we can cheat on this denition,
we can go quite far.
6.1 Polynomial time predicates
When we work with problems in discrete optimization, we typically have a xed set
of allowable inputs to a problem. As an example, a problem might have the set of
all graphs as input space, or all weighted graphs, or weighted digraphs with specied
origin and destination nodes for a path. Our challenge is to answer some question
about these inputs such as what is the smallest weight of a spanning tree? or
Which path is shortest?. For simplicity (and for clarity of thought), computer
scientists boil all of these down into TRUE/FALSE questions: Is there a spanning
tree in G having total weight less than 37? or Is the arc e = (b, c) included in a
shortest path from r to t in digraph G?
55
56 CHAPTER 6. NP-CONP PREDICATES
So we arrive at the concept of a predicate. This is a boolean function on the
set of inputs. For each input x, predicate p() either answers p(x) = T (true) or
p(x) = F (false). Intuitively, one should think of a predicate as a property: for
example, among all graphs (the inputs), some may be connected while the rest are
not, so p(G) = T if and only G is connected is a predicate. It should not bother
us (at this point, at least) that we do not have a formula to compute function p()
or even any reasonable expression for this function. It just encodes the property of
connectedness and, in this, we nd it useful as a device.
Lets survey a few simple examples of predicates before we move on.
Graph Connectedness: Is a given graph connected?
Input: An undirected graph G = (V, E).
Property: G is a connected graph.
Primes: Is a given positive integer prime?
Input: A positive integer n.
Property: n is a prime number.
Factoring: Does a given positive integer have a factor below a given threshold?
Input: Integers m and n with 1 < m < n.
Property: n has a divisor d satisfying 1 < d < m.
Spanning Tree: Does graph G have a spanning tree of total weight below some thresh-
old?
Input: An weighted undirected graph (G, w) and an integer K.
Property: G contains a spanning tree T of total weight w(T) < K.
Shortest Path: Does digraph G contain an (r, t)-path of length below some threshold?
Input: A weighted digraph (G, w), nodes r and t, and an integer K.
Property: G contains a directed path from r to t of length less than K.
Travelling Salesman Problem: Does a given graph admit a Hamilton cycle of total
length below some threshold?
Input: A weighted graph (G, w) and an integer K.
Property: G contains a cycle of length [V (G)[ of total weight less than K.
So a predicate simply divides some class of objects into those having some property
and those which dont. I.e., it splits a collection instances into those for which a
statement is true (so-called Yes instances) and those for which that statement is
false (No instances). It is a completely abstract function an oracle which
6.1. POLYNOMIAL TIME 57
simply reports true or false for every input. There is no formula or process that
computes p(x), it just IS.
By contrast, we use the word function (and, especially, polynomially com-
putable function) to represent a function A() which is going to be our model of an
algorithm. In order to tell how quickly such a function A can be computed, we rst
must have an informal notion of the size of an input.
Every nite amount of data (such as a graph or a matrix of rational numbers)
can be encoded as a nite string of zeros and ones. This is how most computers store
our data, so it seems fairly natural to us. We let 0, 1
denote the set of all nite

strings (or words) over the binary alphabet:
0, 1
= , 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, . . . .
(Here denotes the empty string ignore it.) For such a 01-string x, we let [x[
denote the length of x. This is a natural notion of the size of the input to an
algorithm or function. But there are many other encoding strategies for data and
many other natural notions of size. For instance, the input to a problem may be
encoded in the English language and the size of such an input may be taken to be
the number of letters (and spaces) in it. In theoretical computer science, we basically
take reasonable encoding as intuitively understood, with the requirement that,
for any two reasonable encodings, there is a polynomially computable function that
transforms one into the other. In most cases, the length of a binary string representing
x works just ne.
Now we avoid discussing the technical topic of Turing Machines (big directed
graphs with instructions on converting input strings into walks) by taking as intu-
itively clear the notion of polynomially computable functions. A function A() which
takes inputs x and reports either T or F is polynomially computable if there is a
polynomial f(n) such that, upon input x of length n, the computation of A(x) re-
quires at most f([x[) basic operations. The notion of basic operation depends on
the context. It can include integer or rational arithmetic (given a xed overall bound
on the number of digits in a number), comparison of two integers, edges or vertices,
retrieval of some object from a data structure, and more. But we choose not to make
things more precise at this time. We just want to avoid mysterious things like addi-
tion of real numbers, which can not even be carried out in nite time in most cases.
(In fact, most real numbers cannot even be stored in a computer since it only has a
nite amount of memory!)
So our main objective is to seek out polynomially computable functions A(x)
that compute predicates p(x) which are important in optimization. We say that A
is an algorithm for predicate p if, for all valid inputs x to p, we have A(x) = p(x).
The complexity class P of polynomial-time computable predicates consists of all of
those predicates p for which a polynomially computable function A() exists satisfying
A(x) = p(x) for all valid inputs x.
Note. The standardized encoding of all instances to all problems via 01-strings, as
described above, gives us a way to view a predicate as a language. If we split the
zero-one strings into those with p(x) =true and those with p(x) =false, then we
can dene L = x[p(x) = T and re-phrase our problem as Does a 01-string x belong
to L? This approach, via languages, is more convenient if we are dealing with nite
state automata and Turing machines. If you see these ideas in a computer science
course, you are more likely to work with the language approach over the predicate
approach we have chosen here. But they are equivalent.
6.2 Non-deterministic polynomial time
Sometimes a problem becomes easier to solve if we are able to make a guess about
how its solution works. For example, suppose we are given a positive integer n and we
are asked whether or not n is a prime number. To be precise, consider the predicate
p(n) dened for integers n > 1 given by p(n) = T if n is composite and p(n) = F if n
is prime. If we happen to guess a positive divisor of n, say 1 < d < n and d[n, then it
is easy to answer the question: p(n) is true, n is composite, not prime. This idea of
allowing for lucky guesses brings us to the concept of non-deterministic polynomial
time.
A predicate p() is an NP predicate if, for each input x, p(x) is true if and only
if there is some certicate (or hint) y which is not too long and some ecient
algorithm A(x, y) which evaluates to true on this pair of inputs. More precisely, we
say predicate p() is in the complexity class NP if there exists a polynomial f and a
polynomially computable function A(, ) such that, for all inputs x,
p(x) = true y ([y[ f([x[) and A(x, y) = true) .
It is not our concern, in this context, how the certicate y is found for each x;
it may require an exponential amount of computation to locate a valid y for a given
x, but this computation is irrelevant in the above denition. We say that p NP
provided we can nd this polynomially computable function A of two arguments and
a polynomial function f which limits the size of y as a function of the size of x
with the property that p(x) = T if and only if we can nd a y with length at most
f([x[) making A(x, y) true.
Lets look at an example of an NP predicate. A Hamilton cycle in a graph G on n
vertices is a cycle in G of length n; i.e., a closed path that visits every vertex exactly
once (and ends where it begins). Evidence so far suggests that nding a Hamilton
6.3. THE BIG CONJECTURES 59
cycle in a graph is exponentially hard. Yet the predicate p() dened on graphs by
p(G) =true if G contains a Hamilton cycle and p(G) =false otherwise, is indeed
an NP predicate. To see this, we let y be an ecient description of the Hamilton
cycle and A(G, y) a function that veries that y is indeed a cycle of length n in the
graph G. (It should be fairly clear that such an A can be computed in polynomial
time, even if some details have been suppressed.)
So there is a nice certicate for graphs with a Hamilton cycle. (Such a graph
is called a Hamiltonian graph.) But no polynomially-sized certicate is known that
clearly demonstrates that a given graph G is not Hamiltonian.
6.3 The big conjectures
It is important to note that the complexity class NP is not symmetric; a predicate p()
may belong to NP while its negation p() may not. (Here p(x) =true if p(x) =false
and p(x) =false if p(x) =true.) The above example, Hamilton cycle illustrates
this asymmetry. Meanwhile, the class P is symmetric: a predicate is computable in
polynomial time if and only if its negation is computable in polynomial time.
We dene the complexity class coNP as the set of all negations of predicates in NP.
In other words, a predicate p() is in coNP if and only if p() NP. So a predicate p()
is in coNP if and only if there is a polynomial f(n) and a polynomially computable
function A(x, y) such that
p(x) = F y ([y[ f([x[) and A(x, y) = F) .
We clearly have the following containments of complexity classes:
P NP, P coNP.
The following is the most famous unsolved problem in theoretical computer sci-
ence:
Conjecture 11. P ,= NP
This seems likely to be true, but we appear to be very far from a correct proof. If it
is true, then many important combinatorial optimization problems cannot be solved
exactly in polynomial time. This justies the use for heuristics and relaxations of
hard problems that are within reach of todays computers.
The following, less well-known, conjecture is probably more important for discrete
optimization.
Conjecture 12. P = NP coNP
I tend to operate on the nave assumption that this conjecture is true. But,
again, we appear to be many decades away from a proof. As mentioned above, we
know that P NP coNP. So the impact of the conjecture is the reverse inclusion:
NPcoNP P. Informally, this says that if p(x) = T is easy to check (with the help
of a hint y) and p(x) = F is also easy to verify (with the help of some other hint
y
), then it should be easy to decide whether p(x) = T or p(x) = F without the help
of any hint whatsoever. For this reason, we nd it useful to frame our optimization
problems as theorems about NP-coNP predicates.
6.4 Some examples of NP-coNP predicates
In this section, we simply list a bunch of examples of familiar predicates that are in
NP coNP and, in some instances, discuss them briey. The rst one is proven in
the rst few weeks of a basic course in linear algebra.
Example 6.4.1 (Solvability of a Linear System). Let A be an m n matrix (over
the rational numbers, say) and let b be a vector of length m. Then EITHER
there is a vector x of length n satisfying Ax = b
OR
there is a vector y of length m satisfying y
A = 0 and y
b ,= 0,
NOT BOTH.
It is easy to miss the point here. The reader may already know that this problem
Given A and b, is the linear systemAx = b consistent? is solvable in polynomial
time. Gauss-Jordan reduction requires O(N
3
) basic arithmetic operations where N =
max(m, n). (Actually this is non-trivial: even for systems with rational entries, one
must choose row operations carefully so as to prevent exponential blow-up in the size
of the fractions appearing in the matrix.)
But, just for a moment, ignore the row reduction algorithm and just look at the
two certicates individually. The rst outcome in the theorem says that there is a
vector x with Ax = b. Given this vector as a hint (or certicate), one can easily
compute the matrix-vector product Ax and compare this result entry by entry to the
given vector b. So the predicate
p(A, b): linear system Ax = b has at least one solution x
is an NP predicate: an input for which the answer is true always admits a certicate
which is veriable in polynomial time.
6.4. EXAMPLES 61
Likewise, the second outcome in the theorem says that there is a vector y with
y
A equal to the vector of all zeros and y
b ,= 0. Given this y as a certicate, we

readily compute both products and verify that y
A = 0 and y
b ,= 0. (Note that we
do not need to worry about numerical error here if we are doing computations over
the rational numbers.) So the predicate p(A, b) given above is also a coNP predicate:
an input for which the answer is false always admits a certicate which is veriable
in polynomial time.
Here are two more examples of theorems characterizing NP-coNP predicates.
Example 6.4.2 (Farkas Lemma Variant). Let A be an mn matrix over the rational
numbers and let b be a vector of length m. Then EITHER
there is a vector x 0 of length n satisfying Ax b
OR
there is a vector y 0 of length m satisfying y
A 0 and y
b < 0,
NOT BOTH.
Example 6.4.3. Let A be an m n matrix with integer entries and let b be an
integer vector of length m. Then EITHER
there is an integer vector x such that Ax = b
OR
there is a rational vector y satisfying y
A all integer and y
b , Z,
NOT BOTH.
Exercise 6.4.1. The following predicate, taking a rational matrix A and rational
vector b as inputs, is clearly in NP:
p(A, b): there is a vector x > 0 satisfying Ax = b.
(Here x > 0 indicates that every entry of x must be positive.) Is this predicate in
coNP? What is a possible certicate?
At the beginning of Sectin 6.2, we showed that Primes is a coNP predicate. (Think
about it.) In 1975, Vaughan Pratt used the following theorem from basic abstract
algebra to prove that Primes is also in NP.
Lemma 13. For an integer n > 1, n is prime if and only if there is an integer g,
satisfying
1 < g < n
g
n1
1 (mod n)
for each prime divisor q of n 1, g
(n1)/q
, 1 (mod n).
It still requires a bit of thought to see how this implies that Primes is in NP. The
person proving this must not only supply this magic integer g, but also all of the
prime divisors of n 1 and proofs that all of these are prime, etc etc. Pratt showed
that this certicate (which recursively gives these generators g and prime divisors q)
is polynomial in log
2
n (number of bits needed to encode n) and that the statements
in the lemma can be veried for all of these factors in time polynomial in log
2
n.
So, by the above lemma and this analysis, Primes is an NP-coNP predicate. Guided
by Conjecture 12, we should then expect a polynomial time algorithm which, given
an integer n > 1, decides whether or not n is prime. The rst such algorithm was
discovered, with much fanfare, in 2002 by Agrawal, Kayal and Saxena.
Next consider the predicate Factoring given above. Several important cryptosys-
tems, such as the RSA system used for session key generation in the PGP protocol,
have their security tied to the assumption that it is hard to factor large integers. For
example, if n = pq when p and q are 1000-digit primes, no one outside the classied
community knows a general method for nding p and q, given n. We reduce this to
a predicate as follows:
Factoring: Does a given positive integer have a factor below a given threshold?
Input: Integers m and n with 1 < m < n.
Property: n has a divisor d satisfying 1 < d < m.
This predicate is clearly in NP since the certicate d is easy to verify. But we
now see that it is also in coNP using Pratts result. In order to convince another
that a given integer n has no factor d below m, we simply give as certicate the full
prime factorization of n, say n = q
1
q
2
q
k
, with proofs that each q
i
is indeed prime,
and our partner can easily verify that each q
i
m. So Factoring is another NP-coNP
predicate. Should we expect it to be decidable in polynomial time?
Remark. Note that any polynomially computable function which decides the above
predicate can indeed be converted into a polynomial time algorithm for factoring
integers. To see this, we use a sort of bisection search. First ask if n has a divisor
below m =

n; if so, try m =
4
n, and so on, zeroing in on the number of bits in the

smallest prime divisor of n.
Many important optimization problems on graphs are known to be in both NP
and coNP. We will see a few of these in later lectures.
6.5. NP-COMPLETE AND NP-HARD PROBLEMS 63
6.5 NP-Complete and NP-hard problems
This section will be quite brief, in spite of the vast scope and importance of its topic.
For our purposes, we simply need to know that a problem which has been shown to
be NP-complete or NP-hard (or a computational problem whose solution allows for
the solution of such a decision problem) is very unlikely to admit a fast algorithm. If
we are faced with such a problem, we are best advised to choose one of the following
courses of action:
avoid the problem entirely: check if there is a simpler formulation of our applied
problem that does not involve such hard tasks;
satisfy ourselves with heuristic solutions that may not give the correct or optimal
answer;
if we insist on a correct answer, resign ourselves to a very long wait as expo-
nential search algorithms dig for one.
But the truth is much more complex than this. First, it may astoundingly be
that our NP-complete problem does admit a polynomial-time solution. In that case,
we have proven that P = NP and we become famous. Another issue is the concept of
relaxation. Whenever we formulate a real-world problem for a computer to solve, we
are building models, making simplications, ignoring features or obstacles. So the art
of avoiding a hard theoretical formulation of a real-world problem is a highly technical
one and a valuable one seeing the right formulation is not easy and requires a
great deal of practice and knowledge. Next, a study of heuristics can be quite involved,
bringing into play randomized algorithms, average case analysis, probability theory,
calculus, and general all-around smarts. Finally, I avoided using the term exhaustive
search in the last bullet above because practical solution to NP-complete problems
almost never explores every possibility (or even 1% of the possible solutions), but
for badly structured problems almost always visits exponentially many solutions.
The techniques are sophisticated: branch and bound, dynamic programming, cutting
plane methods in integer linear programming, and more. Suce it to say that we are
just scratching the surface.
So what makes a problem NP-complete? Let p() and q() be predicates. We say
that q is polynomially reducible to p if there is a polynomial time algorithm A which
takes all possible inputs for q and converts them to inputs for p and satises
p(A(x)) = T q(x) = T
for all inputs x. Note that our natural concern that A(x) is not too big an input
relative to the size of x is taken care of by the fact that A does only a polynomial
number of basic computations. So there is a polynomial f(n) such that [A(x)[
f([x[) for all inputs x to q.
Denition 6.5.1. A problem p() is NP-complete if p NP and every problem q NP
is polynomially reducible to p.
Before discussing examples of NP-complete problems, lets consider the impact of
this denition. Suppose p is an NP-complete predicate. If we can nd a polynomial
time algorithm to decide p, then we have proven that P = NP: let q be any problem
in NP; for any input x to q, compute the corresponding input A(x) to p; now apply
this polynomial-time algorithm to decide p(A(x)); this decides q(x). So it seems
that NP-complete problems are very special: they are the hardest problems in the
complexity class NP. Anyone who can crack one of these tough nuts can solve any
problem in NP eciently!
Steven Cook, in 1971, was the rst to discover an NP-complete problem (and
the rst to introduce NP-completeness as a concept!)
1
. Cook showed that boolean
formula satisability is such a problem.
Let x
1
, . . . , x
n
be a collection of boolean variables: each can take on only one of
two values true or false. We let x
i
denote the negation of x
i
, being false when
x
i
is true and true when x
i
is false. Using this notion together with boolean
AND (), OR () and parentheses for grouping, we build boolean functions.
For example, here are two boolean functions in variables x
1
, x
2
, x
3
, x
4
, x
5
:
B(x) = ( x
1
x
2
) (x
1
x
3
) (x
1
x
4
) ( x
2
x
3
) ( x
3
x
4
) (x
2
x
3
x
5
) (x
4
x
5
)
(this is a reachability question in a graph with four nodes and ve edges) and
B
(x) = ((x
1
x
2
) ( x
1
x
2
)) ((x
2
x
3
) ( x
2
x
3
)) ((x
3
x
4
) ( x
3
x
4
))
((x
4
x
5
) ( x
4
x
5
)) ((x
5
x
1
) ( x
5
x
1
))
(this being a question of properly 2-coloring the vertices of the pentagon). The
boolean function B(x) has a solution, hence is said to be satisable; one can easily
check that B(T, F, T, F, T) is true. The boolean function B
(x) has no solution, so

B
is not satisable.
Satisability (SAT): Is a boolean function satisable?
Input: A boolean function B(x) of n variables x = (x
1
, . . . , x
n
).
Property: There exists at least one assignment of T and F to the variables x
i
making
B(x) true.
1
Around the same time, Leonid Levin made closely related discoveries in the USSR, but you can
read a proper history from the experts; I am not an expert.
6.6. LANDAU NOTATION 65
Theorem 14 (Cook, 1971). SAT is NP-complete.
One often hears that 3-SAT is the standard NP-complete problem. For a positive
integer k, a k-SAT formula is a boolean function expressed as
B(x) = (y
1,1
y
1,2
y
1,k
)(y
2,1
y
2,2
y
2,k
) (y
m,1
y
m,2
y
m,k
) ,
where each boolean variable y
i,j
stands for x
h
or x
h
for some h. So a k-SAT formula
is an AND of some number m of clauses, each of these being an OR of exactly k
variables and/or negations of variables. It turns out that, while 1-SAT problems are
trivial to solve and 2-SAT problems can be solved using the ideas in this course, k-SAT
problems for k 3 are much harder. In fact, any SAT problem can be converted in
polynomial time to a 3-SAT problem. This is why it is common to replace SAT in
Cooks Theorem by 3-SAT.
So satisability problems are very special: they are the hardest problems in
NP. But Levin and others immediately showed that other natural problems in NP are
just as hard: travelling salesman, Hamilton cycle, graph coloring are all among the
elite NP-complete problems. Many computer scientists got many publications early
on by discovering new NP-complete problems. But now there are so many of them
that the discovery that some decision problem in NP is NP-complete is rarely worth
publication in a journal. (Even Microsofts Minesweeper computer game includes
an NP-complete subproblem!) Research in computational complexity theory has gone
into dierent directions in recent years.
Okay, the last thing I want to explain is the concept of an NP-hard problem. In
short, a problem p is NP-hard if, for every problem q() in NP, there is a polynomial
time algorithm which may use as a subroutine (or oracle) a hypothetical function
that solves p one one computational step which correctly answers q(x) for each
input x. So there are two key dierences between an NP-complete problem and an
NP-hard problem. First, note that an NP-complete problem must reside in NP; we
do not require this of NP-hard problems. Second, while an NP-complete problem has
the property that every problem in NP is transformable to it in polynomial time, an
NP-hard problem is such that, in using it to solve problems in NP, we may need to
create a polynomial number of instances of this problem not just one and solve
them all in order to get our answer. Nevertheless, discovering a polynomial time
algorithm for any problem in either of these classes amounts to a proof that P = NP.
And that would be big!
6.6 Landau notation
As we are looking at ecient algorithms, we need precise but not overly detailed
language to describe their running times. Of course, the same algorithm will typically
take longer i.e., require a larger number of basic computational steps when
handling longer inputs. So we see the running time of algorithm A(x) as a function
f(n) of the integer variable n = [x[. When inputs are represented as binary strings,
there are 2
n
choices for an input x of length n and A() may have varying running
times on these inputs. So we take the worst case and, as a rst attempt, dene f(n)
to be the maximum number of steps taken by algorithm A on any input of length
n. A second way to avoid technicalities and get to the essence of the matter is to
ignore constants. Suppose, for example that one of us considers the comparison of
two binary values to constitute a single basic operation while another points out that
these two values must rst be accessed, then compared and then the answer reported,
amounting to a total of four elementary steps for this action. If our algorithm, on
inputs of size n, performs at most n
3
such comparisons, then the estimates n
3
and
64n
3
for the number of basic computational steps dier wildly. But we consider
these two estimates as basically the same. To make this precise, we employ Landau
notation.
Landau notation not only allows us to suppress constants, but also to focus on
behavior for large inputs only. For example, if an algorithm requires some large
constant number of preparatory steps before it even looks at its input (implying that
this number of steps is independent of n), then we want to ignore this constant as
well in our comparison of this algorithm against another. (This is the point where
practicing software engineers go bonkers and resort to simulations.) We are trying
only to compare asymptotic rates of growth here.
We consider real-valued functions on the set N of positive integers. Let f : N R
and g : N R. Lets start with the notation for upper bounds. We say f(n) is
O(g(n)) provided there exists a positive constant c and a positive integer N such
that, whenever n > N, we have f(n) cg(n). In logical notation, f(n) is O(g(n))
means
c > 0, N N n > N (f(n) cg(n)) .
For example f(n) = 1000n
2
is O(n
2
), it is O(n
3
) and it is O(2
n
), but it is not O(n).
A function which is O(1) is exactly one which is bounded above by a constant.
Our notation for upper bounds automatically gives us language for lower bounds,
but it is convenient to introduce a new symbol for this. We say f(n) is (g(n)) if
g(n) is O(f(n)). Logically, f(n) is (g(n)) if
c > 0, N N n > N (f(n) cg(n)) .
If we want to exactly nail the growth rate of a function (again ignoring constants
and start-up costs reected in its values for small n), we use the notation. Simply
stated, f(n) is (g(n)) if f(n) is both O(g(n)) and (g(n)). For example .001n
4
is
(n
4
); it is also O(n
5
) but not (n
5
) and it is (n
3
log n) but it is not (n
3
log n).
Finally (for us, in this brief introduction), we say f(n) is o(g(n)) if
lim
n
f(n)
g(n)
= 0.
So f(n) is negligible compared to g(n) for large n.
The O() (or Big-Oh) notation allows us to classify algorithms according to
their asymptotic running time. Let A() be an algorithm taking a maximum of f(n)
steps on inputs of size n for each positive integer n. We say algorithm A is linear
time if f(n) is O(n). We say A is a quadratic time algorithm if f(n) is O(n
2
). And
so on. We say A is a polynomial time algorithm if f(n) is O(n
k
) for some integer k
independent of n.
The O() (or Big-Oh) notation also allows us to talk about various complexity
classes of decision problems. For any monotone increasing function f : N R, we de-
ne TIME(f(n)) to be the set of decision problems which are solved by some algorithm
with running time O(f(n)). It is curious that, using Cantors diagonal argument (a
classical proof technique which shows that the real numbers are uncountable), we can
prove that the containment TIME(n
k
) TIME(n
k+1
) is proper, we have trouble even
demonstrating a single decision problem which is in TIME(n
3
) but not in TIME(n
2
),
for example. So we know even less about the complexity class
P =
_
k=1
TIME(n
k
).
6.7 The Menagerie
Consider a light bulb factory which on one given day produces a large number N of
light bulbs. The Quality Control Oce must subject bulbs to k tests, T
1
, T
2
, . . . , T
k
.
Each test has associated to it a cost per bulb c
k
> 0 and a failure rate 0 p
k
1.
A bulb which fails any one test is discarded and therefore no further money spent
on testing rejected bulbs. If we assume that the failure probabilities are entirely
independent, what is the optimal order in which the tests should be carried out if the
goal is to minimize overall cost? (This problem comes to me from Jack Edomonds.)
We have k! solutions to choose from, each representable as a permutation of the
k indices. Let q
i
= 1 p
i
for convenience. According to the above assumptions, the
overall cost associated to solution is
c
(1)
N +c
(2)
_
q
(1)
N
_
+c
(3)
_
q
(1)
q
(2)
N
_
+ = N
k
i=1
c
(i)
i1
h=1
(1 p
(h)
).
For example, if the data is
Test T
1
T
2
T
3
T
4
T
5
c
i
1 3 5 7 9
p
i
.001 .002 .004 .008 .009
then the schedule T
1
, T
2
, T
4
, T
5
, T
3
of tests is expected to cost about 24.778N units,
but this is not best possible: a cost of 24.756N is achievable.
The famous Knapsack Problem has a somewhat similar statement. We are given
a list of n items, some of which are to be loaded into a knapsack for a treacherous
journey. Each item i has a value v
i
and a weight w
i
. Our knapsack can handle
some maximum total capacity W and we are tasked with nding that subset of items
(without repetition) which has maximum total value subject to this overall weight
constraint. As an example, a knapsack with W = 30 and six items with values and
weights as follows
Item 1 2 3 4 5 6
v
i
10 14 16 18 21 22
w
i
4 5 8 9 11 13
permits an optimal solution of total value 63.
Exercises
Seven
Network Flows
Nov. 16, 2010
In the next class meetings, we will work through the basic theory of ows in
networks. Such problems arise, for example, in trac engineering, oil transport, dis-
tribution systems, telecommunications and even in computer graphics. The central
model for all of these problems is a directed graph with various source nodes (sup-
plying some commodity) and various sink nodes (each demanding that commodity)
and ow capacities on the arcs of the network. Our goal here is to maximize the
ow of some single commodity from a single source to a single sink. Once we have
understood this problem, we move on to the challenge of achieving these same goals
at minimum costs.
7.1 Statement of the problem
Let G = (V, A) be a directed graph (or network) with specied source node r V
and designated sink node t V . We assume that each arc e A has a capacity b(e)
which limits the amount of ow achievable across that arc. If e = (u, v) is an arc, we
denote the head of e by v and write h(e) = v and we denote the tail of e by u, writing
t(e) = u.
A ow from r to t in G is an assignment of real numbers f : A R to the arcs
of the network satisfying the conservation of ow law at each internal node of the
network (i.e., excluding only the source and sink):
h(e)=v
f(e) =
t(e)=v
f(e), v , r, t.
This ow in equals ow out rule is fundamental to many mathematical problems,
not just in discrete math.
69
70 CHAPTER 7. NETWORK FLOWS
A ow f is a feasible ow from r to t if, for each arc e, we have 0 f(e) b(e).
(For short, we will call this a feasible (r, t)-ow in G.) The value of ow f is
Value(f) :=
t(e)=r
f(e)
h(e)=r
f(e);
i.e., the net amount of ow out of the source. Given the conservation of ow at the
internal nodes, we see that this is equal to the net amount of ow into the sink:
Value(f) =
h(e)=t
f(e)
t(e)=t
f(e).
The rst problem we address is, given a network with a set of arc capacities and
designated source and sink, to nd a feasible ow of maximum value.
7.2 The Ford-Fulkerson algorithm
The famous algorithm of Ford and Fulkerson for nding a ow of maximum value in
a graph is very simple. It relies on a path-nding (or reachability) subroutine, but
applies this in each iteration to a carefully designed auxiliary graph.
We begin with a ow of zero on every arc; this clearly satises the conservation
laws and, naturally assuming b(e) 0 for all e, it is feasible. In each iteration of the
algorithm, we augment the ow by changing ow along some path from r to t (but
not necessarily a directed path). The small example in Figure 7.2 demonstrates why
ow cannot be augmented in a greedy fashion.
For this reason, algorithms for network ows work with a sequence of auxiliary
networks which reect not only the original digraph G together with its arc capacities
b, but also some current solution f to the ow problem in G. An ordered pair (u, v)
in such a network will, naturally, be called an arc if (u, v) is an arc in the original
digraph G but will be called a reverse arc if instead (v, u) is an arc in G. Given the
original network G = (V, A), we dene the set of all reverse arcs
A
= e = (v, u) [ e = (u, v) A
and note that it is possible for both (u, v) and (v, u) to belong to
A
.
Let G = (V, A) be given with arc capacities b(e) : e A, source r and sink t.
Suppose we have a feasible (r, t)-ow f in G. We dene the auxiliary network G(f)
as follows:
G(f) has vertex set V , the vertex set of the original network G
7.2. THE FORD-FULKERSON ALGORITHM 71
Figure 7.1: In graph G = (V, A) with V = r, a, b, t and A = (r, a), (r, b), (a, b),
(a, t), (b, t), we have all capacities b(e) = 1 (given as circled values in the diagram).
Here, the initial choice of path r, a, b, t leads to a ow of value one which cannot be
improved without backtracking.
G(f) has arc set A
+
A
where
A
+
= e A : f(e) < b(e), A
= e
A
: f(e) > 0
where e
A
is the reverse of arc e A
G(f) has arc capacities b
(e) = b(e) f(e) for e A

+
and b
( e) = f(e) for
e A
Numerous examples will appear below.

Ford-Fulkerson Algorithm for Max-Flow
Input: Network G = (V, A) with source r, sink t and arc capacities b(e) : e A
Output: A feasible ow f : A R from r to t in G of maximum value K
Description: Initially dene G
0
= (V, A
0
) to be the graph G. In particular, A
0
= A.
Initialize f
0
(e) = 0 for all arcs e A. Initialize b
0
(e) = b(e) for each e A.
As long as the auxiliary network G
k
contains a path from r to t, do the following:
Choose an (r, t)-path P A
k
in digraph G
k
. (We call this a ow-augmenting
path.
Compute the smallest residual capacity along this path:
:= min
_
b
k
(e) : e P
_
Update the ow along path P: dene
f
k+1
(e) =
_
_
f
k
(e) + if e P A;
f
k
(e) if e P
A
;
f
k
(e) if e , P.
Build the next auxiliary network G(f
k+1
), which we abbreviate to G
k+1
=
(V, A
k+1
) where the new arc set is
A
k+1
:= A
k+1
+
A
k+1
dened by
A
k+1
+
:=
_
(u, v)[e = (u, v) A with f
k+1
(e) < b(e)
_
and
A
k+1
:=
_
(v, u)[e = (u, v) A with f
k+1
(e) > 0
_
and capacities
b
k+1
(e) :=
_
b(e) f
k+1
(e) if e A
k+1
+
f
k+1
(e) if e A
k+1
When no ow-augmenting path is found in the auxiliary network G

k
, the ow f = f
k
is optimal: its value K =
t(e)=r
f(e)
h(e)=r
f(e) is best possible.
Thats all there is to it! Just a lot of path-nding and construction of auxiliary
networks.
Theorem 15. If all arc capacities are integers and 0 b(e) B for all e A, then
the Ford-Fulkerson algorithm runs in O(mnB) time, where n is the number of nodes
and m is the number of arcs.
The example in Figure 7.2 shows why the integer B appears in our complexity
bound. A nave choice of ow-augmenting path (r, a, b, t and then r, b, a, t) leads
to 10,000 iterations in the implementation of the Ford-Fulkerson algorithm for this
network! But it still reaches optimality after a nite number of steps. Even this
cannot be guaranteed for the case where some arc capacities are irrational. (See
Papadimitrou and Steiglitz [6, p126].)
7.3. THE MAX-FLOW MIN-CUT THEOREM 73
Figure 7.2: Poor choice of ow-augmenting path leads to exponentially many itera-
tions of the Ford-Fulkerson algorithm.
7.3 The Max-Flow Min-Cut Theorem
Nov. 18, 2010
The Ford-Fulkerson (F-F) algorithm terminates when, at some iteration k, the
auxiliary network G
k
contains no path from source r to sink t.
Recall our concept of a cut in a graph or digraph G = (V, A). For a set S V ,
we let

S = V S denote the complement of S in V and
(S;

S) =
_
e = (u, v) A[u S, v

S
_
.
The capacity of such a cut is the sum of the capacities of the arcs it contains:
b(S;

S) =
e(S;
S)
b(e).
For nodes r, t V , a cut (S;

S) is called an (r, t)-cut if r S and t

S (so that the
cut separates, or cuts o, r from t.) An example is given in Figure 7.3. Given a
ow f in G, we use f(S;

S) to denote the net ow across the cut:
f(S;

S) =
e(S;
S)
f(e)
e(
S;S)
f(e).
Lemma 16. For any feasible (r, t)-ow in G, the net ow out of the source r is equal
to the net ow across any (r, t)-cut:
Value(f) = f(S;

S).
Figure 7.3: For S = r, c, d, we have

S = a, b, t and the cut (S;

S) has capacity
b(S;

S) = 2 + 2 + 5 = 9.
Proof: Suppose (S;

S) is an (r, t)-cut. Then r S but t , S. Since ow is conserved
at every element in S other than r, we have
Value(f) =
t(e)=r
f(e)
h(e)=r
f(e)
=
sS
_
_

t(e)=s
f(e)
h(e)=s
f(e)
_
_
=
e(S;
S)
f(e)
e(
S;S)
f(e)
= f(S;

S)
where the double sum in the second line above includes two terms (with opposite
signs) involving every arc having both head and tail in S and only one term involving
arcs with one end in S (a term +f(e) when t(e) lies in S and a term f(e) when
h(e) S).
Lemma 17 (Weak Duality Theorem for Network Flows). The value of any feasible
(r, t)-ow is bounded above by the capacity of any (r, t)-cut.
Proof: Let f be a feasible (r, t)-ow and let (S;

S) be an (r, t)-cut. Since f(e) b(e)
for each arc e (S;

S), we have
f(S;

S) b(S;

S),
7.3. THE MAX-FLOW MIN-CUT THEOREM 75
and as f(e) 0 for each arc e (
S; S), we have
f(
S; S) 0.
By the previous lemma, we have
Value(f) = f(S;

S) b(S;

S).
Corollary 18. If f is a feasible (r, t)-ow and (S;

S) is an (r, t)-cut satisfying
Value(f) = b(S;

S),
then f is a ow of maximum value and (S;

S) is a cut of minimum capacity.
Corollary 19. If all arc capacities are integers, then the Ford-Fulkerson algorithm
terminates after at most b(r; V r) iterations.
Proof: Since 1 each time, the value of the ow increases by at least one unit in
each iteration. Since the capacity b(S;

S) of any (r, t)-cut S;

S) is an upper bound on
the value of any ow (Lemma 17), the capacity of the trivial (r, t)-cut (r; V r)
is a valid upper bound on the total number of iterations.
Theorem 20 (Max-Flow Min-Cut Theorem). Let G = (V, A) be a digraph with
source r and sink t and integer arc capacities b(e) : e A. The maximum value of
a feasible (r, t)-ow in G is equal to the minimum capacity of an (r, t)-cut in G.
Proof: First observe that, for any (r, t)-ow f and any (r, t)-cut (S;

S) we have
Value(f) b(S;

S) by Lemma 17.
Now we look at the specic ow produced by the Ford-Fulkerson algorithm and
the cut (S :

S) where S is the set of all nodes in G reachable from r in the nal
iteration of the algorithm. Let k be the smallest integer such that the auxiliary
network G
k
= (V, A
k
) contains no (r, t)-path. In this last iteration of the algorithm,
we nd that sink t is not reachable from source r in G
k
and S V is the set of all
nodes which are reachable from r in this network. Then we nd that, in G
k
, the cut
(S;

S) is the empty set. Shifting our attention back to the original network G, we
claim that b(S;

S) = Value(f). Given what we established above, this would prove
not only the theorem but also the correctness of the algorithm.
We know from above that
Value(f) = f(S;

S)
since all the other arcs cancel out. But our last auxiliary graph contains no arc with
tail in S and head in

S. So, by denition of the auxiliary network, f(e) = b(e) for
Figure 7.4: Network with two commodities. Resource i is supplied at source r
i
and
demanded at sink t
i
for i = 1, 2. Arc capacities are overall upper limits on the sum
of the two ows across an arc.
every arc e in (S;

S) and f(e) = 0 for every arc e in the opposite cut (
S; S). In other
words, we have
f(S;

S) = b(S;

S).
This gives us exactly what were after:
Value(f) = f(S;

S) = b(S;

S)
and the proof is complete.
7.4 The Menagerie
An important problem in several industries is the problem of multicommodity ow. As
in the single-commodity case we are studying, we have a network with given capacity
for each arc. But instead of transporting one resource, we must transport several
distinct resources over the same system. Intuitively, instead of shipping only water
through the pipes, one may view this as shipping water, oil and beer all through the
same pipes (with some miraculous way of separating them in the end) still subject to
the same overall capacity constraints. For example, in the network of Figure ??, it is
possible to ship three units of Resource 1 from source r
1
to sink s
1
and also possible
to ship three units of Resource 2 from source r
2
to sink t
2
. But it is not possible to
achieve a ow value of three (or even two) for one resource while keeping the value
at three for the other resource.
Multicommodity ow problems arise in many applications, including VLSI design
and shipping.
Exercises
Eight
Dinics Algorithm for Network Flows
Nov. 22, 2010
We learned last week that the Ford-Fulkerson algorithm, while terminating after
only a nite number of iterations when all arc capacities are rational, can sometimes
require an exponential number of iterations. We now consider another approach to
the maximum ow problem. The Dinic algorithm makes much better use of each
auxiliary network, nding for each one a sort of maximal layered ow using a greedy
depth-rst search method.
8.1 The Dinic algorithm for maximum ow
The Dinic algorithm is a clever variation on the Ford-Fulkerson algorithm which
improves the running time to O([V [
2
[A[). (Note that there is an Edmonds-Karp
variant of the Ford-Fulkerson method which achieves a running time of O([V [ [A[
2
).)
We again use the concept of an auxiliary graph based on the current ow f; here we
call this graph simply G(f). If G = (V, A) with arc capacities (b(e) : e A) and
f : A R is a feasible ow on G, then G(f) has vertex set V and arc set
A(f) = A
+
A
where
A
+
= e = (u, v)[e A, f(e) < b(e) with capacities b
+
(e) = b(e) f(e)
and
A
= e = (v, u)[e = (u, v) A, f(e) > 0 with capacities b

+
(e) = f(e).
78
8.1. THE DINIC ALGORITHM FOR MAXIMUM FLOW 79
Dinics Algorithm
Input: A directed graph G = (V, A) with arc capacities b : A R, source r and
sink t.
Output: A ow f : A R from r to t of maximum value and an (r, t)-cut (S;

S)
of minimum capacity in G.
Description: Begin with f(e) = 0 for all arcs e in the network. Construct the
auxiliary graph G(f) = G. Apply the breadth-rst search algorithm to determine if
there is a path from r to t in G(f).
If there is no such path, then the zero ow is optimal. Let S = T be the set of
nodes reachable from r in G(f) and let

S = V S. Then (S;

S) is an (r, t)-cut of
capacity zero, so it is a cut of minimum capacity.
If there is such a path, then the breadth-rst search algorithm partitions the
vertices of G(f) into layers L
0
, . . . , L
where we call the integer the length of the

network.
With these layers, repeat the following as long as an (r, t)-path exists in G(f).
Initialize the augmenting ow, setting f
+
(e) = 0 for all arcs e in G(f).
Use the depth-rst search strategy to nd a maximal ow from r to t in the
auxiliary graph G(f). This ow augmentation procedure is described in detail
in a separate paragraph below.
Update the ow. For each arc e A, add f
+
(e) to f(e); subtract f
+
( e) from
f(e) whenever G(f) has the reverse arc e with non-zero augmenting ow.
Based on this new ow f, build the new auxiliary network G(f).
Apply breadth-rst search to partition V into layers L
0
, L
1
, L
2
, . . .. If t is reach-
able from r, go back and repeat this list of steps. If not, let S be the set of
nodes reachable from r and let

S contain the remaining nodes.
When this process terminates, f is a ow from r to t of maximum value and (S;

S)
is an (r, t)-cut of minimum capacity.
Thats the whole thing. The key part is the ow augmentation procedure. Heres
how it works.
Flow Augmentation Procedure
Description: We have network G(f) with arc capacities b(e) > 0 for each arc. Begin
with f
+
(e) = 0 for all arcs e in G(f) and mark all arcs initially as unblocked. We
will use a stack to traverse this graph and build our augmenting ow.
80 CHAPTER 8. THE DINIC METHOD
Let the current node be initialized to r and start with an empty stack.
From the current node, seek an unblocked arc e from its layer to the next layer.
(I.e., if L
i
contains the current node u, seek an unblocked arc e = (u, v) with v
in L
i+1
.) Push the arc e onto the stack and make v the new current node.
If we reach t via this procedure, then the arcs on the stack form an (r, t)-path
in G(f). Let be the minimum of the residual augmenting capacities
b(e) f
+
(e)
among the arcs on the stack. Increase f
+
(e) by for each of these arcs and,
among these, mark the arc(s) with f
+
(e) = b(e) blocked. Now empty the
stack and set the current node to r.
If, instead, we reach a situation where our current node v is not t and yet there
are no unblocked arcs leading to the next layer, then do the following:
Pop the top arc e = (u, v) o the stack;
mark this arc as blocked;
update the current node to u.
The procedure terminates when all arcs going from r to layer L
1
are blocked.
8.2 An example
We now apply the Dinic algorithm to a simple example. A secondary goal here is
to show, by example, an ecient way to annotate the steps of the algorithm and, in
particular, the Flow Augmentation Procedure.
Our example network is given in Figure 8.2. Our original network has six nodes
and nine arcs. As we build our ows and our corresponding auxiliary networks, the
vertex set will always remain V = r, s, t, u, v, w. Our initial ow f
0
has f
0
(e) = 0
for all arcs e. So the zero-th auxiliary network is G(f
0
) = G.
Phase 1: Choosing source node r as our root, we build a breadth-rst search tree,
thereby partitioning G into layers. The length of this layered network is = 3.
Our depth-rst strategy is described in detail in the Flow Augmentation Proce-
dure. Here we recored its execution by listing all values taken by the current node
variable. In this list, when we nd a path from r to t, we enclose this set of vertices
in a box.
8.2. AN EXAMPLE 81
Figure 8.1: Example for the Dinic algorithm. We aim to nd an (r, t)-ow in G of
maximum value.
Figure 8.2: Layered network in phase 1 of the Dinic algorithm. We now use a depth-
rst strategy to greedily build a maximal layered augmenting ow.
Record of Current node: r, s, t, r, s, v, w, v, s, r, u, v, u, r
with = 2. The augmenting ow has f
+
(e) = 2 for e = (r, s) and
e = (s, t) and f
+
(e) = 0 elsewhere. Arcs become blocked by the dfs
algorithm in the following order: (s, t), (v, w), (s, v), (r, s), (u, v), (r, u).
Now we obtain ow f
1
= f
0
+ f
+
and record this in the table below. Then we
move on to the next phase.
Phase 2: Again with source node r as our root, we build a breadth-rst search tree
in auxiliary graph G(f
1
), partitioning the network into layers L
0
, L
1
, L
2
, L
3
.
Figure 8.3: Layered network in phase 2 of the Dinic algorithm. Our depth-rst scan
will only nd paths r-s-v-t and r-u-v-t.
Here is our summary of the Flow Augmentation Procedure for this layered net-
work.
Record of Current node: r, s, v, t, r, u, v, w, v, t, r, u, r
with = 5 and then = 2. The augmenting ow f
+
is given by the data
BELOW and arcs become blocked in the following order: (r, s), (v, w),
(u, v), (r, u).
As before, we obtain our next ow f
2
= f
1
+ f
+
and record this in the table
below. Then we move on to Phase 3.
Phase 3: From root r, we build a breadth-rst search tree in the latest auxiliary graph
G(f
2
), this time obtaining a layered network of length = 4.
We summarize the Flow Augmentation Procedure as follows:
Record of Current node: r, u, s, v, t, r, u, s, u, r with = 2
(blocking e = (s, v)).
Now we obtain our next ow f
3
= f
2
+ f
+
and record this in the table. Then
we move on to Phase 4. But in Phase 4, our Breadth-First Search routine nds
that t is not reachable from r. So the algorithm terminates. The ow f = f
3
is a
ow of maximum value and the cut (S;

S) with S = r, u, s consisting of the nodes
reachable from r in Phase 4 is minimum capacity cut.
8.3. ANALYSIS OF THE DINIC ALGORITHM 83
Figure 8.4: Layered network in phase 3 of the Dinic algorithm. Our depth-rst scan
will nd only one path: r-u-s-v-t.
8.3 Analysis of the Dinic algorithm
The previous section presented the Dinic algorithm for maximum ow through a
network. This section contains some of the analysis of this algorithm, but not a full
proof.
Correctness: The proof that the algorithm nds both a max ow and a min
cut is the same as for the Ford-Fulkerson algorithm: if there is no (r, t)-path in the
auxiliary network G(f), then the partition S,

S of vertices into reachable (from r)
and unreachable gives a cut in G whose forward capacity is all used up. So the
capacity of the cut (S;

S) is equal to the value of the current ow from r to t. As
proved earlier in class, this implies that both are optimal.
Limit on Number of Phases: The key dierence between Dinic and our earlier
approach is that the choice to stick with the same auxiliary network until the aug-
menting ow is maximal (as opposed to discarding this network as soon as a single
augmenting path is found, as in Ford-Fulkerson) provides us with a strict limit on
the number of phases (or auxiliary networks) that the algorithm must pass through.
Each phase begins with a breadth-rst search routine which partitions the auxil-
iary network into layers L
0
, L
1
, L
2
, . . .. The vertices in L
i
are reachable from r by a
directed path of length i, but by no shorter path. Our goal is to prove that t moves to
a higher layer (i.e., a layer with a larger subscript) in each phase. Once we prove this,
we can obtain a good bound on the running time for the Dinic algorithm. Indeed,
each layer is non-empty so there can be at most [V [ layers. This proves that there
can be no more than [V [ 1 phases.
Claim: If, in the k
th
phase, we have t L
k
, then
1
<
2
<
3
< until in some
phase t is unreachable from r. In other words, the length of the auxiliary network
always increases.
Proof: Our rst step is to prove that the length never decreases. More generally, we
prove that no node can be closer to r in the (k + 1)
st
phase than it was in the k
th
phase.
Let L
0
, L
1
, . . . , L
k
be the partition of the auxiliary network into layers at the
beginning of the k
th
phase and let L
0
, L
1
, . . . , L
k+1
be the partition of the next
auxiliary network into layers at the beginning of the (k + 1)
st
phase.
For each v V we show that, if L
i
is the layer containing v in the k
th
phase and
L
j
is the layer containing v in the (k + 1)
st
phase, then j i. We prove this by
induction on i.
Certainly the statement holds for i = 0 since L
0
= L
0
= r. Now assume it holds
for all vertices in L
0
L
1
L
i1
and let v be a vertex in L
i
. Now suppose that
u
0
, u
1
, . . . , u
j1
, u
j
is a directed path in the (k +1)
st
auxiliary network from u
0
= r to
u
j
= v. To prove that j i, we simply consider u
j1
. If j were smaller than i, then
u
j1
would belong to L
j1
and our induction hypothesis would give us j 1 i 1.
To be contined . . .
8.4 The Menagerie
Another strange-but-useful variant of network ows goes here.
Exercises
Exercise 8.4.1. Carry out the Dinic Algorithm on the network shown in Figure 8.4.
Exercise 8.4.2. Explain in words the execution of the Dinic Algorithm on a network
which is simply a directed path G = (V, A) with V = v
0
, v
1
, . . . , v
n
, A = (v
i1
, v
i
) :
1 i n arc capacities b(e) = b
i
for e = (v
i1
, v
i
), source node v
r
and sink node v
t
where we assume t > r.
ZZ
Figure 8.5: Find a maximum ow using the Dinic algorithm.
Nine
The Minimum Cost Flow Problem
Nov. 29, 2010
We now delve into one of the main algorithms of this course. A wide range of
applied problems can be cast in the language of min-cost ow problems.
9.1 Finding minimum cost ows
We now present an algorithm to nd a feasible ow in a network of given value and
minimum cost. The next few pages are based on lectures of Jack Edmonds.
Min Cost Flow Problem (mcfp)
Input: Network G = (V, A); a source node r V ; a sink node t V ; the demand D
given by: D
t
= K and D
r
= K; D
u
= 0 for u V r, t; capacities b = b(e) :
e A 0 on arcs; and costs p = p(e) : e A 0 on arcs.
Output: A feasible (r, t)-ow f = (f(e) : e A) in G of value K such that total
cost p f :=
eA
p(e)f(e) is minimum.
Recall that a feasible ow f satises
0 f b and i V r, t
h(e)=i
f(e)
t(e)=i
f(e) = 0,
where h(e) is the head of arc e and t(e) is the tail. The value of a ow f is equal to
the net amount of ow into the sink t: K =
h(e)=t
f(e)
t(e)=t
f(e).
We now present the Primal-dual algorithm using shortest paths (PDSP Al-
gorithm, for short) for the MCFP.
86
9.1. FINDING MINIMUM COST FLOWS 87
PDSP Algorithm for MCFP
At stage k of the algorithm, k = 0, 1, 2, ..., we have a feasible ow,
f
k
= (f
k
(e) : e A) of amount K
k
at node t (and of course amount K
k
at node r); And we have a node numbering
y
k
= (y
k
(u) : u V ) (the dual variables).
We assume by way of induction that f
k
and y
k
satisfy Edmonds magic number
conditions :
Magic Number Conditions (MNC) :
for each e A, f
k
(e) < b(e) p
k
(e) 0, and f
k
(e) > 0 p
k
(e) 0,
where p
k
(e) := p(e) y
k
(h(e)) +y
k
(t(e)).
By the Magic Number Theorem to be proved below, the MNC impy that f
k
is a
cheapest feasible ow (i.e. one that minimizes p f) of amount K
k
at node t.
If K
k
= K, the amount demanded, stop. Otherwise, as follows, either nd an
obstruction to more ow than amount K
k
and stop, or else nd a feasible f
k+1
of
amount K
k+1
> K
k
and a node numbering y
k+1
satisfying the MNC.
Details of the (k + 1)
st
iteration
Construct the (k +1)
st
auxiliary network G(f
k
) (as usual, a modication of G deter-
mined by the current ow f
k
):
G
k
= (V, A
k
), A
k
= A
k
+
A
k
,
where A
k
+
= e A : f
k
(e) < b(e) and A
k
= e

A : f
k
(e) > 0 (i.e., reverse
arcs).
Recall: For any e A, e means a new arc such that h( e) = t(e) and t( e) = h(e)
and
A
= e : e A.
Recall that
p
k
(e) = p(e) y
k
(h(e)) +y
k
(t(e)) (9.1)
for e A. Let p
k
( e) = p
k
(e) for e

A. Notice that if we use p( e) = p(e), we have
p
k
( e) = p( e) y
k
(h( e)) +y
k
(t( e)) which is the same formula as for p
k
(e). That is, for
every e A

A, formula (9.1) holds.
Let P
k
be any shortest (i.e. cheapest) directed path in G
k
from r to t relative to
arc costs p
k
= (p
k
(e) : e A
k
) or if there is no P
k
then there is an obstruction as in
max ow - min cut, so stop.
88 CHAPTER 9. THE MINIMUM COST FLOW PROBLEM
For each node u V , let d
k
(u) be the distance of node u from node r, in network
G
k
relative to p
k
= (p
k
(e) : e A
k
) i.e.
d
k
(u) = the minimum cost of a directed path in G
k
from r to u,
or d
k
(u) = if there is no directed path in G
k
from r to u.
Notice the MNC mean exactly the same as: p
k
(e) 0 for each e A
k
. This is
important because it follows that, to nd d
k
= (d
k
(u) : u V ) and some P
k
, we can
use an algorithm for shortest paths which assumes that p
k
0.
Note. For e A such that 0 < f
k
(e) < b(e), we have both e and e in G
k
, and we have
p
k
(e) = 0 and p
k
( e) = 0. This happens often and it helps to make it easier to nd d
k
and P
k
. In fact we may get d
k
(u) = 0, for every u V , for a sequence of stages k.
In this case our algorithm acts exactly like the max ow algorithm.
For each arc e, let
f
k+1
(e) =
_
_
f
k
(e) + , for e P
k
A;
f
k
(e) , for e P
k

A;
f
k
(e), for other e A.
where is taken as large as possible such that f
k+1
= (f
k+1
(e) : e A) satises
0 f
k+1
b and such that the value (denoted K
k+1
) of f
k+1
into node t satises
K
k+1
K. That is,
:= minb(e) f
k
(e) : e P
k
A f
k
(e) : e P
k

A K K
k
.
Note. As in max ow algorithm, f
k+1
will satisfy demands of D
u
= 0 for u V r, t,
no matter how is chosen.
Let y
k+1
= y
k
+d
k
. (i.e. for each u V , let y
k+1
(u) = y
k
(u) +d
k
(u))
Using merely the facts that P
k
is a shortest directed path from r to t, and that
each d
k
(u) is the distance from r to u, in G
k
relative to (p
k
(e) : e A
k
), prove that
d
k
(h(e)) p
k
(e) +d
k
(t(e)) for each e A
k
and that
d
k
(h(e)) = p
k
(e) +d
k
(t(e)) for each e P
k
.
For every e A
A
(not only e A
k
), we have
p
k+1
(e) = p(e)y
k+1
(h(e))+y
k+1
(t(e)) = p(e)y
k
(h(e))+y
k
(t(e))d
k
(h(e))+d
k
(t(e))
Thus p
k+1
(e) = p
k
(e) d
k
(h(e)) + d
k
(t(e)). Thus p
k+1
(e) 0 for e A
k
and
p
k+1
(e) = 0 for e P
k
.
The reader should now consider carefully what the implications are for these
conditions.
9.2. LINEAR PROGRAMMING AND THE MAGIC NUMBER THEOREM 89
9.2 Linear programming and the Magic Number
Theorem
In the previous section, we consider the Primal-Dual Shortest Path (PDSP) algorithm
for nding a minimum cost ow in a network. Here we prove that the algorithm works
by rst considering the linear programming formulation of the problem.
Suppose G = (V, A) is a network (directed graph) with source r, sink t, arc
capacities (b(e) : e A) and arc costs (p(e) : e A), and K 0 is a target value for
our ow.
A feasible ow in G from r to t is a function
f : A R
satisfying 0 f(e) b(e) for all arcs e A and ow in = ow out at each internal
node:
h(e)=v
f(e) =
t(e)=v
f(e), (v ,= r, t).
We seek a minimum cost feasible ow f of value K. So this may be formulated
as a linear programming problem
min
eA
p(e)f(e)
subject to
h(e)=r
f(e)
t(e)=r
f(e) = K
h(e)=v
f(e)
t(e)=v
f(e) = 0 for v ,= r, t
h(e)=t
f(e)
t(e)=t
f(e) = K
0 f(e) b(e) for e A
But we nd it convenient to eliminate the constraint corresponding to the source
node r since it is a linear combination of the other [V [ 1 equality constraints.
With this modication, the dual linear programming problem can be written as
90 CHAPTER 9. THE MINIMUM COST FLOW PROBLEM
follows
max Kz(t)
eA
b(e)w(e)
subject to
z(u) +w(e) +p(e) z(v) for all e = (u, v) A
z(r) = 0
w(e) 0 for e A
where we have re-inserted z(r) as a constant to make the expressions uniform
1
.
9.3 The Menagerie
Exercises
1
Alternatively, we could have left the v = r constraint in the primal and our dual would have
one free variable, which we could then eliminate by setting z(r) = 0 and arriving at the same result.
Bibliography
[1] J.A. Bondy and U.S.R. Murty. Graph Theory with Applications. ElsevierNorth
Holland, New York, 1976.
[2] V. Chvatal. Linear Programming. Freeman, New York, 1983.
[3] S. Even. Graph Algorithms. Computer Science Press, Rockville Maryland, 1979.
[4] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness. Freeman, New York, 1979.
[5] C.H. Papadimitriou. Computational Complexity. Addison-Wesley, Reading
Mass., 1994.
[6] C.H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms
and Complexity. Dover, Mineola, 1998.
[7] M. Sipser. Introduction to the Theory of Computation. PWS Pub. Co., Boston,
1997.
[8] R.J. Vanderbei. Linear Programming: Foundations and Extensions (3rd ed.)
Springer, New York, 2008.
91

Mincostflow Notes

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mincostflow Notes

Hochgeladen von

Copyright:

Verfügbare Formate

Discrete Optimization

MA3233 Course Notes

, then u, v must belong to V

A spanning subgraph is one in which all vertices are included: V

(u), and update up(up

(v)) to be equal to up(u). Then, when we

is any edge of this cycle, then the subgraph

) is also a spanning tree.

) be the spanning tree produced by Kruskals algorithm. Let

has one more

, we have without loss of generality

) = d(u) + w(e) where e = (u, u

). So, just before u is moved to set T, arc e is examined and we are

) (u) +w(e) = d(u) +w(e) = d(u

,= v since we are assuming d(v) < (v). When v is selected

) d(v) < (v), contradicting the

x, so there is no loss in restricting our attention

x is called the objective function and the equations

has form [I[N] and

x w also gives us an upper bound on the value of our objective function

x. This brings us to the

, then we have, for x and y feasible solutions to their respective

is a feasible solution for every positive real

M are non-negative, then

d 0 in this case as well. The proof of the

, then there exist such vectors satisfying c

x r or a vector y, feasible for (D), with y

x r. (And note that z 0 implies

d < 0, which reduces to

b < r, as desired. So the proof is complete.

x over all non-negative vectors x 0 with integer entries.

Xz 0 for every vector z R

X. Our generic semidenite programming

denote the set of all nite

A equal to the vector of all zeros and y

b ,= 0. Given this y as a certicate, we

A all integer and y

n, and so on, zeroing in on the number of bits in the

(x) has no solution, so

(e) = b(e) f(e) for e A

Numerous examples will appear below.

When no ow-augmenting path is found in the auxiliary network G

= e = (v, u)[e = (u, v) A, f(e) > 0 with capacities b

where we call the integer the length of the

Das könnte Ihnen auch gefallen