Comp 3111

COMP3111/3811 Algorithms 2
Lecture 1-1
Introduction
COMP3111/3811 Algorithms
Two hour lectures
Wednesday (11-12am, Carslaw Theatre 275)
Thursday (11-12am, Carslaw Theatre 275)
One hour tutorial
http://www.it.usyd.edu.au/~comp3111.html
Contact: comp3111@it.usyd.edu.au
Dr. Seokhee Hong
Consultation: Thursday 12-1pm, Madsen G86A.
1. Course Aims
strategies for solving search and optimisation
problems in graphs will be presented, including
network flow methods.
The unit will also provide a survey of algorithmic
approaches for which traditional analyses are not
appropriate.
These will include randomisation, online algorithms
and competitive analysis, and parallel and distributed
algorithms.
Problems drawn from such areas as networks,
systems and databases will be used to illustrate these
algorithmic approaches; for these, the student will
design and analyse their corrective and efficiency.
An introduction to intractable problems, NP-hardness,
and heuristics will also be given.
2. Learning Outcomes
Be familiar with a collection of core
algorithms
Be fluent in algorithm design paradigms:
divide & conquer, greedy algorithms,
dynamic programming.
Be able to analyze the correctness and
runtime performance of a given algorithm
Be familiar with the inherent complexity
(lower bounds & intractability) of some
problems
Be familiar with advanced data structures
Be able to apply techniques in practical
problems
Course Aims
Primary aim: Develop thinking ability
problem solving skills
(algorithm design and application)
formal thinking
(proof techniques & analysis)
Secondary aim: have fun with algorithms
3. Assumed Knowledge
Assumed knowledge: MATH 2009.
Prerequisite: COMP (2111 or 2811) or 2002
or 2902) and MATH(1004 or 1904 or 2009
or 2011) and MATH (1005 or 1905).
Prohibition: May not be counted with
COMP (3811 or 3001 or 3901).
COMP2111 Algorithm 1
a formal introduction to the analysis of
algorithms.
Commonly used data structures such as
lists, stacks, queues, priority queues,
search trees, hash tables and graphs are
all analysed according to a notion of
asymptotic complexity.
Design principles such as the greedy
strategy, divide and conquer, and dynamic
programming are covered, as well as
efficient techniques for searching within
graphs.
4. Assessment
Assignments (20%): 2 assignments

Tutorial: 10%
Written exam (70%): closed book
Assignments
Week 7
Week 10
Submit to your tutor.
You must get at least 40% on the exam to

pass the unit.
5. School Policies
6. Topics Covered
http://www.it.usyd.edu.au/current_ugrad/h
andbook2003/policies.html#acadhonesty
Approximate schedule: topics are subject to

change
You are required to visit this URL and

carefully read the policies on
Academic Honesty, and
Special consideration due to illness and
misadventure
Week1: introduction
Week2: sorting
Week3: divide and conquer
Week4: greedy algorithm
week5: dynamic programming
No late submissions allowed for any

reason whatsoever
6. Topics Covered
Week7: graph algorithms
Week8: network flow
Week9: advanced data structures
Week10: amortized time complexity
Week11: randomized algorithm
Week12: NP-complete/ approximate algorithms
week13: review
Week6: graph algorithms
Textbook & References

Introduction to Algorithms,
second edition
by Cormen, Leiserson, Rivest
& Stein, MIT Press, 2001
The Design and Analysis of
Computer Algorithms, by Aho,
Hopcroft and Ullman
Algorithmics: Theory & Practice, by
Brassard & Bratley
Fundamentals of data structures in
C(C++), by Horowitz, Sahni and
Anderson-Freed
Solving a Computational Problem

Step 1. Problem definition &
specification
specify input, output and
constraints
Step 2. Algorithm design & analysis
devise a correct & efficient
algorithm
Step 3. Implementation (coding)
Step 4. Testing
Step 5. Verification
Algorithm Analysis
This
course
Know how to write a problem specification.

Know how to measure the efficiency of an
algorithm.
Know the difference between upper and lower
bounds for an algorithm.
Be able to prove the correctness of an
algorithm.
Algorithm Design Paradigms

Divide and Conquer
Dynamic Programming
Greedy Methods
If time permits
Backtracking
Branch and bound
Greedy Algorithms
Huffman Codes
Activity selection
Minimum Spanning Trees
Shortest Paths
GOAL:
Know when to use greedy algorithms and their
essential characteristics.
Be able to prove the correctness of a greedy
algorithm in solving an optimization problem.
Understand where minimum spanning trees and
shortest path computations arise in practice.
Asymptotic Notation
Recurrence Relations
Proof Techniques
Inherent Complexity
GOAL:
Divide and Conquer
Merge sort
Quick sort
Closest pair
Selection
GOAL:
Know when the divide-and-conquer paradigm is
an appropriate one, and the general structure of
such algorithms.
Be able to characterize their complexity using
techniques for solving recurrences.
Memorize the common case solutions for
recurrence relations.
Dynamic Programming
Longest common subsequences
Matrix chain multiplication
Optimal binary search tree
GOAL
Know what problem characteristics make it
appropriate to use dynamic programming and
how it differs from divide-and-conquer.
Be able to move systematically from one to
the other.
Graph Algorithms
Basic Graph Algorithms: DFS

Shortest Paths
Network Flow
GOAL
Know how graphs arise, their definition and
implications.
Be able to use the adjacency matrix
representation of a graph and the edge list
representation appropriately.
Understand and be able to use the techniques
as seen in the basic graph algorithms.
Advanced Topics
Randomized algorithm
NP-completeness
Approximation algorithm
If time permits,
Parallel algorithm
Online algorithm
Advanced Data Structures

Heap structures: Fibonacci heaps
Search trees: Red-Black trees
Disjoint Set representations
GOAL
Know the fundamental ideas behind
maintaining balance in insertions/deletion.
Be able to use these ideas in other balanced
tree data structures.
Understand what they can represent and why
they are useful.
Know the special features of the data structure
listed above.
Tutorial
Each student must attend one tutorial per week, as
allocated by the University timetable system.
Tutorials commence in week 2.
You should have read and answered the "prework" before you come to the tutorial.
All tutorial activity is done in groups of up to 3
people.
It is important to be able to explain your ideas to
others and to contribute effectively to a
collaborative solution.
The tutor will discuss the solution and comment
on issues raised by the exercise.
You must submit 1 page result to your tutor (10%).
COMP3811 Advanced Course
Communications
This Advanced unit covers all the material

of COMP3111, plus extra topics.
Check the course website regularly, at least once

per week.
Lecture notes will be available on the website
Note that the lecture notes will not contain
everything.
You need to attend class to add your own
notes.
Some topics are subject to change.
Tutorials will be available on the web.
download & print it and Think!
To contact me, send an email or use consultation
hour.
Suggested Reading for Week 1: chapter 2, 3, 4.
The two units share the same lectures,

but have different tutorials and
assessment.
There may be some lectures on advanced
topics in the tutorials.
Lecture 1-2
Goal
Asymptotic notations
motivation
, O, , o,
formal definition
know the difference
Growth of Functions
Example: Insertion Sort
Running Time
Worst case: the longest running time for any input
of size n
O(n2)
tj = j, j = 2, 3, , n
tj : # of times that while loop test is executed

for that value of j
Best case: already sorted

(n)
tj = 1, j = 2, 3, , n
Asymptotic Notation
, O, , o,
Used to describe the running times of algorithms
Instead of using exact running time, we use
asymptotic notation
Simple characterization of the algorithms
efficiency
Compare the relative performance of algorithms
We are concerned with how the running time of
an algorithm increases with the size of the input
in the limit
-notation
For a given function g(n), we
denote by (g(n)) the set of
functions
(g(n)) = {f(n): there exist
positive constants c1, c2 and
n0 such that
0 c1g(n) f(n) c2g(n),
for all n n0 }
A function f(n) belongs to
(g(n)) if there exist such
constants c1, c2.
f(n) is a member of (g(n))
We say g(n) is an asymptotically tight bound for f(n)
Example
1/2n2
(n2),
To show that
- 3n =
we need to determine positive constants c1, c2, n0
such that c1 n2 1/2n2 - 3n c2 n2 for all n n0
c1 1/2 3/n c2
1/2 3/n c2 : holds for n 1 by choosing c2 1/2
c1 1/2 3/n : holds for n 7 by choosing c1 1/14
Example
To show that
=/= (n2),
we use contradiction.
6n3
Suppose that c2, n0 exist such that

6n3 c2n2 for all n n0
Then n c2 /6: cannot hold for arbitrarily large n,
since c2 is a constant
We can verify 1/2n2 - 3n = (n2) by choosing

c1 =1/14, c2 = 1/2, n0 = 7
Example
Example
To show that 3logn + loglogn = (logn),

we need to determine positive constants c1, c2, n0
such that c1logn 3logn + loglogn c2 logn for all n
n0
Informally, we can throw away lower-order

terms and ignoring the leading coefficient
of the higher order term, since they are
insignificant for large n.
3logn + loglogn c2 logn : holds for n 2 by

choosing c2 4
c1logn 3logn + loglogn : holds for n 2 by
choosing c1 3
10n2 - 3n = (n2)
f(n) = an2 +bn +c = (n2), where
a, b, c: constants and a > 0
To compare orders of growth, look at
leading term
O-notation
For a given function g(n),
we denote by O(g(n)) the
set of functions
O(g(n)) = {f(n): there exist
positive constant c and n0
such that
0 f(n) cg(n)
for all n n0 }
f(n) is a member of set
O(g(n))
To prove 7n - 2 = O(n),
We need to show that there exist positive constant c
and n0 such that 7n - 2 cn for all n n0
a possible choice: n0 = 1, c =7
We say g(n) is an asymptotic upper bound for f(n)
Example
f(n) =
an2
+bn +c =
O(n2),
Example
where
f(n) = an +b = O(n2), where

a, b: constants and a > 0
by choosing c = a +|b|, n0 = 1
To prove 20n3 +10nlogn + 5 = O(n3),

We need to show that there exist positive constant c
and n0 such that 20n3 +10nlogn + 5 c n3 for all
n n0
a possible choice: n0 = 1, c =35
-notation
For a given function g(n), we
denote by (g(n)) the set of
functions
(g(n)) = {f(n): there exist
positive constants c and n0
such that
0 cg(n) f(n)
for all n n0 }
f(n) is a member of set (g(n))
We say g(n) is an asymptotic lower bound for f(n)
Example
Relations Between , , O
To prove 3logn + loglogn = (logn),

We need to show that there exist positive constants
c and n0 such that 0 cg(n) f(n) for all n n0
clogn 3logn + loglogn : holds for n 2 by
choosing c 3
For any two functions g(n) and f(n),

f(n) = (g(n)) if and only if f(n) = O(g(n))
and f(n) = (g(n)).
f(n) = an2 +bn +c = (n2), where

The running time is O(f(n))

Worst case is O(f(n))
i.e., (g(n)) = O(g(n)) (g(n))
Running time is (f(n))

Best case is (f(n))
Asymptotic Notation in Equations

Can be used to replace expressions
containing lower-order terms
For example,
4n3 + 3n2 + 2n + 1 = 4n3 + 3n2 + (n)

= 4n3 + (n2) = (n3)
In equations, (g(n)) always stands for
an anonymous function f(n) (g(n))
In the example above, (n2) stands for
3n2 + 2n + 1
o-notation
For a given function g(n), we denote by o(g(n)) the set
of functions
o(g(n)) = {f(n): for any positive constant c > 0, there exists a
constant n0 > 0 such that 0 f(n) < cg(n) for all n n0 }
f(n) becomes insignificant relative to g(n) as n
approaches infinity: lim [f(n) / g(n)] = 0
n
We say g(n) is an upper bound for f(n) that is not

asymptotically tight.
Example: 2n = o(n2),
2n2 =/= o(n2)
-notation
Example
For a given function g(n), we denote by (g(n)) the set

of functions
(g(n)) = {f(n): for any positive constant c > 0, there exists a
constant n0 > 0 such that 0 cg(n) < f(n) for all n n0 }
f(n) = 12n2 +6n = o(n3)

for any positive constant c > 0, there exists a constant n0 > 0
such that 0 f(n) < cg(n) for all n n0
Choose n0 = (12+6)/c
c n3 > 12n2 +6n2 > 12n2 +6n
f(n) becomes arbitrarily large relative to g(n) as n

approaches infinity:
lim [f(n) / g(n)] =
n
We say g(n) is a lower bound for f(n) that is not

asymptotically tight.
Example: n2/2 = (n),
n2/2 =/= (n2)
Comparison of Functions
f g a b : real number
f (n) = O(g(n)) a b
f (n) = (g(n)) a b
f (n) = (g(n)) a = b
f (n) = o(g(n)) a < b
f (n) = (g(n)) a > b
f(n) = 12n2 +6n = (n)

for any positive constant c > 0, there exists a constant n0 > 0
such that 0 cg(n) < f(n) for all n n0
Choose n0 = c/12
12n2 +6n > 12n2 > cn
Conclusion
Asymptotic notations: , O, , o,
definition
difference
usage
Your homework
Answer tutorial questions
Attend tutorial (Tue 2-3pm, Wed 12-1pm)
Suggested reading: chapter 6,7,8
Lecture 2-1
Sorting
1. Sorting algorithms
2. Lower bound for sorting problem
3. Linear time sorting algorithms
1. Sorting Algorithms
1.1 Insertion sort: (n2)
1.2 Merge sort: (n lg n)
1.3 Heap sort: O(n lg n)
1.4 Quick sort:
(n2) worst case, (n lg n) average case
Bubble sort
Shell sort
Fun sort

[Knuth] sorting & searching
1.1 Insertion Sort
(n2)
worst case
1.2 Merge Sort

Divide and conquer technique
(n) best case
(nlogn) worst case
Very tight bound: hidden constant is small
Asymptotically optimal
Works fast for small size input
Not in place
In place
1.3 Heap Sort

Use heap data structure
Heap: priority queue
extract maximum element in O(lg n) time
Insert new element in O(lg n) time
Leads to O(n lg n) sorting algorithm:
Build heap: constructed in (n) time
Repeatedly extract largest remaining
element (constructing sorted list from
back to front)
In-place, Not Stable
optimal
1.4 Quick Sort

Divide and conquer method
Importance of partitioning algorithm
Choose a pivot element
In place, not stable
(n2) worst case, (n lg n) average case
Very popular algorithm
Works fast in practice
Worst Case Partitioning
n-2
n-3
(n)
n-1
(n-1)/2
(n-1)/2
Good split
(n-1)/2 -1
(n-1)/2
2
1
(n) split
n-1
n-2
Bad split
n-1
1
Average-Case Splitting
(n2)
2. Lower Bounds for Sorting

Sorting methods that determine sorted order
based only on comparisons between input
elements must take (n lg n) comparisons in
the worst case to sort. Thus, merge sort and
heapsort are asymptotically optimal.
Comparison sort
The combination of good and bad splits would result

in T(n) = (n log n).
Decision Tree Model

internal node: annotated by ai: aj for some i and j in
range 1 i, j n.
leaf node: annotated by a permutation (i).
Path: execution of
sorting algorithm
a3 a2 a1
Other sorting methods (counting sort, radix

sort, bucket sort) use operations other than
comparisons to determine the order can do
better - run in linear time.
Lower Bound for Worst Case
3. Linear time sorting algorithm
Any comparison sort algorithm requires (n lg n)

comparisons in worst case (= Any decision tree
that sorts n elements has height (n lg n))
3.1 Counting Sort
Proof: There are n! permutations of n elements, each
permutation representing a distinct sorted order, the
tree must have at least n! leaves. Since a binary tree
of height h has no more than 2h leaves, we have
3.2 Radix Sort

3.3 Bucket Sort
n! 2h h lg(n!)
By Stirlings approximation: n! > (n/e)n
h lg(n!) lg(n/e)n = n lg n - n lg e = (n lg n)
10
3.1 Counting Sort

Assumption: each of n input elements is an integer
ranging 1 to k.
For each element x, determine # of elements less
than x (use as an index of output array)
1 2 3 4 5 6 7 8
A: input, B: output
A 2 5 3 0 2 3 0 3
C: temporary [k]
0 1 2 3 4 5
0 1 2 3 4 5 (k=5)
C 2 2 4 7 7 8
C
2 0 2
0
0
1 2 4
6 7 8
1
7 8
3
2
1 2 4
6 7 8
Step2
C[i] : #of
elements i
Step3
Position in B
Decrement C
Counting-Sort (A, B, k)
for i 1 to k
(k)
do C[i] 0
for j 1 to length[A]
(n)
do C[A[j]] C[A[j]] + 1 C[i] : #of elements = i
for i 1 to k
(k)
do C[i] C[i] + C[i-1] C[i] : #of elements i
for j length[A] downto 1
(n)
do B[C[A[ j ]]] A[j]
C[A[j]] C[A[j]] - 1
(n+k)
If k=O(n), then worst case (n).
3.2 Radix Sort

It was used by the card-sorting machines
to read the punch cards.
The key is sort the least significant digit
first and the remaining digits in sequential
order. The sorting method used to sort
each digit must be stable(counting sort).
If we start with the most significant
digit, well need extra storage.
1 2
B
2 3
5 6
7 8
3
4 5
B
0
0
0
7 8
3 0 1
Step1
C[i] : #of
B
elements = i
1.
2.
3.
4.
5.
6.
7.
8.
9.
Counting Sort
1
1 2
1 2 4
8
3 5
5 7 8
Algorithm Analysis
The overall time is (n+k).
When we have k=O(n),
the worst case is (n).
Stable, but not in place.
No comparisons made
it uses actual values of the elements to
index into an array.
392
356
446
928
631
532
495
631
392
532
495
356
446
928
928
631
532
446
356
392
495
Radix-Sort(A,
d)
356
392
446
495
532
631
928
1. for i 1 to d
2.
do use a stable sort to sort array A on digit i (n+k)
Counting sort
(d(n+k))
11
Algorithm Analysis
Each pass over n d-digit numbers then takes time
(n+k).
There are d passes, so the total time for radix sort is
(d(n+k)).
When d is a constant and k = O(n), radix sort runs in
linear time.
Radix sort, if uses counting sort as the intermediate
stable sort, does not sort in place.
If primary memory storage is an issue, quicksort or
other sorting methods may be preferable.
3.3 Bucket Sort

Counting sort and radix sort are good for integers.
For floating point numbers, try bucket sort or other
comparison-based methods.
Assumption: input is generated by a random process
that distributes the elements uniformly over interval
[0,1). (0 A[i] < 1)
(Other ranges can be scaled accordingly.)
The basic idea is to divide the interval into n equal-sized
subintervals, or buckets, then insert the n input
numbers into the buckets.
The elements in each bucket are then sorted; lists from
all buckets are concatenated in sequential order to
generate output.
B: auxiliary array with linked list(bucket)
Bucket i holds values

in the interval [i/10, (i+1)/10)
Algorithm Analysis
Bucket-Sort (A)
1.
2.
3.
4.
5.
6.
n length[A]
for i 1 to n
do insert A[i] into list B[ nA[i] ]
for i 0 to n-1
do sort list B[i] with insertion sort
Concatenate the lists B[0], B[1], B[n-1]
together in order
Conclusion
Total running time: (n) expected time

All lines except line 5 take O(n) time in the worst
case.
let ni be a random variable denoting the number of
elements placed in bucket B[i].
1. Various sorting algorithms

2. Lower bound for sorting problem
3. Linear time sorting algorithms
Total running time: T(n) = (n) + i = 0 to n-1 O(ni2)

E[T(n)] = (n) + i = 0 to n-1 O(E[ni2])
Reading: chapter 4,9
E[ni2] = 2 - 1/n (see textbook p 176)

linear expected time: (n) +n O(2 - 1/n) = (n)
12
Lecture 2-2
Order Statistics
Recurrences
1. Minimum (A)
1. min A[1]
2. for i 2 to length[A]
3.
do if min > A[i]
4.
then min A[i]
5. return min
T(n) = (n) for Minimum(A) or Maximum(A)
Simutaneous Minimum & Maximum

1. One solution: 2n-1 comparisons
2. at most 3 n/2 comparisons are sufficient
process (compare) elements in pairs
compare the smaller to the minimum
compare the larger to the maximum
Algorithm SELECT: determine ith smallest element

1 Divide the n elements of input array into n/5 groups of 5
elements each (at most one group made up of the
remaining (n mod 5) elements).
2 Find the median of each group by insertion sort
(take lower median if n is even).
3 Use SELECT recursively to find the median-of-medians x
of the n/5 medians found in step 2.
4 Partition the input array around the median-of-medians x
using a modified PARTITION.
(let k: x is the kth smallest element)
5 If i = k then return x.
If i < k then use SELECT recursively to find the ith
smallest element on the low side.
If i > k then use SELECT recursively to find the (i-k)th
smallest element on the high side.
(1) Order Statistic

ith order statistic of a set of n elements is the ith
smallest element
Minimum: the first order statistic (i= 1)
Maximum: the nth order statistic (i = n)
Median: halfway points of the set
n is odd: only one median
n is even: two medians (lower & upper)
Selection problem can be specified as:
Input: A set A of n distinct numbers and a number i,
with 1 i n
Ouput: the element x A that is larger than exactly
i-1 other elements of A
2. Selection in
Worst-Case Linear Time
It finds the desired element(s) by
recursively partitioning the input array
Basic idea: to generate a good split
when array is partitioned using a
modified partition algorithm of quick
sort
Pictorial Analysis of Select
13
Solving Recurrence
Algorithm Analysis
At least half of the medians found in step 2 are
greater or equal to the median-of-medians x.
Thus, at least half of the n/5 groups contribute
3 elements that are greater than x, except the
one that has < 5 and the one group containing x.
The number of elements > x is at least
3 ( (1/2)n/5 - 2) 3n/10 - 6
Similarly the number of elements < x is at least
3n/10 - 6.
In the worst case, SELECT is called recursively
on at most 7n/10 + 6.
(2) Recurrences
Describe functions in terms of their values
on smaller inputs
Arise from Recursive call
Arise from Divide and Conquer
T(n) = (1)
if n c
T(n) = a T(n/b) + D(n) + C(n) otherwise
Solution Methods
1. Substitution Method
2. Recursion Tree Method
3. Master Method
Example
To Solve:
T(n) = 2T(n/2) + n
Guess: T(n) = O(n lg n)
We need to prove T(n) c n lg n for an appropriate choice
of the constant c > 0
Assume that this bound holds for n/2,
i.e.
T(n/2) c n/2 lg n/2
Substituting the recurrence
T(n) 2c n/2 lg n/2 + n
c n lg (n/2) + n
= c n lg n - c n lg2 + n
= c n lg n - cn + n
c n lg n : true as long as c 1
Step 1, 2 and 4 take O(n) time. Step 3 takes time

T(n/5) and step 5 takes time at most T(7n/10 + 6).
T(n) (1), if n 140
T(n) T( n/5 ) + T(7n/10 + 6) + O(n), if n > 140
Substitution Method: Guess T(n) cn
T(n) c n/5 + c (7n/10 + 6) + an
cn/5 + c + 7cn/10 + 6c + an
= 9cn /10 + 7c + an
= cn +(-cn/10 + 7c + an)
c n if (-cn/10 + 7c + an) 0
10a(n/(n-70)) c when n>70
Because we assume n > 140, we have n/(n-70)) 2
So, choosing c>=20a will satisfy (-cn/10 + 7c + an) 0
1. Substitution Method
Step 1. Guess the form of solution
Step 2. Use mathematical induction to
show that the solution works.
Works well when the solution is easy to
guess
No general way to guess the correct
solution
Can be used to establish an upper or
lower bounds on the recurrence
Making a Good Guess

1. Use recursion tree
2. Guess a similar solution to one that you have
seen before
T(n) = 3T(n/3 + 5) + n
T(n) = 3T(n/3) + n
When n is large, the difference between n/3
and (n/3 + 5) is insignificant
3. to prove loose upper and lower bounds on the
recurrence and then reduce the range of
uncertainty.
Start with T(n) = (n) & T(n) = O(n2)
T(n) = (n log n)
14
Subtleties
When the math doesnt quite work out in the
induction, try to revise your guess by
subtracting a lower-order term. For example:
T(n) = T(n/2) + T(n/2) + 1
We guess T(n) = O(n), try to show T(n) cn for
an appropriate choice of the constant c > 0
T(n) c(n/2) + c(n/2) + 1 = c n + 1
New guess is T(n) cn - b, where b 0
T(n) (c(n/2) - b) + (c(n/2) - b) + 1
= c n - 2b + 1
c n - b as long as b 1
Avoiding Pitfalls
Be careful not to misuse asymptotic notation.
For example:
We can falsely prove T(n) = O(n) by
guessing T(n) c n for T(n) = 2T(n/2) + n
T(n) 2c n/2 + n
cn+n
= O(n) Wrong!
we havent proved the exact from of the
induction hypothesis: T(n) c n
Changing Variables
Use algebraic manipulation to turn an
unknown recurrence into one similar to what
you have seen before.
Consider T(n) = 2T(n1/2) + lg n
Rename m = lg n and we have
T(2m) = 2T(2m/2) + m
Set S(m) = T(2m) and we have
S(m) = 2S(m/2) + m S(m) = O(m lg m)
Changing back from S(m) to T(n), we have
T(n) = T(2m) = S(m) = O(m lg m) = O(lg n lg lg n)
Example: Merge Sort

Merge-Sort (A, p, r)
INPUT: a sequence of n numbers stored in
array A
OUTPUT: an ordered sequence of n numbers
1. if p < r
2.
then q [(p+r)/2]
3.
Merge-Sort (A, p, q)
4.
Merge-Sort (A, q+1, r)
5.
Merge (A, p, q, r)
2. Recursion Tree Method

Recursion tree
Each node represents the cost of a single
subproblem
We sum the costs within each level to
obtain a set of per-level costs
Then we sum all the per-level costs to
determine the total cost of all levels of the
recursion
Best used to generate a good guess, then
verified by the substitution method
Analysis
Divide: computing the middle takes (1)
Conquer: solving 2 subproblems takes
2T(n/2)
Combine: merging n elements takes (n)
Total:
T(n) = (1)
if n = 1
T(n) = 2T(n/2) + (n) if n > 1
T(n) = (n lg n): master theorem
T(n) = 2T(n/2) + cn
15
Recurrence Tree
Running times to merge two sublists
cost cn
Per level cost
height
2c(n/2) = cn
cn/2
cn/2
T(n) = 3T(n/4) + (n2) -> T(n) = 3T(n/4) + cn2

subproblem size for a node at depth i : n/4i
The tree has lg4n+1 levels (0,1,, lg4n) : n/4i = 1, i=lg4n
Cost at each level: 3i c(n/4i)2
# of nodes at each level i: 3i
each node at depth i: cost c(n/4i)2
last level (depth lg4n): 3 lg4n = n lg43 nodes with cost T(1) ->
total cost: n lg43 T(1) = (n lg43 )
2
cn
cn2
lg n
cn/4
cn/4
cn/4 4c(n/4) = cn
cn/4
c(n/4) 2
c(n/4) 2
c(n/4) 2
lg4n
c
cn
(3/16) 2cn2
c(n/16) 2
c
Total: cn lg n+ cn
T(n) = (n lg n)
(lgn+1 level)
(n lg43 )
Total: O(n2)
T(1)
(3/16)cn2
3. Master Method
Provides a cookbook method for solving
recurrences of the form
T(n) = a T(n/b) + f(n)
Lecture 3-1
Assumptions:
Recurrences
a 1 and b 1 are constants
f(n) is an asymptotically positive function

T(n) is defined for nonnegative integers
We interpret n/b to mean either n/b or
n/b
The Master Theorem

With the recurrence T(n) = a T(n/b) + f(n),
T(n) can be bounded asymptotically as follows:
1. If f(n)= O(nlogba - ) for some constant > 0,
then T(n)= (nlogba).
2. If f(n) = (nlogba), then T(n) = (nlogba lg n).
compare f(n) and nlogba : solution is determined

by the larger function
1. nlogba is larger: T(n)= (nlogba).
2. same size: T(n) = (nlogba lg n) = (f(n) lg n)
3. f(n) is larger: T(n)= (f(n)).
3. If f(n) = (nlogba + ) for some constant > 0, and

if a f(n/b) c f(n) for some constant c < 1 and
all sufficiently large n,
then T(n)= (f(n)).
1. f(n) must be polynomially smaller than nlogba :

f(n) is must be asymtotically smaller than nlogba
by a factor of n for some constant > 0.
3. f(n) must be polynomially larger than nlogba :
f(n) is must be asymtotically larger, and satisfy
a f(n/b) c f(n) (regularity condition: mostly
satisfied by the polynomially bounded function)
16
Simplified Master Theorem

Let a 1 and b > 1 be constants and let
T(n) be the recurrence
T(n) = a T(n/b) + c nk
defined for n 0.
1. If a > bk, then T(n) = ( nlogba ).
2. If a = bk, then T(n) = ( nk lg n ).
3. If a < bk, then T(n) = ( nk ).
Example
T(n) = 3T(n/4) + n lg n
a = 3, b=4, thus nlogba = nlog43 = O(n0.793)
f(n) = n lg n = (nlog43 + ) where 0.2 case 3.
therefore, T(n) = (f(n)) = (n lg n)
T(n) = 2T(n/2) + n lg n
a = 2, b=2, f(n) = n lg n, and nlogba = nlog22 = n
f(n) is asymptotically larger than nlogba, but not
polynomially larger (ratio f(n)/nlogba = nlgn/n = lg n
is asymptotically less than n for any positive ).
Thus, the Master Theorem doesnt apply here.
Example
T(n) = 16T(n/4) + n
a = 16, b = 4, thus nlogba = nlog416 = (n2)
f(n) = n = O(nlog416 - ) where = 1 case 1.
therefore, T(n) = (nlogba ) = (n2)
T(n) = T(3n/7) + 1
a = 1, b=7/3, and nlogba = nlog 7/3 1 = n0 = 1
f(n) = 1 = (nlogba) case 2.
therefore, T(n) = (nlogba lg n) = (lg n)
SELECT Algorithm Analysis

At least half of the medians found in step 2 are
greater or equal to the median-of-medians x.
Thus, at least half of the n/5 groups contribute
3 elements that are greater than x, except the
one that has < 5 and the one group containing x.
The number of elements > x is at least
3 ( (1/2)n/5 - 2) 3n/10 - 6
Similarly the number of elements < x is at least
3n/10 - 6.
In the worst case, SELECT is called recursively
on at most 7n/10 + 6.
Solving Recurrence
Step 1, 2 and 4 take O(n) time. Step 3 takes time
T(n/5) and step 5 takes time at most T(7n/10 + 6).
T(n) (1), if n 140
T(n) T( n/5 ) + T(7n/10 + 6) + O(n), if n > 140
Substitution Method: Guess T(n) cn
T(n) c n/5 + c (7n/10 + 6) + an
cn/5 + c + 7cn/10 + 6c + an
= 9cn /10 + 7c + an
= cn +(-cn/10 + 7c + an)
c n if (-cn/10 + 7c + an) 0
10a(n/(n-70)) c when n>70
Lecture 3-2
Divide & Conquer
Closest Pair
Tree Drawing
Because we assume n > 140, we have n/(n-70)) 2

So, choosing c>=20a will satisfy (-cn/10 + 7c + an) 0
17
1. Closest Pair
D&C Algorithm
We consider the problem of finding the

closest pair of points in a set Q of n 2
points (Closest: Euclidean distance)
Input: subset P of Q, array X and Y

X: sorted by x-coordinates of points in P
Y: sorted by y-coordinates of points in P
Application: traffic control (detect collision)
If | P | <= 3 then use brute-force method
Two points in Q maybe coincident: = 0
If | P | >3 then
Divide:
find a vertical line l that bisects P into
PL and PR s.t. | PL | = | PR | =n/2
X (Y) are divided by XL (YL) and XR (YR).
Some points may lie on the line
Brute-force algorithm: compare all pairs

O(n2)
D&C algorithm: O(nlogn)
D&C Algorithm
Conquer:
L = CP (PL, XL, YL)
R = CP (PR, XR, YR)
= min (L, R)
Combine:
The closest pair:
either
or a pair of points one in PL and the other in
PR.
If there is a pair of points with distance less than
, they must reside in the 2-wide vertical strip
centered at line l.
Combine Algorithm
1. Y = Y all points not in the 2 -wide
vertical strip (Y : sorted)
2. For each point p in Y,
find points p in Y that are within
units of p (Only 7 points in Y that
follow p need to be considered).
compute distance from p to p
keep minimum
3. Return min( , )
Correctness
Why we only need to consider 7 points
following each point p in array Y
the closest pair pL in PL and pR in PR ( < ):
within *2 rectangle centered at line l.
pL (pR) must be on or to the left (right) of l
and less than units away.
pL and pR : within units each other
vertically.
At most 8 points can lie on *2 rectangle
centered at line l
At most 4 points in PL (PR) can reside in
* square, left (right) half of the rectangle
(as points in PL are at least apart).
18
Implementation
Ensure XL, XR, YL, YR, Y: sorted properly
when they are passed to recursive calls.
We wish to form a sorted subset of a sorted
array.
Divide X into XL, XR is easy.
Running time
Presorting before the first recursive call:
O(nlogn)
T(n) = O(nlogn)
T(n) = O(1) if n 3
T(n) = 2T(n/2) + O(n) if n > 3
Simple Method
inorder traversal
layered grid drawing
two flaws:
too wide: width n-1
parent vertex is not centered with
respect to the children
Implementation
Given a subset P and the array Y, partition P
into PL and PR:
needs to form YL and YR sorted by ycoordinates in linear time
Method: the opposite of the MERGE
procedure in Merge Sort.
Split a sorted array into two sorted arrays
Examine points in Y in order.
If a point Y[i] is in PL, append it to the end
of YL; otherwise append it to the end of YR.
2. Tree Drawing Algorithm

Layered Drawing
rooted (binary) tree T
assign layer according to the depth
-> y-coordinates: y(v) = depth of v
how to compute x-coordinates?
D &C Algorithm
Divide
recursively apply the algorithm to draw the
left and right subtrees of T.
Conquer
move the drawings of subtrees until their
horizontal distance equals 2.
place the root r vertically one level above
and horizontally half way between its
children.
If there is only one child, place the root at
horizontal distance 1 from the child.
19
Implementation
Two traversals
step 1. postorder traversal
For each vertex v, recursively computes the
horizontal displacement of the left & right
children of v with respect to v.
step 2. preorder traversal
Computes x-coordinates of the vertices by
accumulating the displacements on the
path from each vertex to the root.
Postorder Traversal
Processing v: scan the right contour of the left
subtree and the left contour of the right
subtree
accumulate the displacements of the vertices
on the left & right contour
keep the max. cumulative displacement at any
depth
Construction of contour list: v with T, T
case 1: height(T) = height(T)
case 2: height(T) < height(T)
case 3: height(T) > height(T)
Postorder Traversal
it is necessary to travel down the
contours of two subtrees T and T only
as far as the height of the subtree of
lesser height
the time spent processing vertex v in the
postorder traversal is proportional to the
minimum heights of T and T
Postorder Traversal
left (right) contour: the sequence
of vertices vi such that vi is the
leftmost (rightmost)vertex of T
with depth i
In conquer step, we need to
follow the right contour of the left
subtree and the left contour of the
right subtree
After we process v, we maintain
the left & right contour of the
subtree rooted at v as a linked list
Postorder Traversal
L(T) (R(T)): left (right) contour of T
case 1: height(T) = height(T)
L(T) = L(T) + v
R(T) = R(T) + v
case 2: height(T) < height(T)
R(T) = R(T) + v
L(T) = v+ L(T) + {part of L(T) starting from w}
h: depth of T
w: the vertex on L(T) whose depth = h+1
case 3: height(T) > height(T) : similar to case2
Lecture 4-1
Greedy Algorithms
(Chapter 16)
The sum is no more than the number of

vertices of the tree
Hence, it runs in linear time
20
Motivation
Greedy Algorithms
Solving optimization problem:

Many real-world problems are optimization
problems in that they attempt to find an optimal
solution among many possible candidate
solutions.
a set of choices must be made in order to arrive at
an optimal solution, subject to some constraints.
There may be several solutions to achieve the
optimal value.
A greedy algorithm construct the solution by

making a choice that looks best at the
moment (local optimum)
The hope: a locally optimal choice will lead
to a globally optimal solution
Two common techniques:

Dynamic Programming (global)
Greedy Algorithms (local)
Greedy Algorithms
Greedy algorithms normally consist of:
Set (list) of candidates
Two other sets: chosen & rejected
Function that checks whether a particular set of
candidates provides a solution to the problem
Function that checks if a set of candidates is feasible
Selection function indicating at any time which is the
most promising candidate not yet used
Objective function giving the value of a solution; this is
the function we are trying to optimize
Generic Greedy Algorithm

// C is the set of all candidates
1. S // S is the set in which we construct
solutions
2. while not solution(S) and C do
3.
x an element of C maximizing select(x)
4.
C C \ {x}
5.
if feasible(S {x}) then S S {x}
6. if solution(S) then return S

7.
For some problems, it works

Note: do not always find an optimal solution.
Dynamic programming can be overkill;
greedy algorithms tend to be easier to code
Step by Step Approach

Initially, the set of chosen candidates is empty
At each step, add to this set the best remaining
candidate; this is guided by selection function.
If enlarged set is no longer feasible, then remove the
candidate just added; else it stays.
Each time the set of chosen candidates is enlarged,
check whether the current set now constitutes a
solution to the problem.
When a greedy algorithm works correctly, the first
solution found in this way is always optimal.
Analysis
The selection function is usually based on the objective
function; they may be identical.
But, often there are several plausible ones.
At every step, the procedure chooses the best morsel it
can swallow, without worrying about the future.
It never changes its mind: once a candidate is included
in the solution, it is there for good; once a candidate is
excluded, its never considered again.
Greedy algorithms do NOT always yield optimal
solutions, but for some problems they do.
else return there are no solutions
21
Elements of Greedy Strategy

Greedy-choice property: A global optimal
solution can be arrived at by making locally
optimal (greedy) choices
Optimal substructure: an optimal solution to the
problem contains within it optimal solutions to
sub-problems
Be able to demonstrate that if A is an optimal
solution containing s1, then the set A = A - {s1}
is an optimal solution to a smaller problem
w/o s1.
Example: when it works

make change for the amount x = 67 (cents).
Use q = x/25 = 2 quarters.
The remainder = x 25q = 17, which we use d = 17/10
= 1 dime.
Then the remainder = 17 10d = 7, so we use n = 7/5
= 1 nickel.
Finally, the remainder = 7 5n = 2, which requires p =
2/1 = 2 pennies.
The total number of coins used = q + d + n + p = 6.
Note: The above algorithm is optimal in that it uses the
fewest number of coins among all possible ways to
make change for a given amount. However, this is
dependent on the denominations of the US currency
system.
2. The Knapsack Problem

A thief breaks into a museum:
Fabulous paintings, sculptures, and jewels are
everywhere.
The thief has a good eye for the value of these
objects, and knows that each will fetch
hundreds or thousands of dollars on the
clandestine art collectors market.
But, the thief has only brought a single
knapsack to the scene of the robbery, and
can take away only what he can carry.
What items should the thief take to maximize
the haul?
1. Coin Changing Problem

Give change using the minimum number of coins
we often encounter at a cash register: receiving the
fewest numbers of coins to make change after
paying the bill for a purchase.
Example: the purchase is worth $5.27, how many
coins and what coins does a cash register return
after paying a $6 bill?
Greedy solution: always use larger coins first
For a given amount ($0.73), use as many quarters
($0.25) as possible without exceeding the amount.
Use as many dimes ($.10) for the remainder, then
use as many nickels ($.05) as possible. Finally, use
the pennies ($.01) for the rest.
Example when it doesnt work

Example1: try a system that uses
denominations of 1-cent, 6-cent, and 7-cent
coins, and try to make change for x = 18 cents.
Greedy: 2 7-cents and 4 1-cents: 6 coins.
Optimal solution: use 3 6-cent coins.
Example 2: 16c due, coins of 12, 10, 5 and 1.
Greedy: 12, 1, 1, 1, 1: 5 coins
Optimal: 10, 5, 1: 3 coins
Moral: need to prove that greedy produces
optimal
More formally
the 0-1 knapsack problem:
The thief must choose among n items, where the ith
item worth vi dollars and weighs wi pounds
Carrying at most W pounds, maximize value
Note: assume vi, wi, and W are all integers
0-1: each item must be taken or left in entirety
the fractional knapsack problem:
Thief can take fractions of items
Think of items in 0-1 problem as gold ingots, in
fractional problem as buckets of gold dust
22
The fractional Knapsack Problem:

Given n objects each have a weight wi and a value vi ,
and given a knapsack of total capacity W. The
problem is to pack the knapsack with these objects in
order to maximize the total value of those objects
packed without exceeding the knapsacks capacity.
More formally, let xi denote the fraction of the object i
to be included in the knapsack, 0 xi 1, for 1 i
n. The problem is to find values for the xi such that
n
i =1
i =1
xi wi W and xi vi is maximized.
n
Note that we may assume wi > W because otherwise,

i =1
we would choose xi = 1 for each i which would be an
obvious optimal solution.
The Optimal Knapsack Algorithm:

Input: an integer n, positive values wi and vi , for 1 i
n, and another positive value W.
Output: n values xi such that 0 xi 1 and
n
i =1
i =1
There seem to be 3 obvious greedy strategies:

(Max value) Sort the objects from the highest
value to the lowest, then pick them in that order.
(Min weight) Sort the objects from the lowest
weight to the highest, then pick them in that
order.
(Max value/weight ratio) Sort the objects based
on the value to weight ratios, from the highest to
the lowest, then select.
Example: Given n = 5 objects and a knapsack
capacity W = 100 .
xi wi W and xi vi is maximized.
Algorithm (of time complexity O(n lgn))

(1) Sort the n objects from large to small based on the
ratios vi/wi . We assume the arrays w[1..n] and v[1..n]
store the respective weights and values after sorting.
(2) initialize array x[1..n] to zeros.
(3) weight = 0; i = 1
(4) while (i n and weight < W) do
(4.1) if weight + w[i] W then x[i] = 1
(4.2) else x[i] = (W weight) / w[i]
(4.3) weight = weight + x[i] * w[i]
(4.4) i++
w 10 20 30 40 50
v 20 30 66 40 60
v/w 2.0 1.5 2.2 1.0 1.2
select
xi
value
Max vi 0 0 1 0.5 1 146
Min wi 1 1 1 1 0 156
Max vi/wi 1 1 1 0 0.8 164
Optimal Substructure
Both variations exhibit optimal substructure
An optimal solution to the problem contains
optimal solutions to subproblems.
To show this for the 0-1 problem, consider the
most valuable load weighing at most W pounds
If we remove item j from the load, what do we
know about the remaining load?
A: remainder must be the most valuable load
weighing at most W - wj that thief could take
from museum, excluding item j
Solving The Knapsack Problem

The optimal solution to the fractional knapsack
problem can be found with a greedy algorithm
The optimal solution to the 0-1 knapsack problem
cannot be found with the same greedy strategy
Greedy strategy: take in order of dollars/pound
Example: 3 items weighing 10, 20, and 30 pounds,
knapsack can hold 50 pounds
Value: $60, $100, $120.
23
3. Minimizing Time in the System

A single server (a processor, a gas pump, a cashier in
a bank, and so on) has n customers to serve.
The service time required by each customer is known
in advance: customer i will take time ti, 1 i n.
We want to minimize
T = i = 1 to n (time in system for customer i )
Example: we have 3 customers with
t1 = 5, t2 = 10, t3 = 3
order 3 1 2: 3 + (3+5) + (3+5+10) = 29 optimal
Optimality Proof
[Theorem] This greedy algorithm is always
optimal.
(Proof) Let I = (i1, , in) be any permutation of the
integers {1, 2, , n}.
If customers are served in the order I, the total time
passed in the system by all the customers is
T = ti1 + (ti1 + ti2) + (ti1+ ti2+ ti3) +
= n ti1 + (n-1)ti2 + (n-2) ti3 +
= k = 1 to n (n - k + 1) tik
We can therefore improve any schedule in which a customer

is served before someone else who requires less service.
The only schedules that remain are those obtained by putting
the customers in non-decreasing order of service time.
All such schedules are equivalent and thus theyre all optimal.
Service Order
Served Customer
Service Duration
After exchange ia & ib
Service Duration
Served Customer
1..a-1 a a+1..b-1 b b+1.. n

i1 ia ib in
ti1 tia tib tin
Designing Algorithm
Imagine an algorithm that builds the optimal schedule
step by step.
Suppose after serving customer i1, , im we add
customer j. The increase in T at this stage is
ti1 + + tim + tj
To minimize this increase, we need only to minimize tj.
This suggests a simple greedy algorithm: at each
step, add to the end of schedule the customer
requiring the least service among those remaining.
Suppose now that i is such that we can find 2 integers

a and b with a < b and tia > tib
in other words, the ath customer is served before the
bth customer even though a needs more service time
than b.
If we exchange the positions of these two customers,
we obtain a new order of service I.
This order is preferable because
T(I) = (n-a+1)tia + (n-b+1)tib + k=1 to n & ka,b (n - k + 1) tik
T(I) = (n-a+1)tib + (n-b+1)tia + k=1 to n & ka,b (n - k + 1) tik
T(I) - T(I) = (n-a+1)(tia - tib) + (n-b+1)(tib - tia)
= (b-a)(tia - tib) > 0
Lecture 4- 2
Greedy Algorithms
(Chapter 16)
ti1 tib tia tin

i1 ib ia in
24
An optimal solution to the problem contains optimal
solutions to the subproblems
An optimal solution to the subproblem
+ the greedy choice
= an optimal solution to the original problem
Greedy Choice Property

Global optimal solution can be found by
making local optimal choice (greedy choice)
Examine a global optimal solution to some
subproblem.
Then show that the solution can be modified
to use the greedy choice, resulting in one
similar but smaller subproblem
4. Activity-Selection Problem
Example: get your moneys worth out of a carnival
Buy a wristband that lets you onto any ride
Lots of rides, each starting and ending at different
times
Your goal: ride as many rides as possible
Formally:
Given a set S of n activities
si = start time of activity i
fi = finish 3time of activity i
4
6 activities
Find max-size subset
A of compatible
2
1
Assume (wlog) that f1 f2 fn
A Greedy Algorithm
So actual algorithm is simple:
Sort the activities by finish time
Schedule the first activity
Then schedule the next activity in
sorted list which starts after previous
activity finishes
Repeat until no more activities
Intuition is even more simple:
Always pick the shortest ride available
at the time
Let k be the minimum activity in A (i.e., the
one with the earliest finish time).
Then A - {k} is an optimal solution to S = {i
S: si fk}
once activity #1 is selected, the problem
reduces to finding an optimal solution for
activity-selection over activities in S
compatible with #1
Proof: if we could find optimal solution B
to S with |B| > |A - {k}|, then B U {k} is
compatible and |B U {k}| > |A|
Greedy Choice Property

Activity selection problem also exhibits the greedy
choice property:
Locally optimal choice globally optimal
solution
[Theorem] if S is an activity selection problem sorted
by finish time, then optimal solution
A S such that {1} A
Sketch of proof: if optimal solution B that does
not contain {1}, can always replace first activity in
B with {1}. Same number of activities, thus
optimal.
25
5. Optimal 2-way Merge patterns
Recursive greedy algorithm
Suppose there are 3 sorted lists L1, L2, and L3,

of sizes 30, 20, and 10, respectively, which need
to be merged into a combined sorted list
but we can merge only two at a time.
Iterative
greedy algorithm
We intend to find an optimal merge pattern

which minimizes the total number of
comparisons.
O(n) if already sorted
Example
merge L1 & L2,: 30 + 20 = 50 comparisons,
resulting in a list of size 50.
then merge the list & L3: 50 + 10 = 60
comparisons
total number of comparisons: 50 + 60 = 110.
Alternatively, merge L2 & L3: 20 + 10 = 30
comparisons, the resulting list (size 30)
then merge the list with L1: 30 + 30 = 60
comparisons.
total number of comparisons: 30 + 60 = 90.
Optimal Binary Merge Tree Algorithm:
Binary Merge Trees: depict the merge patterns

using a binary tree, built from the leaf nodes
(the initial lists) towards the root in which
each merge of two nodes creates a parent
node whose size is the sum of the sizes of the
two children.
0
1
2
Cost = 30*2 + 20*2

+ 10*1 = 110
60
10
50
30
20
20
Merge L1 and L2, then L3
merge cost = sum of all weighted external path lengths
Initially, 5 leaf nodes with

sizes
5
2
10
Iteration 2:
5
merge 5
and 5 into
2
3
10
5
2
5
3
Iteration 4: merge
10 and 16 into 26
16
10
7
Iteration 1: merge 2 and 3

into 5
Iteration 3: merge 7
and 9 (chosen among
7, 9, and 10) into 16
16
26
When the algorithm terminates, there is a single value left in the

heap whose corresponding node is the root of the optimal binary
merge tree.
time complexity: O(n lgn) : Step (1) takes O(n) time; Step (2) runs
O(n) iterations, in which each iteration takes O(lgn) time.
10
Merge L2 and L3, then L1
Algorithm:
(1) create a min-heap T[1..n ] based on the n initial sizes.
(2) while (the heap size 2) do
(2.1) delete from the heap two smallest values, call
them a and b, create a parent node of size a + b
for the nodes corresponding to these two values
(2.2) insert the value (a + b) into the heap which
corresponds to the node created in Step (2.1)
30
30
Input: n leaf nodes each have an integer size, n 2.

Output: a binary tree with the given leaf nodes which
has a minimum total weighted external path lengths
Cost = 30*1 +
20*2 + 10*2 = 90
60
Cost = 2*3 + 3*3 + 5*2 + 7*2

+ 9*2 = 57.
26
6. Huffman codes
Widely used and effective technique for compressing
data: savings of 20% to 90% are typical depending on
file characteristics
Motivation: suppose we wish to save a text (ASCII) file
on the disk or to transmit it though a network using an
encoding scheme that minimizes the number of bits
required.
Without compression, characters are typically
encoded by their ASCII codes with 8 bits per character.
We can do better if we have the freedom to design our
own encoding.
Example
Given a text file that uses only 5 different
letters (a, e, i, s, t), the space character, and
the newline character.
Since there are 7 different characters, we
could use 3 bits per character because that
allows 8 bit patterns ranging from 000
through 111 (so we still one pattern to spare).
The following table shows the encoding of
characters, their frequencies, and the size of
encoded (compressed) file.
Binary character code to represent each

character:
Fixed length code: each char is assigned
same fixed length codeword.
Variable length code: much better than fixed
length code, by giving frequent chars short
codeword and infrequent chars long
codeword.
Prefix code are codes in which no codeword
is also a prefix of some other codeword.
Character
a
e
i
s
t
space
newline
Frequency
10
15
12
3
4
13
1
Total
58
Code
000
001
010
011
100
101
110
Total bits
30
45
36
9
12
39
3
Code
001
01
10
00000
0001
11
00001
174
Fixed-length encoding
Total bits
30
30
24
15
16
26
5
146
Variable-length
encoding
If we can use variable lengths for the codes,

we can actually compress more as shown in
the above.
However, the codes must satisfy the property
that no code is the prefix of another code:
prefix code.
Huffman codes
Total # of bits of encoded file = freq1 * length(code1) +
freq2 * length(code2) + + freqk * length(codek)
Huffman code: an optimal prefix code constructed by
a greedy algorithm.
An optimal code for a file is always represented by a
full binary tree, in which every nonleaf node has two
children.
Idea: to start with |C| leaves and perform a sequence
of |C|-1 merging operations to create the final tree.
Greedy property: Smaller the frequency, make the
code longer to improve the compression.
Priority queues can be used to find the two leastfrequent objects to merge together.
27
How to design an optimal prefix code (i.e., with

minimum total length) for a given file?
We can depict the codes for the given collection of
characters using a binary tree: reading each code from
left to right, we construct a binary tree from the root
following the left branch when encountering a 0, right
branch when encountering a 1.
We do this for all the codes by constructing a single
combined binary tree.
0
0
1
0
1
0
1
1
Codes 001, 01, 10, 00000,
0001, 11, and 00001
Code
001
Codes 001
and 01
Codes 001,
01, and 10
Note: each code terminates at

a leaf node, by the prefix
property.
Note that the encoded file size is equal to the total

weighted external path lengths if we assign the
frequency to each leaf node.
Total file size
3
s
1
\n
4
t
e
15 12
10 i
a
13
= 3*5 + 1*5 + 4*4 + 10*3 + 15*2 +

12*2 + 13*2
= 146, which is exactly the total
weighted external path lengths.
Also note that in an optimal

prefix code, each node in the
tree has either no children or
has two.
Thus, the optimal binary
merge tree algorithm finds
the optimal code (Huffman
code).
x
y
Node x has
only one
child y
Merge x and y,
reducing total
size
Correctness Proof: greedy-choice property

[lemma16.2] C: an alphabet, each c in C has frequency f(c).
x,y: two characters in C having the lowest frequencies.
Then there is an optimal prefix code for C in which the
codewords for x and y have the same length and differ in
the last bit.
<Proof> idea: take a tree T representing an optimal prefix code.
Modify T to make a tree Trepresenting another optimal prefix
code s.t. x and y appear as sibling leaves of the max. depth in
T(their codewords have same length and differ in the last bit)
O(nlogn)
a,b: two characters that are sibling leaves of maximum depth in T

w.l.o.g assume that f(a)<=f(b) and f(x)<=f(y)
Since f(x),f(y) are two lowest frequencies in order,and f(a) and f(b)
are two arbitrary frequencies, f(x)<=f(a) and f(y)<=f(b)
Exchange the positions of a and x in T to produce T
Then Exchange the positions of b and y in T to produce T
28
dT(c) : depth of cs leaf in T (length of codeword for c)

B(T): cost of the tree T (# of bits required to encode a file)
B(T) = c in C f(c) dT(c)
B(T)-B(T) = c in C f(c) dT(c) - c in C f(c) dT(c)
= f(x)dT(x) + f(a)dT(a) f(x)dT(x) f(a)dT(a)
= f(x)dT(x) + f(a)dT(a) f(x)dT(a) f(a)dT(x)
= (f(a)-f(x))(dT(a)-dT(x)) > = 0
f(a)-f(x): nonnegative (as x is a minimum frequency leaf)
dT(a)-dT(x): nonnegative (as a is a leaf of max. depth in T)
[lemma16.3] C: an alphabet, each c in C has frequency f(c).
x,y: two characters in C with min. frequency.
C: alphabet set C=C-{x,y} + {z}, f(z)=f(x)+f(y)
T: any tree representing an optimal prefix code for C
Then, T, obtained from T by replacing the leaf node z with an
internal node having x and y as children, represents an
optimal prefix code for C.
<Proof> first show that B(T) can be expressed in terms of B(T).
For each c in C-{x,y}, dT(c)=dT(c), hence f(c)dT(c)=f(c)dT(c).
Since dT(x)=dT(y)=dT(z)+1,
we have f(x)dT(x)+f(x)dT(y) = (f(x)+f(y))(dT(z)+1)
= f(z)dT(z) +(f(x)+f(y))
We conclude B(T)=B(T)+f(x)+f(y) or
B(T) B(T) f( ) f( )
Similarly exchange y and b does not increase the cost

B(T)-B(T) : nonnegative
Therefore, B(T) <= B(T).
Since T is optimal, B(T) <= B(T)
B(T)= B(T): T is an optimal tree where x and y appear as
sibling leaves of max. depth
<Proof>
We concluded B(T)=B(T)+f(x)+f(y) or B(T)=B(T)-f(x)-f(y).
We now prove the lemma by contradiction.
Suppose that T does not represent an optimal prefix code for C.
Then there is a tree T s.t. B(T) < B(T).
w.l.o.g(by lemma16.2), T has x and y as siblings.
T:tree T with parent of x and y replaced by a leaf z with
f(z)=f(x)+f(y).
Then B(T) = B(T)-f(x)-f(y)
< B(T)-f(x)-f(y)
= B(T)
Contradiction as T represents an optimal prefix code for C.
Thus, T must represent an optimal prefix code for C.
Motivation
Lecture 5-1/5-2
Optimization Problems
In which a set of choices must be made
in order to arrive at an optimal (min/max)
solution, subject to some constraints.
(There may be several solutions to
achieve an optimal value.)
Dynamic Programming
(Chapter 15)
Two common techniques:

Dynamic Programming (global)
Greedy Algorithms (local)
29
Dynamic Programming
Similar to divide-and-conquer, it breaks
problems down into smaller problems that
are solved recursively.
In contrast, DP is applicable when the subproblems are not independent,
i.e. when sub-problems share sub-subproblems.
It solves every sub-sub-problem just once
and save the results in a table to avoid
duplicated computation.
Applicability to Optimization Problems

Optimal sub-structure (principle of optimality):
for the global problem to be solved optimally,
each sub-problem should be solved optimally.
This is often violated due to sub-problem
overlaps. Often by being less optimal on
one problem, we may make a big savings on
another sub-problem.
Small number of sub-problems: Many NP-hard
problems can be formulated as DP problems,
but these formulations are not efficient,
because the number of sub-problems is
exponentially large. Ideally, the number of
sub-problems should be at most a polynomial
number.
Elements of DP Algorithms
Sub-structure: decompose problem into
smaller sub-problems. Express the solution
of the original problem in terms of solutions
for smaller problems.
Table-structure: Store the answers to the subproblem in a table, because sub-problem
solutions may be used many times.
Bottom-up computation: combine solutions
on smaller sub-problems to solve larger subproblems, and eventually arrive at a solution
to the complete problem.
Dynamic Programming
Divide & Conquer
independent subproblems
Dynamic Programming
subproblems are not independent
(subproblems share subproblems)
algorithm solves every subproblem just
once and then saves its answer in a table,
avoiding recomputation
applied to optimization problem: we want
to find an optimal solution with the
optimal (minimum or maximum) value
(there may be several solutions)
Dynamic Programming Algorithm
1. Assembly-Line Scheduling
4 steps
1. Characterize the structure of an optimal
solution
2. Recursively define the value of the optimal
solution
3. Compute the value of an optimal solution in
a bottom-up fashion
4. Construct an optimal solution from
computed information (can be omitted)
Two parallel assembly lines in a factory,

lines 1 and 2
Each line has n stations Si,1Si,n
For each j, S1, j does the same thing as S2, j ,
but it may take a different amount of
assembly time ai, j
Transferring away from line i after stage j
costs ti, j
Also entry time ei and exit time xi at
beginning and end
30
Station S1,1
S1,2
a1,1
e1
chassis
enter e
2
a1,2
t1,1
t1,2
S1,3
S1,4
S1,n-1
S1,n
a1,3
a1,4 a1,n-1
a1,n
t1,3
t1,n-1
Step 1: the structure of the fastest way

through the factory
assembly
line 1
x1
t2,1
a2,1
Station S2,1
t2,2
t2,3
a2,2
a2,3
S2,2
S2,3
t2,n-1
a2,4 a2,n-1
S2,4
S2,n-1
x2
a2,n
completed
auto exit
assembly
line 2
S2,n
Optimal substructure: an optimal solution to a

problem (finding the fastest way through S1, j )
contains an optimal solution to subproblems
(finding the fastest way through either S1, j-1 or S2,
j-1 )
A fastest way through station S1, j
must go through station j-1 on either line 1 or
line 2
Either the fastest way through S1, j-1 and then
directly through S1, j or
the fastest way through S2, j-1 , a transfer from
line 2 to line 1, and then through S1, j
Recursive formula for subproblem
Step 2: a recursive solution

Define the value of an optimal solution
recursively in terms of the optimal solutions
to subproblems
Fastest time
to any given
station
=min
fi [ j]: the fastest possible time to get the

chassis through S i, j
Fastest time
through prev
station (other
line)
Total time (our goal):

f * = min( f1[n] + x1, f2 [ n] + x2)
Station S1,1
f1[1] = e1 + a1,1 if j = 1
f1[ j] = min( f1[ j-1] + a1, j , f2 [ j-1]+ t2, j-1+ a1, j ) if
j>= 2
f2[1] = e2 + a2,1 if j = 1
f2[ j] = min( f2[ j-1] + a2, j , f1 [ j-1]+ t1, j-1+ a2, j ) if
j>= 2
li[j]: line number 1 or 2 whose station j-1 is used
in a fastest way through station S i, j
l*: the line whose station n is used in a fastest
way through the entire factory
7
2
chassis
enter 4
Fastest time
through prev
station (same
line)
Time it
to
+ takes
switch lines
S1,2
S1,3
S1,4
S1,5
S1,6
8
Station S2,1
S2,2
S2,3
S2,4
S2,5
S2,6
= min{18+3, 16+1+3}
4
5 6
j 2 3 4
18 20 24 32 35 f*=38 l1[j] 1 2 1
f2[j] 12 16 22 25 30 37
l2[j] 1 2 1
= min{18+3+6, 16+6}
j 1
f1[j] 9
assembly
line 1
completed
auto exit
assembly
line 2
5
1
2
6
2 l*=1
2
31
Step 3: computing the fastest time

Note that a naive recursive algorithm runs in
exponential time.
By computing the fi [ j] values in order of
increasing station number j, we can compute
the fastest way through the factory in time
O(n).
Step 4: constructing the fastest way

through the factory
line 1, station 6
line 2, station 5
line 2, station 4
line 1, station 3
line 2, station 2
line 1, station 1
2. Matrix-Chain Multiplication
Given a sequence of matrices A1 A2An , and
dimensions p0 p1pn where Ai is of dimension
pi-1 x pi , determine multiplication sequence
that minimizes the number of operations.
This algorithm does not perform the
multiplication, it just figures out the best
order in which to perform the multiplication.
O(n)
Lecture 6-1/6-2
Dynamic Programming
(Chapter 15)
Optimized Chain Operations

Determine the optimal sequence for performing
a series of operations. (the general class of the
problem is important in compiler design for
code optimization & in databases for query
optimization)
For example: given a series of matrices: A1An ,
we can parenthesize this expression however
we like, since matrix multiplication is
associative (but not commutative).
Multiply a p x q matrix A times a q x r matrix B,
the result will be a p x r matrix C. (# of columns
of A must be equal to # of rows of B.)
32
Matrix Multiplication
In particular for 1 i p and 1 j r,
C[i, j] = k = 1 to q A[i, k] B[k, j]
Observe that there are pr total entries in C
and each takes O(q) time to compute, thus
the total time to multiply two matrices is pqr.
Example
Consider 3 matrices: A1 be 5 x 4, A2 be
4 x 6, and A3 be 6 x 2.
Mult[((A1 A2)A3)] = (5x4x6) + (5x6x2) = 180
Mult[(A1 (A2A3 ))] = (4x6x2) + (5x4x2) = 88
Even for this small example, considerable
savings can be achieved by reordering
the evaluation sequence.
Cost of Naive Algorithm

The number of different ways of parenthesizing
n items is
P(n) = 1,
if n = 1
P(n) = k = 1 to n-1 P(k)P(n-k), if n 2
This is related to Catalan numbers (which in turn
is related to the number of different binary
trees on n nodes). Specifically P(n) = C(n-1).
C(n) = (1/(n+1)) C(2n, n) (4n / n3/2)
where C(2n, n) stands for the number of various
ways to choose n items out of 2n items total.
Step1: the structure of an optimal

parenthesization
The problem of determining the optimal sequence of

multiplication is broken up into 2 parts:
Aij : the product of matrices i through j.

Aij is a pi-1 x pj matrix.
Q : How do we decide where to split the chain (what

k)?
A : Consider all possible values of k.
At the highest level, we are multiplying two

matrices together.
that is, for any k, 1 k n-1,
A1n = (A1k)(Ak+1n)
Q : How do we parenthesize the subchains A1k &

Ak+1n?
A : Solve by recursively applying the same scheme.
NOTE: this problem satisfies the principle of
optimality.
33

For 1 i j n, let m[i, j] denote the minimum
number of multiplications needed to compute
Aij .
Example: Minimum number of multiplies for
A37
A1 A2 A3 A4 A5 A6 A7 A8 A9
14243
m[ 3, 7 ]
Step 3: Computing the optimal costs

For a specific k,
(Ai Ak)( Ak+1 Aj)
= Aik( Ak+1 Aj)
= Aik Ak+1j
= Aij
(m[i, k] mults)
(m[k+1, j] mults)
(pi-1 pk pj mults)
For solution, evaluate for all k and take

minimum.
m[i, j] = mini k < j (m[i, k] + m[k+1, j] + pi-1pkpj )
The optimal cost can be described be as

follows:
i = j the sequence contains only 1 matrix,
so m[i, j] = 0.
i < j this can be split by considering each
k, i k < j, as Aik (pi-1 x pk) times Ak+1j (pk x pj).
This suggests the following recursive rule for
computing m[i, j]:
m[i, i] = 0 for i = j
m[i, j] = mini k < j (m[i, k] + m[k+1, j] + pi-1pkpj )
for i < j
s
m
6 1
6
1
i
i
3 2
j
j
5
5 15125 2
4 3
3 3
4 11875 10500 3
3 3
3
3 4
9375
5375
4
7125
3
1
3
5
3
5
2
2
3
4
5
2 7875 4375 2500 3500 5 1
1 15750 2625 750 1000 5000 6
0
0
0
0
0
0
A1
A2
A3
A4
A5
A6 ((A1 (A2 A3))((A4 A5 )A6))
30x35 35x15 15x5 5x10 10x20 20x25
m[2,5] = min k
k
{m[2,2] + m[3,5] +p1p2p5 = 0 + 2500 + 35x15x20 = 13000,
m[2,3] + m[4,5] +p1p3p5 = 2625 + 1000 + 35x5x20 = 7125,
m[2,4] + m[5,5] +p1p4p5 = 4375 + 0 + 35x10x20 = 11375}
= 7125
Matrix-Chain-Order(p)
1. n length[p] - 1
2. for i 1 to n
// initialization: O(n) time
3.
do m[i, i] 0
4. for L 2 to n
// L = length of sub-chain
5.
do for i 1 to n - L+1
6.
do j i + L - 1
7.
m[i, j]
8.
for k i to j - 1
9.
do q m[i, k] + m[k+1, j] + pi-1 pk pj
10.
if q < m[i, j]
11.
then m[i, j] q
12.
s[i, j] k
13. return m and s
Analysis
The array s[i, j] is used to extract the
actual sequence (see next).
There are 3 nested loops and each can
iterate at most n times, so the total
running time is (n3).
34
Step 4: constructing an optimal solution

Leave a split marker indicating where the best
split is (i.e. the value of k leading to minimum
values of m[i, j]).
We maintain a parallel array s[i, j] in which we
store the value of k providing the optimal split.
If s[i, j] = k, the best way to multiply the subchain Aij is to first multiply the sub-chain Aik
and then the sub-chain Ak+1j , and finally
multiply them together.
Intuitively s[i, j] tells us what multiplication to
perform last.
We only need to store s[i, j] if we have at least 2
matrices & j > i.
Example
The initial set of dimensions are <5, 4, 6, 2, 7>:
we are multiplying A1 (5x4) times A2 (4x6) times
A3 (6x2) times A4 (2x7).
Optimal sequence is (A1 (A2A3 )) A4.
Elements of Dynamic Programming

Optimal substructure
an optimal solution to the problem
contains within it optimal solutions to
subproblems
How many subproblems
How many choices
Overlapping subproblems
recursive algorithm revisit the same
subproblem again and again
Solve each subproblem once and then
store the solution in a table
3. Longest common subsequence

Motivation
In biological applications, we often want to
compare the DNA of two different organisms.
A strand of DNA consists of a string of molecules
called bases:
can be represented as a string over a finite set {A,
C, G, T},
where A=adenine, C=cytosine, G=guanine,
T=thymine
One goal of comparing two strands of DNA is to
determine how similar the two strands are.
Similarity can be defined in many different ways (S1 &

S2).
If one is a substring of the other
If the number of changes needed to turn one into
the other is small
Find a third strand S3:
the bases of S3 must appear in each of S1 & S2 in
the same order but not necessary consecutively
Example:
S1=ACCGGTCGAGTGCGCGGAAGCCGGCCGAA
S2 =GTCGTTCGGAATGCCGTTGCTCTGTAAA
S3 =GTCGTCGGAAGCCGGCCGAA
35
Longest Common Subsequence(LCS)

Subsequence
given a sequenve X = < x1, x2, , xm>,
another sequence Z=<z1, z2, , zk> is a
subsequence of X if there exist a strictly
increasing sequence <i1, i2, , ik> of
indices of X such that for all j = 1, 2, , k,
we have xij = zj.
Z = <B,C,D,B> is a subsequence of X =
<A, B, C, B, D, A, B> with index sequence
<2,3,5,7>
Step1: characterizing a LCS

Brute-force approach
enumerate all subsequences of X and check if it is
also a subsequence of Y, keeping the track of the
longest found.
2m subsequences of X
requires exponential time
LCS has an optimal substructure property:
subproblems correspond to pairs of prefixes of the
two input sequences.
Given a sequence X = < x1, x2, , xm>, we define the
ith prefix of X, for i = 0,1,,m, as Xi= < x1, x2, , xi>.
X=<A,B,C,B,D,A,B>, then X4=<A,B,C,B>
<pf>1. If xm= yn, then zk = xm= yn and Zk-1 is an

LCS of Xm-1 and Yn-1
a. zk = xm= yn
if zk =/= xm, then we can append xm= yn to Z:
length k+1 -> contradiction
b. Zk-1 is an LCS of Xm-1 and Yn-1
Zk-1 is a length (k-1) common subsequence of
Xm-1 and Yn-1
Suppose that there is a common subsequence
W of Xm-1 and Yn-1with length > (k-1)
Then appending xm= yn to W: LCS of X and Y:
length >k
Contradiction the assumption that Z is an LCS
of X and Y.
Common Subsequence
Given two sequences X and Y, we say that a
sequence Z is a common sequence of X and Y if Z
is a subsequence of both X and Y.
X=<A,B,C,B,D,A,B>, Y=<B,D,C,A,B,A>
subsequence <B,C,A>: common subsequence of X
and Y, but not a longest
<B,C,B,A>, <B,D,A,B>: an LCS of X and Y
Longest Common Subsequence(LCS) problem
Given two sequences X = < x1, x2, , xm> and Y= <
y1, y2, , yn>, we want to find a maximum-length
common subsequence of X and Y.
[Theorem] optimal substructure of LCS

X= < x1, x2, , xm> and Y= < y1, y2, , yn>,
Z= < z1, z2, , zk> be any LCS of X & Y
1. If xm= yn, then zk = xm= yn and Zk-1 is an LCS
of Xm-1 and Yn-1
2. If xm=/= yn, then zk =/= xm implies that Z is an
LCS of Xm-1 and Y
3. If xm=/= yn, then zk =/= yn implies that Z is an
LCS of X and Yn-1
An LCS of two sequences contains within it
an LCS of prefixes of the two sequences:
optimal substructure property
<pf> 2. If xm=/= yn, then zk =/= xm implies that Z is

an LCS of Xm-1 and Y
If zk =/= xm, then Z is a common subsequence
of Xm-1 and Y.
If there were a common subsequence W of Xm-1
and Y with length > k, then W would also be a
common subsequence of Xm and Y
Contradiction to the assumption that Z is an
LCS of X and Y.
36

There are either one or two subproblems to examine
1. If xm= yn, then
we must find an LCS of Xm-1 and Yn-1
appending xm= yn to this LCS yields an LCS of X and
Y.
2. If xm=/= yn, then we must solve two subproblems
finding an LCS of Xm-1 and Y, and
finding an LCS of X and Yn-1
Take the longer one
Overlapping subproblems: many subproblems share
sub-subproblems
Establish a recurrence for the value of an optimal

solution
c[i,j]: length of an LCS of the sequences Xi and Yj
c[i,j]=0
c[i,j]=c[i-1, j-1] + 1
c[i,j]=max(c[i,j-1], c[i-1,j])
A condition in the problem restricts which

subproblems we may consider: subproblems are
ruled out due to conditions in the problem
Step 3: computing the length of an LCS

Naive approach: exponential time recursive algorithm
O(mn) time DP algorithm
There are only O(mn) distinct subproblems
We can use DP to compute the solutions bottom
up: c[0..m, 0..n].
It stores c[i,j] in row-major order: fills from the
first row of c from left to right
We also maintain table b[1..m, 1..n] for step4
b[i,j]: points to the table entry corresponding to
the optimal subproblem solution chosen when
computing c[i,j]
if i=0 or j=0
if i,j>0 and xi=yj
if i,j>0 and xi=/=yj
0
yj
1
B
2
D
3
C
4
A
5
B
6
A
0 xi
1 A
2 B
3 C
4 B
5 D
6 A
7 B
X=<A,B,C,B,D,A,B>
Y=<B,D,C,A,B,A>
LCS=<B,C,B,A>
O(mn) time
Step 4: constructing an LCS

PRINT-LCS(b,X,length[X],length[Y])
O(m+n) time
37
[Theorem] optimal substructure of LCS

X= < x1, x2, , xm> and Y= < y1, y2, , yn>,
Z= < z1, z2, , zk> be any LCS of X & Y
1. If xm= yn, then zk = xm= yn and Zk-1 is an LCS
of Xm-1 and Yn-1
2. If xm=/= yn, then zk =/= xm implies that Z is an
LCS of Xm-1 and Y
3. If xm=/= yn, then zk =/= yn implies that Z is an
LCS of X and Yn-1
An LCS of two sequences contains within it
an LCS of prefixes of the two sequences:
optimal substructure property
<pf> 2. If xm=/= yn, then zk =/= xm implies that Z is

an LCS of Xm-1 and Y
<pf>1. If xm= yn, then zk = xm= yn and Zk-1 is an

LCS of Xm-1 and Yn-1
a. zk = xm= yn
if zk =/= xm, then we can append xm= yn to Z:
length k+1 -> contradiction
b. Zk-1 is an LCS of Xm-1 and Yn-1
Zk-1 is a length (k-1) common subsequence of
Xm-1 and Yn-1
Suppose that there is a common subsequence
W of Xm-1 and Yn-1with length > (k-1)
Then appending xm= yn to W: LCS of X and Y:
length >k
Contradiction the assumption that Z is an LCS
of X and Y.
Improvement
O(nm) time and O(nm) space DP algorithm
If zk =/= xm, then Z is a common subsequence

of Xm-1 and Y.
If there were a common subsequence W of Xm-1
and Y with length > k, then W would also be a
common subsequence of Xm and Y
Contradiction to the assumption that Z is an
LCS of X and Y.
O(n+m) time and O(nm) space DP algorithm

O(n+m) time and O(n+m) space DP algorithm
Only compute the length of LCS
In general, DP solution is not unique.
Elements of Dynamic Programming

an optimal solution to the problem
contains within it optimal solutions to
subproblems
How many subproblems
How many choices
Overlapping subproblems
recursive algorithm revisit the same
subproblem again and again
Solve each subproblem once and then
store the solution in a table
Review: Dynamic Programming

Optimal substructure: if an optimal solution contains
within it optimal solutions to the subproblems.
1. Show that the problem consists of making choices.
2. Suppose that you are given the choice that leads to an
optimal solution.
3. Given the choice, determine which subproblems to
ensue and how to characterize the subproblems.
4. Show that the solutions to the subproblems used in
the optimal solution must be optimal by using cut &
paste technique.
Overlapping subproblems: when a recursive algorithm
revisits the same problem over & over again.
38
Review: Dynamic Programming
Step 1. Optimal substructure

Step 2. Recursive solution
Step 3. Compute the optimal costs
Step 4. Construct an optimal solution
# of choices
# of subproblems
Bottom-up approach
Review: Greedy methods
Steps:
1. Determine the optimal substructure

2. Develop a recursive solution
3. Prove that at any stage of the recursion,
one of the optimal choices is the greedy
choice.
4. Show that all but one of the subproblems
induced by the greedy choice is empty.
5. Develop a recursive algorithm
6. Convert to an iterative algorithm
Alternatively:
1. Cast the optimization problem as one in

which we make a choice and left with one
subproblem to solve
2. Prove that there is always an optimal
solution that makes the greedy choice
3. Show that (having made the greedy choice)
what remains is a subproblem with the
property: if we combine an optimal
solution to the subproblem with the greedy
choice, then we arrive an optimal solution.
4. Minimum-Weight Triangulation of
Convex Polygon
Motivitaion: computational geometry (graphics)
A polygon is a piecewise linear closed curve in
the plane.
We form a cycle by joining line segments
end to end.
The line segments are called the sides of the
polygon and the endpoints are called the
vertices.
A polygon is simple if it does not cross itself,
i.e. if the edges do not intersect one another
except for two consecutive edges sharing a
common vertex.
Greedy-choice property: a globally

optimal solution can be arrived by making
a locally optimal greedy choice.
Optimal substructure: if an optimal
solution contains within it optimal
solutions to the subproblems.
Top down
A simple polygon defines a region consisting of

points it encloses.
The points strictly within this region are in the
interior of this region, the points strictly on the
outside are in its exterior, and the polygon itself is
the boundary of this region.
A simple polygon is said to be convex if given any
two points on its boundary, the line segment between
them lies entirely in the union of the polygon and its
interior.
Convexity can also be defined by the interior
angles.
The interior angles of vertices of a convex polygon
are at most 180 degrees.
39
Triangulations
Example
intersection
exterior
interior
Given a convex polygon, assume that its vertices are

labeled in counterclockwise order P=<v0,,vn-1>.
Assume that indexing of vertices is done modulo n,
so v0 = vn.
This polygon has n sides, (vi-1 ,vi ).
Given two nonadjacent vj , where i < j, the line segment
(vi ,vj ) is a chord.
If the polygon is simple but not convex, a segment
must also lie entirely in the interior of P for it to be a
chord.
Any chord subdivides the polygon into two
polygons.
Triangulations
A triangulation of a convex polygon is a maximal
set T of chords.
Every chord that is not in T intersects the
interior of some chord in T.
Such a set of chords subdivides interior of a
polygon into set of triangles.
Dual graph of the triangulation is a graph whose
vertices are the triangles, and in which two
vertices share an edge if the triangles share a
common chord.
NOTE: the dual graph is a free tree.
In general, there are many possible
triangulations.
Minimum-Weight Convex Polygon

Triangulation
The number of possible triangulations is exponential in n,
the number of sides.
The best triangulation depends on the applications.
Our problem: Given a convex polygon, determine the
triangulation that minimizes the sum of the perimeters of
its triangles.
chord
Example
Correspondence to Binary Trees

In MCM, the associated binary tree is the
evaluation tree for the multiplication, where the
leaves of the tree correspond to the matrices,
and each node of the tree is associated with a
product of a sequence of two or more matrices.
Given three distinct vertices, vi , vj and vk , we define the

weight of the associated triangle by the weight function
w(vi , vj , vk) = |vi vj | + |vj vk | + |vk vi |,
where |vi vj | : length of the line segment (vi ,vj ).
40
Correspondence to Binary Trees
Binary Tree for Triangulation
Consider an (n+1)-sided convex polygon,

P=<v0,,vn> and fix one side of the polygon,
(v0 ,vn).
Consider a rooted binary tree whose root
node is the triangle containing side (v0 ,vn),
whose internal nodes are the nodes of the
dual tree, and whose leaves correspond to
the remaining sides of the tree.
The partitioning of a polygon into triangles is
equivalent to a binary tree with n leaves, and
vice versa.
The associated binary tree has n leaves, and

hence n-1 internal nodes.
Since each internal node other than the root
has one edge entering it, there are n-2 edges
between the internal nodes.
Example of Binary Tree for

Triangulation
Lemma
A triangulation of a simple polygon has n-2 triangles
and n-3 chords.
(Proof)
The result follows directly from the previous figure.
Each internal node corresponds to one triangle and each edge
between internal nodes corresponds to one chord of
triangulation.
If we consider an n-vertex polygon, then well have n-1 leaves,
and thus n-2 internal nodes (triangles) and n-3 edges
(chords).
DP Solution
DP Solution
For 1 i j n, let t[i, j] denote the minimum

weight triangulation for the subpolygon <vi-1,
vi ,, vj>.
We start with vi-1
rather than vi, to
keep the structure
as similar as
possible to the
matrix chain
multiplication
problem.
v5
Observe: if we can compute t[i, j] for all i and

j (1 i j n), then the weight of minimum
weight triangulation of the entire polygon will
be t[1, n].
v4
For the basis case, the weight of the trivial 2sided polygon is zero, implying that t[i, i] = 0
(line (vi-1, vi)).
v3
Min. weight
triangulation
= t[2, 5]
v6
v2
v0
v1
41
DP Solution
DP Solution
In general, to compute t[i, j], consider the

subpolygon <vi-1, vi ,, vj>, where i j.
One of the chords of this polygon is the side
(vi-1, vj).
We may split this subpolygon by introducting
a triangle whose base is this chord, and
whose third vertex is any vertex vk, where i k
j-1.
This subdivides the polygon into two
subpolygons <vi-1,...vk> & <vk+1,... vj>, whose
minimum weights are t[i, k] and t[k+1, j].
We have following recursive rule for

computing t[i, j]:
t[i, i] = 0
t[i, j] = mini k j-1 (t[i, k] + t[k+1, j] + w(vi-1vkvj ))
for i < k
Min. Weight Polygon Triangulation

1. n length[V] - 1
// V = <v0 ,v1 ,,vn>
2. for i 1 to n
// initialization: O(n) time
3.
do t[i, i] 0
4. for L 2 to n
// L = length of sub-chain
5.
do for i 1 to n-L+1
6.
do j i + L - 1
7.
t[i, j]
8.
for k i to j - 1
9.
do q t[i, k] + t[k+1, j] + w(vi-1 , vk , vj)
10.
if q < t[i, j]
11.
then t[i, j] q
12.
s[i, j] k
13. return t and s
j = 0, 1, , N (8)
0 1 2 3 4 5 6 7 8
d2=4
0 1 2 3 1 2 3 4 2
d3=6
0 1 2 3 1 2 1 2 2
Coin of denomination i: value di units/ N unit
c[i,j]: min # of coins to pay up to j units (0<=j<=N), using

only coins of denomination 1 to i(1<=i<=n).
c[1..n, 0..N]
c[i,0] = 0 for all i: initialization
Choose not to use coin of denomination i:
c[i,j] = c[i-1,j]
Choose to use at least one coin of denomination i:

c[i,j] = 1+ c[i, j-di]
c[i,j] = min(c[i-1,j], 1+c[i,j-di])

= infinity if i = 1 or/and j<di
coins(N): c[1..n, 0..N]
Amount 0 1 2 3 4 5 6 7 8
d1=1
5. Coin changing problem
for i = 1 to n
i = 1, 2, , n (3)
c[i,0]=0
for i =1 to n
for i=1 to N
if i=1 and j<d[1] then c[i,j] <- infinity
c[i,j] = min(c[i-1,j], 1+c[i,j-di])

j-di =8-6=2
3
c[1..n, 0..N]
else if i=1 then c[i,j] <- 1+c[1, j-d[1]]

else if j<di then c[i,j] <- c[i-1, j]
else c[i,j] <- min(c[i-1,j], 1+c[i,j-d[i]])
return c[n,N] : O(nN) time
42
Constructing optimal solution
Start from c[n,N] to c[0,0] and trace where the optimal

values came from
c[i,j]: how many coins are needed
Lecture 7-1/7-2
If c[i,j] = c[i-1,j]
No coin of denomination i is necessary
Move up to c[i-1,j]
If c[i,j] = 1+c[i, j-di]
Hand over one coin of denomination i (worth di)
move left to c[i,j-di]
Graph Algorithms
(Chapter 22)
If c[i,j] = c[i-1,j] = 1+c[i, j-di] : either
1. Graphs
1. Graphs
Graph: a collection of vertices or nodes,

connected by a collection of edges.
2. BFS
3. DFS
4. Topological Sort
5. Strongly Connected Component
Digraphs/ Graphs
Directed Graph (or digraph) G = (V, E) consists
of a finite set V, called vertices or nodes, and E,
a set of ordered pairs, called edges of G. E is a
binary relation on V.
Self-loops are allowed.
Multiple edges are not allowed, though (v, w)
and (w, v) are distinct edges.
Motivation: many applications where there

is some connection or relationship or
interaction between pairs of objects
network communication & transportation
VLSI design & logic circuit design
surface meshes in CAD/CAM & GIS
path planning for autonomous agents
precedence constraints in scheduling
Examples of Digraphs & Graphs
Undirected Graph (or graph) G = (V, E) consists

of a finite set V of vertices, and a set E of
unordered pairs of distinct vertices, called edges
of G.
No self-loops are allowed.
43
Basic Terminology
Vertex w is adjacent to vertex v if there is an edge (v,w).
Given an edge e = (u,v) in an undirected graph, u and v are
the endpoints of e and e is incident on u (or on v).
In a digraph, u & v are the origin and destination. e leaves
u and enters v.
A digraph or graph is weighted if its edges are labeled
with numeric values.
In a digraph,
Out-degree of v: number of edges coming out of v
In-degree of v: number of edges coming in to v
In a graph, degree of v: no. of incident edges to v
Path vs. Cycle

Path: a sequence of vertices <v0, , vk> s.t. (vi-1,vi) is an
edge for i = 1 to k, in a digraph.
The length of the path is the number of edges, k.
w is reachable from u if there is a path from u to w.
A path is simple if all vertices are distinct.
Cycle: a path containing at least 1 edge and for which v0 =
vk, in a digraph.
A cycle is simple if, in addition, all vertices are distinct.
For graphs, the definitions are the same, but a simple
cycle must visit 3 distinct vertices.
Connectivity
Combinatorial Facts
In a graph
0 e C(n,2) = n (n-1) / 2 O(n2)
vV deg(v) = 2e
In a digraph
0 e n2
vV in-deg(v) = vV out-deg(v) = e
A graph is said to be sparse if e O(n),
and dense otherwise.
History on Cycles/Paths
Eulerian cycle is a cycle (not
necessarily simple) that visits every
edge of a graph exactly once.
Hamiltonian cycle (path) is a cycle (path
in a directed graph) that visits every
vertex exactly once.
Examples for Isomorphic Graphs
Acyclic: if a graph contains no simple cycles

Connected: if every vertex of a graph can reach
every other vertex
Connected: every pair of vertices is connected
by a path
Connected Components: equivalence classes of
vertices under is reachable from relation
Strongly connected: for any 2 vertices, they can
reach each other in a digraph
G = (V, E) & G = (V, E) are isomorphic, if a
bijection f : V V s.t. v, uE iff (f(v), f(u)) E.
44
Free Trees, Forests, and DAGs
Graph Representations
Let G = (V, E) be a digraph with n = |V| & e = |E|
Adjacency Matrix: a n x n matrix for 1 v,w n
A[v,w] = 1 if (v,w) E and 0 otherwise
If digraph has weights, store them in matrix.
Dense graphs: O(V2) memory
Free Tree
Forest
DAG
2. Breadth-First-Search (BFS)
Adjacency List: an array Adj[1n] of pointers

where for 1 v n, Adj[v] points to a linked list
containing the vertices which are adjacent to v.
If the edges have weights, then they may
also be stored in the linked list elements.
Sparse graphs: O(V+E) memory
Breadth-First-Search (BFS)
Given:
G = (V, E)
A distinguished source vertex s
Algorithm discovers all vertices at distance k

from s before discovering vertices at distance
k+1.
explores the edges of G to discover every vertex

that is reachable from s
Computes (shortest) distance from s to all
reachable vertices
Produces a breadth-first-tree with root s that
contains all reachable vertices
BFS colors each vertex:

white -- undiscovered
gray -- discovered but not done yet
black -- all adjacent vertices have been
discovered
45
BFS(G,s)
white: undiscovered
1. for each vertex u in (V[G] \ {s}) gray: discovered
black: finished
2
do color[u] white
3
d[u]
Q: a queue of discovered
4
[u] nil
vertices
5 color[s] gray
color[v]: color of v
6 d[s] 0
d[v]: distance from s to v
[u]: predecessor of v
7 [s] nil
8 Q
9 enqueue(Q,s)
10 while Q
11
do u dequeue(Q)
12
for each v in Adj[u]
13
do if color[v] = white
14
then color[v] gray
15
d[v] d[u] + 1
Analysis of BFS
Finding Shortest Paths
Initialization: O(V).
Traversal Loop
After initialization, each vertex is enqueued and
dequeued at most once, and each operation
takes O(1) : total time for queuing is O(V).
The adjacency list of each vertex is scanned at
most once: the sum of lengths of all adjacency
lists is (E).
Given an undirected graph and source

vertex s, the length of a path in a graph
(without edge weights) is the number of
edges on the path. Find the shortest
path from s to each other vertex in the
graph.
Summing up over all vertices => total running

time of BFS is O(V+E), linear in the size of the
adjacency list representation of graph.
Brute-Force: enumerate all simple paths

starting from s and keep track of the
shortest path arriving at each vertex.
There may be n! simple paths in a
graph
BFS for Shortest Paths
1
1
3
S
Shortest-Path distance (s, v) from s to v is

the minimum number of edges in any
path from vertex s to vertex v, or else if
there is no path from s to v.
3
2
2
S
1
Shortest Paths
A path of length (s, v) from s to v is said

to be a shortest path from s to v.
2
2
Finished
Discovered
3
Undiscovered
46
BFS for Shortest Paths

[Theorem] Let G = (V, E) be a directed or
undirected graph, and suppose that BFS is
run on G from a given source vertex sV.
Then, during its execution, BFS discovers
every vertex vV that is reachable from the
source s, and upon termination, d[v]= (s, v)
for all vV.
Moreover, for any vertex v s that is
reachable from s, one of the shortest paths
from s to v is the shortest path from s to [v]
followed by the edge ([v], v).
3. Depth-First-Search (DFS)
Explore edges out of the most recently
discovered vertex v
When all edges of v have been explored,
backtrack to explore edges leaving the
vertex from which v was discovered (its
predecessor)
Search as deep as possible first
Whenever a vertex v is discovered during a
scan of the adjacency list of an already
discovered vertex u, DFS records this event
by setting predecessor [v] to u.
DFS(G)
1. for each vertex u V[G]
2.
3.
do color[u] WHITE
[u] NIL
4. time 0
5. for each vertex u V[G]
6.
7.
do if color[u] = WHITE
then DFS-Visit(u)
Breadth-First Tree
For a graph G = (V, E) with source s, the predecessor
subgraph of G is G = (V , E) where
V ={vV : [v] NIL}U{s}
E ={([v],v)E : v V - {s}}
The predecessor subgraph G is a breadth-first tree if:
V consists of the vertices reachable from s
for all vV , there is a unique simple path from s to
v in G that is also a shortest path from s to v in G.
The edges in E : tree edges (|E | = |V | - 1)
There are potentially many BFS trees for a given
graph.
Depth-First Trees
Coloring scheme is the same as BFS.
The predecessor subgraph of DFS is G = (V , E)
where E ={([v],v) : v V and [v] NIL}.
The predecessor subgraph G forms a depth-first
forest composed of several depth-first trees.
The edges in E are called tree edges.
Each vertex u has 2 timestamps:
d[u]: records when u is first discovered (grayed)
f[u]: records when the search finishes
(blackens)
For every vertex u, d[u] < f[u].
DFS-Visit(u)
1. color[u] GRAY
White vertex u has been discovered
2. d[u] ++time
3. for each vertex v Adj[u]: Explore edge (u,v)
4.
do if color[v] = WHITE
5.
then [v] u
6.
DFS-Visit(v)
7. color[u] BLACK
Blacken u; it is finished.
8. f[u] time++
47
Analysis of DFS
Loops on lines 1-2 & 5-7 take (V) time,
excluding time to execute DFS-Visit.
DFS-Visit is called once for each white vertex
vV when its painted gray the first time.
Lines 3-6 of DFS-Visit is executed |Adj[v]|
times.
The total cost of executing DFS-Visit is
vV|Adj[v]| = (E)
Total running time of DFS is (V+E).
Properties of DFS
Predecessor subgraph G forms a forest
of trees (the structure of a depth-first tree
mirrors the structure of DFS-Visit)
The discovery and finishing time have
parenthesis structure, i.e. the parentheses
are properly nested.
Parenthesis Theorem
In any DFS of a graph G = (V, E), for any two
vertices u and v, exactly one of the
followings holds:
the interval [d[u], f[u]] and [d[v], f[v]] are
entirely disjoint
the interval [d[u], f[u]] is contained entirely
within the interval [d[v], f[v]], and u is a
descendant of v in the depth-first tree, or
the interval [d[v], f[v]] is contained entirely
within the interval [d[u], f[u]], and v is a
descendant of u in the depth-first tree
Nesting of Descendent Intervals

Vertex v is a proper descendant of vertex u
in the depth-first forest for a (direct or
undirected) graph G if and only if d[u] <
d[v] < f[v] < f[u]
White-Path Theorem
In a depth-first forest of a graph G, vertex v
is a descendant of vertex u if and only if at
the time d[u] that the search discovers u,
vertex v can be reached from u along a
path consisting entirely of white vertices.
48
Classification of Edges
Algorithm for Edge-Classification
DFS can be used to classify edges of G:

1. Tree edges: edges in the depth-first forest G .
Edge (u, v) is a tree edge if v was first
discovered by exploring edge (u, v).
2. Back edges: edges (u, v) connecting a vertex u
to an ancestor v in a depth-first tree. Selfloops are considered to be back edges.
3. Forward edges: nontree edges (u, v)
connecting a vertex u to a descendant v in a
depth-first tree.
4. Cross edges: all other edges.
Modify DFS so that each edge (u, v) can be

classified by the color of the vertex v that is
reachable when the edge is first explored:
1. WHITE indicates a tree edge
2. GRAY indicates a back edge
3. BLACK indicates a forward or cross edges
4. Topological Sort
Topological Sort
A directed acyclic graph (DAG) arises in many

applications where there are precedence or ordering
constraints (e.g. scheduling problems).
For instance, if there are a series of tasks to be
performed, and certain tasks must precede other
tasks (e.g. in construction you must build the first
floor before the second, but you can do the electrical
wiring while you install the windows).
In general, a precedence constraint graph is a DAG,
in which vertices are tasks and edge (u,v) means that
task u must be completed before task v begins.
Given a directed acyclic graph (DAG) G = (V,

E), topological sort is a linear ordering of all
vertices of the DAG such that if G contains an
edge (u, v), u appears before v in the ordering.
Theorem
In a depth-first search of an undirected graph
G, every edge of G is either a tree edge or a
back edge.
In general, there may be many legal

topological orders for a given DAG.
Topological-Sort (G)
1. Call DFS(G) to compute finishing time
f[v] for each vertex
2. As each vertex is finished, insert it onto
the front of linked list
3. Return the linked list of vertices
49
Another Example
Analysis of Topological-Sort
Consider the example given in the book, but

we do our DFS in a different order, so we get
a different final ordering. But, both are
legitimate, given the precedence constraints.
1/10 U- shorts
2/9
3/6
belt
4/5
jacket
pants
shirt
11/14
tie
12/13
socks 15/16
Line1: DFS takes (V+E)

Line 2: Each insertion takes O(1) for |V| vertices
Total: (V+E) time
Lemma
7/8 shoes
A directed graph G is acyclic if and only if the

DFS of G yields no back edges.
Final Ordering: socks, shirt, tie, u-shorts, pants, shoes, belt, jacket
A Compact Version: TopSort(G)

1.
2.
3.
4.
5.
5. Strongly Connected Components
for each uV color[u] = white;// initialization

L = new linked_list; // L: empty linked list
for each uV
if (color[u] == white) TopVisit(u);
return L;
// L gives final order
Digraphs are often used in communication and

transportation networks, where people want to
know that the networks are complete in the
sense that from any location it is possible to
reach another location in the digraph.
TopVisit(u) {
// start search at u
1. color[u] = gray;
// mark u visited
2. for each vAdj(u)
3.
if (color[v] == white) TopVisit(v);
4. append u to the front of L // finish & add u to L
}
A digraph is strongly connected, if for every

pair of vertices u, v V, u can reach v and vice
versa. We say that two vertices u and v are
mutually reachable.
Mutual reachability is an equivalence relation.
Component DAG
Ordering DFS
If we merge the vertices in each strong component into a single super vertex, and joint
two super vertices (A, B) if and only if there
are vertices u A and v B such that (u, v)
E, then the resulting digraph, called the
component digraph, is necessarily acyclic.
c
a
d
a, b, c
b
h
f
e
d, e
f, g, h, i
Once the DFS starts within a given strong component, it must visit every vertex within the
component (and possibly some others) before
finishing.
If we dont start with reverse topological order,
then the search may leak out into other strong
components.
However, by visiting components in reverse
topological order of the component tree, each
search cannot leak out into other components,
since they would have already been visited earlier
in the search.
50
StrongComp(G)
1. Run DFS(G) to compute finish time f[u] for each
vertex u
Lecture 8-1
2. Compute R = Reverse(G), reversing all edges of G

3. Sort the vertices of R (by CountingSort) in
decreasing order of f[u]
4. Run DFS(R) using this order

(Chapter 23)
5. Each DFS tree is a strong component; output

vertices of each tree in the DFS forest
Total running time is (n + e)
1. Minimum Spanning Trees

Problem: Connect a set of nodes by a network
of minimal total length
Applications:
Communication networks
Circuit design
Layout of highway systems
Motivation
To minimize the length of a connecting
network, it never pays to have cycles.
The resulting connection graph is
connected, undirected, and acyclic, i.e., a
(free) tree: this is the MST problem.
Formal Definition of MST

Given a connected, undirected, graph G = (V, E),
a spanning tree is an acyclic subset of edges
T E that connects all the vertices together.
Assuming G is weighted, we define the cost of a
spanning tree T to be the sum of edge weights
in the spanning tree
w(T) = (u,v)T w(u,v)
A minimum spanning tree (MST) is a spanning
tree of minimum weight.
Steiner Minimum Trees (SMT)

Given a undirected graph G = (V, E) with edge
weights and a subset of vertices V V, called
terminals.
We wish to compute a connected acyclic
subgraph of G that includes all terminals.
MST is just a SMT with V =V.
Not only do the edges sum to the same value,

but the same set of edge weights appear in
the two MSTs.
NOTE: An MST may not be unique.
51
Generic Approaches
Two greedy algorithms for computing MSTs:
Kruskals Algorithm
Prims Algorithm
Facts about (Free) Trees

A tree with n vertices has exactly n-1 edges
(|E| = |V| - 1)
There exists a unique path between any two
vertices of a tree
Adding any edge to a tree creates a unique
cycle; breaking any edge on this cycle restores
a tree
Generic-MST (G, w)
1. A
invariant
// A trivially satisfies
// lines 2-4 maintain the invariant

2. while A does not form a spanning tree
3.
do find an edge (u,v) that is safe for
A
4.
A A {(u,v)}
5. return A
// A is now a MST
When is an Edge Safe?

If we have computed a partial MST, and we
wish to know which edges can be added
that do NOT induce a cycle in the current
MST, any edge that crosses a respecting
cut is a possible candidate.
Intuition says that since all edges crossing
a respecting cut do not induce a cycle, then
the lightest edge crossing a cut is a natural
choice.
Intuition Behind Greedy MST

maintain in a subset of edges A (initially be
empty) : add edges one at a time, until equals
the MST.
a subset A E is viable if A is a subset of edges
in some MST.
an edge (u,v) E-A is safe if A{(u,v)} is viable.
The choice (u,v) is a safe choice to add so that
A can still be extended to form an MST
Note: if A is viable it cannot contain a cycle.
A generic greedy algorithm: repeatedly adding
any safe edge to the current spanning tree.
Definitions
A cut (S, V-S) is just a partition of the vertices
into 2 disjoint subsets.
An edge (u, v) crosses the cut if one endpoint is
in S and the other is in V-S.
Given a subset of edges A, we say that a cut
respects A if no edge in A crosses the cut.
An edge of E is a light edge crossing a cut, if
among all edges crossing the cut, it has the
minimum weight (the light edge may not be
unique if there are duplicate edge weights).
Theorem:
Let G = (V, E) be a connected, undirected graph
with real-value weights on the edges.
Let A be a viable subset of E (i.e. a subset of some
MST), let (S, V-S) be any cut that respects A, and
let (u,v) be a light edge crossing this cut.
Then, the edge is safe for A.
Proof :
show that A {(u,v)} is a subset of some MST:
1. Find arbitrary MST T containing A
2. Use a cut-and-paste technique to find
another MST T that contains A {(u,v)}
52
Let T be any MST for G containing A.

We know such a tree exists because A is
viable.
If (u, v) is in T then we are done.
If (u, v) is not in T, then add it to T, thus creating
a cycle.
Since u and v are on opposite sides of the cut,
and since any cycle must cross the cut an
even number of times, there must be at least
one other edge (x, y) in T that crosses the cut.
The edge (x, y) is not in A (because the cut
respects A).
By removing (x,y) we restore a spanning tree,
T.
Now we show
T is a minimum spanning tree
A {(u,v)} is a subset of T
T is an MST: We have
w(T) = w(T) - w(x,y) + w(u,v)
Since (u,v) is a light edge crossing the cut, we
have w(u,v) w(x,y). Thus w(T) w(T).
So T is also a minimum spanning tree.
A {(u,v)} T: Remember that (x, y) is not in A.
Thus A T - {(x, y)}, and thus
A {(u,v)} T - {(x, y)} {(u,v)} = T
2. Kruskals Algorithm
Attempts to add edges to A in increasing order
of weight (lightest edge first)
If the next edge does not induce a cycle
among the current set of edges, then it is
added to A.
If it does, then this edge is passed over, and
we consider the next edge in order.
As this algorithm runs, the edges of A will
induce a forest on the vertices and the trees
of this forest are merged together until we
have a single tree containing all vertices.
Detecting a Cycle
We can perform a DFS on subgraph induced by
the edges of A, but this takes too much time.
Use disjoint set UNION-FIND data structure.
This data structure supports 3 operations:
Create-Set(u): create a set containing u.
Find-Set(u): Find the set that contains u.
Union(u, v): Merge the sets containing u and v.
Each can be performed in O(lg n) time.
The vertices of the graph will be elements to be
stored in the sets; the sets will be vertices in
each tree of A (stored as a simple list of edges).
53
MST-Kruskal(G, w)
1.
2.
3.
4.
5.
6.
7.
8.
9.
A
// initially A is empty
for each vertex v V[G]
// O(V) time
do Create-Set(v)
// create set for each vertex
sort the edges of E by nondecreasing weight w //O(E lg E)
for each edge (u,v) E, in order by nondecreasing weight
do if Find-Set(u) Find-Set(v) // u&v on different trees
then A A {(u,v)}
Union(u,v)
//O(E lg E)
return A
3. Prims Algorithm
Consider the set of vertices S currently part of
the tree, and its complement (V-S).
We have a cut of the graph and the current set of
tree edges A is respected by this cut.
Which edge should we add next? Light edge!
Total running time is O(E lg E).
Basics of Prim s Algorithm
Implementation: Priority Queue
It works by adding leaves on at a time to the current

tree.
Start with the root vertex r (it can be any vertex).
At any time, the subset of edges A forms a single
tree. S = vertices of A.
At each step, a light edge connecting a vertex in
S to a vertex in V- S is added to the tree.
The tree grows until it spans all the vertices in V.
Priority queue implemented using heap can support the

following operations in O(lg n) time:
Insert (Q, u, key): Insert u with the key value key in Q
u = Extract_Min(Q): Extract the item with minimum
key value in Q
Decrease_Key(Q, u, new_key): Decrease the value of
us key value to new_key
Implementation Issues:
How to update the cut efficiently?
How to determine the light edge quickly?
All the vertices that are not in the S (the vertices of the
edges in A) reside in a priority queue Q based on a key
field. When the algorithm terminates, Q is empty. A =
{(v, [v]): v V - {r}}
MST-Prim(G, w, r)
1. Q V[G]
2. for each vertex u Q
// initialization: O(V) time
3.
do key[u]
4. key[r] 0
// start at the root
5. [r] NIL
// set parent of r to be NIL
6. while Q
// until all vertices in MST
7.
do u Extract-Min(Q)
// vertex with lightest edge
8.
for each v adj[u]
9.
do if v Q and w(u,v) < key[v]
10.
then [v] u
11.
key[v] w(u,v) // new lighter edge
out of v
12.
decrease_Key(Q, v, key[v])
54
Analysis of Prim
Extracting the vertex from the queue: O(lg V)
For each incident edge, decreasing the key of

the neighboring vertex: O(lg V)
The other steps are constant time.
The overall running time: O(V lg V + E lg V) =

O(E lg V) : same as Kruskals
1. Definitions for Shortest Paths

Example: street map or road map
Cities or intersections: vertices
Distance between adjacent intersections:
weight of the edge.
Directed graph G=(V,E)
Weight function: E -> R
Weight of path from vertex u to vertex v: sum of
edge weights along the path.
shortest path ( (u,v)) from vertex u to vertex v:
path of minimum weight
Lecture 8-2
Shortest Path
(Chapter 24)
Single-Source Shortest Paths

Given a directed graph G = (V, E) with edge
weights and a distinguished source vertex,
sV, determine the distance from the source
vertex to every vertex in the graph.
BFS finds short-paths from a single source
vertex to all other vertices in O(n+e) time,
assuming the graph has no edge weights.
Edge weights can be negative; but in order for
the problem to be well-defined there must be
no cycle whose total cost is negative.
Variants
Single-destination shortest-paths problem:
Find a shortest path to a given destination
vertex t from every vertex v V.
Single-pair shortest-path problem: Find a
shortest path from u to v for given vertices u
and v.
A shortest path contains other shortest paths
within it
Greedy: Dijkstras algorithm
Dynamic: Floyd-Warshall algorithm
[lemma] Subpaths of shortest paths are
shortest paths
vi
All-pairs shortest-paths problem: Find a

shortest path from u to v for every pair of
vertices u and v.
v1
vj
vk
55
Triangle inequality
Well-definedness
[Lemma] (s,v) (s,u) + w(u,v), for each (u,v)

s
Negative-weight cycle in path: shortest paths

are not well defined.
<0
[Theorem] (s,v) (s,u) + (u,v)

s
v
u
If there is a negative-weight cycle on some path from

s to v, we define (s,v) = -
(s,v) = : v is not reachable from s
Cycles
Can a shortest path contain a cycle?
Negative-weight cycle
Positive-weight cycle
0-weight cycle
WLOG, we can assume that we find a cycle-free
shortest path.
at most |V|-1 edges
Representing shortest paths

During execution: Predecessor subgraph
Similar to BFS
For each v, maintain a predecessor of v
At termination: Shortest path tree G=(V,E)
V: set of reachable vertices from s
G: rooted tree with root s
For all v in V, the unique path from s to v in
G is a shortest path from s to v in G
56
Relaxation
For each v, maintain d[v], shortest path estimate:
an upper bound on the weight of the shortest path for
each vertex v.
This value will always be greater than or equal to the
true shortest path distance from s to v.
Initially,all d[v]= & d[s]=0.
As the algorithm goes on and sees more vertices, it
tries to update d[v] for each vertex in the graph, until
all d[v] values converge to true shortest distances.
Relaxing an edge (u,v): testing whether we can improve
the shortest path to v by going through u (if yes, then
update d[v])
O(V) time
Find Shortest Path by Relaxation

If the solution is not yet an optimal value, then push a
little closer to the optimum.
If we find a path from s to v shorter than d[v], then
update d[v].
Consider an edge (u,v) with weight w(u,v).
Suppose that we have already computed current
estimates on d[u] and d[v].
We know that there is a path from s to u of weight
d[u].
By taking this path and following it with the edge
(u,v) we get a path to v of length d[u]+w(u,v).
If this path is better than the existing path of length
d[v] to v, we should take it.
Relax (u, v, w)
1. if d[v] > d[u] + w(u,v) // is the path thru u shorter?
2. then d[v] d[u] + w(u,v);
// yes, then take it.
[v] u;
3.
// the shortest way back to the source is thru u
// by updating the predecessor pointer
NOTE: If we perform Relax (u, v, w) repeatedly
over all edges of the graph, all the d[v] values
will eventually converge to the true final
distance values from s.
How to do this most efficiently? (how many
times? order?)
2. Bellman-Ford Algorithm
Simple
Allow negative edge weight
Can test whether a negative weight cycle is
reachable from the source
Relax each edge many times: progressively
decreasing an estimate d[v] until it achieves
the actual shortest-path weight (s,v)
O(VE) time
57
3. Shortest paths in DAG

Consider weighted DAG
Shortest path: always well defined
no negative weight cycle
Use topological sort
Relax edges in topologically sorted order
just one pass
Runs in linear time (adjacent list): O(V+E)
Application
Determine critical paths in PERT(program
evaluation and review technique) chart
analysis
Edge: job to be performed
Edge weight: time required to perform the job
A path: a sequence of jobs that must be
performed in a particular order
Critical path: a longest path through the DAG
u
job (u,v) must be performed prior to job (v,x)
58
We can find a critical path by either
Negating thed edge weights
Run DAG-SHORTEST-PATH
Or
Run DAG-SHORTEST-PATH with
modification:
Initialize-single-source: replace by -
Relax: replace > by <
Lecture 9- 1/9-2
Single-Source Shortest Paths

(Chapter 24)
4. Dijkstras Algorithm
Non-negative edge weights
Maintain a subset of vertices, S V, for which we know
their true distance d[u] = (s,u).
Initially S =; set d[s]=0 and all others to .
Greedy approach:
O(logV)
select a vertex u from V-S with minimum d[u]

Add u to S
Relax all edges leaving u.
Store the vertices of V - S in a priority queue (heap),
where the key value of each vertex u is d[u]: all
operations can be done in O(logV) time.
O(ElogV) time
DECREASE_KEY: O(logV)
O(ElogV)
7
a 0
3 2
7
3 2
Black: in S
a 0
Black: in S
59
7
a 0
3 2
Black: in S
Black: in S
10
7
3 2
3 2
2
2
Black: in S
Black: in S
7
0
7
3 2
7
3 2
3 2
2
2
Black: in S
Black: in S
60
The Sequence of Relaxations
7
0
3 2
2
7
Black: in S
Correctness
(s,v): length of true shortest path from s to v.

Need to show: d[v] = (s,v) for each v when the
algorithm terminates
[Lemma] When a vertex u is added to S,
d[u]=(s,u).
Proof:
Suppose that at some point Dijkstras
algorithm first attempts to add a vertex u to S
for which d[u] (s,u).
Note d[u] is never less than (s,u), thus
d[u]>(s,u).
Proof (I)
Just prior to the insertion of u, consider the true
shortest path from s to u.
Because s S and u V - S, at some point this
path must first jump out of S.
Let (x, y) be the edge where it jumps out, so
that x S and y V - S.
(It might happen that x=s and/or y=u).
Proof (II): y u
We argue that y u after all
Since x S we have d[x]=(s,x). (Remember that
u was the first vertex added to S that violated
this criterion.)
Since we applied relaxation to x when it was
added, we would have set d[y] = d[x] + w(x,y) =
(s,y).
Since (x,y) is on a shortest path from s to u, it is
on a shortest path from s to y. Thus d[y] is now
correct.
By hypothesis, d[u] is not correct, so u and y
cannot be the same.
61
Conclusion of Proof
Correctness
Notice:
We have just proved that, when a vertex u

is added to S, d[u]=(s,u).
y appears somewhere along the shortest path

from s to u (but not at u)
Every vertex is eventually added to S, so

the algorithm assigns the correct
distances to all vertices.
All subsequent edges following y are of weight

0,
Therefore, (s,y) (s,u), so d[y]=(s,y) (s,u)<d[u]
This guarantees that the shortest-paths

tree is correct because relaxation
correctly updates parent pointers.
But, d[u] d[y] because u is added before y, and

we add nodes with lower d values first.
Contradiction!
1. Definitions
All Pair Shortest Paths

(Chapter 25)
n vertex directed graph G = (V, E)

Adjacency matrix representation: W = (wij)
wij = 0 if i = j
wij = the weight of directed edge if i =/= j , (i, j) in E
wij = if i =/= j , (i, j) not in E
Tabular output n x n matrix D = (dij)
dij: weight of a shortest path form vertex i to vertex j
(i, j) : the shortest path weight from vertex i to
vertex j
Predecessor matrix: = (ij)
Predecessor subgraph G,i = (V,i, E,i)
2. Shortest paths & matrix multiplication

Step 1. The structure of a shortest path
Consider a shortest path p from i to j : at most m
edges
Decompose p into p + (k, j)
p: a shortest path from i to k (at most m-1
edges)
i
<= m-1 edges

(i,
j) = (i, k) + wkj
ks
<= m-1 edges
62

lij(m) : min. weight of any path from i to j containing at
most m edges
lij(0) = 0 if i = j
lij(0) = if i =/= j
lij(m) = min {lij(m-1), min(1 k n){lik(m-1) + wkj }}
Step 3. Computing the shortest-path weights

bottom up
Input: W = (wij)
Compute a series of matrices L(1), L(2), , L(n-1),
where L(m) = (lij(m))
= min(1 k n) {lik(m-1) + wkj } k: predecessor
L(1) = W
Actual shortest path weights: no negative weight cycle
(i, j) = lij(n-1) = lij(n) = lij(n+1) =
Matrix multiplication
cij = k = 1 to n aik .bkj

Substitution
lij(m) = min(1 k n) {lik(m-1) + wkj }

l(m-1) -> a
w -> b
l(m) -> c
min -> +
+ -> .
O(n3)
Compute shortest path by extending shortest

path edge by edge:
L(1) = L(0)W = W
L(2) = L(1)W = W2
L(3) = L(2)W = W3

L(n-1) = L(n-2)W = Wn-1
O(n3)
63
min(1 k n) {lik(m-1) + wkj }
O(n4)
Improving running time

Repeated sequencing:
L(1) = W
L(2) = W2 = WW
L(4) = W4 = W2 W2
L(8) = W8 = W4 W4

L(2 log(n-1)) = W 2 log(n-1) = W 2 log(n-1) 1 W 2 log(n-1) 1
L(2 log(n-1)) =
L(n-1)
(2 log(n-1) >=
O(n3logn)
n-1)
Lecture 10- 1
All Pair Shortest Paths

(Chapter 25)
3. The Floyd-Warshall algorithm

Step 1. The structure of a shortest path
Intermediate vertex of a simple path p =<v1, v2,
, vl>: any vertex of p other than v1 & vl
Consider a subset {1, 2, .., k} of V for some k
Consider all paths from i to j whose
intermediate vertices are from {1, 2, .., k}
p: minimum weight path among them
Relationship between p and shortest paths from
i to j with intermediate vertices {1, 2, , k-1}:
whether or not k is an intermediate vertex of p
64
k is not an intermediate vertex of p:

k is an intermediate vertex of p: we break p into
p1 (i to k) and p2 (k to j)
Intermediate vertices of p1 & p2: {1, 2, , k-1}

dij(k) : weight of a shortest path from i to j with
intermediate vertices {1, 2, , k}
dij(k) = wij if k = 0
dij(k) = min (dij(k-1) , dik(k-1) + dkj(k-1) ) if k>=1
D(n) = (dij(n) ): dij(n) = (i, j)
Step 3. Computing the shortest path weights

bottom up
dij(k) = min (dij(k-1) , dik(k-1) + dkj(k-1) )
O(n3)
Step 4. Constructing a shortest path

Predecessor matrix: = (ij)
Compute a sequence of matrices (0), (1),.., (n),
ij (k): predecessor of vertex j with intermediate
vertices {1, 2, .., k}
ij (0) = NIL if i = j or wij =
ij (0) = i if i =/= j and wij <
ij (k) = ij (k-1) if dij(k-1) <= dik(k-1) + dkj(k-1)
ij (k) = kj (k-1) if dij(k-1) > dik(k-1) + dkj(k-1)
3. Transitive closure of directed graph

Given a directed graph, we may want to know
whether there is a path from i to j, for all pairs,
The transitive closure of G: a graph G* = (V, E*),
E*={(i,j): there is a path from i to j in G}
We can use Floyd-Warshall algorithm
Assign weight 1 to each edge of E
Run Floyd-Warshall algorithm
If there is a path from i to j, then we get dij < n
(otherwise dij = )
65
To save time & space, use logical AND (+),

logical OR (min)
tij(k) =1 if there is a path from i to j with
intermediate vertices {1, 2, , k}; 0 otherwise
tij(n) =1 (i,j) in E*
tij(0) = 0 if i=/= j and (i,j) not in E
tij(0) = 1 if i= j or (i,j) in E
tij(k) = tij(k-1) OR (tik(k-1) AND tkj(k-1) ) if k>=1
Lecture 10- 2
Maximum Flow
(Chapter 26)
0. Motivation
Flow network:
Interpret a directed graph as a flow network
Each directed edge: conduit for the material
Flow network:
Each conduit has a capacity: max. rate
The source produce material at some rate and the sink

consume the material at the same rate.
Flow of the material at any point in the system: the
rate at which the material moves
Flow network can model:
Liquids flowing through pipes
Parts thru assembly lines
Current thru electrical networks
Vertices: conduit junctions

Flow conservations: rate of the material entering the
vertex = rate of the material leaving the vertex
Maximum Flow problem
We want to compute the greatest rate at which
material can be shipped from the source to the sink
without violating any capacity constraints
Information thru communication networks
66
Example of flow network
1. Flow Networks
flow network G=(V, E): a directed graph
Each edge (u, v) in E has nonnegative
capacity c(u,v)>0
If (u,v) not in E: we assume c(u,v)=0
Two distinguished vertices : a source s and
a sink t
Assume G is connected: every vertex v lies
on a path from s to t
Flow: given a flow network G, a flow f is a realvalued function f: VxV -> R that satisfies the
following properties
Capacity constraints: for all u, v in V, we
require f(u,v) <= c(u,v)
Skew symmetry: for all u, v in V, we require
f(u,v) = -f(v,u)
Flow conservation: for all u in V-{s,t}, we
require v in V f(u,v) = 0, for all v in V
Neither (u,v) or (v,u) in E: f(u,v)=f(v,u)=0
f(u,v): flow from u to v

positive, 0, or negative
Value of a flow f: |f| = v in V f(s,v)
the total flow out of the source
Maximum flow problem: we wish to find a
flow of maximum value
Example of flow network
Total positive flow entering a vertex v:

u in V, f(u,v)>0 f(u,v)
Total net flow at a vertex(other than s & t):

(Total positive flow leaving a vertex) (Total
positive flow entering the vertex) = 0
Flow in equals flow out
Flow conservation property
67
Cancellation
Networks with multiple sources and sinks
WLOG, we can say positive flow goes either

from u to v or from v to u, but not both
if not true, can transform by cancellation to be
true
Set of m factories {s1, s2, , sm} and a set of n

warehouses {t1, t2, , tn}
Add a super source s
2/3
1/2
1/3
0/2
Cancel 1 unit in each direction
Directed edge (s, si) with capacity c(s, si) =

Add a super sink t
Directed edge (ti, t) with capacity c(ti, t) =
Implicit summation notation

Function over sets imply summation: simplicity
(eg) X, Y: set of vertices, f(X,Y) = x in X y in Y f(x,y)
|f| = v in V f(s,v) = f(s,V)
f(u,V)= 0: flow conservation
[lemma] G: flow network and f: flow of G. then:
1. For all subset X of V, f(X,X) = 0
2. For all subset X, Y of V, f(X,Y) = - f(Y,X)
3. For all subset X, Y, Z of V, f(X U Y, Z) = f(X,Z) + f(Y,Z) and
f(Z, X U Y) = f(Z,X) + f(Z,Y)
2. Ford-Fulkerson method
|f| = f(s,V)
= f(V,V) f(V-s,V) : 3
Residual networks
= -f(V-s, V) : 1
Augmenting paths
= f(V, V-s) : 2
Cuts
= f(V,t) + f(V, V-s-t) : 3

= f(V, t): flow conservation
68
Residual networks
Residual capacity cf(u,v) = c(u,v) - f(u,v)
Residual network Gf = (V, Ef)
Ef = {(u,v) in VxV: cf (u,v)>0}
Gf : flow network with capacity cf
[lemma] G=(V,E): flow network, f: flow of G
Gf: residual network of G induced by f
f: flow in Gf
Then, the flow sum f+f is a flow in G with value
|f+f| = |f|+|f|.
* Function VxV to R: (f+f)(u,v) = f(u,v) + f(u,v)
Augmenting paths
Augmenting path p: simple path from s to t in Gf
Residual capacity of p:
cf (p)= min{cf (u,v): (u,v) in p}
[lemma] G=(V,E): flow network, f: flow of G, p:
augmenting path in Gf, define fp: VxV -> R by
fp (u,v) = cf (p) if (u,v) on p
= -cf (p) if (v,u) on p
= 0 otherwise
Then fp is a flow in Gf with |fp|= cf (p)>0
[corollary] define f: VxV -> R by f= f + fp.
Then f is a flow of G with |f| = |f|+|fp| > |f|
Cuts of flow networks

A cut (S,T) of flow network G: partition of V
into S and T = V-S such that s in S and t in T
f(S,T): the net flow across the cut (S,T)
c(S,T): capacity of the cut
Minimum cut: a cut whose capacity is
minimum over all cuts of the network
[lemma]
f: flow of a flow network G, (S,T): a cut of G.
Then the net flow across (S,T) f(S,T) = |f|
<pf>
f(S,T) = f(S,V) f(S,S) : 3
= f(S,V) : 1
= f(s,V) + f(S-s,V) :3
= f(s,V) : flow conservation
Capacity c(S,T) = 12 + 14 = 26
Net flow f(S,T) = 12 + 11 - 4 = 19
= |f|
69
max-flow min-cut theorem

[corollary]
The value of any flow f is bounded by the
capacity of any cut of G
[theorem] f: flow of a flow network G

Then, the following conditions are equivalent:
1. f: maximum flow in G
<pf>
|f| = f|S,T| = u in S v in T f(u,v)
<= u in S v in T c(u,v) = c(S,T)
2. The residual network Gf contains no augmenting

path
3. |f|= c(S,T) for some cut (S,T) of G
<pf> 1=> 2: contradiction
3=> 1: |f| <= c(S,T) for all cut by corollary
|f| = c(S,T) implies f is a maximum flow.
Ford-Fulkerson algortihm
<pf> 2=>3
Suppose that Gf has no augmenting path
Define S={v in V| there is a path from s to v in Gf}
T= V-S
(S,T); a cut
For each vertices u in S and v in T, f(u,v)=c(u,v)
(otherwise, (u,v) in Gf, then v in S)
By lemma, |f| = f(S,T)=c(S,T)
O( E |f*| ), f*: maximum flow found by the algorithm

with integral capacities
O(E): time to find augmenting path by DFS or BFS
70
The Edmond-Karp algorithm

Computing the augmenting path p with
breadth first search: shortest path from s to t
in the residual network, where each edge has
unit distance(weight)
O(VE2)
[lemma] E-K algorithm: for all vertices v in V{s,t}, the shortest path distance f (s,v) in the
residual network Gf increases monotonically
with each flow augmentation
[theorem] E-K algorithm: the total # of flow
augmentation is O(VE)
3. Maximum bipartite matching

Problem definition: given an undirected graph
G=(V,E), a matching M is a subset of edges E
such that for all vertices, at most one edge of M
is incident to v.
v is matched by matching M if some edge in M
is incident on v; otherwise unmatched
A maximum matching M is a matching of
maximum cardinality: |M| > = |M| for any
matching M
Bipartite graph: V can be partitioned into LUR,
L, R: disjoint and all edges in E go between L &
R
Application of maximum bipartite matching

Set of L machines & set of R tasks to be
performed simultaneously
Presence of edge (u,v) in M: machine u is
capable of performing a task v
Finding a maximum bipartite matching

Use Ford-Fulkerson method
Corresponding flow network G=(V,E) for the
bipartite graph G
V = V U {s, t}
E = {(s,u): u in L}
U {(u,v): u in L, v in R, and (u,v) in E}
U {(v,t): v in R}
Assign unit capacity to each edge in E
|E| = O(E)
71
[lemma]
G=(V,E): a bipartite graph with vertex partition V
=LUR
G=(V,E): its corresponding network
If M is a matching in G, then there is an integervalued flow f in G with value |f| = |M|.
Conversely, if f is an integer-valued flow in G,
then there is a matching M in G with
cardinality |M| =|f|.
[theorem] if the capacity function c takes on

only integral values, then the maximum flow f
produced by Ford-Fulkerson method has the
property that |f| is integer-valued.
Moreover, for all vertices u and v, the value
f(u,v) is an integer.
[corollary] the cardinality of maximum matching
M in a bipartite graph G is the value of a
maximumm flow f in its corresponding flow
network G.
O(VE) = O(VE)
Lecture 11- 1
Amortized analysis
Time required to perform a sequence of data structure
operations is averaged over all the operations
performed
Show that the average cost of an operation is small
No probability involved
Amortized Time Complexity

(Chapter 17)
Guarantee the average performance of each operation

in the worst case
Three techniques
1. Aggregate analysis
2. Accounting method
3. Potential method
Amortized analysis
n operations take T(n) time
Average cost of an operation: T(n)/n
Imprecise: dont get separate cost for each type of
operation
Amortized analysis
3. Potential method
stored work (accounting method) viewed
as potential energy
Most flexible & powerful view
Charge each operation an (invented) amortized cost
Amount not used stored in bank
Later operations can use stored work
Balance must not go negative
72
A sequence of n operations : T(n) worst-case time
MULTIPOP(S,k): remove the k top objects of

stack S (or pops the entire stack if < k objects)
Amortized cost (worst case average cost): T(n)/n

This cost applies to each operation of different type
Example: Stack Operations
Two stack operations
PUSH(S,x)
POP(S)
O(1) time
Total cost of a sequence of n PUSH & POP: O(n)
n PUSH, POP, MULTIPOP operations

Worst case cost of a MULTIPOP: O(n)
Sequence of n operations: O(n2)
Assign differing charge to different operations:

amortized cost
Aggregate analysis
Sequence of n operations: O(n)
# of POP: at most # of PUSH operations: O(n)
Average cost per each operation: O(1)
Credit can be used later on to help pay for operations

whose amortized cost is less than their actual cost
When an operations amortized cost exceeds its actual

cost, the difference is assigned to specific objects in
the DS as credit
Must choose the amortized cost carefully
(i= 1 to n) ci > = (i= 1 to n) ci : upperbound

ci : amortized cost of ith operation
ci : actual cost of ith operation
Total credit: (i= 1 to n) ci - (i= 1 to n) ci (>= 0)
Example: Stack operations

Actual cost of operation
PUSH: 1
POP: 1
When we PUSH: use 1 to pay the actual cost,

and left 1 credit
When we POP: we pay the actual cost using the
credit
MULTIPOP: min(k,s), s: stack size

Assign amortized cost
PUSH: 2
POP: 0
MULTIPOP: 0
Show that we can pay for any sequence of stack
operations by charging the amortized costs
Since each object on the stack has 1 credit and

the stack has nonnegative # of object, we can
ensure that the amount of credit is nonnegative
Thus for any sequence of n PUSH, POP,
MULTIPOP operations, the total amortized cost
is an upper bound on the total actual cost: O(n)
73
3. Potential method
Amortized cost ci of the ith operation w.r.t. :

ci = ci + (Di) (Di-1)
Represent prepaid work as potential energy that can

be released to pay for future operation
The potential is associated with DS as a whole rather
than with specific objects within DS
Start with initial DS D0 on which n operations are
performed
For each i = 1, 2, , n
ci: actual cost of ith operation
Di: DS that results after applying ith operation to Di-1
Potential function : maps Di to (Di): real #
the potential associated with Di
Amortized cost = Actual cost + increase in potential

Total amortized cost of n operation:
(i= 1 to n) ci = (i= 1 to n) (ci + (Di) (Di-1))
= (i= 1 to n) ci + (Dn) (D0)
define (Di) >= (D0) for all i, guarantee that we pay in
advance
Different potential functions may yield different
amortized costs, yet upper bound on the actual cost
Example: Stack operation
: # of objects in the stack
Empty stack: (D0)= 0

(Di) >= 0 = (D0) : # of objects in stack is never be
negative (nonnegative potential)
ith operation on a stack with s objects is PUSH:
Amortized cost of each of three operations:

O(1)
Total amortized cost of a sequence of n
operations: O(n)
(Di) (Di-1) = (s+1) s = 1

ci = ci + (Di) (Di-1) = 1+ 1 = 2
ith operation on a stack is MULTIPOP(S,k) and
k=min(k,s) objects are popped off the stack:
(Di) (Di-1) = -k
Total amortized cost of n operations: upper

bound on the total actual cost
Worst case cost of n operation: O(n)
ci = ci + (Di) (Di-1) = k- k = 0
Similarly, amortized cost of POP: 0
Lecture 11- 2
Advanced Data Structure:

Disjoint sets
(Chapter 21)
1. Disjoint set
Disjoint set data structure: maintain a
collection S = {S1, S2, , Sk} of disjoint
dynamic sets
Each set is identified by a representative
Operations:
MAKE-SET(x): create a new set whose only
member is x (x not already be in other set)
UNION(x, y): unite the dynamic sets that
contain x and y, Sx and Sy, into a new set that
is the union of these two sets (destroy Sx, Sy)
FIND-SET(x): return a pointer to the
representative of the set containing x
74
Application
MSP algorithm: Kruskals algorithm
Finding the connected components of an
undirected graph
Representation
1. Linked list representation
2. Rooted tree representation: better time
complexity
2. Linked list representation

First object in each linked list: representative
Each object:
contain a set member
A pointer to the object containing the next
set member
A pointer back to the representative
Each list:
head: pointer to the representative
tail: pointer to the last object in the list
75
MAKE-SET, FIND-SET: O(1) time

UNION
1. Simple implementation
UNION(x,y): append xs list to the end of ys
list
Use tail to find where to append
must update the pointer to the representative
for each object on xs list: time linear in the
length of xs list
A sequence of m operations on n objects:
O(n2) time
Amortized time of an operation: O(n)
n MAKE-SET: O(n)
n-1 UNION:
(i=1 to n-1) i = O(n2)
m = 2n-1 operations
Each operation: O(n)
amortized time complexity
3. Disjoint-set forests
2. Weighted-union heuristic
Faster implementation
Represent sets by rooted trees
Each list maintain the length of the list

Always append the smaller list onto the longer
A single UNION: O(n)
A sequence of m MAKE-SET, UNION, FIND-SET
operations(n: MAKE-SET) : O(m+nlgn)
Each member points only its parent

Root of each tree: representative
Nave algorithm: no faster than linked-list
Two heuristics:
Union by rank
Path compression
FIND-SET: follow parent pointer until the root (find
path: nodes visited on the path)
UNION: root of one tree to point to the root of the other
Heuristics
A sequence of n-1 UNION: create a linear chain of n
nodes
Two heuristics: almost linear running time in total
number of m operations
1. Union by rank
Similar to weight-union heuristic
Make the root of tree with fewer nodes point to the
tree with more nodes
For each node, we maintain a rank: upper bound on
the height of the node
The root with smaller rank point to the root with
larger rank
76
Heuristics
FIND-SET(a)
2. Path compression
Simple & effective
Use it during FIND-SET operations
Make each node on the find path point
directly to the root
Pseudo code
1. UNION-by-rank
For each node x, rank[x]: upper bound on the height
of x (# of edges in the longest path between x and a
descendant leaf)
MAKE-SET: initial rank = 0
FIND-SET: rank unchanged
UNION
Roots with unequal rank: root of higher rank be
the parent of the root of lower rank
Roots with equal rank: arbitrarily choose one of
the roots as the parent and increment its rank
Pseudo code
Time complexity
2. Path compression
Union by rank only: O(mlogn)
FIND-SET: two pass method
Both union by rank & path compression: O(m(n)) in

worst case time
One pass up the find path to find the root
(n) : very slowly growing function
Second pass back down the find path to update

each node s. t. it points directly to the root
(n) <= 4 for all practical purpose

Amortized cost of each MAKE-SET: O(1)
Amortized cost of each LINK: O((n))
Amortized cost of each FIND-SET: O((n))
A sequence of m MAKE-SET, UNION, FIND-SET
operations (n: MAKE-SET), with union by rank and
path compression: O(m(n)) in worst case time
77
Lecture 12- 1
Randomized Algorithms
78
Worst case time complexity: O(n2)

Expected running time : O(nlogn)
79
Worst case time complexity: O(n2)

Expected running time : O(n)
Las Vegas algorithm:

Success is guaranteed.
Running time is not.
Time required may vary from execution to execution,
even on the same input.
Example: quick sort. selection
Monte Carlo algorithm:

Running time is guaranteed.
Success is not.
Can reduce the failure probability by running it many
times.
Example: primality testing
Simple algorithm
Complex analysis
NP-Completeness
Lecture 12-2/13-1
Some problems seem to be intractable:

as the problem size grows large, we are unable to
solve them in reasonable time
NP-completeness
What constitutes reasonable time complexity?

Standard working definition: polynomial time
On an input of size n the worst-case running
time is O(nk) for some constant k
Polynomial time: O(n2), O(n3), O(1), O(nlgn)
Not in polynomial time: O(2n), O(nn), O(n!)
80
Polynomial-Time Algorithms
Are some problems solvable in polynomial
time?
Of course: every algorithm weve studied
provides polynomial-time solution to some
problem
We define P to be the class of problems
solvable in polynomial time
Are all problems solvable in polynomial time?
No: Turings Halting Problem is not
solvable by any computer, no matter how
much time is given
Such problems are clearly intractable, not in
P
NP-Complete Problems
The NP-Complete problems are an interesting
class of problems whose status is unknown
Computable, but .
No polynomial-time algorithm has been
discovered for an NP-Complete problem
But no superpolynomial lower bound has
been proved for any NP-Complete problem.
We call this the P = NP question
The biggest open problem in CS
Some problems seem to be intractable For
An NP-Complete Problem:
Hamiltonian Cycles
An example of an NP-Complete problem:
A hamiltonian cycle of an undirected graph is a
simple cycle that contains every vertex
The hamiltonian-cycle problem: given a graph G,
does it have a hamiltonian cycle?
Cube?
Dodecahedron?
Grid graph?
a nave algorithm for solving the hamiltoniancycle problem: Running time?
Nondeterminism
Think of a non-deterministic computer as a computer

that
1. magically guesses a solution, then
2. has to verify that it is correct
Better
If a solution exists, computer always guesses it
One way to imagine it: a parallel computer that
can freely spawn an infinite number of processes
Have one processor work on each possible
solution
All processors attempt to verify that their
solution works
If a processor succeeds, then the whole
machine succeeds
P and NP
P = the set of problems that can be solved
in polynomial time
NP = the set of problems that can be solved
in polynomial time by a nondeterministic
computer
Notes:
1. both P and NP are sets of problems,
not sets of algorithms
2. NP stands for nondeterministic
polynomial time
P and NP
Summary so far:
P = problems that can be solved in polynomial time
NP = problems for which a solution can be verified
in polynomial time
Unknown whether P = NP (most suspect not)
Hamiltonian-cycle problem is in NP:
Cannot solve in polynomial time
Easy to verify solution in polynomial time
81
NP-Complete Problems
NP-Complete problems are the hardest problems
in NP:
If any one NP-Complete problem can be solved
in polynomial time
then
every NP-Complete problem can be solved in
polynomial time
and in fact every problem in NP can be solved
in polynomial time (which would show P = NP)
The Scene
All problems
Halting
problem
Ham Cycle
NP-complete
Trav Salesp
.
NP
sorting
Thus: if you solve the hamiltonian-cycle problem in

O(n100) time, youve proved that P = NP. You can
retire rich & famous.
Reduction
The crux of NP-Completeness is reducibility
Informally, a problem A can be reduced to
another problem B if any instance of A can
be easily rephrased as an instance of B,
the solution to which provides a solution to
the instance of A
What do you suppose easily means?
This rephrasing is called transformation
Intuitively: If A reduces to B, then A is no
harder to solve than B
Using Reductions
If A is polynomial-time reducible to B, we denote
this by A p B
Definition of NP-Complete:
A is NP-Complete if
A NP, and
all problems B in NP are reducible to A
Formally: A is NP-Complete if
A NP, and
B NP, B p A
If A p B and A is NP-Complete, then B is also
NP-Complete
This is the key idea that you should take away
Min sp tree
Reducibility
An example:
Problem A: Given a set of Booleans, is at least
one TRUE?
Problem B: Given a set of integers, is their sum
positive?
Transformation: (x1, x2, , xn) = (y1, y2, , yn)
where yi = 1 if xi = TRUE, yi = 0 if xi = FALSE
Another example:
Solving linear equations is reducible to solving
quadratic equations
How can we easily use a quadratic-equation
solver to solve linear equations?
Review: Tractability
Some problems are undecidable: no computer
can solve them
Turings Halting Problem
Other problems are decidable, but intractable
as the input grows large, it seems that we are
unable to solve them in reasonable time
Traveling salesperson
Hamilton cycle
Other problems are easy
Sorting
Minimum spanning tree
82
An Aside: Terminology
What is the difference between a problem
and an instance of that problem?
To formalize things, we will express
instances of problems as strings
How can we express a instance of the
hamiltonian cycle problem as a string?
To simplify things, we will worry only
about decision problems with a yes/no
answer
Many problems are optimization
problems, but we can often re-cast
those as decision problems
Why Prove NP-Completeness?

Nobody has proven that P != NP
But if you prove a problem NP-Complete,
most people accept that it is most likely
intractable
Therefore it can be important to prove
that a problem is NP-Complete
Dont need to come up with an efficient
algorithm
Can instead work on approximation
algorithms
NP-Hard and NP-Complete

If A is polynomial-time reducible to B, we denote
this A p B
Definition of NP-Hard and NP-Complete:
If all problems B NP are reducible to A, then
A is NP-Hard
We say A is NP-Complete if A is NP-Hard
and A NP
If A p B and A is NP-Complete, then B is also
NP- Complete
Proving NP-Completeness
What steps do we have to take to prove a
problem P is NP-Complete?
Pick a known NP-Complete problem Q
Reduce Q to P
Describe a transformation that maps
instances of Q to instances of P, s.t.
yes for P = yes for Q
Prove the transformation works
Prove it runs in polynomial time
prove P NP
83

Comp 3111

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Comp 3111

Hochgeladen von

Copyright:

Verfügbare Formate

COMP3111/3811 Algorithms 2

Assignments (20%): 2 assignments

You must get at least 40% on the exam to

Approximate schedule: topics are subject to

You are required to visit this URL and

No late submissions allowed for any

Week6: graph algorithms

Textbook & References

Solving a Computational Problem

Know how to write a problem specification.

Algorithm Design Paradigms

Divide and Conquer

Basic Graph Algorithms: DFS

Advanced Data Structures

COMP3811 Advanced Course

This Advanced unit covers all the material

Check the course website regularly, at least once

The two units share the same lectures,

Example: Insertion Sort

tj : # of times that while loop test is executed

Best case: already sorted

f(n) is a member of (g(n))

We say g(n) is an asymptotically tight bound for f(n)

Suppose that c2, n0 exist such that

We can verify 1/2n2 - 3n = (n2) by choosing

To show that 3logn + loglogn = (logn),

Informally, we can throw away lower-order

3logn + loglogn c2 logn : holds for n 2 by

We say g(n) is an asymptotic upper bound for f(n)

a, b, c: constants and a > 0

f(n) = an +b = O(n2), where

To prove 20n3 +10nlogn + 5 = O(n3),

We say g(n) is an asymptotic lower bound for f(n)

To prove 3logn + loglogn = (logn),

For any two functions g(n) and f(n),

f(n) = an2 +bn +c = (n2), where

The running time is O(f(n))

i.e., (g(n)) = O(g(n)) (g(n))

Running time is (f(n))

Asymptotic Notation in Equations

4n3 + 3n2 + 2n + 1 = 4n3 + 3n2 + (n)

We say g(n) is an upper bound for f(n) that is not

2n2 =/= o(n2)

For a given function g(n), we denote by (g(n)) the set

f(n) = 12n2 +6n = o(n3)

f(n) becomes arbitrarily large relative to g(n) as n

We say g(n) is a lower bound for f(n) that is not

n2/2 =/= (n2)

f(n) = 12n2 +6n = (n)

1.1 Insertion Sort

1.2 Merge Sort

(n) best case

(nlogn) worst case

Very tight bound: hidden constant is small

Works fast for small size input

1.3 Heap Sort

1.4 Quick Sort

Worst Case Partitioning

2. Lower Bounds for Sorting

The combination of good and bad splits would result

Decision Tree Model

Other sorting methods (counting sort, radix

Lower Bound for Worst Case

3. Linear time sorting algorithm