Sie sind auf Seite 1von 48

Introduction to analysis of algorithms

An algorithm is just the outline or idea behind a program. We express algorithms in pseudo-code: something resembling C or Pascal, but with some statements in English rather than within the programming language. It is expected that one could translate each pseudo-code statement to a small number of lines of actual code, easily and mechanically. This class covers the design of algorithms for various types of problems, as well as a mathematical analysis of those algorithms done independently of any actual computational experiments. The purpose of design of algorithms is obvious: one needs an algorithm in order to write a program. Analysis of algorithms is less obviously necessary, but has several purposes:

Analysis can be more reliable than experimentation. If we experiment, we only know the behavior of a program on certain specific test cases, while analysis can give us guarantees about the performance on all inputs. It helps one choose among different solutions to problems. As we will see, there can be many different solutions to the same problem. A careful analysis and comparison can help us decide which one would be the best for our purpose, without requiring that all be implemented and tested. We can predict the performance of a program before we take the time to write code. In a large project, if we waited until after all the code was written to discover that something runs very slowly, it could be a major disaster, but if we do the analysis first we have time to discover speed problems and work around them. By analyzing an algorithm, we gain a better understanding of where the fast and slow parts are, and what to work on or work around in order to speed it up.

The Fibonacci numbers


We introduce algorithms via a "toy" problem: computation of Fibonacci numbers. It's one you probably wouldn't need to actually solve, but simple enough that it's easy to understand and maybe surprising that there are many different solutions. The Fibonacci story: Leonardo of Pisa (aka Fibonacci) was interested in many things, including a subject we now know as population dynamics: For instance, how quickly would a population of rabbits expand under appropriate conditions?

As is typical in mathematics (and analysis of algorithms is a form of mathematics), we make the problem more abstract to get an idea of the general features without getting lost in detail:

We assume that a pair of rabbits has a pair of children every year. These children are too young to have children of their own until two years later. Rabbits never die.

(The last assumption sounds stupid, but makes the problem simpler. After we have analyzed the simpler version, we could go back and add an assumption e.g. that rabbits die in ten years, but it wouldn't change the overall behavior of the problem very much.) We then express the number of pairs of rabbits as a function of time (measured as a number of years since the start of the experiment):

F(1) = 1 -- we start with one pair F(2) = 1 -- they're too young to have children the first year F(3) = 2 -- in the second year, they have a pair of children F(4) = 3 -- in the third year, they have another pair F(5) = 5 -- we get the first set of grandchildren

In general F(n) = F(n-1) + F(n-2): all the previous rabbits are still there (F(n-1)) plus we get one pair of children for every pair of rabbits we had two years ago (F(n-2)). The algorithmic problem we'll look at today: how to compute F(n)?

Formulas and floating point


You probably saw in Math 6a that F(n) = (x^n - (1-x)^n)/(x - (1-x)) where x = (1+sqrt 5)/2 ~ 1.618 is the golden ratio. This solution is often used as a standard example of the method of "generating functions". So this seems to be an algorithm: compute 1.618^n - 0.618^n. Problem: how accurately do you have to know x to get the right answer? e.g. if you just use x=1.618, you get

F(3)=1.99992 -- close enough to 2? F(16)=986.698 -- round to 987? F(18)=2583.1 -- should be 2584

Instead since F(n) is defined in integers it saves some problems to stick with integers.

A recursive algorithm
The original formula seems to give us a natural example of recursion: Algorithm 1:
int fib(int n) { if (n <= 2) return 1 else return fib(n-1) + fib(n-2) }

An example of the sort of basic question we study in this class is, how much time would this algorithm take? How should we measure time? The natural measure would be in seconds, but it would be nice to have an answer that didn't change every time Intel came out with a faster processor. We can measure time in terms of machine instructions; then dividing by a machine's speed (in instructions/second) would give the actual time we want. However, it is hard to guess from a piece of pseudo-code the exact number of instructions that a particular compiler would generate. To get a rough approximation of this, we try measuring in terms of lines of code. Each call to fib returns either one or two lines. If n <= 2, we only execute one line (the if/return). if n = 3, execute 2 for fib(2), plus one each for fib(1) and fib(0): 4 It's like the rabbits! Except for the two lines in each call, the time for n is the sum of the times for two smaller recursive calls.
time(n) = 2 + time(n-1) + time(n-2)

In general, any recursive algorithm such as this one gives us a recurrence relation: the time for any routine is the time within the routine itself, plus the time for the recursive calls. This gives in a very easy mechanical way an equation like the one above, which we can then solve to find a formula for the time. In this case, the recurrence relation is very similar to the definition of the Fibonacci numbers. With some work, we can solve the equation, at least in terms of F(n): We think of the recursion as forming a tree. We draw one node, the root of the tree, for the first call then any time the routine calls itself, we draw another child in the tree.
F(5) / \ F(4) F(3) / \ / \ F(3) F(2) F(2) / \ F(2) F(1)

F(1)

The four internal nodes of this tree for fib(5) take two lines each, while the five leaves take one line, so the total number of lines executed in all the recursive calls is 13. Note that, when we do this for any call to fib, the Fibonacci number F(i) at each internal node is just the number of leaves below that node, so the total number of leaves in the tree is just F(n). Remember that leaves count as one line of code, internal nodes 2. To count internal nodes, use basic fact about binary trees (trees in which each node has 2 children): the number of internal nodes always equals the number of leaves minus one. (You can prove this by induction: it's true if there's one leaf and no internals, and it stays true if you add 2 children to a leaf.) So there are F(n) lines executed at the leaves, and 2F(n)-2 at the internal nodes, for a total of 3F(n)-2. Let's double check this on a simple example: time(5) = 3F(5) - 2 = 3(5)-2 = 13. This is kind of slow e.g. for n=45 it takes over a billion steps. Maybe we can do faster?

Dynamic programming
One idea: the reason we're slow slow is we keep recomputing the same subproblems over and over again. For instance, the tree above shows two computations of F(3). The second time we get to F(3), we're wasting effort computing it again, because we've already solved it once and the answer isn't going to change. Instead let's solve each subproblem once and then look up the solution later when we need it instead of repeatedly recomputing it. This easy idea leads to some complicated algorithms we'll see later in the section on dynamic programming, but here it's pretty simple: Algorithm 2:
int fib(int n) { int f[n+1]; f[1] = f[2] = 1; for (int i = 3; i <= n; i++) f[i] = f[i-1] + f[i-2]; return f[n]; }

This is an iterative algorithm (one that uses loops instead of recursion) so we analyze it a little differently than we would a recursive algorithm. Basically, we just have to compute for each line, how many times that line is executed, by looking at which loops it's in and how many times each loop is executed.

Three lines are executed always. The first line in the loop is executed n-1 times (except for n=1) The second line in loop executed n-2 times (except for n=1) so time(n) = n-1 + n-2 + 3 = 2n (except time(1)=4). As an example for n=45 it takes 90 steps, roughly 10 million times faster than the other program. Even if you don't do this very often this is a big enough difference to notice, so the second algorithm is much better than the first.

Space complexity
Running time isn't the only thing we care about, or the only thing that can be analyzed mathematically. Programmer time and code length are important, but we won't discuss them here -- they're part of the subject of software engineering. However we will often analyze the amount of memory used by a program. If a program takes a lot of time, you can still run it, and just wait longer for the result. However if a program takes a lot of memory, you may not be able to run it at all, so this is an important parameter to understand. Again, we analyze things differently for recursive and iterative programs. For an iterative program, it's usually just a matter of looking at the variable declarations (and storage allocation calls such as malloc() in C). For instance, algorithm 2 declares only an array of n numbers. Analysis of recursive program space is more complicated: the space used at any time is the total space used by all recursive calls active at that time. Each recursive call in algorithm 1 takes a constant amount of space: some space for local variables and function arguments, but also some space for remembering where each call should return to. The calls active at any one time form a path in the tree we drew earlier, in which the argument at each node in the path is one or two units smaller than the argument at its parent. The length of any such path can be at most n, so the space needed by the recursive algorithm is again (some constant factor times) n. We abbreviate the "some constant factor times" using "O" notation: O(n). It turns out that algorithm 2 can be modified to use a much smaller amount of space. Each step through the loop uses only the previous two values of F(n), so instead of storing these values in an array, we can simply use two variables. This requires some swapping around of values so that everything stays in the appropriate places: Algorithm 3:
int fib(int n) { int a = 1, b = 1; for (int i = 3; i <= n; i++) { int c = a + b;

a = b; b = c; } return b; }

Here c represents f[i], b represents f[i-1], and a represents f[i-2]. The two extra assignments after the sum shift those values over in preparation for the next iteration. This algorithm uses roughly 4n lines to compute F(n), so it is slower than algorithm 2, but uses much less space.

Big "O" notation


There are better algorithms for Fibonacci numbers, but before we investigate that, let's take a side track and make our analysis a little more abstract. A problem with the analysis of the two algorithms above: what is a line of code? If I use whitespace to break a line into two, it doesn't change the program speed but does change the number of lines executed. And as mentioned before, if I buy a faster computer, it does change program speed but doesn't change the analysis. To avoid extraneous details like whitespace and computer type, we use "big O" notation. The idea: we already write the times as a function of n. Big O notation treats two functions as being roughly the same if one is c times the other where c is a constant (something that doesn't depend on n). So for instance we would replace 3F(n)-2 by O(F(n)) and both 2n and 4n by O(n). Formally, we say that
f(n)=O(g(n))

If there is some constant c such that


f(n) <= c g(n)

It is true that 4n=O(n), but it is also true that n=O(4n). However note that this is not always a symmetric relation; n=O(F(n)) but it is not true that F(n)=O(n). In practice we will usually only use O notation to simplify formulas, by ignoring constant factors and other extraneous details. What is the point of O notation? First, it makes life easier by allowing us to be less careful of all the fine details of an algorithm's behavior. But also, it allows us to compare two algorithms easily. Algorithm 2 and algorithm 3 are both O(n). According to the number of lines executed, one is twice as fast as the other, but this ratio does not change as a function of n. Other factors (like the amount of time needed to allocate the large array in algorithm 2) may mean that in actual time, the algorithms are closer to each other; a more careful analysis is needed to determine which of the two to use. On

the other hand, we know that 4n is much better than 3F(n)-2 for any reasonable value of n -- but this doesn't depend on the factor of 4 in the 4n time bound, it would be the same for 7n or 12n. For larger and larger n, the ratio of n to F(n) gets very large, so that very quickly any O(n) will be faster than any O(F(n)). Replacing 4n by O(n) is an abstraction that lets us compare it to other functions without certain details (the 4) getting in the way.

Recursive powering
Algorithms 3 and 4 above aren't the best! Here's a mathematical trick with matrices:
[ 1 1 ] n [ 1 0 ] = [ F(n+1) F(n) ] [ F(n) F(n-1) ]

(You don't have to remember much linear algebra to understand this -- just the formula for multiplying two symmetric 2x2 matrices:
[ a b ] [ d e ] [ ad + be [ b c ] [ e f ] = [ bd + ce [ 1 1 ] A = [ 1 0 ] bd + ce ] be + cf ]

You can then prove the result above by induction: Let assume by induction that the equation above is is true for some n, multiply both sides by another power of A using the formula for matrix multiplication, and verify that the terms you get are the same as the formula defining the Fibonacci numbers.) We can use this to define another iterative algorithm, using matrix multiplication. Although I will write this in C syntax, we are starting to get to pseudo-code, since C does not have matrix multiplication built in to it the way I have it written below. The following algorithm initializes a matrix M to the identity matrix (the "zeroth power" of A) and then repeatedly multiplies M by A to form the (n-1)st power. Then by the formula above, the top left corner holds F(n), the value we want to return. Algorithm 4:
int fib(int n) { int M[2][2] = {{1,0},{0,1}} for (int i = 1; i < n; i++) M = M * {{1,1},{1,0}} return M[0][0]; }

This takes time O(n) (so much better than algorithm 1) but is probably somewhat slower than algorithm 2 or algorithm 3. (The big O notation hides the difference

between these algorithms, so you have to be more careful to tell which is better.) Like algorithm 3, this uses only O(1) space. But we can compute M^n more quickly. The basic idea: if you want to compute e.g. 3^8 you can multiply 8 3's together one at a time (3*3*3*3*3*3*3*3) or you can repeatedly square: square 3^2 = 9, 9^2 = 3^4 = 81, 81^2 = 3^8 = 6561. The squaring idea uses many fewer multiplications, since each one doubles the exponent rather than simply adding one to it. With some care, the same idea works for matrices, and can be extended to exponents other than powers of two. Algorithm 5:
int M[2][2] = {{1,0}{0,1}} int fib(int n) { matpow(n-1); return M[0][0]; } void matpow(int n) { if (n > 1) { matpow(n/2); M = M*M; } if (n is odd) M = M*{{1,1}{1,0}} }

Basically all the time is in matpow, which is recursive: it tries to compute the nth power of A by squaring the (n/2)th power. However if n is odd, rounding down n/2 and squaring that power of A results in the (n-1)st power, which we "fix up" by multiplying one more factor of A. This is a recursive algorithm, so as usual we get a recurrence relation defining time, just by writing down the time spent in a call to matpow (O(1)) plus the time in each recursive call (only one recursive call, with argument n/2). So the recurrence is
time(n) = O(1) + time(n/2)

It turns out that this solves to O(log n). For the purposes of this class, we will use logarithms base 2, and round all logarithms to integers, so log n is basically the number of bits needed to write n down in binary. An equivalent way of defining it is the smallest value of i such that n < 2^i. But clearly if n < 2^i, n/2 < 2^(i-1) and conversely, so log n satisfies the recurrence log(n) = 1 + log(n/2). The recurrence defining the time for matpow is basically the same except with O(1) instead of 1. So the solution to the recurrence is just the sum of log n copies of O(1), which is O(log n).

If n is 1 billion, log n would only be 30, and this algorithm would be better than algorithms 2 and 3 in the same way that they are better than algorithm 1. (This is actually somewhat cheating: to be able to use this for n equal to a billion you need to be able to write down the answer which will have O(n) digits, and you need to be able to store variables with that many digits. Manipulating such large numbers would take more like O(n) steps per operation, where here we are only counting one step per integer multiplication or addition. But even if you used a special library for dealing with large numbers, algorithm 4 would be much faster than the other ones.) Actually you can get the original formula 1.618^n to work using a similar repeated squaring trick, also with time O(log n). So to tell which is better you have to be more careful and not just use O-notation -- dealing with an integer matrix is somewhat simpler than having to compute floating point square roots so it wins. Which is the sort of comparison that analysis of algorithms is all about...

Sequential and Binary Search


Example: looking up a topic in Baase. Suppose after Tuesday's application of matrix multiplication to Fibonacci numbers, that you wanted to know what she says about matrix multiplication. You could look it up in the index but it will give you many different pages to look at, some of which are only somewhat relevant. Or you could read through the table of contents until you find relevant looking titles (6.2 and 7.3).

Sequential search
The second method (reading through the table of contents) is an example of sequential search. Similar sorts of problems show up all the time in programming (e.g. operating systems have to look up file names in a directory, and the Unix system usually does it with sequential search). Very abstractly:
sequential search(list L,item x) { for (each item y in the list) if (y matches x) return y return no match }

This has many variants -- do you stop once you've found one match or do you keep going until you've found all of them? do you represent the list using pointers, linked lists, or what? how do you indicate that there was no match? So we want to analyse this... To really understand the running time, we have to know how quick the "y matches x" part is -- everything else is straightforward. The way I've written it in pseudocode, that part still needs to be filled in. But we can still analyse the algorithm! We just measure the time in terms of the number of comparisons. Examples: does 8 appear in the list of the first 10 Fib. numbers? Does 9? Note for 9, algorithm has to go through whole list. So the time seems to depend on both L and x. We want to be able to predict the time easily without running the algorithm, so saying
comparisons(x,L) = position of x in L

is true but not very informative.

Methods of analysis
To be able to predict the time without having to look at the details of the input, we measure it as a function of the length of the input. Here x has basically constant length (depending on what an item is) and the length of L is just the number of items in it. So given a list L with n items, how many comparisons does the algorithm take? Answer: it depends. We want an answer that doesn't depend. There are various ways of getting one, by combining the times for different inputs with the same length.

Worst case analysis -- what is the most comparisons we could ever see no matter how perverse the input is?
time_wc(n) = max (input I of size n) time(I)

Best case analysis -- what is the fewest comparisons the algorithm could take if the input is well behaved?
time_wc(n) = min (input I of size n) time(I)

Average case analysis -- how much time would the algorithm take on "typical" input?

We assume that each input I of size n has a probability P[I] of being the actual input and use these proabilities to find a weighted average:
time_avg(n) = sum P[I] time(I)

These distinctions didn't make sense with Fibonacci numbers because the time there was always a function of n, but here they can give different answers (we'll see with sequential search). Average case is probably the most important in general, but is problematic in terms of what is a typical input? You have to make some assumption about the probabilities, and your analysis will only be as accurate as the validity of your assumptions. Also note that it's possible to have an algorithm for which no input takes the "average" time -- e.g. if it takes either 1 step or 100 steps, the average may be around 50 even though no input actually takes 50 steps. Worst case is what we usually do, it's easier than average case analysis and it's useful because you can guarantee that the algorithm will not ever take longer than its worst case bound. It's also true that the average case is at most the worst case, no matter what probabilities you choose, so you can use worst case analysis to get some information about the average case without having to make assumptions about what a "typical" input looks like. Best case is fun but not very useful.

Analysis of sequential search


The best case for sequential search is that it does one comparison, and matches X right away. In the worst case, sequential search does n comparisons, and either matches the last item in the list or doesn't match anything. The average case is harder to do. We know that the number of comparisons is the position of x in the list. But what is typical position of x? One reasonable assumption: If x is in the list, it's equally likely to be anywhere in it. so P[pos] = 1/n.
average number of comparisons n 1 = sum i=1 n - . i

1 =

n i

- sum n i=1 = (n+1)/2.

But if x is not in the list, the number of comparisons is always n. So finding something takes half as long as not finding it, on average, with this definition of "typical". We can define a stronger version of "typical": suppose for any list, any permutation of the list is equally likely. Then we can average over all possible permutations:
average number of comparisons n! 1 = sum i=1 n! . (position of x in permutation i)

n 1 = sum - . p . (number of permutations with x in position p) p=1 n! n 1 = sum - . p . (n-1)! p=1 n! n 1 = sum - . p p=1 n = (n+1)/2.

So this assumption ends up giving the same analysis. A second point to be made about average case analysis: sometimes it makes sense to analyse different cases separately. The analysis above assumes x is always in the list; if x is not in the list, you always get n comparisons. You could make up a probability p that x is in or out of the list and combine the two numbers above to get a total average number comparisons equal to pn + (1-p)(n+1)/2 but it makes more sense to just report both numbers separately.

Randomized algorithms
Sometimes it's useful to pay a little bit to reduce the uncertainty in the world -- e.g. insurance, you know you'll pay a fixed amount instead of either paying nothing (if you stay healthy) or a lot (if you get appendicitis).

the same concept applies to computer programs -- if the worst case is much larger than the average case, we might prefer to have a slightly more complicated program that reduces the worst case as long as it doesn't increase the average case too much. For instance if you're programming the computer controlling a car, and you want to tell if you're in a crash and should activate the air bags, you don't want to be running some algorithm that usually takes half a second but maybe sometimes takes as much as five minutes. Random numbers are very useful in this respect. they're also useful in making "average case" analysis apply even when the input itself is not random at all, or when we don't know a good definition for a "typical" input. The idea is to "scramble" the input so that it looks typical. We say that an algorithm is randomized if it uses random numbers. An algorithm that is not randomized is called deterministic. The "expected time" analysis of a random algorithm is measured in terms of time(input,sequence of random numbers). For some particular input I, the expected time of the algorithm is just the average over different sequences of random numbers:
sum Prob(R) . time(I,R) (random sequence R)

The expected time of the algorithm on (worst case) inputs of length n is then computed by combining this formula with the previous formula for worst case analysis:
max sum Prob(R) . time(I,R) (input I of size n) (random sequence R)

This looks complicated, but isn't usually much harder than average case analysis. Here it is for sequential search. We want to scramble (x,L) so that position of x in L is random. Idea: pick a random permutation of L then do the sequential search.
randomized search(list L,item x) { randomly permute L for (each item y in L) if (y matches x) return y return no match }

This slows down the algorithm somewhat (because you have to take time to do the permutation) but may speed up the searching part. If you're just searching for a number in a list of numbers, this would be a pretty bad method, because the time for doing the random permutation would probably be more than the worst case for the original deterministic sequential search algorithm. However if comparisons are very slow, much slower than the other steps in the algorithm, the total number of

comparisons will dominate the overall time and this algorithm could be an improvement. Let's plug this algorithm into our formula for expected times:
time = max(x,L) sum(permutation p) probability(p) time(x,p(L))

Note there are n! permutations. of those, there are (n-1)! such that x is in some given position i.
time = max (x,L) i = max sum (x,L) i = max sum (x,L) i sum sum (perm w/x at pos i) prob(perm) . time(x,L')

#(perms w/x at pos i) 1/n! . i (n-1)!/n! . i

= max(x,L) sum(i) i/n = (n+1)/2

so the number of comparisons is exactly the same as the average case but now it doesn't matter what the list is. We'll see that same idea of using random permutation to avoid the worst case later, in the quicksort and quickselect algorithms. For both of these algorithms, the use of randomization decreases the running time enormously, from O(n^2) to O(n log n) or O(n). It is also sometimes possible to make stronger forms of analysis about random algorithms than just their expected time, for instance we could compute the variance of the running time, or prove statements such as that with very high probability, an algorithm uses time close to its expectation. This is important if one wants to be sure that the slow possibilities are very rare, but is usually much more complicated, so we won't do much of that sort of analysis in this class.

Binary search
Let's go back to the original example -- finding matrix multiplication in Baase. I talked about looking it up in the table of contents (by sequential search) but also about looking it up in the index. The index of Baase and most other books has the useful property that it's alphabetized, so we can be smarter about our search. For instance, we could stop the sequential search whenever we found a y>x, and this would speed up the time for x not in L. But we can be much better, and this is basically what people do in alphabetized lists.

binary search(x,L) { let n = length of L, i=n/2. if (n = 0) return no match else if (L[i] matches x) return L[i] else if (L[i] > x) binary search(x,L[1..i-1]) else binary search(x,L[i+1..n]) }

Recursion is not really necessary:


alternate search(x,L) { let n = length of L let a = 1, b = n while (L[i = (a+b)/2] doesn't match) if (L[i] > x) b = i-1 else a = i+1 if a>b return no match return L[i] }

Analysis: T(n) = O(1) + T(n/2) = O(log n) More precisely in the worst case, T(n) = 2 + T(ceiling((n-1)/2)) which solves to approximately 2 log n (logarithm to base 2). So binary search is fast, but in order to use it we need to somehow get the list to be in sorted order -- this problem is known as sorting, and we'll see it in much detail next week.

Sorting
How to alphabetize a list of words? Sort a list of numbers? Some other information? We saw last time one reason for doing this (so we can apply binary search) and the same problem comes up over and over again in programming.

Comparison sorting
This is a very abstract model of sorting. We assume we are given a list of objects to sort, and that there is some particular order in which they should be sorted. What is the minimum amount of information we can get away with and still be able to sort them? As a particular case of the sorting problem, we should be able to sort lists of two objects. But this is the same as comparing any two objects, to determine which comes first in the sorted order. (For now, we assume no two objects are equal, so one should always go before the other; most sorting algorithms can also handle objects that are "the same" but it complicates the problem.)

Algorithms that sort a list based only on comparisons of pairs (and not using other information about what is being sorted, for instance arithmetic on numbers) are called comparison sorting algorithms Why do we care about this abstract and restrictive model of sorting?

We only have to write one routine to do sorting, that can be used over and over again without having to rewrite it and re-debug it for each new sorting problem you need to solve. In fact we don't even have to write that one routine, it is provided in the qsort() routine in the Unix library. For some problems, it is not obvious how to do anything other than comparisons. (I gave an example from my own research, on a geometric problem ofquadtree construction, which involved comparing points (represented as pairs of coordinates) by computing bitwise exclusive ors of the coordinates, comparing those numbers, and using the result to determine which coordinates to compare). It's easier to design and analyze algorithms without having to think about unnecessary problem-specific details Some comparison sorting algorithms work quite well, so there is not so much need to do something else.

Sorting algorithms
There are dozens of sorting algorithms. Baase covers around seven. We'll probably have time only for four: heapsort, merge sort, quicksort, and bucket sort. Each of these is useful as an algorithm, but also helps introduce some new ideas:

Heapsort shows how one can start with a slow algorithm (selection sort) and by adding some simple data structures transform it into a much better one. Merge sort and quick sort are different examples of divide and conquer, a very general algorithm design technique in which one partitions an input into parts, solves the parts recursively, then recombines the subproblem solutions into one overall solution. The two differ in how they do the partition and recombination; merge sort allows any partition, but the result of the recursive solution to the parts is two interleaved sorted lists, which we must combine into one in a somewhat complicated way. Quick sort instead does a more complicated partition so that one subproblem contains all objects less than some value, and the other contains all objects greater than that value, but then the recombination stage is trivial (just concatenate). Quick sort is an example of randomization and average case analysis.

Bucket sort shows how abstraction is not always a good idea -- we can derive improved sorting algorithms for both numbers and alphabetical words by looking more carefully at the details of the objects being sorted.

Sorting time bounds


What sort of time bounds should we expect? First, how should we measure time? If we have a comparison sorting algorithm, we can't really say how many machine instructions it will take, because it will vary depending on how complicated the comparisons are. Since the comparisons usually end up dominating the overall time bound, we'll measure time in terms of the number of comparisons made. Sorting algorithms have a range of time bounds, but for some reason there are two typical time bounds for comparison sorting: mergesort, heapsort, and (the average case of) quicksort all take O(n log n), while insertion sort, selection sort, and the worst case of quicksort all take O(n^2). As we'll see, O(n log n) is the best you could hope to achieve, while O(n^2) is the worst -- it describes the amount of time taken by an algorithm that performs every possible comparison it could. O(n log n) is significantly faster than O(n^2):
n -10 100 1000 10^6 10^9 n log n ------33 665 10^4 2 10^7 3 10^10 n^2 --100 10K 10^6 10^12 10^18

So even if you're sorting small lists it pays to use a good algorithm such as quicksort instead of a poor one like bubblesort. You don't even have the excuse that bubblesort is easier, since to get a decent sorting algorithm in a program you merely have to call qsort.

Lower bounds
A lower bound is a mathematical argument saying you can't hope to go faster than a certain amount. More precisely, every algorithm within a certain model of computation has a running time at least that amount. (This is usually proved for worst case running times but you could also do the same sort of thing for average case or best case if you want to.) This doesn't necessarily mean faster algorithms are completely impossible, but only that if you want to go faster, you can't stick with the abstract model, you have to look more carefully at the problem. So the linear time

bound we'll see later for bucketsort won't contradict the n log n lower bounds we'll prove now. Lower bounds are useful for two reasons: First, they give you some idea of how good an algorithm you could expect to find (so you know if there is room for further optimization). Second, if your lower bound is slower than the amount of time you want to actually spend solving a problem, the lower bound tells you that you'll have to break the assumptions of the model of computation somehow. We'll prove lower bounds for sorting in terms of the number of comparisons. Suppose you have a sorting algorithm that only examines the data by making comparisons between pairs of objects (and doesn't use any random numbers; the model we describe can be extended to deal with randomized algorithms but it gets more complicated). We assume that we have some particular comparison sorting algorithm A, but that we don't know anything more about how it runs. Using that assumption, we'll prove that the worst case time for A has to be at least a certain amount, but since the only assumption we make on A is that it's a comparison sorting algorithm, this fact will be true for all such algorithms.

Decision trees
Given a comparison sorting algorithm A, and some particular number n, we draw a tree corresponding to the different sequences of comparisons A might make on an input of length n. If the first comparison the algorithm makes is between the objects at positions a and b, then it will make the same comparison no matter what other list of the same length is input, because in the comparison model we do not have any other information than n so far on which to make a decision. Then, for all lists in which a<b, the second comparison will always be the same, but the algorithm might do something different if the result of the first comparison is that a>b. So we can draw a tree, in which each node represents the positions involved at some comparison, and each path in the tree describes the sequence of comparisons and their results from a particular run of the algorithm. Each node will have two children, representing the possible behaviors of the program depending on the result of the comparison at that node. Here is an example for n=3.
1:2 / < / \ > \

/ \ 2:3 1:3 / \ / \ < / > \ < / > \ / \ / \ 1,2,3 1:3 2,1,3 2:3 / \ / \ < / > \ < / > \ / \ / \ 1,3,2 3,1,2 2,3,1 3,2,1

This tree describes an algorithm in which the first comparison is always between the first and second positions in the list (this information is denoted by the "1:2" at the root of the tree). If the object in position one is less than the object in position two, the next comparison will always be between the second and third positions in the list (the "2:3" at the root of the left subtree). If the second is less than the third, we can deduce that the input is already sorted, and we write "1,2,3" to denote the permutation of the input that causes it to be sorted. But if the second is greater than the third, there still remain two possible permutations to be distinguished between, so we make a third comparison "1:3", and so on. Any comparison sorting algorithm can always be put in this form, since the comparison it chooses to make at any point in time can only depend on the answers to previously asked comparisons. And conversely, a tree like this can be used as a sorting algorithm: for any given list, follow a path in the tree to determine which comparisons to be made and which permutation of the input gives a sorted order. This is a reasonable way to represent algorithms for sorting very small lists (such as the case n=3 above) but for larger values of n it works better to use pseudo-code. However this tree is also useful for discovering various properties of our original algorithm A.

The worst case number of comparisons made by algorithm A is just the longest path in the tree. One can also determine the average case number of comparisons made, but this is more complicated. At each leaf in the tree, no more comparisons to be made -- therefore we know what the sorted order is. Each possible sorted order corresponds to a permutation, so there are at least n! leaves. (There might be more if for instance we have a stupid algorithm that tests whether a<c even after it has already discovered that a<b and b<c).

The sorting lower bound

What is longest path in binary tree with k leaves? At least log k. (Proof: one of the two subtrees has at least half the leaves so LP(k) >= 1 + LP(k/2); the result follows by induction.) So the number of comparisons to sort is at least log n!. This turns out to be roughly n log n; to distinguish lower bounds from upper bounds we write them a little differently, with a big Omega rather than a big O, so we write this lower bound as Omega(n log n). More precisely,
log n! = n log n - O(n).

A reasonably simple proof follows:


n n! = product i i=1

so
n log n! = sum log i i=1 n = sum log (n i/n) i=1 n = sum (log n - log n/i) i=1 n = n log n - sum log n/i . i=1

Let f(n) be the last term above, sum log(n/i); then we can write down a recurrence bounding f(n):
n f(n) = sum log n/i i=1 n/2 n f(n) = sum log n/i + sum i=1 i=n/2+1 log n/i

All of the terms in the first sum are equal to log 2((n/2)/i) = 1 + log((n/2)/i), and all of the terms in the second sum are logs of numbers between 1 and 2, and so are themselves numbers between 0 and 1. So we can simplify this equation to
n/2 f(n) <= n + sum log (n/2)/i i=1 = n + f(n/2)

which solves to 2n and completes the proof that log n! >= n log n - 2n.

(Note: in class I got this argument slightly wrong and lost a factor of two in the recurrence for f(n).) We can get a slightly more accurate formula from Sterling's formula (which I won't prove):
n! ~ sqrt(pi/n) (n/e)^n

so
log n! ~ n log n - 1.4427 n - 1/2 log n + .826

Let's compute a couple examples to see how accurate this is:


log n! n=10 21.8 n=100 524.8 formula gives 33.22 - 14.43 ~ 18.8 664.4 - 144.3 ~ 520.1

Enough math, let's do some actual algorithms.

Selection sort
To understand heap sort, let's start with selection sort. An experiment: I write a list of numbers, once I'm done you tell me sorted order.
5,2,100,19,22,7

How did you go about finding them? You probably looked through the list for the first number, then looked through it again for the next one, etc. One way of formalizing this process is called selection sort:
selection sort(list L) { list X = empty while L nonempty { remove smallest element of L and add it to X } }

Time analysis: there is one loop, executed n times. But the total time is not O(n). Remember we are counting comparisons. "Remove the smallest element of L" could take many comparisons. We need to look more carefully at this part of the loop. (The other part, adding an element to X, also depends on how we store X, but can be done in constant time for most reasonable implementations and in any case doesn't require any comparisons, which is what we're counting.) The obvious method of finding (and removing) the smallest element: scan L and keep track of the smallest object. So this produces a nested inner loop, time = O(length of L) so total time = O(sum i) = O(n^2). This is one of the slow algorithms. In fact it is as slow as possible: it always makes every possible comparison. Why am I describing it when there are so many better algorithms?

Heap sort

Heap sort (invented by J.R.J. Williams) looks exactly like the pseudo-code above for selection sort, and simply uses some data structures to perform the main step of selection sort more quickly. The operations we need to perform are

Starting with a list L and turning it into a copy of whatever data structure we're using, Finding the smallest object in the data structure, and Removing the smallest element

There are many suitable data structures, for instance the AVL trees studied in ICS 23. We'll describe here a structure called a binary heap. A heap also supports other possible operations, such as adding objects to the list; that's not useful in this algorithm but maybe later. (We will see heaps again when we talk about minimum spanning trees and shortest paths.) Simple analysis of heap sort: if we can build a data structure from our list in time X and finding and removing the smallest object takes time Y then the total time will be O(X + nY). In our case X will be O(n) and Y will be O(log n) so total time will be O(n + n log n) = O(n log n)

Heap data structure


We form a binary tree with certain properties:

The elements of L are placed on the nodes of the tree; each node holds one element and each element is placed on one node. The tree is balanced which as far as I'm concerned means that all paths have length O(log n); Baase uses a stronger property in which no two paths to a leaf differ in length by more than one. (The heap property): If one node is a parent of another, the value at the parent is always smaller than the value at the child.

You can think of the heap property as being similar to a property of family trees -- a parent's birthday is always earlier than his or her childrens' birthdays. As another example, in a corporate hierarchy, the salary of a boss is (almost) always bigger than that of his or her underlings. You can find the smallest heap element by looking at root of the tree (e.g. the boss of whole company has the biggest salary); this is easy to see, since any node in a tree has a smaller value than all its descendants (by transitivity).

How to remove it? Say the company boss quits. How do we fill his place? We have promote somebody. To satisfy the heap property, that will have to be the person with the biggest salary, but that must be one of his two direct underlings (the one of the two with the bigger salary). Promoting this person then leaves a vacancy lower down that we can fill the same sort of way, and so on. In pseudo-code:
remove_node(node x): { if (x is a leaf) delete it else if (no right child or left < right) { move value at left child to x remove_node(left child) } else if (no left child or right < left) { move value at right child to x remove_node(right child) } }

(Baase has a more complicated procedure since she wants to maintain a stronger balanced tree property. Essentially the idea is to pick someone at the bottom of the tree to be the new root, notice that that violates the heap property, and trade that value with its best child until it no longer causes a violation. This results in twice as many comparisons but has some technical advantages in terms of being able to store the heap in the same space as the sorted list you're constructing.) The number of comparison steps in this operation is then just the length of the longest path in the tree, O(log n). This fits into the comparison sorting framework because the only information we use to determine who should be promoted is to compare pairs of objects.. The total number of comparisons in heapsort is then O(n log n) + how much time it takes to set up the heap.

Three Divide and Conquer Sorting Algorithms


Today we'll finish heapsort, and describe both mergesort and quicksort. Why do we need multiple sorting algorithms? Different methods work better in different applications.

Heapsort uses close to the right number of comparisons but needs to move data around quite a bit. It can be done in a way that uses very little extra memory. It's probably good when memory is tight, and you are sorting many small items that come stored in an array. Merge sort is good for data that's too big to have in memory at once, because its pattern of storage access is very regular. It also uses even fewer comparisons than heapsort, and is especially suited for data stored as linked lists. Quicksort also uses few comparisons (somewhat more than the other two). Like heapsort it can sort "in place" by moving data in an array.

Heapification
Recall the idea of heapsort:
heapsort(list L) { make heap H from L make empty list X while H nonempty remove smallest from H and add it to X return X }

Remember that a heap is just a balanced binary tree in which the value at any node is smaller than the values at its children. We went over most of this last time. The total number of comparisons is n log n + however many are needed to make H. The only missing step: how to make a heap? To start with, we can set up a binary tree of the right size and shape, and put the objects into the tree in any old order. This is all easy and doesn't require any comparisons. Now we have to switch objects around to get them back in order. The divide and conquer idea: find natural subproblems, solve them recursively, and combine them to get an overall solution. Here the obvious subproblems are the subtrees. If we solve them recursively, we get something that is close to being a heap, except that perhaps the root doesn't satisfy the heap property. To make the whole thing a heap, we merely have to percolate that value down to a lower level in the tree.
heapify(tree T) { if (T is nonempty) { heapify(left subtree) heapify(right subtree) let x = value at tree root while node containing x doesn't satisfy heap propert switch values of node and its smallest child } }

The while loop performs two comparisons per iteration, and takes at most log n iterations, so the time for this satisfies a recurrence
T(n) <= 2 T(n/2) + 2 log n

How to solve it?

Divide and conquer recurrences


In general, divide and conquer is based on the following idea. The whole problem we want to solve may too big to understand or solve at once. We break it up into smaller pieces, solve the pieces separately, and combine the separate pieces together. We analyze this in some generality: suppose we have a pieces, each of size n/b and merging takes time f(n). (In the heapification example a=b=2 and f(n)=O(log n) but it will not always be true that a=b -- sometimes the pieces will overlap.) The easiest way to understand what's going on here is to draw a tree with nodes corresponding to subproblems (labeled with the size of the subproblem)
n / | n/b /|\ . . . \ n/b /|\ . . .

n/b /|\ . . .

For simplicity, let's assume n is a power of b, and that the recursion stops when n is 1. Notice that the size of a node depends only on its level:
size(i) = n/(b^i).

What is time taken by a node at level i?


time(i) = f(n/b^i)

How many levels can we have before we get down to n=1? For bottom level, n/b^i=1, so n=b^i and i=(log n)/(log b). How many items at level i? a^i. So putting these together we have
(log n)/(log b) T(n) = sum i=0 a^i f(n/b^i)

This looks messy, but it's not too bad. There are only a few terms (logarithmically many) and often the sum is dominated by the terms at one end (f(n)) or the other (n^(log a/log b)). In fact, you will generally only be a logarithmic factor away from the truth if you approximate the solution by the sum of these two, O(f(n) + n^(log a/log b)). Let's use this to analyze heapification. By plugging in parameters a=b=2, f(n)=log n, we get

log n T(n) = 2 sum i=0 log n T(n) = 2 sum i=0

2^i log(n/2^i)

Rewriting the same terms in the opposite order, this turns out to equal
n/2^i log(2^i)

log n = 2n sum i/2^i i=0 <= 2n i=0 = 4n infty sum i/2^i

So heapification takes at most 4n comparisons and heapsort takes at most n log n + 4n. (There's an n log n - 1.44n lower bound so we're only within O(n) of the absolute best possible.) This was an example of a sorting algorithm where one part used divide and conquer. What about doing the whole algorithm that way?

Merge sort
According to Knuth, merge sort was one of the earliest sorting algorithms, invented by John von Neumann in 1945. Let's look at the combine step first. Suppose you have some data that's close to sorted -- it forms two sorted lists. You want to merge the two sorted lists quickly rather than having to resort to a general purpose sorting algorithm. This is easy enough:
merge(L1,L2) { list X = empty while (neither L1 nor L2 empty) { compare first items of L1 & L2 remove smaller of the two from its list add to end of X } catenate remaining list to end of X return X }

Time analysis: in the worst case both lists empty at about same time, so everything has to be compared. Each comparison adds one item to X so the worst case is |X|-1 = |L1|+|L2|-1 comparisons. One can do a little better sometimes e.g. if L1 is smaller than most of L2.

Once we know how to combine two sorted lists, we can construct a divide and conquer sorting algorithm that simply divides the list in two, sorts the two recursively, and merges the results:
merge sort(L) { if (length(L) < 2) return L else { split L into lists L1 and L2, each of n/2 elements L1 = merge sort(L1) L2 = merge sort(L2) return merge(L1,L2) } }

This is simpler than heapsort (so easier to program) and works pretty well. How many comparisons does it use? We can use the analysis of the merge step to write down a recurrence:
C(n) <= n-1 + 2C(n/2)

As you saw in homework 1.31, for n = power of 2, the solution to this is n log n - n + 1. For other n, it's similar but more complicated. To prove this (at least the power of 2 version), you can use the formula above to produce
log n C(N) <= sum i=0 = 2^i (n/2^i - 1)

log n sum n - 2^i i=0

= n(log n + 1) - (2n - 1) = n log n - n + 1

So the number of comparisons is even less than heapsort.

Quicksort
Quicksort, invented by Tony Hoare, follows a very similar divide and conquer idea: partition into two lists and put them back together again It does more work on the divide side, less on the combine side. Merge sort worked no matter how you split the lists (one obvious way is to take first n/2 and last n/2 elements, another is to take every other element). But if you could perform the splits so that everything in one list was smaller than everything in the other, this information could be used to make merging much easier: you could merge just by concatenating the lists.

How to split so one list smaller than the other? e.g. for alphabetical order, you could split into A-M, N-Z so could use some split depending on what data looks like, but we want a comparison sorting algorithm that works for any data. Quicksort uses a simple idea: pick one object x from the list, and split the rest into those before x and those after x.
quicksort(L) { if (length(L) < 2) return L else { pick some x in L L1 = { y in L : y < x } L2 = { y in L : y > x } L3 = { y in L : y = x } quicksort(L1) quicksort(L2) return concatenation of L1, L3, and L2 } }

(We don't need to sort L3 because everything in it is equal).

Quicksort analysis
The partition step of quicksort takes n-1 comparisons. So we can write a recurrence for the total number of comparisons done by quicksort:
C(n) = n-1 + C(a) + C(b)

where a and b are the sizes of L1 and L2, generally satisfying a+b=n-1. In the worst case, we might pick x to be the minimum element in L. Then a=0, b=n-1, and the recurrence simplifies to C(n)=n-1 + C(n-1) = O(n^2). So this seems like a very bad algorithm. Why do we call it quicksort? How can we make it less bad? Randomization! Suppose we pick x=a[k] where k is chosen randomly. Then any value of a is equally likely from 0 to n-1. To do average case analysis, we write out the sum over possible random choices of the probability of that choice times the time for that choice. Here the choices are the values of k, the probabilities are all 1/n, and the times can be described by formulas involving the time for the recursive calls to the algorithm. So average case analysis of a randomized algorithm gives a randomized recurrence:
n-1 C(n) = sum (1/n)[n - 1 + C(a) + C(n-a-1)] a=0

To simplify the recurrence, note that if C(a) occurs one place in the sum, the same number will occur as C(n-a-1) in another term -- we rearrange the sum to group the

two together. We can also take the (n-1) parts out of the sum since the sum of 1/n copies of 1/n times n-1 is just n-1.
n-1 C(n) = n - 1 + sum (2/n) C(a) a=0

The book gives two proofs that this is O(n log n). Of these, induction is easier. One useful idea here: we want to prove f(n) is O(g(n)). The O() hides too much information, instead we need to prove f(n) <= a g(n) but we don't know what value a should take. We work it out with a left as a variable then use the analysis to see what values of a work. We have C(1) = 0 = a (1 log 1) for all a. Suppose C(i) <= a i log i for some a, all i<n. Then
C(n) = n-1 + sum(2/n) C(a) <= n-1 + sum(2/n)ai log i = n-1 + 2a/n sum(i=2 to n-1) (i log i) <= n-1 + 2a/n integral(i=2 to n)(i log i) = n-1 + 2a/n (n^2 log n / 2 - n^2/4 - 2 ln 2 + 1) = n-1 + a n log n - an/2 - O(1)

and this will work if n-1 < an/2, and in particular if a=2. So we can conclude that C(n) <= 2 n log n. Note that this is worse than either merge sort or heap sort, and requires random number generator to avoid being really bad. But it's pretty commonly used, and can be tuned in various ways to work better. (For instance, let x be the median of three randomly chosen values rather than just one value).

Minimum Spanning Trees


Spanning trees
A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. A graph may have many spanning trees; for instance the complete graph on four vertices
o---o |\ /| | X | |/ \| o---o

has sixteen spanning trees:


o---o | | | | | | o---o | | | o | | | o | | | o---o | | |

o---o o o |\ / | X |/ \ o o o---o / / / o---o o o | / | / |/ o---o

o---o o o \ / X / \ o---o

o---o o o \ /| X | / \| o o

o---o \ / X / \ o o o o |\ | | \ | | \| o o o---o |\ | \ | \ o o

o o | /| | / | |/ | o o o o | \ | \| o---o \

o---o \ \ \ o---o o---o /| / | / | o o

Minimum spanning trees


Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths. The problem: how to find the minimum length spanning tree? This problem can be solved by many different algorithms. It is the topic of some very recent research. There are several "best" algorithms, depending on the assumptions you make:

A randomized algorithm can solve it in linear expected time. [Karger, Klein, and Tarjan, "A randomized linear-time algorithm to find minimum spanning trees", J. ACM, vol. 42, 1995, pp. 321-328.] It can be solved in linear worst case time if the weights are small integers. [Fredman and Willard, "Trans-dichotomous algorithms for minimum spanning trees and shortest paths", 31st IEEE Symp. Foundations of Comp. Sci., 1990, pp. 719--725.] Otherwise, the best solution is very close to linear but not exactly linear. The exact bound is O(m log beta(m,n)) where the beta function has a complicated definition: the smallest i such that log(log(log(...log(n)...))) is less than m/n, where the logs are nested i times. [Gabow, Galil, Spencer, and Tarjan, Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, vol. 6, 1986, pp. 109--122.]

These algorithms are all quite complicated, and probably not that great in practice unless you're looking at really huge graphs. The book tries to keep things simpler, so it only describes one algorithm but (in my opinion) doesn't do a very good job of it.

I'll go through three simple classical algorithms (spending not so much time on each one).

Why minimum spanning trees?


The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money. A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once. Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set. On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal. There is a more complicated way (Christofides' heuristic) of using minimum spanning trees to find a tour within a factor of 1.5 of optimal; I won't describe this here but it might be covered in ICS 163 (graph algorithms) next year.

How to find minimum spanning tree?


The stupid method is to list all spanning trees, and find minimum of list. We already know how to find minima... But there are far too many trees for this to be efficient. It's also not really an algorithm, because you'd still need to know how to list all the trees. A better idea is to find some key property of the MST that lets us be sure that some edge is part of it, and use this property to build up the MST one edge at a time. For simplicity, we assume that there is a unique minimum spanning tree. (Problem 4.3 of Baase is related to this assumption). You can get ideas like this to work without this

assumption but it becomes harder to state your theorems or write your algorithms precisely. Lemma: Let X be any subset of the vertices of G, and let edge e be the smallest edge connecting X to G-X. Then e is part of the minimum spanning tree. Proof: Suppose you have a tree T not containing e; then I want to show that T is not the MST. Let e=(u,v), with u in X and v not in X. Then because T is a spanning tree it contains a unique path from u to v, which together with e forms a cycle in G. This path has to include another edge f connecting X to G-X. T+e-f is another spanning tree (it has the same number of edges, and remains connected since you can replace any path containing f by one going the other way around the cycle). It has smaller weight than t since e has smaller weight than f. So T was not minimum, which is what we wanted to prove.

Kruskal's algorithm
We'll start with Kruskal's algorithm, which is easiest to understand and probably the best one for solving problems by hand.
Kruskal's algorithm: sort the edges of G in increasing order by length keep a subgraph S of G, initially empty for each edge e in sorted order if the endpoints of e are disconnected in S add e to S return S

Note that, whenever you add an edge (u,v), it's always the smallest connecting the part of S reachable from u with the rest of G, so by the lemma it must be part of the MST. This algorithm is known as a greedy algorithm, because it chooses at each step the cheapest edge to add to S. You should be very careful when trying to use greedy algorithms to solve other problems, since it usually doesn't work. E.g. if you want to find a shortest path from a to b, it might be a bad idea to keep taking the shortest edges. The greedy idea only works in Kruskal's algorithm because of the key property we proved. Analysis: The line testing whether two endpoints are disconnected looks like it should be slow (linear time per iteration, or O(mn) total). But actually there are some complicated data structures that let us perform each test in close to constant time; this is known as the union-find problem and is discussed in Baase section 8.5 (I won't get to it in this class, though). The slowest part turns out to be the sorting step, which takes O(m log n) time.

Prim's algorithm
Rather than build a subgraph one edge at a time, Prim's algorithm builds a tree one vertex at a time.
Prim's algorithm: let T be a single vertex x while (T has fewer than n vertices) { find the smallest edge connecting T to G-T add it to T }

Since each edge added is the smallest connecting T to G-T, the lemma we proved shows that we only add edges that should be part of the MST. Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex.
Prim with heaps: make a heap of values (vertex,edge,weight(edge)) initially (v,-,infinity) for each vertex let tree T be empty while (T has fewer than n vertices) { let (v,e,weight(e)) have the smallest weight in the heap remove (v,e,weight(e)) from the heap add v and e to T for each edge f=(u,v) if u is not already in T find value (u,g,weight(g)) in heap if weight(f) < weight(g) replace (u,g,weight(g)) with (u,f,weight(f)) }

Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n).

Boruvka's algorithm
(Actually Boruvka should be spelled with a small raised circle accent over the "u".) Although this seems a little complicated to explain, it's probably the easiest one for

computer implementation since it doesn't require any complicated data structures. The idea is to do steps like Prim's algorithm, in parallel all over the graph at the same time.
Boruvka's algorithm: make a list L of n trees, each a single vertex while (L has more than one tree) for each T in L, find the smallest edge connecting T to G-T add all those edges to the MST (causing pairs of trees in L to merge)

As we saw in Prim's algorithm, each edge you add must be part of the MST, so it must be ok to add them all at once. Analysis: This is similar to merge sort. Each pass reduces the number of trees by a factor of two, so there are O(log n) passes. Each pass takes time O(m) (first figure out which tree each vertex is in, then for each edge test whether it connects two trees and is better than the ones seen before for the trees on either endpoint) so the total is O(m log n).

A hybrid algorithm
This isn't really a separate algorithm, but you can combine two of the classical algorithms and do better than either one alone. The idea is to do O(log log n) passes of Boruvka's algorithm, then switch to Prim's algorithm. Prim's algorithm then builds one large tree by connecting it with the small trees in the list L built by Boruvka's algorithm, keeping a heap which stores, for each tree in L, the best edge that can be used to connect it to the large tree. Alternately, you can think of collapsing the trees found by Boruvka's algorithm into "supervertices" and running Prim's algorithm on the resulting smaller graph. The point is that this reduces the number of remove min operations in the heap used by Prim's algorithm, to equal the number of trees left in L after Boruvka's algorithm, which is O(n / log n). Analysis: O(m log log n) for the first part, O(m + (n/log n) log n) = O(m + n) for the second, so O(m log log n) total.

NP-Completeness
So far we've seen a lot of good news: such-and-such a problem can be solved quickly (in close to linear time, or at least a time that is some small polynomial function of the input size). NP-completeness is a form of bad news: evidence that many important problems can't be solved quickly.

Why should we care?


These NP-complete problems really come up all the time. Knowing they're hard lets you stop beating your head against a wall trying to solve them, and do something better:

Use a heuristic. If you can't quickly solve the problem with a good worst case time, maybe you can come up with a method for solving a reasonable fraction of the common cases. Solve the problem approximately instead of exactly. A lot of the time it is possible to come up with a provably fast algorithm, that doesn't solve the problem exactly but comes up with a solution you can prove is close to right. Use an exponential time solution anyway. If you really have to solve the problem exactly, you can settle down to writing an exponential time algorithm and stop worrying about finding a better solution. Choose a better abstraction. The NP-complete abstract problem you're trying to solve presumably comes from ignoring some of the seemingly unimportant details of a more complicated real world problem. Perhaps some of those details shouldn't have been ignored, and make the difference between what you can and can't solve.

Classification of problems
The subject of computational complexity theory is dedicated to classifying problems by how hard they are. There are many different classifications; some of the most common and useful are the following. (One technical point: these are all really defined in terms of yes-or-no problems -- does a certain structure exist rather than how do I find the structure.)

P. Problems that can be solved in polynomial time. ("P" stands for polynomial.) These problems have formed the main material of this course. NP. This stands for "nondeterministic polynomial time" where nondeterministic is just a fancy way of talking about guessing a solution. A problem is in NP if you can quickly (in polynomial time) test whether a solution is correct (without worrying about how hard it might be to find the solution). Problems in NP are still relatively easy: if only we could guess the right solution, we could then quickly test it.

NP does not stand for "non-polynomial". There are many complexity classes that are much harder than NP.

PSPACE. Problems that can be solved using a reasonable amount of memory (again defined formally as a polynomial in the input size) without regard to how much time the solution takes. EXPTIME. Problems that can be solved in exponential time. This class contains most problems you are likely to run into, including everything in the previous three classes. It may be surprising that this class is not all-inclusive: there are problems for which the best algorithms take even more than exponential time. Undecidable. For some problems, we can prove that there is no algorithm that always solves them, no matter how much time or space is allowed. One very uninformative proof of this is based on the fact that there are as many problems as there real numbers, and only as many programs as there are integers, so there are not enough programs to solve all the problems. But we can also define explicit and useful problems which can't be solved.

Although defined theoretically, many of these classes have practical implications. For instance P is a very good approximation to the class of problems which can be solved quickly in practice -- usually if this is true, we can prove a polynomial worst case time bound, and conversely the polynomial time bounds we can prove are usually small enough that the corresponding algorithms really are practical. NP-completeness theory is concerned with the distinction between the first two classes, P and NP.

Examples of problems in different classes


Example 1: Long simple paths. A simple path in a graph is just one without any repeated edges or vertices. To describe the problem of finding long paths in terms of complexity theory, we need to formalize it as a yes-or-no question: given a graph G, vertices s and t, and a number k, does there exist a simple path from s to t with at least k edges? A solution to this problem would then consist of such a path. Why is this in NP? If you're given a path, you can quickly look at it and add up the length, double-checking that it really is a path with length at least k. This can all be done in linear time, so certainly it can be done in polynomial time. However we don't know whether this problem is in P; I haven't told you a good way for finding such a path (with time polynomial in m,n, and K). And in fact this problem is NP-complete, so we believe that no such algorithm exists.

There are algorithms that solve the problem; for instance, list all 2^m subsets of edges and check whether any of them solves the problem. But as far as we know there is no algorithm that runs in polynomial time. Example 2: Cryptography. Suppose we have an encryption function e.g.
code=RSA(key,text)

The "RSA" encryption works by performing some simple integer arithmetic on the code and the key, which consists of a pair (p,q) of large prime numbers. One can perform the encryption only knowing the product pq; but to decrypt the code you instead need to know a different product, (p-1)(q-1). A standard assumption in cryptography is the "known plaintext attack": we have the code for some message, and we know (or can guess) the text of that message. We want to use that information to discover the key, so we can decrypt other messages sent using the same key. Formalized as an NP problem, we simply want to find a key for which code=RSA(key,text). If you're given a key, you can test it by doing the encryption yourself, so this is in NP. The hard question is, how do you find the key? For the code to be strong we hope it isn't possible to do much better than a brute force search. Another common use of RSA involves "public key cryptography": a user of the system publishes the product pq, but doesn't publish p, q, or (p-1)(q-1). That way anyone can send a message to that user by using the RSA encryption, but only the user can decrypt it. Breaking this scheme can also be thought of as a different NP problem: given a composite number pq, find a factorization into smaller numbers. One can test a factorization quickly (just multiply the factors back together again), so the problem is in NP. Finding a factorization seems to be difficult, and we think it may not be in P. However there is some strong evidence that it is not NP-complete either; it seems to be one of the (very rare) examples of problems between P and NPcomplete in difficulty. Example 3: Chess.

We've seen in the news recently a match between the world chess champion, Gary Kasparov, and a very fast chess computer, Deep Blue. The computer lost the match, but won one game and tied others. What is involved in chess programming? Essentially the sequences of possible moves form a tree: The first player has a choice of 20 different moves (most of which are not very good), after each of which the second player has a choice of many responses, and so on. Chess playing programs work by traversing this tree finding what the possible consequences would be of each different move. The tree of moves is not very deep -- a typical chess game might last 40 moves, and it is rare for one to reach 200 moves. Since each move involves a step by each player, there are at most 400 positions involved in most games. If we traversed the tree of chess positions only to that depth, we would only need enough memory to store the 400 positions on a single path at a time. This much memory is easily available on the smallest computers you are likely to use. So perfect chess playing is a problem in PSPACE. (Actually one must be more careful in definitions. There is only a finite number of positions in chess, so in principle you could write down the solution in constant time. But that constant would be very large. Generalized versions of chess on larger boards are in PSPACE.) The reason this deep game-tree search method can't be used in practice is that the tree of moves is very bushy, so that even though it is not deep it has an enormous number of vertices. We won't run out of space if we try to traverse it, but we will run out of time before we get even a small fraction of the way through. Some pruning methods, notably "alpha-beta search" can help reduce the portion of the tree that needs to be examined, but not enough to solve this difficulty. For this reason, actual chess programs instead only search a much smaller depth (such as up to 7 moves), at which point they don't have enough information to evaluate the true consequences of the moves and are forced to guess by using heuristic "evaluation functions" that measure simple quantities such as the total number of pieces left. Example 4: Knots. If I give you a three-dimensional polygon (e.g. as a sequence of vertex coordinate triples), is there some way of twisting and bending the polygon around until it becomes flat? Or is it knotted? There is an algorithm for solving this problem, which is very complicated and has not really been adequately analyzed. However it runs in at least exponential time.

One way of proving that certain polygons are not knots is to find a collection of triangles forming a surface with the polygon as its boundary. However this is not always possible (without adding exponentially many new vertices) and even when possible it's NP-complete to find these triangles. There are also some heuristics based on finding a non-Euclidean geometry for the space outside of a knot that work very well for many knots, but are not known to work for all knots. So this is one of the rare examples of a problem that can often be solved efficiently in practice even though it is theoretically not known to be in P. Certain related problems in higher dimensions (is this four-dimensional surface equivalent to a four-dimensional sphere) are provably undecidable. Example 5: Halting problem. Suppose you're working on a lab for a programming class, have written your program, and start to run it. After five minutes, it is still going. Does this mean it's in an infinite loop, or is it just slow? It would be convenient if your compiler could tell you that your program has an infinite loop. However this is an undecidable problem: there is no program that will always correctly detect infinite loops. Some people have used this idea as evidence that people are inherently smarter than computers, since it shows that there are problems computers can't solve. However it's not clear to me that people can solve them either. Here's an example:
main() { int x = 3; for (;;) { for (int a = 1; a <= x; a++) for (int b = 1; b <= x; b++) for (int c = 1; c <= x; c++) for (int i = 3; i <= x; i++) if(pow(a,i) + pow(b,i) == pow(c,i)) exit; x++; } }

This program searches for solutions to Fermat's last theorem. Does it halt? (You can assume I'm using a multiple-precision integer package instead of built in integers, so don't worry about arithmetic overflow complications.) To be able to answer this, you have to understand the recent proof of Fermat's last theorem. There are many similar

problems for which no proof is known, so we are clueless whether the corresponding programs halt.

Problems of complexity theory


The most famous open problem in theoretical science is whether P = NP. In other words, if it's always easy to check a solution, should it also be easy to find the solution? We have no reason to believe it should be true, so the expectation among most theoreticians is that it's false. But we also don't have a proof... So we have this nice construction of complexity classes P and NP but we can't even say that there's one problem in NP and not in P. So what good is the theory if it can't tell us how hard any particular problem is to solve?

NP-completeness
The theory of NP-completeness is a solution to the practical problem of applying complexity theory to individual problems. NP-complete problems are defined in a precise sense as the hardest problems in P. Even though we don't know whether there is any problem in NP that is not in P, we can point to an NP-complete problem and say that if there are any hard problems in NP, that problems is one of the hard ones. (Conversely if everything in NP is easy, those problems are easy. So NP-completeness can be thought of as a way of making the big P=NP question equivalent to smaller questions about the hardness of individual problems.) So if we believe that P and NP are unequal, and we prove that some problem is NPcomplete, we should believe that it doesn't have a fast algorithm. For unknown reasons, most problems we've looked at in NP turn out either to be in P or NP-complete. So the theory of NP-completeness turns out to be a good way of showing that a problem is likely to be hard, because it applies to a lot of problems. But there are problems that are in NP, not known to be in P, and not likely to be NPcomplete; for instance the code-breaking example I gave earlier.

Reduction
Formally, NP-completeness is defined in terms of "reduction" which is just a complicated way of saying one problem is easier than another.

We say that A is easier than B, and write A < B, if we can write down an algorithm for solving A that uses a small number of calls to a subroutine for B (with everything outside the subroutine calls being fast, polynomial time). There are several minor variations of this definition depending on the detailed meaning of "small" -- it may be a polynomial number of calls, a fixed constant number, or just one call. Then if A < B, and B is in P, so is A: we can write down a polynomial algorithm for A by expanding the subroutine calls to use the fast algorithm for B. So "easier" in this context means that if one problem can be solved in polynomial time, so can the other. It is possible for the algorithms for A to be slower than those for B, even though A < B. As an example, consider the Hamiltonian cycle problem. Does a given graph have a cycle visiting each vertex exactly once? Here's a solution, using longest path as a subroutine:
for each edge (u,v) of G if there is a simple path of length n-1 from u to v return yes // path + edge form a cycle return no

This algorithm makes m calls to a longest path subroutine, and does O(m) work outside those subroutine calls, so it shows that Hamiltonian cycle < longest path. (It doesn't show that Hamiltonian cycle is in P, because we don't know how to solve the longest path subproblems quickly.) As a second example, consider a polynomial time problem such as the minimum spanning tree. Then for every other problem B, B < minimum spanning tree, since there is a fast algorithm for minimum spanning trees using a subroutine for B. (We don't actually have to call the subroutine, or we can call it and ignore its results.)

Cook's Theorem
We are now ready to formally define NP-completeness. We say that a problem A in NP is NP-complete when, for every other problem B in NP, B < A. This seems like a very strong definition. After all, the notion of reduction we've defined above seems to imply that if B < A, then the two problems are very closely related; for instance Hamiltonian cycle and longest path are both about finding very similar structures in graphs. Why should there be a problem that closely related to all the different problems in NP? Theorem: an NP-complete problem exists.

We prove this by example. One NP-complete problem can be found by modifying the halting problem (which without modification is undecidable). Bounded halting. This problem takes as input a program X and a number K. The problem is to find data which, when given as input to X, causes it to stop in at most K steps. To be precise, this needs some more careful definition: what language is X written in? What constitutes a single step? Also for technical reasons K should be specified in unary notation, so that the length of that part of the input is K itself rather than O(log K). For reasonable ways of filling in the details, this is in NP: to test if data is a correct solution, just simulate the program for K steps. This takes time polynomial in K and in the length of program. (Here's one point at which we need to be careful: the program can not perform unreasonable operations such as arithmetic on very large integers, because then we wouldn't be able to simulate it quickly enough.) To finish the proof that this is NP-complete, we need to show that it's harder than anything else in NP. Suppose we have a problem A in NP. This means that we can write a program PA that tests solutions to A, and halts within polynomial time p(n) with a yes or no answer depending on whether the given solution is really a solution to the given problem. We can then easily form a modified program PA' to enter an infinite loop whenever it would halt with a no answer. If we could solve bounded halting, we could solve A by passing PA' and p(n) as arguments to a subroutine for bounded halting. So A < bounded halting. But this argument works for every problem in NP, so bounded halting is NP-complete.

How to prove NP-completeness in practice


The proof above of NP-completeness for bounded halting is great for the theory of NP-completeness, but doesn't help us understand other more abstract problems such as the Hamiltonian cycle problem. Most proofs of NP-completeness don't look like the one above; it would be too difficult to prove anything else that way. Instead, they are based on the observation that if A < B and B < C, then A < C. (Recall that these relations are defined in terms of the existence of an algorithm that calls subroutines. Given an algorithm that solves A with a subroutine for B, and an algorithm that solves B with a subroutine for C, we can just use the second algorithm to expand the subroutine calls of the first algorithm, and get an algorithm that solves A with a subroutine for C.)

As a consequence of this observation, if A is NP-complete, B is in NP, and A < B, B is NP-complete. In practice that's how we prove NP-completeness: We start with one specific problem that we prove NP-complete, and we then prove that it's easier than lots of others which must therefore also be NP-complete. So e.g. since Hamiltonian cycle is known to be NP-complete, and Hamiltonian cycle < longest path, we can deduce that longest path is also NP-complete. Starting from the bounded halting problem we can show that it's reducible to a problem of simulating circuits (we know that computers can be built out of circuits, so any problem involving simulating computers can be translated to one about simulating circuits). So various circuit simulation problems are NP-complete, in particular Satisfiability, which asks whether there is an input to a Boolean circuit that causes its output to be one. Circuits look a lot like graphs, so from there it's another easy step to proving that many graph problems are NP-complete. Most of these proofs rely on constructinggadgets, small subgraphs that act (in the context of the graph problem under consideration) like Boolean gates and other components of circuits. There are many problems already known to be NP-complete, and listed in the bible of the subject: Computers and Intractibility: A guide to the theory of NP-completeness Michael R. Garey and David S. Johnson W. H. Freeman, 1979. If you suspect a problem you're looking at is NP-complete, the first step is to look for it in Garey and Johnson. The second step is to find as similar a problem as you can in Garey and Johnson, and prove a reduction showing that similar problem to be easier than the one you want to solve. If neither of these works, you could always go back to the methods described in the rest of this class, and try to find an efficient algorithm...

Shortest Paths
The basic problem: Find the "best" way of getting from s to t where s and t are vertices in a graph. We measure "best" simply as the sum of edge lengths of a path. For instance the graph could be a map representing intersections as vertices, road segments as edges; you want to find either the shortest or fastest route from your house to ICS. Although both of these problems have different solutions, they are both

shortest path problems; in one the length of an edge represents the actual mileage of a segment of road, while in the other it represents the time it would take to drive it, but in both cases the important fact is that the total length of a path is measured by adding the lengths of individual edges. For another example, which I mentioned in my first lecture on graph algorithms, the graph might have vertices representing airports, with edges representing possible flights, and the "length" of an edge measuring the cost of taking that flight; your problem would then be to find the cheapest flight from e.g. SNA to JFK. Note that these graphs may be directed; e.g. there may be a one-way road, or flights in one direction might have different costs than those the other way. We are going to make a big assumption: that all the edges have lengths that are positive numbers. This is often but not always the case; it makes sense in the examples above, but it is conceivable that an airline could pay people to take certain routes, so that the lengths of those edges in the airport graph might be negative. We'll pretend this never happens. It makes the algorithms a lot easier. Later we'll see some special cases where we can handle negative weights. Rather than computing one distance d(s,t), we'll compute d(s,x) for all vertices x. This is known as the single source shortest path problem (s is the source). It turns out that computing this extra information makes things easier, because then we can put together information about paths with fewer edges to get paths with more edges.

Paths from distances


Suppose we already know the distances d(s,x) from s to every other vertices. This isn't a solution to the shortest path problem, because we want to know actual paths having those distances. How can we find those paths? That there are two kinds of shortest paths: those formed by a single edge (s,t), and those in which the path from s to t goes through some other vertices; let's say x is the last vertex the path goes through before t. Then in the second case, the overall path must be formed by concatenating a path from s to x with edge (x,t). (We can view both types of shortest path as being similar if we think of the shortest path from s to s as being one with no edges in it.) Further, the path from s to x must itself be a shortest path (since otherwise concatenating the shortest path with (x,t) would decrease the length of the overall path). A final observation is that d(s,x) must be less than d(s,t), since d(s,x)=d(s,t)+length(x,t) and we are assuming all edges have positive length. Therefore if we only know the correct value of x we can find a shortest path: Algorithm 1:
for each vertex y in sorted order by d(s,y)

let (x,y) be an edge with d(s,x)+length(x,y)=d(s,y) path(s,y) = path(s,x) + edge (x,y)

We will want to use something like this idea to compute shortest paths without already knowing their lengths. When we get to y in the loop, it will still be ok to use terms like d(s,x) if this is less than d(s,y), because we will have already processed x in a previous iteration. But the pseudo-code above uses d(s,y) itself twice, and this will not work as well. To get rid of the second use of d(s,y), in which we test to determine which edge to use, we can notice that (because we are computing a shortest path) d(s,x)+length(x,y) will be less than any similar expression, so instead of testing it for equality with d(s,y) we can just find a minimum: Algorithm 2:
for each vertex y in sorted order by d(s,y) let (x,y) be an edge with x already processed, minimizing d(s,x)+length(x,y) path(s,y) = path(s,x) + edge (x,y) d(s,y) = d(s,x) + length(x,y)

Dijkstra's algorithm
The only remaining use of d(s,y) in this algorithm is to determine what order to process the vertices in. Dijkstra's algorithm for shortest paths does this almost exactly like Prim's algorithm. Remember that in Prim's algorithm, we add vertices and edges one a a time to a tree, at each step choosing the shortest possible edge to add. Dijkstra's algorithm does the same thing, only choosing the edge to add at each step to be the one minimizing d(s,x)+length(x,y). Algorithm 3: (Dijkstra, basic outline)
let T be a single vertex s while (T has fewer than n vertices) { find edge (x,y) with x in T and y not in T minimizing d(s,x)+length(x,y) add (x,y) to T d(s,y)=d(s,x)+length(x,y) }

The actual shortest paths can be found by following the path in T from s to t. This defines a structure known as a "shortest path tree". In practice it may sometimes faster to build two trees, one from s and one from t, and stop when they run into each other (this usually ends up visiting less of the graph).

Just like with Prim's algorithm, we can use heaps to perform the hard part of each iteration (finding the best edge) in logarithmic time. Algorithm 4: (Dijkstra with heaps)
make a heap of values (vertex,edge,distance) initially (v,-,infinity) for each vertex let tree T be empty while (T has fewer than n vertices) { let (v,e,d(v)) have the smallest weight in the heap remove (v,e,d(v)) from the heap add v and e to T set distance(s,v) to d(v) for each edge f=(v,u) if u is not already in T find value (u,g,d(u)) in heap if d(v)+length(f) < d(g) replace (u,g,d(g)) with (u,f,d(v)+length(f)) }

Just as in Prim's algorithm, this runs in time O(m log n) if you use binary heaps, or O(m + n log n) if you use Fibonacci heaps.

Dijkstra and negative lengths


Dijkstra's algorithm does not work with negative edge weights. For instance, consider the following graph (assume the edges are all directed from left to right):
2 A-----B \ / 3 \ / -2 C

If we start with A, Dijkstra's algorithm will choose the edge (A,x) minimizing d(A,A)+length(edge), namely (A,B). It then sets d(A,B)=2 and chooses another edge (y,C) minimizing d(A,y)+d(y,C); the only choice is (A,C) and it sets d(A,C)=3. But it never finds the shortest path from A to B, via C, with total length 1.

Topological ordering and shortest paths


There is an important class of graphs in which shortest paths can be computed more quickly, in linear time. The idea is to go back to algorithms 1 and 2, which required you to visit the vertices in some order. In those algorithms we defined the order to be sorted by distance from s, which as we have seen works for positive weight edges, but not if there are negative weights. Here's another ordering that always works: define a topological ordering of a directed graph to be one in which, whenever we have an edge from x to y, the ordering visits x before y. If we can define such an ordering,

then we can do something like algorithm 2, and be sure that the predecessor of a vertex x is always processed before we process x itself. Algorithm 5: (shortest paths from topological order)
for each vertex y in a topological ordering of G choose edge (x,y) minimizing d(s,x)+length(x,y) path(s,y) = path(s,x) + edge (x,y) d(s,y) = d(s,x) + length(x,y)

This runs in linear time (with the possible exception of finding the ordering), and works even when the graph has negative length edges. You can even use it to find longest paths: just negate the lengths of all the edges. The only catch is that it only works when we can find a topological ordering.

Topological ordering and acyclic graphs


Define a directed acyclic graph (often known as a DAG for short) to be a directed graph, containing no cycle (a cycle is a set of edges forming a loop, and all pointing the same way around the loop). Theorem: a graph has a topological ordering if and only if it is a directed acyclic graph. One direction of the proof is simple: suppose G is not a DAG, so it has a cycle. In any ordering of G, one vertex of the cycle has to come first, but then one of the two cycle edges at that vertex would point the wrong way for the ordering to be topological. In the other direction, we have to prove that every graph without a topological ordering contains a cycle. We'll prove this by finding an algorithm for constructing topological orderings; if the algorithm ever gets stuck we'll be able to use that information to find a cycle. Algorithm 6: (topological ordering)
list L = empty while (G is not empty) find a vertex v with no incoming edges delete v from G add v to L

If this algorithm terminates, L is a topological ordering, since we only add a vertex v when all its incoming edges have been deleted, at which point we know its predecessors are already all in the list. What if it doesn't terminate? The only thing that could go wrong is that we could be unable to find a vertex with no incoming edges. In this case all vertices have some

incoming edge. We want to prove that in this case, G has a cycle. Start with any vertex s, follow its incoming edge backwards to another vertex t, follow its incoming edge backwards again, and so on, building a chain of vertices ...w->v->u->t->s. We can keep stepping backwards like this forever, but there's only a finite number of vertices in the graph. Therefore, we'll eventually run into a vertex we've seen before: u->w->v->u->t->s. In this case, u->w->v->u is a directed cycle. This procedure always finds a directed cycle whenever algorithm 6 gets stuck, completing the proof of the theorem that a graph has a topological ordering if and only if it is a DAG. Incidentally this also proves that algorithm 6 finds a topological ordering whenever one exists, and that we can use algorithm 6 to test whether a graph is a DAG. Putting algorithm 6 together with the "stepping backwards" procedure provides a fast method of finding cycles in graphs that are not DAGs. Finally, let's analyze the topological ordering algorithm. The key step (finding a vertex without incoming edges) seems to require scanning the whole graph, but we can speed it up with some really simple data structures: a count I[v] of the number of edges incoming to v, and a list K of vertices without incoming edges. Algorithm 7: (topological ordering, detailed implementation)
list K = empty list L = empty for each vertex v in G let I[v] = number of incoming edges to v if (I[v] = 0) add v to K while (G is not empty) remove a vertex v from K for each outgoing edge (v,w) decrement I[w] if (I[w] = 0) add w to K add v to L

It is not hard to see that this algorithm runs in linear time, so combining it with algorithm 5 we see that we can find shortest paths in DAGs in linear time.

Das könnte Ihnen auch gefallen