Sie sind auf Seite 1von 516

A Compilation of most used data structures and algorithms.

That will be helpful in different ways.

Codechef Discussion
Data Structures and Algorithm
1

Binary Search
Topcoder.com

By lovro
topcoder member

Binary search is one of the fundamental algorithms in computer science. In order to explore it,
we'll first build up a theoretical backbone, then use that to implement the algorithm properly and
avoid those nasty off-by-one errors everyone's been talking about.

Finding a value in a sorted sequence

In its simplest form, binary search is used to quickly find a value in a sorted sequence (consider
a sequence an ordinary array for now). We'll call the sought value the target value for clarity.
Binary search maintains a contiguous subsequence of the starting sequence where the target
value is surely located. This is called the search space. The search space is initially the entire
sequence. At each step, the algorithm compares the median value in the search space to the
target value. Based on the comparison and because the sequence is sorted, it can then
eliminate half of the search space. By doing this repeatedly, it will eventually be left with a
search space consisting of a single element, the target value.

For example, consider the following sequence of integers sorted in ascending order and say we
are looking for the number 55:

0 5 13 19 22 41 55 68 72 81 98

We are interested in the location of the target value in the sequence so we will represent the
search space as indices into the sequence. Initially, the search space contains indices 1 through
11. Since the search space is really an interval, it suffices to store just two numbers, the low and
high indices. As described above, we now choose the median value, which is the value at index
6 (the midpoint between 1 and 11): this value is 41 and it is smaller than the target value. From
this we conclude not only that the element at index 6 is not the target value, but also that no
element at indices between 1 and 5 can be the target value, because all elements at these
indices are smaller than 41, which is smaller than the target value. This brings the search space
down to indices 7 through 11:

55 68 72 81 98

Proceeding in a similar fashion, we chop off the second half of the search space and are left
with:

55 68
2

Depending on how we choose the median of an even number of elements we will either find 55
in the next step or chop off 68 to get a search space of only one element. Either way, we
conclude that the index where the target value is located is 7.

If the target value was not present in the sequence, binary search would empty the search
space entirely. This condition is easy to check and handle. Here is some code to go with the
description:

binary_search(A, target):
lo = 1, hi = size(A)
while lo <= hi:
mid = lo + (hi-lo)/2
if A[mid] == target:
return mid
else if A[mid] < target:
lo = mid+1
else:
hi = mid-1

// target was not found

Complexity
Since each comparison binary search uses halves the search space, we can assert and easily
prove that binary search will never use more than (in big-oh notation) O(log N) comparisons to
find the target value.

The logarithm is an awfully slowly growing function. In case you're not aware of just how
efficient binary search is, consider looking up a name in a phone book containing a million
names. Binary search lets you systematically find any given name using at most 21
comparisons. If you could manage a list containing all the people in the world sorted by name,
you could find any person in less than 35 steps. This may not seem feasible or useful at the
moment, but we'll soon fix that.

Note that this assumes that we have random access to the sequence. Trying to use binary
search on a container such as a linked list makes little sense and it is better use a plain linear
search instead.

Binary search in standard libraries


C++'s Standard Template Library implements binary search in algorithms lower_bound,
upper_bound, binary_search and equal_range, depending exactly on what you need to do. Java
has a built-in Arrays.binary_search method for arrays and the .NET Framework has
Array.BinarySearch.

You're best off using library functions whenever possible, since, as you'll see, implementing
binary search on your own can be tricky.

Beyond arrays: the discrete binary search


This is where we start to abstract binary search. A sequence (array) is really just a function
which associates integers (indices) with the corresponding values. However, there is no reason
to restrict our usage of binary search to tangible sequences. In fact, we can use the same
3

algorithm described above on any monotonic function f whose domain is the set of integers. The
only difference is that we replace an array lookup with a function evaluation: we are now looking
for some x such that f(x) is equal to the target value. The search space is now more formally a
subinterval of the domain of the function, while the target value is an element of the codomain.
The power of binary search begins to show now: not only do we need at most O(log N)
comparisons to find the target value, but we also do not need to evaluate the function more than
that many times. Additionally, in this case we aren't restricted by practical quantities such as
available memory, as was the case with arrays.

Taking it further: the main theorem


When you encounter a problem which you think could be solved by applying binary search, you
need some way of proving it will work. I will now present another level of abstraction which will
allow us to solve more problems, make proving binary search solutions very easy and also help
implement them. This part is a tad formal, but don't get discouraged, it's not that bad.

Consider a predicate p defined over some ordered set S (the search space). The search space
consists of candidate solutions to the problem. In this article, a predicate is a function which
returns a boolean value, true or false (we'll also use yes and no as boolean values). We use the
predicate to verify if a candidate solution is legal (does not violate some constraint) according to
the definition of the problem.

What we can call the main theorem states that binary search can be used if and only if for all
x in S, p(x) implies p(y) for all y > x. This property is what we use when we discard the second
half of the search space. It is equivalent to saying that p(x) implies p(y) for all y < x (the
symbol denotes the logical not operator), which is what we use when we discard the first half
of the search space. The theorem can easily be proven, although I'll omit the proof here to
reduce clutter.

Behind the cryptic mathematics I am really stating that if you had a yes or no question (the
predicate), getting a yes answer for some potential solution x means that you'd also get
a yes answer for any element after x. Similarly, if you got a no answer, you'd get a no answer for
any element before x. As a consequence, if you were to ask the question for each element in
the search space (in order), you would get a series of no answers followed by a series
of yesanswers.

Careful readers may note that binary search can also be used when a predicate yields a series
of yes answers followed by a series of no answers. This is true and complementing that
predicate will satisfy the original condition. For simplicity we'll deal only with predicates
described in the theorem.

If the condition in the main theorem is satisfied, we can use binary search to find the smallest
legal solution, i.e. the smallest x for which p(x) is true. The first part of devising a solution based
on binary search is designing a predicate which can be evaluated and for which it makes sense
to use binary search: we need to choose what the algorithm should find. We can have it find
either the first x for which p(x) is true or the last x for which p(x) is false. The difference between
the two is only slight, as you will see, but it is necessary to settle on one. For starters, let us
seek the first yes answer (first option).

The second part is proving that binary search can be applied to the predicate. This is where we
use the main theorem, verifying that the conditions laid out in the theorem are satisfied. The
4

proof doesn't need to be overly mathematical, you just need to convince yourself that p(x)
implies p(y) for all y > x or that p(x) implies p(y) for all y < x. This can often be done by
applying common sense in a sentence or two.

When the domain of the predicate are the integers, it suffices to prove that p(x) implies p(x+1) or
that p(x) implies p(x-1), the rest then follows by induction.

These two parts are most often interleaved: when we think a problem can be solved by binary
search, we aim to design the predicate so that it satisfies the condition in the main theorem.

One might wonder why we choose to use this abstraction rather than the simpler-looking
algorithm we've used so far. This is because many problems can't be modeled as searching for
a particular value, but it's possible to define and evaluate a predicate such as "Is there an
assignment which costs x or less?", when we're looking for some sort of assignment with the
lowest cost. For example, the usual traveling salesman problem (TSP) looks for the cheapest
round-trip which visits every city exactly once. Here, the target value is not defined as such, but
we can define a predicate "Is there a round-trip which costs x or less?" and then apply binary
search to find the smallest x which satisfies the predicate. This is called reducing the original
problem to a decision (yes/no) problem. Unfortunately, we know of no way of efficiently
evaluating this particular predicate and so the TSP problem isn't easily solved by binary search,
but many optimization problems are.

Let us now convert the simple binary search on sorted arrays described in the introduction to
this abstract definition. First, let's rephrase the problem as: "Given an array A and a target
value, return the index of the first element in A equal to or greater than the target
value." Incidentally, this is more or less how lower_bound behaves in C++.

We want to find the index of the target value, thus any index into the array is a candidate
solution. The search space S is the set of all candidate solutions, thus an interval containing all
indices. Consider the predicate "Is A[x] greater than or equal to the target value?". If we were to
find the first x for which the predicate says yes, we'd get exactly what decided we were looking
for in the previous paragraph.

The condition in the main theorem is satisfied because the array is sorted in ascending order: if
A[x] is greater than or equal to the target value, all elements after it are surely also greater than
or equal to the target value.

If we take the sample sequence from before:

0 5 13 19 22 41 55 68 72 81 98

With the search space (indices):

1 2 3 4 5 6 7 8 9 10 11
5

And apply our predicate (with a target value of 55) to it we get:

no no no no no no yes yes yes yes yes

This is a series of no answers followed by a series of yes answers, as we were expecting.


Notice how index 7 (where the target value is located) is the first for which the predicate
yields yes, so this is what our binary search will find.

Implementing the discrete algorithm


One important thing to remember before beginning to code is to settle on what the two numbers
you maintain (lower and upper bound) mean. A likely answer is a closed interval which surely
contains the first x for which p(x) is true. All of your code should then be directed at maintaining
this invariant: it tells you how to properly move the bounds, which is where a bug can easily find
its way in your code, if you're not careful.

Another thing you need to be careful with is how high to set the bounds. By "high" I really mean
"wide" since there are two bounds to worry about. Every so often it happens that a coder
concludes during coding that the bounds he or she set are wide enough, only to find a
counterexample during intermission (when it's too late). Unfortunately, little helpful advice can
be given here other than to always double- and triple-check your bounds! Also, since execution
time increases logarithmically with the bounds, you can always set them higher, as long as it
doesn't break the evaluation of the predicate. Keep your eye out for overflow errors all around,
especially in calculating the median.

Now we finally get to the code which implements binary search as described in this and the
previous section:

binary_search(lo, hi, p):


while lo < hi:
mid = lo + (hi-lo)/2
if p(mid) == true:
hi = mid
else:
lo = mid+1

if p(lo) == false:
complain // p(x) is false for all x in S!

return lo // lo is the least x for which p(x) is true

The two crucial lines are hi = mid and lo = mid+1. When p(mid) is true, we can discard the
second half of the search space, since the predicate is true for all elements in it (by the main
theorem). However, we can not discard mid itself, since it may well be the first element for
which p is true. This is why moving the upper bound to mid is as aggressive as we can do
without introducing bugs.
6

In a similar vein, if p(mid) is false, we can discard the first half of the search space, but this time
including mid. p(mid) is false so we don't need it in our search space. This effectively means we
can move the lower bound to mid+1.

If we wanted to find the last x for which p(x) is false, we would devise (using a similar rationale
as above) something like:

// warning: there is a nasty bug in this snippet!


binary_search(lo, hi, p):
while lo < hi:
mid = lo + (hi-lo)/2 // note: division truncates
if p(mid) == true:
hi = mid-1
else:
lo = mid

if p(lo) == true:
complain // p(x) is true for all x in S!

return lo // lo is the greatest x for which p(x) is false

You can verify that this satisfies our condition that the element we're looking for always be
present in the interval (lo, hi). However, there is another problem. Consider what happens when
you run this code on some search space for which the predicate gives:

no yes

The code will get stuck in a loop. It will always select the first element as mid, but then will not
move the lower bound because it wants to keep the no in its search space. The solution is to
change mid = lo + (hi-lo)/2 to mid = lo + (hi-lo+1)/2, i.e. so that it rounds up instead of down.
There are other ways of getting around the problem, but this one is possibly the cleanest. Just
remember to always test your code on a two-element set where the predicate is false for the
first element and true for the second.

You may also wonder as to why mid is calculated using mid = lo + (hi-lo)/2 instead of the usual
mid = (lo+hi)/2. This is to avoid another potential rounding bug: in the first case, we want the
division to always round down, towards the lower bound. But division truncates, so when lo+hi
would be negative, it would start rounding towards the higher bound. Coding the calculation this
way ensures that the number divided is always positive and hence always rounds as we want it
to. Although the bug doesn't surface when the search space consists only of positive integers or
real numbers, I've decided to code it this way throughout the article for consistency.

Real numbers
Binary search can also be used on monotonic functions whose domain is the set of real
numbers. Implementing binary search on reals is usually easier than on integers, because you
don't need to watch out for how to move bounds:

binary_search(lo, hi, p):


7

while we choose not to terminate:


mid = lo + (hi-lo)/2
if p(mid) == true:
hi = mid
else:
lo = mid

return lo // lo is close to the border between no and yes

Since the set of real numbers is dense, it should be clear that we usually won't be able to find
the exact target value. However, we can quickly find some x such that f(x) is within some
tolerance of the border between no and yes. We have two ways of deciding when to terminate:
terminate when the search space gets smaller than some predetermined bound (say 10-12) or
do a fixed number of iterations. On topcoder, your best bet is to just use a few hundred
iterations, this will give you the best possible precision without too much thinking. 100 iterations
will reduce the search space to approximately 10-30 of its initial size, which should be enough
for most (if not all) problems.

If you need to do as few iterations as possible, you can terminate when the interval gets small,
but try to do a relative comparison of the bounds, not just an absolute one. The reason for this is
that doubles can never give you more than 15 decimal digits of precision so if the search space
contains large numbers (say on the order of billions), you can never get an absolute difference
of less than 10-7.

Example
At this point I will show how all this talk can be used to solve a topcoder problem. For this I have
chosen a moderately difficult problem, FairWorkload, which was the division 1 level 2 problem in
SRM 169.

In the problem, a number of workers need to examine a number of filing cabinets. The cabinets
are not all of the same size and we are told for each cabinet how many folders it contains. We
are asked to find an assignment such that each worker gets a sequential series of cabinets to
go through and that it minimizes the maximum amount of folders that a worker would have to
look through.

After getting familiar with the problem, a touch of creativity is required. Imagine that we have an
unlimited number of workers at our disposal. The crucial observation is that, for some
number MAX, we can calculate the minimum number of workers needed so that each worker
has to examine no more than MAX folders (if this is possible). Let's see how we'd do that. Some
worker needs to examine the first cabinet so we assign any worker to it. But, since the cabinets
must be assigned in sequential order (a worker cannot examine cabinets 1 and 3 without
examining 2 as well), it's always optimal to assign him to the second cabinet as well, if this does
not take him over the limit we introduced (MAX). If it would take him over the limit, we conclude
that his work is done and assign a new worker to the second cabinet. We proceed in a similar
manner until all the cabinets have been assigned and assert that we've used the minimum
number of workers possible, with the artificial limit we introduced. Note here that the number of
workers is inversely proportional to MAX: the higher we set our limit, the fewer workers we will
need.
8

Now, if you go back and carefully examine what we're asked for in the problem statement, you
can see that we are really asked for the smallest MAX such that the number of workers required
is less than or equal to the number of workers available. With that in mind, we're almost done,
we just need to connect the dots and see how all of this fits in the frame we've laid out for
solving problems using binary search.

With the problem rephrased to fit our needs better, we can now examine the predicate Can the
workload be spread so that each worker has to examine no more than x folders, with the limited
number of workers available? We can use the described greedy algorithm to efficiently evaluate
this predicate for any x. This concludes the first part of building a binary search solution, we now
just have to prove that the condition in the main theorem is satisfied. But observe that increasing
x actually relaxes the limit on the maximum workload, so we can only need the same number of
workers or fewer, not more. Thus, if the predicate says yes for some x, it will also say yes for all
larger x.

To wrap it up, here's an STL-driven snippet which solves the problem:

int getMostWork( vector folders, int workers ) {


int n = folders.size();
int lo = *max_element( folders.begin(), folders.end() );
int hi = accumulate( folders.begin(), folders.end(), 0 );

while ( lo < hi ) {
int x = lo + (hi-lo)/2;

int required = 1, current_load = 0;


for ( int i=0; i<n; ++i ) {
if ( current_load + folders[i] <= x ) {
// the current worker can handle it
current_load += folders[i];
}
else {
// assign next worker
++required;
current_load = folders[i];
}
}

if ( required <= workers )


hi = x;
else
lo = x+1;
}

return lo;
}

Note the carefully chosen lower and upper bounds: you could replace the upper bound with any
sufficiently large integer, but the lower bound must not to be less than the largest cabinet to
avoid the situation where a single cabinet would be too large for any worker, a case which
9

would not be correctly handled by the predicate. An alternative would be to set the lower bound
to zero, then handle too small x's as a special case in the predicate.

To verify that the solution doesn't lock up, I used a small no/yes example with folders={1,1} and
workers=1.

The overall complexity of the solution is O(n log SIZE), where SIZE is the size of the search
space. This is very fast.

As you see, we used a greedy algorithm to evaluate the predicate. In other problems, evaluating
the predicate can come down to anything from a simple math expression to finding a maximum
cardinality matching in a bipartite graph.

Conclusion
If you've gotten this far without giving up, you should be ready to solve anything that can be
solved with binary search. Try to keep a few things in mind:

Design a predicate which can be efficiently evaluated and so that binary search can be
applied
Decide on what you're looking for and code so that the search space always contains
that (if it exists)
If the search space consists only of integers, test your algorithm on a two-element set to
be sure it doesn't lock up
Verify that the lower and upper bounds are not overly constrained: it's usually better to
relax them as long as it doesn't break the predicate

Quick Sort
geeksquiz.com

QuickSort is a Divide and Conquer algorithm. It picks an element as pivot and partitions the
given array around the picked pivot. There are many different versions of quickSort that pick
pivot in different ways.
1) Always pick first element as pivot.
2) Always pick last element as pivot (implemented below)
3) Pick a random element as pivot.
4) Pick median as pivot.

The key process in quickSort is partition(). Target of partitions is, given an array and an element
x of array as pivot, put x at its correct position in sorted array and put all smaller elements
(smaller than x) before x, and put all greater elements (greater than x) after x. All this should be
done in linear time.

Partition Algorithm
There can be many ways to do partition, following code adopts the method given in CLRS book.
10

The logic is simple, we start from the leftmost element and keep track of index of smaller (or
equal to) elements as i. While traversing, if we find a smaller element, we swap current element
with arr[i]. Otherwise we ignore current element.

Implementation:
Following is C++ implementation of QuickSort.

/* A typical recursive implementation of quick sort */


#include<stdio.h>

// A utility function to swap two elements


void swap(int* a, int* b)
{
int t = *a;
*a = *b;
*b = t;
}

/* This function takes last element as pivot, places the pivot element at its
correct position in sorted array, and places all smaller (smaller than
pivot)
to left of pivot and all greater elements to right of pivot */
int partition (int arr[], int l, int h)
{
int x = arr[h]; // pivot
int i = (l - 1); // Index of smaller element

for (int j = l; j <= h- 1; j++)


{
// If current element is smaller than or equal to pivot
if (arr[j] <= x)
{
i++; // increment index of smaller element
swap(&arr[i], &arr[j]); // Swap current element with index
}
}
swap(&arr[i + 1], &arr[h]);
return (i + 1);
}

/* arr[] --> Array to be sorted, l --> Starting index, h --> Ending index
*/
void quickSort(int arr[], int l, int h)
{
if (l < h)
{
int p = partition(arr, l, h); /* Partitioning index */
quickSort(arr, l, p - 1);
quickSort(arr, p + 1, h);
}
}

/* Function to print an array */


void printArray(int arr[], int size)
11

{
int i;
for (i=0; i < size; i++)
printf("%d ", arr[i]);
printf("\n");
}

// Driver program to test above functions


int main()
{
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr)/sizeof(arr[0]);
quickSort(arr, 0, n-1);
printf("Sorted array: \n");
printArray(arr, n);
return 0;
}

Output:

Sorted array:
1 5 7 8 9 10

Analysis of QuickSort

Time taken by QuickSort in general can be written as following.

T(n) = T(k) + T(n-k-1) + (n)

The first two terms are for two recursive calls, the last term is for the partition process. k is the
number of elements which are smaller than pivot.
The time taken by QuickSort depends upon the input array and partition strategy. Following are
three cases.

Worst Case: The worst case occurs when the partition process always picks greatest or smallest
element as pivot. If we consider above partition strategy where last element is always picked as
pivot, the worst case would occur when the array is already sorted in increasing or decreasing
order. Following is recurrence for worst case.

T(n) = T(0) + T(n-1) + (n)


which is equivalent to
T(n) = T(n-1) + (n)

The solution of above recurrence is (n2).

Best Case: The best case occurs when the partition process always picks the middle element as
pivot. Following is recurrence for best case.

T(n) = 2T(n/2) + (n)


12

The solution of above recurrence is (nLogn). It can be solved using case 2 of Master Theorem.

Average Case:
To do average case analysis, we need to consider all possible permutation of array and calculate
time taken by every permutation which doesnt look easy.
We can get an idea of average case by considering the case when partition puts O(n/9) elements
in one set and O(9n/10) elements in other set. Following is recurrence for this case.

T(n) = T(n/9) + T(9n/10) + (n)

Solution of above recurrence is also O(nLogn)

Although the worst case time complexity of QuickSort is O(n2) which is more than many other
sorting algorithms like Merge Sort and Heap Sort, QuickSort is faster in practice, because its
inner loop can be efficiently implemented on most architectures, and in most real-world data.
QuickSort can be implemented in different ways by changing the choice of pivot, so that the
worst case rarely occurs for a given type of data. However, merge sort is generally considered
better when data is huge and stored in external storage.

Merge Sort
geeksquiz.com

MergeSort is a Divide and Conquer algorithm. It divides input array in two halves, calls itself for
the two halves and then merges the two sorted halves. The merg() function is used for merging
two halves. The merge(arr, l, m, r) is key process that assumes that arr[l..m] and arr[m+1..r] are
sorted and merges the two sorted sub-arrays into one. See following C implementation for
details.

MergeSort(arr[], l, r)
If r > l
1. Find the middle point to divide the array into two halves:
middle m = (l+r)/2
2. Call mergeSort for first half:
Call mergeSort(arr, l, m)
3. Call mergeSort for second half:
Call mergeSort(arr, m+1, r)
4. Merge the two halves sorted in step 2 and 3:
Call merge(arr, l, m, r)

The following diagram from wikipedia shows the complete merge sort process for an example
array {38, 27, 43, 3, 9, 82, 10}. If we take a closer look at the diagram, we can see that the array
is recursively divided in two halves till the size becomes 1. Once the size becomes 1, the merge
processes comes into action and starts merging arrays back till the complete array is merged.
13

/* C program for merge sort */


#include<stdlib.h>
#include<stdio.h>

/* Function to merge the two haves arr[l..m] and arr[m+1..r] of array arr[]
*/
void merge(int arr[], int l, int m, int r)
{
int i, j, k;
int n1 = m - l + 1;
int n2 = r - m;

/* create temp arrays */


int L[n1], R[n2];

/* Copy data to temp arrays L[] and R[] */


for(i = 0; i < n1; i++)
L[i] = arr[l + i];
for(j = 0; j < n2; j++)
R[j] = arr[m + 1+ j];

/* Merge the temp arrays back into arr[l..r]*/


i = 0;
j = 0;
k = l;
while (i < n1 && j < n2)
{
14

if (L[i] <= R[j])


{
arr[k] = L[i];
i++;
}
else
{
arr[k] = R[j];
j++;
}
k++;
}

/* Copy the remaining elements of L[], if there are any */


while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}

/* Copy the remaining elements of R[], if there are any */


while (j < n2)
{
arr[k] = R[j];
j++;
k++;
}
}

/* l is for left index and r is right index of the sub-array


of arr to be sorted */
void mergeSort(int arr[], int l, int r)
{
if (l < r)
{
int m = l+(r-l)/2; //Same as (l+r)/2, but avoids overflow for large l
and h
mergeSort(arr, l, m);
mergeSort(arr, m+1, r);
merge(arr, l, m, r);
}
}

/* UITLITY FUNCTIONS */
/* Function to print an array */
void printArray(int A[], int size)
{
int i;
for (i=0; i < size; i++)
printf("%d ", A[i]);
printf("\n");
}

/* Driver program to test above functions */


int main()
15

{
int arr[] = {12, 11, 13, 5, 6, 7};
int arr_size = sizeof(arr)/sizeof(arr[0]);

printf("Given array is \n");


printArray(arr, arr_size);

mergeSort(arr, 0, arr_size - 1);

printf("\nSorted array is \n");


printArray(arr, arr_size);
return 0;
}

Output:

Given array is
12 11 13 5 6 7

Sorted array is
5 6 7 11 12 13

Time Complexity: Sorting arrays on different machines. Merge Sort is a recursive algorithm and
time complexity can be expressed as following recurrence relation.
T(n) = 2T(n/2) +
The above recurrence can be solved either using Recurrence Tree method or Master method. It
falls in case II of Master Method and solution of the recurrence is .
Time complexity of Merge Sort is in all 3 cases (worst, average and best) as merge
sort always divides the array in two halves and take linear time to merge two halves.

Auxiliary Space: O(n)

Algorithmic Paradigm: Divide and Conquer

Sorting In Place: No in a typical implementation

Stable: Yes

Applications of Merge Sort


1) Merge Sort is useful for sorting linked lists in O(nLogn) time. Other nlogn algorithms like
Heap Sort, Quick Sort (average case nLogn) cannot be applied to linked lists.
2) Inversion Count Problem
3) Used in External Sorting
16

Suffix Arrays
codechef.com

a suffix array is just a sorted array of all the suffixes of a given string.

As a data structure, it is widely used in areas such as data compression, bioinformatics and, in
general, in any area that deals with strings and string matching problems, so, as you can see, it is
of great importance to know efficient algorithms to construct a suffix array for a given string.

Please note that on this context, the name suffix is the exact same thing as substring, as you can
see from the wikipedia link provided.

A suffix array will contain integers that represent the starting indexes of the all the suffixes of a
given string, after the aforementioned suffixes are sorted.

On some applications of suffix arrays, it is common to paddle the string with a special character
(like #, @ or $) that is not present on the alphabet that is being used to represent the string and,
as such, it's considered to be smaller than all the other characters. (The reason why these special
characters are used will hopefully be clearer ahead in this text)

And, as a picture it's worth more than a thousand words, below is a small scheme which
represents the several suffixes of a string (on the left) along with the suffix array for the same
string (on the right). The original string is attcatg$.
17

The above picture describes what we want to do, and our goal with this text will be to explore
different ways of doing this in the hope of obtaining a good solution.

We will enumerate some popular algorithms for this task and will actually implement some of
them in C++ (as you will see, some of them are trivial to implement but can be too slow, while
others have faster execution times at the cost of both implementation and memory complexity).

The naive algorithm

We shall begin our exploration of this very interesting topic by first studying the most naive
algorithm available to solve our problem, which is also the most simple one to implement.

The key idea of the naive algorithm is using a good comparison-based sorting algorithm to
sort all the suffixes of a given string in the fastest possible way. Quick-sort does this task very
well.

However we should remind ourselves that we are sorting strings, so, either we use the
overloaded < sign to serve as a "comparator" for strings (this is done internally in C++ for the
string data type) or we write our own string comparison function, which is basically the same
thing regarding time complexity, with the former alternative consuming us more time on the
writing of code. As such, on my own implementation I chose to keep things simple and used the
built-in sort() function applied to a vector of strings. As to compare two strings, we are forced to
iterate over all its characters, the time complexity to compare strings is O(N), which means that:

On the naive approach, we are sorting N strings with an O(N log N) comparison based
sorting algorithm. As comparing strings takes O(N) time, we can conclude that the time
complexity of our naive approach is O(N2 log N)

After sorting all the strings, we need to be able to "retrieve" the original index that each string
had initially so we can actually build the suffix array itself.

[Sidenote: As written on the image, the indexes just "come along for the ride".

To do this, I simply used a map as an auxiliary data structure, such that the keys are the strings
that will map to the values which are the original indexes the strings had on the original array.
Now, retrieving these values is trivial.]

Below, you can find the code for the naive algorithm for constructing the Suffix Array of a
given string entered by the user as input:

//Naive algorithm for the construction of the suffix array of a given string
#include <iostream>
#include <string>
#include <map>
#include <algorithm>
#include <vector>
using namespace std;
18

int main()
{
string s;
map<string,int> m;
cin >> s;
vector<string> v;
for(int i = 0; i < s.size();i++)
{
m[s.substr(i,s.size()-i)] = i;
v.push_back(s.substr(i,s.size()-i));
}
sort(v.begin(),v.end());
for(int i = 0; i < v.size();i++)
{
cout << m[v[i]] << endl;
}
return 0;
}

As you can see by the above code snippet, the implementation of the naive approach is pretty
straightforward and very robust as little to virtually no space for errors is allowed if one uses
built-in sorting functions.

However, such simplicity comes with an associated cost, and on this case, such cost is paid with
a relatively high time complexity which is actually impractical for most problems. So, we need
to tune up this approach a bit and attempt to devise a better algorithm.

This is what will be done on the next section.

A clever approach of building the Suffix Array of a given string

As noted above, Suffix Array construction is simple, but an efficient Suffix Array
construction is hard.

However, after some thinking we can actually have a very defined idea of why we are
performing so badly on such construction.

The reason why we are doing badly on the construction of the SA is because we are NOT
EXPLOITING the fact that the strings we are sorting, are actually all part of the SAME original
string, and not random, unrelated strings.

However, how can this observation help us?

This observation can help us greatly because now we can actually use tuples that contain only
some characters of the string (which we will group in powers of two) such that we can sort the
strings in a more ordered fashion by their first two characters, then we can improve on and sort
them by their first four characters and so on, until we have reached a length such that we can be
sure all the strings are themselves sorted.
19

With this observation at hand, we can actually cut down the execution time of our SA
construction algorithm from O(N2 log N) to O(N log2 N).

Using the amazing work done by @gamabunta, I can provide his explanation of this approach,
along with his pseudo-code and later improve a little bit upon it by actually providing an actual
C++ implementation of this idea:

@gamabunta's work

Let us consider the original array or suffixes, sorted only according to the first 2 character. If
the first 2 character is the same, we consider that the strings have the same sort index.

Sort-Index Suffix-Index

0 10: i
1 7: ippi
2 1: ississippi
2 4: issippi
3 0: mississippi
4 9: pi
5 8: ppi
6 3: sissippi
6 6: sippi
7 2: ssissippi
7 5: ssippi

Now, we wish to use the above array, and sort the suffixes according to their first 4
characters. To achieve this, we can assign 2-tuples to each string. The first value in the 2-tuple
is the sort-index of the respective suffix, from above. The second value in the 2-tuple is the
sort-index of the suffix that starts 2 positions later, again from above.

If the length of the suffix is less than 2 characters, then we can keep the second value in the 2-
tuple as -1.

Sort-Index Suffix-Index Suffix-Index


after first 2 chars
and 2-tuple assigned

0 10: i -1 (0, -1)


1 7: ippi 9 (1, 4)
2 1: ississippi 3 (2, 6)
2 4: issippi 6 (2, 6)
3 0: mississippi 2 (3, 7)
4 9: pi -1 (4, -1)
5 8: ppi 10 (5, 0)
6 3: sissippi 5 (6, 7)
6 6: sippi 8 (6, 5)
7 2: ssissippi 4 (7, 2)
7 5: ssippi 7 (7, 1)
20

Now, we can call quick-sort and sort the suffixes according to their first 4 characters by using
the 2-tuples we constructed above! The result would be

Sort-Index Suffix-Index

0 10: i
1 7: ippi
2 1: ississippi
2 4: issippi
3 0: mississippi
4 9: pi
5 8: ppi
6 3: sissippi
7 6: sippi
8 2: ssissippi
9 5: ssippi

Similarly constructing the 2-tuples and performing quick-sort again will give us suffixes sorted
by their first 8 characters.

Thus, we can sort the suffixes by the following pseudo-code

SortIndex[][] = { 0 }

for i = 0 to N-1
SortIndex[0][i] = order index of the character at A[i]

doneTill = 1
step = 1

while doneTill < N


L[] = { (0,0,0) } // Array of 3 tuples

for i = 0 to N-1
L[i] = ( SortIndex[step - 1][i],
SortIndex[step - 1][i + doneTill],
i
)
// We need to store the value of i to be able to retrieve the index

sort L

for i = 0 to N-1
SortIndex[step][L[i].thirdValue] =
SortIndex[step][L[i-1].thirdValue], if L[i] and L[i-1] have the same
first and second values
i, otherwise

++step
doneTill *= 2

The above algorithm will find the Suffix Array in O(N log2 N).
21

end of @gamabunta's work

Below you can find a C++ implementation of the above pseudo-code:

#include <cstdio>
#include <algorithm>
#include <cstring>
using namespace std;

#define MAXN 65536


#define MAXLG 17

char A[MAXN];

struct entry
{
int nr[2];
int p;
} L[MAXN];

int P[MAXLG][MAXN];
int N,i;
int stp, cnt;

int cmp(struct entry a, struct entry b)


{
return a.nr[0]==b.nr[0] ?(a.nr[1]<b.nr[1] ?1: 0): (a.nr[0]<b.nr[0] ?1:
0);
}

int main()
{
gets(A);
for(N=strlen(A), i = 0; i < N; i++)
P[0][i] = A[i] - 'a';

for(stp=1, cnt = 1; cnt < N; stp++, cnt *= 2)


{
for(i=0; i < N; i++)
{
L[i].nr[0]=P[stp- 1][i];
L[i].nr[1]=i +cnt <N? P[stp -1][i+ cnt]:-1;
L[i].p= i;
}
sort(L, L+N, cmp);
for(i=0; i < N; i++)
P[stp][L[i].p] =i> 0 && L[i].nr[0]==L[i-1].nr[0] && L[i].nr[1] ==
L[i- 1].nr[1] ? P[stp][L[i-1].p] : i;
}
return 0;
}

This concludes the explanation of a more efficient approach on building the suffix array for a
given string. The runtime is, as said above, O(N log2 N).
22

Constructing (and explaining) the LCP array

The LCP array (Longest Common Prefix) is an auxiliary data structure to the suffix array. It
stores the lengths of the longest common prefixes between pairs of consecutive suffixes in the
suffix array.

So, if one has built the Suffix Array, it's relatively simple to actually build the LCP array.

In fact, using once again @gamabunta's amazing work, below there is the pseudo-code which
allows one to efficiently find the LCP array:

We can use the SortIndex array we constructed above to find the Longest Common Prefix,
between any two prefixes.

FindLCP (x, y)
answer = 0

for k = ceil(log N) to 0
if SortIndex[k][x] = SortIndex[k][y]
// sort-index is same if the first k characters are same
answer += 2k
// now we wish to find the characters that are same in the
remaining strings
x += 2k
y += 2k

The LCP Array is the array of Longest Common Prefixes between the ith suffix and the (i-1)th
suffix in the Suffix Array. The above algorithm needs to be called N times to build the LCP
Array in a total of O(N log N) time.

Topcoder.com
Suffix Array is the sorted array of suffixes of text T. We represent it by an array of |T| integers, if SA[i] =
j, then it means that the suffix starting at T[j .. n] is the i-th smallest suffix.

1. Construction

Let S[i] be T[i .. n], and Compare(S[i], S[j], t) (it may be equal to '=', '<', or '>') be the comparison
of S[i] and S[j] based on their first t symbols. The main idea is that if we know the comparison
results considering first t symbols, we can find the result of Compare(S[i], S[j], 2t) in O(1). How?
There are 2 cases:

+ Compare(S[i], S[j], t) is not '=', then the result of Compare(S[i], S[j], 2t) is the same as
Compare(S[i], S[j], t).
+ Compare(S[i], S[j], t) is '=', then we should compare last t characters of these two suffixes
23

(since the first t symbols are same), i.e. the result is equal to Compare(T[i+t .. n], T[j+t .. n], t),
which is equal to Compare(S[i+t], S[j+t], t). and since we know the result of this, we can find the
result of Compare(S[i], S[j], 2t) in O(1).

Now, the suffix sorting algorithm is as follows:


+ At first step, we partition the suffixes based on their first symbol. We will have some buckets,
and the first symbol of two suffixes which are in the same bucket are the same. we set t = 1.
(Compare(S[i], S[j], t) is '<' if Bucket[i] < Bucket[j] after t-th step ).
+ At each other step:
- (*) Sort the suffixes based on their first 2t symbols,
- Change the bucket boundaries so that all suffixes with the same initial 2t characters are in the
same bucket.
- Set t = 2*t.

If we do the (*) using an O(n log n) sort algorithm, we can construct the suffix array in O(n log^2
n). Although, there are better algorithms.
2. Pattern Matching
Since we have the sorted array of suffixes of T, and every substring of T is a prefix of a suffix, we
can answer the queries:
+ If pattern P occurs in T?
+ Count the occurrences of P in T.
in O( |P| log n ) using binary search ( O(log n) steps, each taking a O(|P|)).
and locate the patterns in O( |P| log n + occ ) where occ = number of occurrences.
Although, there are better algorithms for this part too.

And here is a simple annotated suffix sorting code:


#include <iostream>
#include <string>
#include <algorithm>
#include <ctime>
using namespace std;
#define N 10000000

char str[N];
int H = 0, Bucket[N], nBucket[N], c;

struct Suffix{
int idx; // Suffix starts at idx, i.e. it's str[ idx .. L-1 ]
bool operator<(const Suffix& sfx) const
// Compares two suffixes based on their first 2H symbols,
// assuming we know the result for H symbols.
{
if(H == 0) return str[idx] < str[sfx.idx];
else if(Bucket[idx] == Bucket[sfx.idx])
return (Bucket[idx+H] < Bucket[sfx.idx+H]);
else
return (Bucket[idx] < Bucket[sfx.idx]);
}
24

bool operator==(const Suffix& sfx) const


{
return !(*this < sfx) && !(sfx < *this);
}
} Pos[N];

int UpdateBuckets(int L)
{
int start = 0, id = 0, c = 0;
for(int i = 0; i < L; i++)
{
/*
If Pos[i] is not equal to Pos[i-1], a new bucket has
started.
*/
if(i != 0 && !(Pos[i] == Pos[i-1]))
{
start = i;
id++;
}
if(i != start) // if there is bucket with size larger than 1,
we should continue ...
c = 1;
nBucket[Pos[i].idx] = id; // Bucket for suffix starting at
Pos[i].idx is id ...
}
memcpy(Bucket, nBucket, 4 * L);
return c;
}

void SuffixSort(int L)
{
for(int i = 0; i < L; i++) Pos[i].idx = i;
// H == 0, Sort based on first Character.
sort(Pos, Pos + L);
// Create initial buckets
c = UpdateBuckets(L);
for(H=1;c;H *= 2) {
// Sort based on first 2*H symbols, assuming that we have
sorted based on first H character
sort(Pos, Pos+L);
// Update Buckets based on first 2*H symbols
c = UpdateBuckets(L);
}
}

int main()
{
cin >> str;
int L = strlen(str) + 1;
int cl = clock();
SuffixSort(L);
cerr << (clock() - cl) * 0.001 << endl;
for(int i = 0; i < L; i++)
cout << "'" << str + Pos[i].idx << "'" << endl;
return 0;
}
25

Knuth-Morris-Pratt Algorithm (KMP)


Topcoder.com

Introduction to String Searching Algorithms

By TheLlama
topcoder member

The fundamental string searching (matching) problem is defined as follows: given two strings - a
text and a pattern, determine whether the pattern appears in the text. The problem is also known
as "the needle in a haystack problem."

The "Naive" Method

Its idea is straightforward -- for every position in the text, consider it a starting position of the
pattern and see if you get a match.

function brute_force(text[], pattern[])


{
// let n be the size of the text and m the size of the
// pattern

for(i = 0; i < n; i++) {


for(j = 0; j < m && i + j < n; j++)
if(text[i + j] != pattern[j]) break;
// mismatch found, break the inner loop
if(j == m) // match found
}
}

The "naive" approach is easy to understand and implement but it can be too slow in some cases.
If the length of the text is n and the length of the pattern m, in the worst case it may take as much
as (n * m) iterations to complete the task.

It should be noted though, that for most practical purposes, which deal with texts based on
human languages, this approach is much faster since the inner loop usually quickly finds a
mismatch and breaks. A problem arises when we are faced with different kinds of "texts," such
as the genetic code.

Rabin-Karp Algorithm (RK)

This is actually the "naive" approach augmented with a powerful programming technique - the
hash function.

Every string s[] of length m can be seen as a number H written in a positional numeral system in
base B (B >= size of the alphabet used in the string):

H = s[0] * B(m - 1) + s[1] * B(m - 2) + + s[m - 2] * B1 + s[m - 1] * B0


26

If we calculate the number H (the hash value) for the pattern and the same number for every
substring of length m of the text than the inner loop of the "naive" method will disappear -
instead of comparing two strings character by character we will have just to compare two
integers.

A problem arises when m and B are big enough and the number H becomes too large to fit into
the standard integer types. To overcome this, instead of the number H itself we use its remainder
when divided by some other number M. To get the remainder we do not have to calculate H.
Applying the basic rules of modular arithmetic to the above expression:

A + B = C => (A % M + B % M) % M = C % M

A * B = C => ((A % M) * (B % M)) % M = C % M

We get:

H % M = (((s[0] % M) * (B(m - 1) % M)) % M + ((s[1] % M) * (B(m - 2) % M)) % M +

+ ((s[m - 2] % M) * (B1 % M)) % M + ((s[m - 1] % M) * (B0 % M)) % M) % M

The drawback of using remainders is that it may turn out that two different strings map to the
same number (it is called a collision). This is less likely to happen if M is sufficiently large and B
and M are prime numbers. Still this does not allow us to entirely skip the inner loop of the
"naive" method. However, its usage is significantly limited. We have to compare the "candidate"
substring of the text with the pattern character by character only when their hash values are
equal.

Obviously the approach described so far would be absolutely useless if we were not able to
calculate the hash value for every substring of length m in the text in just one pass through the
entire text. At first glance to do these calculations we will again need two nested loops: an outer
one -- to iterate through all possible starting positions -- and an inner one -- to calculate the hash
function for every starting position. Fortunately, this is not the case. Let's consider a string s[],
and let's suppose we are to calculate the hash value for every substring in s[] with length say m =
3. It is easy to see that:

H0 = Hs[0]s[2] = s[0] * B2 + s[1] * B + s[2]

H1 = Hs[1]..s[3] = s[1] * B2 + s[2] * B + s[3]

H1 = (H0 - s[0] * B2 ) * B + s[3]

In general:

Hi = ( Hi - 1 - s[i- 1] * Bm - 1 ) * B + s[i + m - 1]

Applying again the rules of modular arithmetic, we get:


27

Hi % M = (((( Hi - 1 % M - ((s[i- 1] % M) * (Bm - 1 % M)) % M ) % M) * (B % M)) % M


+

+ s[i + m - 1] % M) % M

Obviously the value of (Hi - 1 - s[i - 1] * Bm - 1) may be negative. Again, the rules of modular
arithmetic come into play:

A - B = C => (A % M - B % M + k * M) % M = C % M

Since the absolute value of (Hi - 1 - s[i - 1] * Bm - 1) is between 0 and (M - 1), we can safely use
a value of 1 for k.

Pseudocode for RK follows:

// correctly calculates a mod b even if a < 0


function int_mod(int a, int b)
{
return (a % b + b) % b;
}

function Rabin_Karp(text[], pattern[])


{
// let n be the size of the text, m the size of the
// pattern, B - the base of the numeral system,
// and M - a big enough prime number

if(n < m) return; // no match is possible

// calculate the hash value of the pattern


hp = 0;
for(i = 0; i < m; i++)
hp = int_mod(hp * B + pattern[i], M);

// calculate the hash value of the first segment


// of the text of length m
ht = 0;
for(i = 0; i < m; i++)
ht = int_mod(ht * B + text[i], M);

if(ht == hp) check character by character if the first


segment of the text matches the pattern;

// start the "rolling hash" - for every next character in


// the text calculate the hash value of the new segment
// of length m; E = (Bm-1) modulo M
for(i = m; i < n; i++) {
ht = int_mod(ht - int_mod(text[i - m] * E, M), M);
ht = int_mod(ht * B, M);
ht = int_mod(ht + text[i], M);

if(ht == hp) check character by character if the


current segment of the text matches
28

the pattern;
}
}

Unfortunately, there are still cases when we will have to run the entire inner loop of the "naive"
method for every starting position in the text -- for example, when searching for the pattern "aaa"
in the string "aaaaaaaaaaaaaaaaaaaaaaaaa" -- so in the worst case we will still need (n * m)
iterations. How do we overcome this?

Let's go back to the basic idea of the method -- to replace the string comparison character by
character by a comparison of two integers. In order to keep those integers small enough we have
to use modular arithmetic. This causes a "side effect" -- the mapping between strings and
integers ceases to be unique. So now whenever the two integers are equal we still have to
"confirm" that the two strings are identical by running character-by-character comparison. It can
become a kind of vicious circle

The way to solve this problem is "rational gambling," or the so called "double hash" technique.
We "gamble" -- whenever the hash values of two strings are equal, we assume that the strings are
identical, and do not compare them character by character. To make the likelihood of a "mistake"
negligibly small we compute for every string not one but two independent hash values based on
different numbers B and M. If both are equal, we assume that the strings are identical.
Sometimes even a "triple hash" is used, but this is rarely justifiable from a practical point of
view.

The "pure" form of "the needle in a haystack problem" is considered too straightforward and is
rarely seen in programming challenges. However, the "rolling hash" technique used in RK is an
important weapon. It is especially useful in problems where we have to look at all substrings of
fixed length of a given text. An example is "the longest common substring problem": given two
strings find the longest string that is a substring of both. In this case, the combination of binary
search (BS) and "rolling hash" works quite well. The important point that allows us to use BS is
the fact that if the given strings have a common substring of length n, they also have at least one
common substring of any length m < n. And if the two strings do not have a common substring
of length n they do not have a common substring of any length m > n. So all we need is to run a
BS on the length of the string we are looking for. For every substring of the first string of the
length fixed in the BS we insert it in a hash table using one hash value as an index and a second
hash value ("double hash") is inserted in the table. For every substring of the fixed length of the
second string, we calculate the corresponding two hash values and check in the table to see if
they have been already seen in the first string. A hash table based on open addressing is very
suitable for this task.

Of course in "real life" (real challenges) the number of the given strings may be greater than two,
and the longest substring we are looking for should not necessarily be present in all the given
strings. This does not change the general approach.

Another type of problems where the "rolling hash" technique is the key to the solution are those
that ask us to find the most frequent substring of a fixed length in a given text. Since the length is
already fixed we do not need any BS. We just use a hash table and keep track of the frequencies.
29

Knuth-Morris-Pratt Algorithm (KMP)

In some sense, the "naive" method and its extension RK reflect the standard approach of human
logic to "the needle in a haystack problem". The basic idea behind KMP is a bit different. Let's
suppose that we are able, after one pass through the text, to identify all positions where an
existing match with the pattern ends. Obviously, this will solve our problem. Since we know the
length of the pattern, we can easily identify the starting position of every match.

Is this approach feasible? It turns out that it is, when we apply the concept of the automaton. We
can think of an automaton as of a kind of abstract object, which can be in a finite number of
states. At each step some information is presented to it. Depending on this information and its
current state the automaton goes to a new state, uniquely determined by a set of internal rules.
One of the states is considered as "final". Every time we reach this "final" state we have found an
end position of a match.

The automaton used in KMP is just an array of "pointers" (which represents the "internal rules")
and a separate "external" pointer to some index of that array (which represents the "current
state"). When the next character from the text is presented to the automaton, the position of the
"external" pointer changes according to the incoming character, the current position, and the set
of "rules" contained in the array. Eventually a "final" state is reached and we can declare that we
have found a match.

The general idea behind the automaton is relatively simple. Let us consider the string

ABABAC

as a pattern, and let's list all its prefixes:

0 /the empty string/

1A

2AB

3ABA

4ABAB

5ABABA

6ABABAC

Let us now consider for each such listed string (prefix) the longest proper suffix (a suffix
different from the string itself), which is at the same time a prefix of it:
30

0 /the empty string/

1 /the empty string/

2 /the empty string/

3A

4AB

5ABA

6 /the empty string/

It's easy to see that if we have at some point a partial match up to say the prefix (A B A B A) we
also have a partial match up to the prefixes (A B A), and (A) - which are both prefixes of the
initial string and suffix/prefixes of the current match. Depending on the next "incoming"
character from the text, three cases arise:

1. The next character is C. We can "expand" the match at the level of the prefix (A B A B A). In this
particular case this leads to a full match and we just notice this fact.
2. The next character is B. The partial match for the prefix (A B A B A) cannot be "expanded". The
best we can do is to return to the largest different partial match we have so far - the prefix (A B
A) and try to "expand" it. Now B "fits" so we continue with the next character from the text and
our current "best" partial match will become the string (A B A B) from our "list of prefixes".
3. The "incoming" character is, for example, D. The "journey" back to (A B A) is obviously
insufficient to "expand" the match. In this case we have to go further back to the second largest
partial match (the second largest proper suffix of the initial match that is at the same time a
prefix of it) - that is (A) and finally to the empty string (the third largest proper suffix in our
case). Since it turns out that there is no way to "expand" even the empty string using the
character D, we skip D and go to the next character from the text. But now our "best" partial
match so far will be the empty string.

In order to build the KMP automaton (or the so called KMP "failure function") we have to
initialize an integer array F[]. The indexes (from 0 to m - the length of the pattern) represent the
numbers under which the consecutive prefixes of the pattern are listed in our "list of prefixes"
above. Under each index is a "pointer" - that identifies the index of the longest proper suffix,
which is at the same time a prefix of the given string (or in other words F[i] is the index of next
best partial match for the string under index i). In our case (the string A B A B A C) the array F[]
will look as follows:

F[0] = 0

F[1] = 0

F[2] = 0
31

F[3] = 1

F[4] = 2

F[5] = 3

F[6] = 0

Notice that after initialization F[i] contains information not only about the largest next partial
match for the string under index i but also about every partial match of it. F[i] is the first best
partial match, F[F[i]] - is the second best, F[F[F[i]]] - the third, and so on. Using this information
we can calculate F[i] if we know the values F[k] for all k < i. The best next partial match of
string i will be the largest partial match of string i - 1 whose character that "expands" it is equal
to the last character of string i. So all we need to do is to check every partial match of string i - 1
in descending order of length and see if the last character of string i "expands" the match at this
level. If no partial match can be "expanded" than F[i] is the empty string. Otherwise F[i] is the
largest "expanded" partial match (after its "expansion").

In terms of pseudocode the initialization of the array F[] (the "failure function") may look like
this:

// Pay attention!
// the prefix under index i in the table above is
// is the string from pattern[0] to pattern[i - 1]
// inclusive, so the last character of the string under
// index i is pattern[i - 1]

function build_failure_function(pattern[])
{
// let m be the length of the pattern

F[0] = F[1] = 0; // always true

for(i = 2; i <= m; i++) {


// j is the index of the largest next partial match
// (the largest suffix/prefix) of the string under
// index i - 1
j = F[i - 1];
for( ; ; ) {
// check to see if the last character of string i -
// - pattern[i - 1] "expands" the current "candidate"
// best partial match - the prefix under index j
if(pattern[j] == pattern[i - 1]) {
F[i] = j + 1; break;
}
// if we cannot "expand" even the empty string
if(j == 0) { F[i] = 0; break; }
// else go to the next best "candidate" partial match
j = F[j];
}
}
32

The automaton consists of the initialized array F[] ("internal rules") and a pointer to the index of
the prefix of the pattern that is the best (largest) partial match that ends at the current position in
the text ("current state"). The use of the automaton is almost identical to what we did in order to
build the "failure function". We take the next character from the text and try to "expand" the
current partial match. If we fail, we go to the next best partial match of the current partial match
and so on. According to the index where this procedure leads us, the "current state" of the
automaton is changed. If we are unable to "expand" even the empty string we just skip this
character, go to the next one in the text, and the "current state" becomes zero.

function Knuth_Morris_Pratt(text[], pattern[])


{
// let n be the size of the text, m the
// size of the pattern, and F[] - the
// "failure function"

build_failure_function(pattern[]);

i = 0; // the initial state of the automaton is


// the empty string

j = 0; // the first character of the text

for( ; ; ) {
if(j == n) break; // we reached the end of the text

// if the current character of the text "expands" the


// current match
if(text[j] == pattern[i]) {
i++; // change the state of the automaton
j++; // get the next character from the text
if(i == m) // match found
}

// if the current state is not zero (we have not


// reached the empty string yet) we try to
// "expand" the next best (largest) match
else if(i > 0) i = F[i];

// if we reached the empty string and failed to


// "expand" even it; we go to the next
// character from the text, the state of the
// automaton remains zero
else j++;
}
}

Many problems in programming challenges focus more on the properties of KMP's "failure
function," rather than on its use for string matching. An example is: given a string (a quite long
one), find all its proper suffixes that are also prefixes of it. All we have to do is just to calculate
the "failure function" of the given string and using the information stored in it to print the
answer.
33

A typical problem seen quite often is: given a string find its shortest substring, such that the
concatenation of one or more copies of it results in the original string. Again the problem can be
reduced to the properties of the failure function. Let's consider the string

ABABAB

and all its proper suffix/prefixes in descending order:

1ABAB

2AB

3 /the empty string/

Every such suffix/prefix uniquely defines a string, which after being "inserted" in front of the
given suffix/prefix gives the initial string. In our case:

1AB

2ABAB

3ABABAB

Every such "augmenting" string is a potential "candidate" for a string, the concatenation of
several copies of which results in the initial string. This follows from the fact that it is not only a
prefix of the initial string but also a prefix of the suffix/prefix it "augments". But that means that
now the suffix/prefix contains at least two copies of the "augmenting" string as a prefix (since it's
also a prefix of the initial string) and so on. Of course if the suffix/prefix under question is long
enough. In other words, the length of a successful "candidate" must divide with no remainder the
length of the initial string.

So all we have to do in order to solve the given problem is to iterate through all proper
suffixes/prefixes of the initial string in descending order. This is just what the "failure function"
is designed for. We iterate until we find an "augmenting" string of the desired length (its length
divides with no remainder the length of the initial string) or get to the empty string, in which case
the "augmenting" string that meets the above requirement will be the initial string itself.

Rabin-Karp and Knuth-Morris-Pratt at topcoder

In the problem types mentioned above, we are dealing with relatively "pure" forms of RK, KMP
and the techniques that are the essence of these algorithms. While you're unlikely to encounter
these pure situations in a topcoder SRM, the drive towards ever more challenging topcoder
problems can lead to situations where these algorithms appear as one level in complex,
"multilayer" problems. The specific input size limitations favor this trend, since we will not be
presented as input with multimillion character strings, but rather with a "generator", which may
34

be by itself algorithmic in nature. A good example is "InfiniteSoup," Division 1 - Level Three,


SRM 286.

Geeksforgeeks.org

Searching for Patterns | Set 2 (KMP Algorithm)

Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[])
that prints all occurrences of pat[] in txt[]. You may assume that n > m.

Examples:
1) Input:

txt[] = "THIS IS A TEST TEXT"


pat[] = "TEST"

Output:

Pattern found at index 10

2) Input:

txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"

Output:

Pattern found at index 0


Pattern found at index 9
Pattern found at index 13

Pattern searching is an important problem in computer science. When we do search for a string
in notepad/word file or browser or database, pattern searching algorithms are used to show the
search results.

We have discussed Naive pattern searching algorithm in the previous post. The worst case
complexity of Naive algorithm is O(m(n-m+1)). Time complexity of KMP algorithm is O(n) in
worst case.

KMP (Knuth Morris Pratt) Pattern Searching


The Naive pattern searching algorithm doesnt work well in cases where we see many matching
characters followed by a mismatching character. Following are some examples.

txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"

txt[] = "ABABABCABABABCABABABC"
35

pat[] = "ABABAC" (not a worst case, but a bad case for Naive)

The KMP matching algorithm uses degenerating property (pattern having same sub-patterns
appearing more than once in the pattern) of the pattern and improves the worst case complexity
to O(n). The basic idea behind KMPs algorithm is: whenever we detect a mismatch (after some
matches), we already know some of the characters in the text (since they matched the pattern
characters prior to the mismatch). We take advantage of this information to avoid matching the
characters that we know will anyway match.
KMP algorithm does some preprocessing over the pattern pat[] and constructs an auxiliary array
lps[] of size m (same as size of pattern). Here name lps indicates longest proper prefix which
is also suffix.. For each sub-pattern pat[0i] where i = 0 to m-1, lps[i] stores length of the
maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].

lps[i] = the longest proper prefix of pat[0..i]


which is also a suffix of pat[0..i].

Examples:
For the pattern AABAACAABAA, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
For the pattern ABCDE, lps[] is [0, 0, 0, 0, 0]
For the pattern AAAAA, lps[] is [0, 1, 2, 3, 4]
For the pattern AAABAAA, lps[] is [0, 1, 2, 0, 1, 2, 3]
For the pattern AAACAAAAAC, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]

Searching Algorithm:
Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the
next sliding position. Let us see how we do that. When we compare pat[j] with txt[i] and see a
mismatch, we know that characters pat[0..j-1] match with txt[i-j+1...i-1], and we also know that
lps[j-1] characters of pat[0...j-1] are both proper prefix and suffix which means we do not need to
match these lps[j-1] characters with txt[i-j...i-1] because we know that these characters will
anyway match. See KMPSearch() in the below code for details.

Preprocessing Algorithm:
In the preprocessing part, we calculate values in lps[]. To do that, we keep track of the length of
the longest prefix suffix value (we use len variable for this purpose) for the previous index. We
initialize lps[0] and len as 0. If pat[len] and pat[i] match, we increment len by 1 and assign the
incremented value to lps[i]. If pat[i] and pat[len] do not match and len is not 0, we update len to
lps[len-1]. See computeLPSArray () in the below code for details.

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

void computeLPSArray(char *pat, int M, int *lps);

void KMPSearch(char *pat, char *txt)


{
int M = strlen(pat);
int N = strlen(txt);
36

// create lps[] that will hold the longest prefix suffix values for
pattern
int *lps = (int *)malloc(sizeof(int)*M);
int j = 0; // index for pat[]

// Preprocess the pattern (calculate lps[] array)


computeLPSArray(pat, M, lps);

int i = 0; // index for txt[]


while(i < N)
{
if(pat[j] == txt[i])
{
j++;
i++;
}

if (j == M)
{
printf("Found pattern at index %d \n", i-j);
j = lps[j-1];
}

// mismatch after j matches


else if(pat[j] != txt[i])
{
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if(j != 0)
j = lps[j-1];
else
i = i+1;
}
}
free(lps); // to avoid memory leak
}

void computeLPSArray(char *pat, int M, int *lps)


{
int len = 0; // lenght of the previous longest prefix suffix
int i;

lps[0] = 0; // lps[0] is always 0


i = 1;

// the loop calculates lps[i] for i = 1 to M-1


while(i < M)
{
if(pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
37

else // (pat[i] != pat[len])


{
if( len != 0 )
{
// This is tricky. Consider the example AAACAAAA and i = 7.
len = lps[len-1];

// Also, note that we do not increment i here


}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}

// Driver program to test above function


int main()
{
char *txt = "ABABDABACDABABCABAB";
char *pat = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}

Rabin-Karp Algorithm

geeksforgeeks.org

Searching for Patterns | Set 3 (Rabin-Karp Algorithm)

Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[])
that prints all occurrences of pat[] in txt[]. You may assume that n > m.

Examples:
1) Input:

txt[] = "THIS IS A TEST TEXT"


pat[] = "TEST"

Output:

Pattern found at index 10

2) Input:
38

txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"

Output:

Pattern found at index 0


Pattern found at index 9
Pattern found at index 13

The Naive String Matching algorithm slides the pattern one by one. After each slide, it one by
one checks characters at the current shift and if all characters match then prints the match.
Like the Naive Algorithm, Rabin-Karp algorithm also slides the pattern one by one. But unlike
the Naive algorithm, Rabin Karp algorithm matches the hash value of the pattern with the hash
value of current substring of text, and if the hash values match then only it starts matching
individual characters. So Rabin Karp algorithm needs to calculate hash values for following
strings.

1) Pattern itself.
2) All the substrings of text of length m.

Since we need to efficiently calculate hash values for all the substrings of size m of text, we must
have a hash function which has following property.
Hash at the next shift must be efficiently computable from the current hash value and next
character in text or we can say hash(txt[s+1 .. s+m]) must be efficiently computable from
hash(txt[s .. s+m-1]) and txt[s+m] i.e., hash(txt[s+1 .. s+m])= rehash(txt[s+m], hash(txt[s ..
s+m-1]) and rehash must be O(1) operation.

The hash function suggested by Rabin and Karp calculates an integer value. The integer value for
a string is numeric value of a string. For example, if all possible characters are from 1 to 10, the
numeric value of 122 will be 122. The number of possible characters is higher than 10 (256 in
general) and pattern length can be large. So the numeric values cannot be practically stored as an
integer. Therefore, the numeric value is calculated using modular arithmetic to make sure that the
hash values can be stored in an integer variable (can fit in memory words). To do rehashing, we
need to take off the most significant digit and add the new least significant digit for in hash
value. Rehashing is done using the following formula.

hash( txt[s+1 .. s+m] ) = d ( hash( txt[s .. s+m-1]) txt[s]*h ) + txt[s + m] ) mod q

hash( txt[s .. s+m-1] ) : Hash value at shift s.


hash( txt[s+1 .. s+m] ) : Hash value at next shift (or shift s+1)
d: Number of characters in the alphabet
q: A prime number
h: d^(m-1)

/* Following program is a C implementation of the Rabin Karp Algorithm


given in the CLRS book */

#include<stdio.h>
39

#include<string.h>

// d is the number of characters in input alphabet


#define d 256

/* pat -> pattern


txt -> text
q -> A prime number
*/
void search(char *pat, char *txt, int q)
{
int M = strlen(pat);
int N = strlen(txt);
int i, j;
int p = 0; // hash value for pattern
int t = 0; // hash value for txt
int h = 1;

// The value of h would be "pow(d, M-1)%q"


for (i = 0; i < M-1; i++)
h = (h*d)%q;

// Calculate the hash value of pattern and first window of text


for (i = 0; i < M; i++)
{
p = (d*p + pat[i])%q;
t = (d*t + txt[i])%q;
}

// Slide the pattern over text one by one


for (i = 0; i <= N - M; i++)
{

// Chaeck the hash values of current window of text and pattern


// If the hash values match then only check for characters on by one
if ( p == t )
{
/* Check for characters one by one */
for (j = 0; j < M; j++)
{
if (txt[i+j] != pat[j])
break;
}
if (j == M) // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
{
printf("Pattern found at index %d \n", i);
}
}

// Calulate hash value for next window of text: Remove leading digit,
// add trailing digit
if ( i < N-M )
{
t = (d*(t - txt[i]*h) + txt[i+M])%q;
40

// We might get negative value of t, converting it to positive


if(t < 0)
t = (t + q);
}
}
}

/* Driver program to test above function */


int main()
{
char *txt = "GEEKS FOR GEEKS";
char *pat = "GEEK";
int q = 101; // A prime number
search(pat, txt, q);
getchar();
return 0;
}

The average and best case running time of the Rabin-Karp algorithm is O(n+m), but its worst-
case time is O(nm). Worst case of Rabin-Karp algorithm occurs when all characters of pattern
and text are same as the hash values of all the substrings of txt[] match with hash value of pat[].
For example pat[] = AAA and txt[] = AAAAAAA.

Tries

Geeksforgeeks.org

Trie | (Insert and Search)

Trie is an efficient information retrieval data structure. Using trie, search complexities can be
brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST
will need time proportional to M * log N, where M is maximum string length and N is number of
keys in tree. Using trie, we can search the key in O(M) time. However the penalty is on trie
storage requirements.

Every node of trie consists of multiple branches. Each branch represents a possible character of
keys. We need to mark the last node of every key as leaf node. A trie node field value will be
used to distinguish the node as leaf node (there are other uses of the value field). A simple
structure to represent nodes of English alphabet can be as following,

struct trie_node

int value; /* Used to mark leaf nodes */

trie_node_t *children[ALPHABET_SIZE];

};
41

Inserting a key into trie is simple approach. Every character of input key is inserted as an
individual trie node. Note that the children is an array of pointers to next level trie nodes. The
key character acts as an index into the array children. If the input key is new or an extension of
existing key, we need to construct non-existing nodes of the key, and mark leaf node. If the input
key is prefix of existing key in trie, we simply mark the last node of key as leaf. The key length
determines trie depth.

Searching for a key is similar to insert operation, however we only compare the characters and
move down. The search can terminate due to end of string or lack of key in trie. In
the former case, if the value field of last node is non-zero then the key exists in trie. In the second
case, the search terminates without examining all the characters of key, since the key is not
present in trie.

The following picture explains construction of trie using keys given in the example below,

root
/ \ \
t a b
| | |
h n y
| | \ |
e s y e
/ | |
i r w
| | |
r e e
|
r

In the picture, every character is of type trie_node_t. For example, the root is of type trie_node_t,
and its children a, b and t are filled, all other nodes of root will be NULL. Similarly, a at the
next level is having only one child (n), all other children are NULL. The leaf nodes are in blue.

Insert and search costs O(key_length), however the memory requirements of trie is
O(ALPHABET_SIZE * key_length * N) where N is number of keys in trie. There are efficient
representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize memory
requirements of trie.

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#define ARRAY_SIZE(a) sizeof(a)/sizeof(a[0])

// Alphabet size (# of symbols)


42

#define ALPHABET_SIZE (26)

// Converts key current character into index

// use only 'a' through 'z' and lower case

#define CHAR_TO_INDEX(c) ((int)c - (int)'a')

// trie node

typedef struct trie_node trie_node_t;

struct trie_node

int value;

trie_node_t *children[ALPHABET_SIZE];

};

// trie ADT

typedef struct trie trie_t;

struct trie

trie_node_t *root;

int count;

};

// Returns new trie node (initialized to NULLs)

trie_node_t *getNode(void)

trie_node_t *pNode = NULL;

pNode = (trie_node_t *)malloc(sizeof(trie_node_t));

if( pNode )

int i;
43

pNode->value = 0;

for(i = 0; i < ALPHABET_SIZE; i++)

pNode->children[i] = NULL;

return pNode;

// Initializes trie (root is dummy node)

void initialize(trie_t *pTrie)

pTrie->root = getNode();

pTrie->count = 0;

// If not present, inserts key into trie

// If the key is prefix of trie node, just marks leaf node

void insert(trie_t *pTrie, char key[])

int level;

int length = strlen(key);

int index;

trie_node_t *pCrawl;

pTrie->count++;

pCrawl = pTrie->root;
44

for( level = 0; level < length; level++ )

index = CHAR_TO_INDEX(key[level]);

if( !pCrawl->children[index] )

pCrawl->children[index] = getNode();

pCrawl = pCrawl->children[index];

// mark last node as leaf

pCrawl->value = pTrie->count;

// Returns non zero, if key presents in trie

int search(trie_t *pTrie, char key[])

int level;

int length = strlen(key);

int index;

trie_node_t *pCrawl;

pCrawl = pTrie->root;

for( level = 0; level < length; level++ )

index = CHAR_TO_INDEX(key[level]);

if( !pCrawl->children[index] )

{
45

return 0;

pCrawl = pCrawl->children[index];

return (0 != pCrawl && pCrawl->value);

// Driver

int main()

// Input keys (use only 'a' through 'z' and lower case)

char keys[][8] = {"the", "a", "there", "answer", "any", "by", "bye",


"their"};

trie_t trie;

char output[][32] = {"Not present in trie", "Present in trie"};

initialize(&trie);

// Construct trie

for(int i = 0; i < ARRAY_SIZE(keys); i++)

insert(&trie, keys[i]);

// Search for different keys

printf("%s --- %s\n", "the", output[search(&trie, "the")] );

printf("%s --- %s\n", "these", output[search(&trie, "these")] );

printf("%s --- %s\n", "their", output[search(&trie, "their")] );


46

printf("%s --- %s\n", "thaw", output[search(&trie, "thaw")] );

return 0;

Topcoder.com

By luison9999
topcoder member

Introduction

There are many algorithms and data structures to index and search strings inside a text, some of
them are included in the standard libraries, but not all of them; the trie data structure is a good
example of one that isn't.

Let word be a single string and let dictionary be a large set of words. If we have a dictionary,
and we need to know if a single word is inside of the dictionary the tries are a data structure that
can help us. But you may be asking yourself, "Why use tries if set <string> and hash tables can
do the same?" There are two main reasons:

The tries can insert and find strings in O(L) time (where L represent the length of a single word).
This is much faster than set , but is it a bit faster than a hash table.
The set <string> and the hash tables can only find in a dictionary words that match exactly with
the single word that we are finding; the trie allow us to find words that have a single character
different, a prefix in common, a character missing, etc.

The tries can be useful in TopCoder problems, but also have a great amount of applications in
software engineering. For example, consider a web browser. Do you know how the web browser
can auto complete your text or show you many possibilities of the text that you could be writing?
Yes, with the trie you can do it very fast. Do you know how an orthographic corrector can check
that every word that you type is in a dictionary? Again a trie. You can also use a trie for
suggested corrections of the words that are present in the text but not in the dictionary.

What is a Tree?

You may read about how wonderful the tries are, but maybe you don't know yet what the tries
are and why the tries have this name. The word trie is an infix of the word "retrieval" because
the trie can find a single word in a dictionary with only a prefix of the word. The main idea of the
trie data structure consists of the following parts:

The trie is a tree where each vertex represents a single word or a prefix.
47

The root represents an empty string (""), the vertexes that are direct sons of the root represent
prefixes of length 1, the vertexes that are 2 edges of distance from the root represent prefixes
of length 2, the vertexes that are 3 edges of distance from the root represent prefixes of length
3 and so on. In other words, a vertex that are k edges of distance of the root have an associated
prefix of length k.
Let v and w be two vertexes of the trie, and assume that v is a direct father of w, then v must
have an associated prefix of w.

The next figure shows a trie with the words "tree", "trie", "algo", "assoc", "all", and "also."

Note that every vertex of the tree does not store entire prefixes or entire words. The idea is that
the program should remember the word that represents each vertex while lower in the tree.

Coding a Trie

The tries can be implemented in many ways, some of them can be used to find a set of words in
the dictionary where every word can be a little different than the target word, and other
implementations of the tries can provide us with only words that match exactly with the target
word. The implementation of the trie that will be exposed here will consist only of finding words
that match exactly and counting the words that have some prefix. This implementation will be
pseudo code because different coders can use different programming languages.

We will code these 4 functions:

addWord. This function will add a single string word to the dictionary.
countPreffixes. This function will count the number of words in the dictionary that have a
string prefix as prefix.
countWords. This function will count the number of words in the dictionary that match exactly
with a given string word.
48

Our trie will only support letters of the English alphabet.

We need to also code a structure with some fields that indicate the values stored in each vertex.
As we want to know the number of words that match with a given string, every vertex should
have a field to indicate that this vertex represents a complete word or only a prefix (for
simplicity, a complete word is considered also a prefix) and how many words in the dictionary
are represented by that prefix (there can be repeated words in the dictionary). This task can be
done with only one integer field words.

Because we want to know the number of words that have like prefix a given string, we need
another integer field prefixes that indicates how many words have the prefix of the vertex. Also,
each vertex must have references to all his possible sons (26 references). Knowing all these
details, our structure should have the following members:

structure Trie
integer words;
integer prefixes;
reference edges[26];

And we also need the following functions:

initialize(vertex)
addWord(vertex, word);
integer countPrefixes(vertex, prefix);
integer countWords(vertex, word);

First of all, we have to initialize the vertexes with the following function:

initialize(vertex)
vertex.words=0
vertex.prefixes=0
for i=0 to 26
edges[i]=NoEdge

The addWord function consists of two parameters, the vertex of the insertion and the word that
will be added. The idea is that when a string word is added to a vertex vertex, we will
add word to the corresponding branch of vertexcutting the leftmost character of word. If the
needed branch does not exist, we will have to create it. All the TopCoder languages can simulate
the process of cutting a character in constant time instead of creating a copy of the original string
or moving the other characters.

addWord(vertex, word)
if isEmpty(word)
vertex.words=vertex.words+1
else
vertex.prefixes=vertex.prefixes+1
k=firstCharacter(word)
if(notExists(edges[k]))
edges[k]=createEdge()
49

initialize(edges[k])
cutLeftmostCharacter(word)
addWord(edges[k], word)

The functions countWords and countPrefixes are very similar. If we are finding an empty string
we only have to return the number of words/prefixes associated with the vertex. If we are finding
a non-empty string, we should to find in the corresponding branch of the tree, but if the branch
does not exist, we have to return 0.

countWords(vertex, word)
k=firstCharacter(word)
if isEmpty(word)
return vertex.words
else if notExists(edges[k])
return 0
else
cutLeftmostCharacter(word)
return countWords(edges[k], word);
countPrefixes(vertex, prefix)
k=firstCharacter(prefix)
if isEmpty(word)
return vertex.prefixes
else if notExists(edges[k])
return 0
else
cutLeftmostCharacter(prefix)
return countWords(edges[k], prefix)
Complexity Analysis

In the introduction you may read that the complexity of finding and inserting a trie is linear, but
we have not done the analysis yet. In the insertion and finding notice that lowering a single level
in the tree is done in constant time, and every time that the program lowers a single level in the
tree, a single character is cut from the string; we can conclude that every function lowers L levels
on the tree and every time that the function lowers a level on the tree, it is done in constant time,
then the insertion and finding of a word in a trie can be done in O(L) time. The memory used in
the tries depends on the methods to store the edges and how many words have prefixes in
common.

Other Kinds of Tries

We used the tries to store words with lowercase letters, but the tries can be used to store many
other things. We can use bits or bytes instead of lowercase letters and every data type can be
stored in the tree, not only strings. Let flow your imagination using tries! For example, suppose
that you want to find a word in a dictionary but a single letter was deleted from the word. You
can modify the countWords function:

countWords(vertex, word, missingLetters)


k=firstCharacter(word)
if isEmpty(word)
return vertex.word
else if notExists(edges[k]) and missingLetters=0
50

return 0
else if notExists(edges[k])
cutLeftmostCharacter(word)
return countWords(vertex, word, missingLetters-1)
Here we cut a character but we don't go lower in the tree
else
We are adding the two possibilities: the first
character has been deleted plus the first character is present
r=countWords(vertex, word, missingLetters-1)
cutLeftmostCharacter(word)
r=r+countWords(edges[k], word, missingLetters)
return r

The complexity of this function may be larger than the original, but it is faster than checking all
the subsets of characters of a word.

Depth First Traversal of a graph

Depth First Traversal of a graph (Search)

Geeksforgeeks.org

for a graph is similar to Depth First Traversal of a tree. The only catch here is, unlike trees,
graphs may contain cycles, so we may come to the same node again. To avoid processing a node
more than once, we use a boolean visited array.
For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0,
we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we dont mark visited
vertices, then 2 will be processed again and it will become a non-terminating process. Depth
First Traversal of the following graph is 2, 0, 1, 3

See this post for all applications of Depth First Traversal.


Following is C++ implementation of simple Depth First Traversal. The implementation uses
adjacency list representation of graphs. STLs list container is used to store lists of adjacent
nodes.
51

#include<iostream>
#include <list>

using namespace std;

// Graph class represents a directed graph using adjacency list


representation
class Graph
{
int V; // No. of vertices
list<int> *adj; // Pointer to an array containing adjacency lists
void DFSUtil(int v, bool visited[]); // A function used by DFS
public:
Graph(int V); // Constructor
void addEdge(int v, int w); // function to add an edge to graph
void DFS(int v); // DFS traversal of the vertices reachable from v
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to vs list.
}

void Graph::DFSUtil(int v, bool visited[])


{
// Mark the current node as visited and print it
visited[v] = true;
cout << v << " ";

// Recur for all the vertices adjacent to this vertex


list<int>::iterator i;
for(i = adj[v].begin(); i != adj[v].end(); ++i)
if(!visited[*i])
DFSUtil(*i, visited);
}

// DFS traversal of the vertices reachable from v. It uses recursive


DFSUtil()
void Graph::DFS(int v)
{
// Mark all the vertices as not visited
bool *visited = new bool[V];
for(int i = 0; i < V; i++)
visited[i] = false;

// Call the recursive helper function to print DFS traversal


DFSUtil(v, visited);
}
52

int main()
{
// Create a graph given in the above diagram
Graph g(4);
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);

cout << "Following is Depth First Traversal (starting from vertex 2) \n";
g.DFS(2);

return 0;
}

Output:

Following is Depth First Traversal (starting from vertex 2)


2 0 1 3

Note that the above code traverses only the vertices reachable from a given source vertex. All the
vertices may not be reachable from a given vertex (example Disconnected graph). To do
complete DFS traversal of such graphs, we must call DFSUtil() for every vertex. Also, before
calling DFSUtil(), we should check if it is already printed by some other call of DFSUtil().
Following implementation does the complete graph traversal even if the nodes are unreachable.
The differences from the above code are highlighted in the below code.

#include<iostream>
#include <list>
using namespace std;

class Graph
{
int V; // No. of vertices
list<int> *adj; // Pointer to an array containing adjacency lists
void DFSUtil(int v, bool visited[]); // A function used by DFS
public:
Graph(int V); // Constructor
void addEdge(int v, int w); // function to add an edge to graph
void DFS(); // prints DFS traversal of the complete graph
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to vs list.
53

void Graph::DFSUtil(int v, bool visited[])


{
// Mark the current node as visited and print it
visited[v] = true;
cout << v << " ";

// Recur for all the vertices adjacent to this vertex


list<int>::iterator i;
for(i = adj[v].begin(); i != adj[v].end(); ++i)
if(!visited[*i])
DFSUtil(*i, visited);
}

// The function to do DFS traversal. It uses recursive DFSUtil()


void Graph::DFS()
{
// Mark all the vertices as not visited
bool *visited = new bool[V];
for(int i = 0; i < V; i++)
visited[i] = false;

// Call the recursive helper function to print DFS traversal


// starting from all vertices one by one
for(int i = 0; i < V; i++)
if(visited[i] == false)
DFSUtil(i, visited);
}

int main()
{
// Create a graph given in the above diagram
Graph g(4);
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);

cout << "Following is Depth First Traversal (starting from vertex 2) \n";
g.DFS();

return 0;
}

Time Complexity: O(V+E) where V is number of vertices in the graph and E is number of edges
in the graph.

Breadth First Traversal of a graph


54

Geeksforgeeks.org

for a graph is similar to Breadth First Traversal of a tree (See mthod 2 of this post). The only
catch here is, unlike trees, graphs may contain cycles, so we may come to the same node again.
To avoid processing a node more than once, we use a boolean visited array. For simplicity, it is
assumed that all vertices are reachable from the starting vertex.
For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0,
we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we dont mark visited
vertices, then 2 will be processed again and it will become a non-terminating process. Breadth
First Traversal of the following graph is 2, 0, 3, 1.

Following is C++ implementation of simple Breadth First Traversal from a given source. The
implementation uses adjacency list representation of graphs. STLs list container is used to store
lists of adjacent nodes and queue of nodes needed for BFS traversal.

// Program to print BFS traversal from a given source vertex. BFS(int s)


// traverses vertices reachable from s.
#include<iostream>
#include <list>

using namespace std;

// This class represents a directed graph using adjacency list representation


class Graph
{
int V; // No. of vertices
list<int> *adj; // Pointer to an array containing adjacency lists
public:
Graph(int V); // Constructor
void addEdge(int v, int w); // function to add an edge to graph
void BFS(int s); // prints BFS traversal from a given source s
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}
55

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to vs list.
}

void Graph::BFS(int s)
{
// Mark all the vertices as not visited
bool *visited = new bool[V];
for(int i = 0; i < V; i++)
visited[i] = false;

// Create a queue for BFS


list<int> queue;

// Mark the current node as visited and enqueue it


visited[s] = true;
queue.push_back(s);

// 'i' will be used to get all adjacent vertices of a vertex


list<int>::iterator i;

while(!queue.empty())
{
// Dequeue a vertex from queue and print it
s = queue.front();
cout << s << " ";
queue.pop_front();

// Get all adjacent vertices of the dequeued vertex s


// If a adjacent has not been visited, then mark it visited
// and enqueue it
for(i = adj[s].begin(); i != adj[s].end(); ++i)
{
if(!visited[*i])
{
visited[*i] = true;
queue.push_back(*i);
}
}
}
}

// Driver program to test methods of graph class


int main()
{
// Create a graph given in the above diagram
Graph g(4);
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);
56

cout << "Following is Breadth First Traversal (starting from vertex 2)


\n";
g.BFS(2);

return 0;
}

Output:

Following is Breadth First Traversal (starting from vertex 2)


2 0 3 1

Note that the above code traverses only the vertices reachable from a given source vertex. All the
vertices may not be reachable from a given vertex (example Disconnected graph). To print all the
vertices, we can modify the BFS function to do traversal starting from all nodes one by one (Like
the DFS modified version) .

Time Complexity: O(V+E) where V is number of vertices in the graph and E is number of edges
in the graph.

Also see Depth First Traversal

Dijkstra's Algorithm

Topcoder.com

Introduction to graphs and their data


structures: Section 3
57

By gladius
topcoder member

Read Section 2

Finding the best path through a graph


An extremely common problem on TopCoder is to find the shortest path from one position to
another. There are a few different ways for going about this, each of which has different uses.
We will be discussing two different methods, Dijkstra using a Heap and the Floyd Warshall
method.

Dijkstra (Heap method)


Dijkstra using a Heap is one of the most powerful techniques to add to your TopCoder arsenal. It
essentially allows you to write a Breadth First search, and instead of using a Queue you use a
Priority Queue and define a sorting function on the nodes such that the node with the lowest cost
is at the top of the Priority Queue. This allows us to find the best path through a graph in O(m *
log(n)) time where n is the number of vertices and m is the number of edges in the graph.

Sidenote:
If you haven't seen big-O notation before then I recommend reading this.

First however, an introduction to the Priority Queue/Heap structure is in order. The Heap is a
fundamental data structure and is extremely useful for a variety of tasks. The property we are
most interested in though is that it is a semi-ordered data structure. What I mean by semi-ordered
is that we define some ordering on elements that are inserted into the structure, then the structure
keeps the smallest (or largest) element at the top. The Heap has the very nice property that
inserting an element or removing the top element takes O(log n) time, where n is the number of
elements in the heap. Simply getting the top value is an O(1) operation as well, so the Heap is
perfectly suited for our needs.

The fundamental operations on a Heap are:

1. Add - Inserts an element into the heap, putting the element into the correct ordered location.
2. Pop - Pops the top element from the heap, the top element will either be the highest or lowest
element, depending on implementation.
3. Top - Returns the top element on the heap.
4. Empty - Tests if the heap is empty or not.

Pretty close to the Queue or Stack, so it's only natural that we apply the same type of searching
principle that we have used before, except substitute the Heap in place of the Queue or Stack.
Our basic search routine (remember this one well!) will look something like this:

void dijkstra(node start) {


priorityQueue s;
s.add(start);
while (s.empty() == false) {
top = s.top();
58

s.pop();
mark top as visited;

check for termination condition (have we reached the target node?)


add all of top's unvisited neighbors to the stack.

}
}

Unfortunately, not all of the default language libraries used in TopCoder have an easy to use
priority queue structure.

C++ users are lucky to have an actual priority_queue<> structure in the STL, which is used as
follows:

#include
using namespace std;
priority_queue pq;
1. Add - void pq.push(type)
2. Pop - void pq.pop()
3. Top - type pq.top()
4. Empty - bool pq.empty()

However, you have to be careful as the C++ priority_queue<> returns the *highest* element
first, not the lowest. This has been the cause of many solutions that should be O(m * log(n))
instead ballooning in complexity, or just not working.

To define the ordering on a type, there are a few different methods. The way I find most useful is
the following though:

Define your structure:


struct node {
int cost;
int at;
};

And we want to order by cost, so we define the less than operator for this structure as follows:

bool operator<(const node &leftNode, const node &rightNode) {


if (leftNode.cost != rightNode.cost) return leftNode.cost < rightNode.cost;
if (leftNode.at != rightNode.at) return leftNode.at < rightNode.at;
return false;
}

Even though we don't need to order by the 'at' member of the structure, we still do otherwise
elements with the same cost but different 'at' values may be coalesced into one value. The return
false at the end is to ensure that if two duplicate elements are compared the less than operator
will return false.
59

Java users unfortunately have to do a bit of makeshift work, as there is not a direct
implementation of the Heap structure. We can approximate it with the TreeSet structure which
will do full ordering of our dataset. It is less space efficient, but will serve our purposes fine.

import java.util.*;
TreeSet pq = new TreeSet();

1. Add - boolean add(Object o)


2. Pop - boolean remove(Object o)

In this case, we can remove anything we want, but pop should remove the first element, so we
will always call it like

this: pq.remove(pq.first());
3. Top - Object first()
4. Empty - int size()

To define the ordering we do something quite similar to what we use in C++:

class Node implements Comparable {


public int cost, at;

public int CompareTo(Object o) {


Node right = (Node)o;
if (cost < right.cost) return -1;
if (cost > right.cost) return 1;
if (at < right.at) return -1;
if (at > right.at) return 1;
return 0;
}
}

C# users also have the same problem, so they need to approximate as well, unfortunately the
closest thing to what we want that is currently available is the SortedList class, and it does not
have the necessary speed (insertions and deletions are O(n) instead of O(log n)). Unfortunately
there is no suitable built-in class for implementing heap based algorithms in C#, as the
HashTable is not suitable either.

Getting back to the actual algorithm now, the beautiful part is that it applies as well to graphs
with weighted edges as the Breadth First search does to graphs with un-weighted edges. So we
can now solve much more difficult problems (and more common on TopCoder) than is possible
with just the Breadth First search.

There are some extremely nice properties as well, since we are picking the node with the least
total cost so far to explore first, the first time we visit a node is the best path to that node (unless
there are negative weight edges in the graph). So we only have to visit each node once, and the
really nice part is if we ever hit the target node, we know that we are done.

For the example here we will be using KiloManX, from SRM 181, the Div 1 1000. This is an
excellent example of the application of the Heap Dijkstra problem to what appears to be a
60

Dynamic Programming question initially. In this problem the edge weight between nodes
changes based on what weapons we have picked up. So in our node we at least need to keep
track of what weapons we have picked up, and the current amount of shots we have taken (which
will be our cost). The really nice part is that the weapons that we have picked up corresponds to
the bosses that we have defeated as well, so we can use that as a basis for our visited structure. If
we represent each weapon as a bit in an integer, we will have to store a maximum of 32,768
values (2^15, as there is a maximum of 15 weapons). So we can make our visited array simply
be an array of 32,768 booleans. Defining the ordering for our nodes is very easy in this case, we
want to explore nodes that have lower amounts of shots taken first, so given this information we
can define our basic structure to be as follows:

boolean visited[32768];

class node {
int weapons;
int shots;
// Define a comparator that puts nodes with less shots on top appropriate to
your language
};

Now we will apply the familiar structure to solve these types of problems.

int leastShots(String[] damageChart, int[] bossHealth) {


priorityQueue pq;

pq.push(node(0, 0));

while (pq.empty() == false) {


node top = pq.top();
pq.pop();

// Make sure we don't visit the same configuration twice


if (visited[top.weapons]) continue;
visited[top.weapons] = true;

// A quick trick to check if we have all the weapons, meaning we defeated


all the bosses.
// We use the fact that (2^numWeapons - 1) will have all the numWeapons
bits set to 1.
if (top.weapons == (1 << numWeapons) - 1)
return top.shots;

for (int i = 0; i < damageChart.length; i++) {


// Check if we've already visited this boss, then don't bother trying him
again
if ((top.weapons >> i) & 1) continue;

// Now figure out what the best amount of time that we can destroy this
boss is, given the weapons we have.
// We initialize this value to the boss's health, as that is our default
(with our KiloBuster).
int best = bossHealth[i];
for (int j = 0; j < damageChart.length; j++) {
if (i == j) continue;
61

if (((top.weapons >> j) & 1) && damageChart[j][i] != '0') {


// We have this weapon, so try using it to defeat this boss
int shotsNeeded = bossHealth[i] / (damageChart[j][i] - '0');
if (bossHealth[i] % (damageChart[j][i] - '0') != 0) shotsNeeded++;
best = min(best, shotsNeeded);
}
}

// Add the new node to be searched, showing that we defeated boss i, and
we used 'best' shots to defeat him.
pq.add(node(top.weapons | (1 << i), top.shots + best));
}
}
}

There are a huge number of these types of problems on TopCoder; here are some excellent ones
to try out:

SRM 150 - Div 1 1000 - RoboCourier


SRM 194 - Div 1 1000 - IslandFerries
SRM 198 - Div 1 500 - DungeonEscape
TCCC '04 Round 4 - 500 - Bombman

Floyd-Warshall
Floyd-Warshall is a very powerful technique when the graph is represented by an adjacency
matrix. It runs in O(n^3) time, where n is the number of vertices in the graph. However, in
comparison to Dijkstra, which only gives us the shortest path from one source to the targets,
Floyd-Warshall gives us the shortest paths from all source to all target nodes. There are other
uses for Floyd-Warshall as well; it can be used to find connectivity in a graph (known as the
Transitive Closure of a graph).

First, however we will discuss the Floyd Warshall All-Pairs Shortest Path algorithm, which is the
most similar to Dijkstra. After running the algorithm on the adjacency matrix the element at
adj[i][j] represents the length of the shortest path from node i to node j. The pseudo-code for the
algorithm is given below:

for (k = 1 to n)
for (i = 1 to n)
for (j = 1 to n)
adj[i][j] = min(adj[i][j], adj[i][k] + adj[k][j]);

As you can see, this is extremely simple to remember and type. If the graph is small (less than
100 nodes) then this technique can be used to great effect for a quick submission.

An excellent problem to test this out on is the Division 2 1000 from SRM 184, TeamBuilder.

zobayer.blogspot.in
62

Dijkstra's algorithm is a single source shortest path (sssp) algorithm. Like BFS, this famous
graph searching algorithm is widely used in programming and problem solving, generally used to
determine shortest tour in a weighted graph.

This algorithm is almost similar to standard BFS, but instead of using a Queue data structure, it
uses a heap like data structure or a priority queue to maintain the weight order of nodes. Here is
the algorithm and pseudo-code. You can also look into Introduction to Algorithms (by C.L.R.S).

Here is a short implementation of this algorithm in C++, I assumed that, all the edge-weights are
positive, and the max possible distance is less than 220 which is set as INF macro and the graph
supports MAX nodes starting from 1 to MAX-1, these macros needs to be set properly.

Input format:

N E // number of nodes and edges


E lines containing an edge (u, v, w) on each line
// edge(u, v, w) means weight of edge u-v is w
// u, v in range 1 to N
S // starting node

C++ code:

#include <cstdio>
#include <queue>
#include <vector>
using namespace std;

#define MAX 100001


#define INF (1<<20)
#define DEBUG if(0)
#define pii pair< int, int >
#define pb(x) push_back(x)

struct comp {
bool operator() (const pii &a, const pii &b) {
return a.second > b.second;
}
};

priority_queue< pii, vector< pii >, comp > Q;


vector< pii > G[MAX];
int D[MAX];
63

bool F[MAX];

int main() {
int i, u, v, w, sz, nodes, edges, starting;
DEBUG freopen("in.txt","r", stdin);

// create graph
scanf("%d %d", &nodes, &edges);
for(i=0; i<edges; i++) {
scanf("%d %d %d", &u, &v, &w);
G[u].pb(pii(v, w));
G[v].pb(pii(u, w)); // for undirected
}
scanf("%d", &starting);

// initialize graph
for(i=1; i<=nodes; i++) D[i] = INF;
D[starting] = 0;
Q.push(pii(starting, 0));

// dijkstra
while(!Q.empty()) {
u = Q.top().first;
Q.pop();
if(F[u]) continue;
sz = G[u].size();
DEBUG printf("visiting from %d:", u);
for(i=0; i<sz; i++) {
v = G[u][i].first;
w = G[u][i].second;
if(!F[v] && D[u]+w < D[v]) {
DEBUG printf(" %d,", v);
D[v] = D[u] + w;
Q.push(pii(v, D[v]));
}
}
DEBUG printf("\n");
F[u] = 1; // done with u
}

// result
for(i=1; i<=nodes; i++) printf("Node %d, min weight = %d\n", i, D[i]);
return 0;
}

This is a straight forward implementation, according to the problem solving purpose, changes are
to be made here and there.

Binary Indexed Tree

Topcoder.com
64

Introduction
We often need some sort of data structure to make our algorithms faster. In this article we will
discuss the Binary Indexed Trees structure. According to Peter M. Fenwick, this structure was
first used for data compression. Now it is often used for storing frequencies and manipulating
cumulative frequency tables.

Let's define the following problem: We have n boxes. Possible queries are
1. add marble to box i
2. sum marbles from box k to box l

The naive solution has time complexity of O(1) for query 1 and O(n) for query 2. Suppose we
make m queries. The worst case (when all queries are 2) has time complexity O(n * m). Using
some data structure (i.e. RMQ) we can solve this problem with the worst case time complexity of
O(m log n). Another approach is to use Binary Indexed Tree data structure, also with the worst
time complexity O(m log n) -- but Binary Indexed Trees are much easier to code, and require
less memory space, than RMQ.

Notation
BIT - Binary Indexed Tree
MaxVal - maximum value which will have non-zero frequency
f[i] - frequency of value with index i, i = 1 .. MaxVal
c[i] - cumulative frequency for index i (f[1] + f[2] + ... + f[i])
tree[i] - sum of frequencies stored in BIT with index i (latter will be described what index
means); sometimes we will write tree frequency instead sum of frequencies stored in BIT
num - complement of integer num (integer where each binary digit is inverted: 0 -> 1; 1 -> 0 )
NOTE: Often we put f[0] = 0, c[0] = 0, tree[0] = 0, so sometimes I will just ignore index 0.

Basic idea
Each integer can be represented as sum of powers of two. In the same way, cumulative frequency
can be represented as sum of sets of subfrequencies. In our case, each set contains some
successive number of non-overlapping frequencies.

idx is some index of BIT. r is a position in idx of the last digit 1 (from left to right) in binary
notation. tree[idx] is sum of frequencies from index (idx - 2^r + 1) to index idx (look at the
Table 1.1 for clarification). We also write that idx is responsible for indexes from (idx - 2^r + 1)
to idx (note that responsibility is the key in our algorithm and is the way of manipulating the
tree).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
f 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0 2
c 1 1 3 4 5 8 8 12 14 19 21 23 26 27 27 29
65

tree 1 1 2 4 1 4 0 12 2 7 2 11 3 4 0 29

Table 1.1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
tree 1 1..2 3 1..4 5 5..6 7 1..8 9 9..10 11 9..12 13 13..14 15 1..16

Table 1.2 - table of responsibility


66

Image 1.3 - tree of responsibility for indexes (bar shows range of frequencies accumulated in top
element)
67

Image 1.4 - tree with tree frequencies

Suppose we are looking for cumulative frequency of index 13 (for the first 13 elements). In
binary notation, 13 is equal to 1101. Accordingly, we will calculate c[1101] = tree[1101] +
tree[1100] + tree[1000] (more about this later).

Isolating the last digit


NOTE: Instead of "the last non-zero digit," it will write only "the last digit."
68

There are times when we need to get just the last digit from a binary number, so we need an
efficient way to do that. Let num be the integer whose last digit we want to isolate. In binary
notation num can be represented as a1b, where a represents binary digits before the last digit
and b represents zeroes after the last digit.

Integer -num is equal to (a1b) + 1 = a0b + 1. b consists of all zeroes, so b consists of all
ones. Finally we have

-num = (a1b) + 1 = a0b + 1 = a0(0...0) + 1 = a0(1...1) + 1 = a1(0...0) = a1b.

Now, we can easily isolate the last digit, using bitwise operator AND (in C++, Java it is &) with
num and -num:

a1b
& a1b
--------------------
= (0...0)1(0...0)

Read cumulative frequency


If we want to read cumulative frequency for some integer idx, we add to sum tree[idx], substract
last bit of idx from itself (also we can write - remove the last digit; change the last digit to zero)
and repeat this while idx is greater than zero. We can use next function (written in C++)

int read(int idx){


int sum = 0;
while (idx > 0){
sum += tree[idx];
idx -= (idx & -idx);
}
return sum;
}

Example for idx = 13; sum = 0:

iteration idx position of the last digit idx & -idx sum

1 13 = 1101 0 1 (2 ^0) 3
2 12 = 1100 2 4 (2 ^2) 14
3 8 = 1000 3 8 (2 ^3) 26
4 0=0 --- --- ---
69

Image 1.5 - arrows show path from index to zero which we use to get sum (image shows example
for index 13)

So, our result is 26. The number of iterations in this function is number if bits in idx, which is at
most log MaxVal.

Time complexity: O(log MaxVal).


Code length: Up to ten lines.
70

Change frequency at some position and update tree


The concept is to update tree frequency at all indexes which are responsible for frequency whose
value we are changing. In reading cumulative frequency at some index, we were removing the
last bit and going on. In changing some frequency val in tree, we should increment value at the
current index (the starting index is always the one whose frequency is changed) for val, add the
last digit to index and go on while the index is less than or equal to MaxVal. Function in C++:

void update(int idx ,int val){


while (idx <= MaxVal){
tree[idx] += val;
idx += (idx & -idx);
}
}

Let's show example for idx = 5:

iteration idx position of the last digit idx & -idx

1 5 = 101 0 1 (2 ^0)
2 6 = 110 1 2 (2 ^1)
3 8 = 1000 3 8 (2 ^3)
4 16 = 10000 4 16 (2 ^4)
5 32 = 100000 --- ---
71

Image 1.6 - Updating tree (in brackets are tree frequencies before updating); arrows show path
while we update tree from index to MaxVal (image shows example for index 5)

Using algorithm from above or following arrows shown in Image 1.6 we can update BIT.

Time complexity: O(log MaxVal).


Code length: Up to ten lines.
72

Read the actual frequency at a position


We've described how we can read cumulative frequency for an index. It is obvious that we can
not read just tree[idx] to get the actual frequency for value at index idx. One approach is to have
one aditional array, in which we will seperately store frequencies for values. Both reading and
storing take O(1); memory space is linear. Sometimes it is more important to save memory, so
we will show how you can get actual frequency for some value without using aditional
structures.

Probably everyone can see that the actual frequency at a position idx can be calculated by calling
function read twice -- f[idx] = read(idx) - read(idx - 1) -- just by taking the difference of two
adjacent cumulative frequencies. This procedure always works in 2 * O(log n) time. If we write a
new function, we can get a bit faster algorithm, with smaller const.

If two paths from two indexes to root have the same part of path, then we can calculate the sum
until the paths meet, substract stored sums and we get a sum of frequencies between that two
indexes. It is pretty simple to calculate sum of frequencies between adjacent indexes, or read the
actual frequency at a given index.

Mark given index with x, its predecessor with y. We can represent (binary notation) y as a0b,
where b consists of all ones. Then, x will be a1b (note that b consists all zeros). Using our
algorithm for getting sum of some index, let it be x, in first iteration we remove the last digit, so
after the first iteration x will be a0b, mark a new value with z.

Repeat the same process with y. Using our function for reading sum we will remove the last
digits from the number (one by one). After several steps, our y will become (just to remind, it
was a0b) a0b, which is the same as z. Now, we can write our algorithm. Note that the only
exception is when x is equal to 0. Function in C++:

int readSingle(int idx){


int sum = tree[idx]; // sum will be decreased
if (idx > 0){ // special case
int z = idx - (idx & -idx); // make z first
idx--; // idx is no important any more, so instead y, you can use idx
while (idx != z){ // at some iteration idx (y) will become z
sum -= tree[idx];
// substruct tree frequency which is between y and "the same path"
idx -= (idx & -idx);
}
}
return sum;
}

Here's an example for getting the actual frequency for index 12:

First, we will calculate z = 12 - (12 & -12) = 8, sum = 11

iteration y position of the last digit y & -y sum


73

1 11 = 1011 0 1 (2 ^0) 9
2 10 = 1010 1 2 (2 ^1) 2
3 8 = 1000 --- --- ---

Image 1.7 - read actual frequency at some index in BIT


(image shows example for index 12)

Let's compare algorithm for reading actual frequency at some index when we twice use function
read and the algorithm written above. Note that for each odd number, the algorithm will work in
74

const time O(1), without any iteration. For almost every even number idx, it will work in c *
O(log idx), where c is strictly less than 1, compare to read(idx) - read(idx - 1), which will work
in c1 * O(log idx), where c1 is always greater than 1.

Time complexity: c * O(log MaxVal), where c is less than 1.


Code length: Up to fifteen lines.

Scaling the entire tree by a constant factor


Sometimes we want to scale our tree by some factor. With the procedures described above it is
very simple. If we want to scale by some factor c, then each index idx should be updated by -(c -
1) * readSingle(idx) / c (because f[idx] - (c - 1) * f[idx] / c = f[idx] / c). Simple function in C++:

void scale(int c){


for (int i = 1 ; i <= MaxVal ; i++)
update(-(c - 1) * readSingle(i) / c , i);
}

This can also be done more quickly. Factor is linear operation. Each tree frequency is a linear
composition of some frequencies. If we scale each frequency for some factor, we also scaled tree
frequency for the same factor. Instead of rewriting the procedure above, which has time
complexity O(MaxVal * log MaxVal), we can achieve time complexity of O(MaxVal):

void scale(int c){


for (int i = 1 ; i <= MaxVal ; i++)
tree[i] = tree[i] / c;
}
Time complexity: O(MaxVal).
Code length: Just a few lines.

Find index with given cumulative frequency


The naive and most simple solution for finding an index with a given cumultive frequency is just
simply iterating through all indexes, calculating cumulative frequency, and checking if it's equal
to the given value. In case of negative frequencies it is the only solution. However, if we have
only non-negative frequencies in our tree (that means cumulative frequencies for greater indexes
are not smaller) we can figure out logarithmic algorithm, which is modification of binary search.
We go through all bits (starting with the highest one), make the index, compare the cumulative
frequency of the current index and given value and, according to the outcome, take the lower or
higher half of the interval (just like in binary search). Function in C++:

// if in tree exists more than one index with a same


// cumulative frequency, this procedure will return
// some of them (we do not know which one)

// bitMask - initialy, it is the greatest bit of MaxVal


// bitMask store interval which should be searched
int find(int cumFre){
75

int idx = 0; // this var is result of function

while ((bitMask != 0) && (idx < MaxVal)){ // nobody likes overflow :)


int tIdx = idx + bitMask; // we make midpoint of interval
if (cumFre == tree[tIdx]) // if it is equal, we just return
idx
return tIdx;
else if (cumFre > tree[tIdx]){
// if tree frequency "can fit" into cumFre,
// then include it
idx = tIdx; // update index
cumFre -= tree[tIdx]; // set frequency for next loop
}
bitMask >>= 1; // half current interval
}
if (cumFre != 0) // maybe given cumulative frequency doesn't exist
return -1;
else
return idx;
}

// if in tree exists more than one index with a same


// cumulative frequency, this procedure will return
// the greatest one
int findG(int cumFre){
int idx = 0;

while ((bitMask != 0) && (idx < MaxVal)){


int tIdx = idx + bitMask;
if (cumFre >= tree[tIdx]){
// if current cumulative frequency is equal to cumFre,
// we are still looking for higher index (if exists)
idx = tIdx;
cumFre -= tree[tIdx];
}
bitMask >>= 1;
}
if (cumFre != 0)
return -1;
else
return idx;
}

Example for cumulative frequency 21 and function find:

First tIdx is 16; tree[16] is greater than 21; half bitMask and
iteration continue
tIdx is 8; tree[8] is less than 21, so we should include
Second first 8 indexes in result, remember idx because we
iteration surely know it is part of result; subtract tree[8] of
cumFre (we do not want to look for the same
76

cumulative frequency again - we are looking for


another cumulative frequency in the rest/another part
of tree); half bitMask and contiue
tIdx is 12; tree[12] is greater than 9 (there is no way to
Third overlap interval 1-8, in this example, with some further
iteration intervals, because only interval 1-16 can overlap); half
bitMask and continue
Forth tIdx is 10; tree[10] is less than 9, so we should update
iteration values; half bitMask and continue
Fifth
tIdx is 11; tree[11] is equal to 2; return index (tIdx)
iteration

Time complexity: O(log MaxVal).


Code length: Up to twenty lines.

2D BIT
BIT can be used as a multi-dimensional data structure. Suppose you have a plane with dots (with
non-negative coordinates). You make three queries:

1. set dot at (x , y)
2. remove dot from (x , y)
3. count number of dots in rectangle (0 , 0), (x , y) - where (0 , 0) if down-left corner, (x , y)
is up-right corner and sides are parallel to x-axis and y-axis.

If m is the number of queries, max_x is maximum x coordinate, and max_y is maximum y


coordinate, then the problem should be solved in O(m * log (max_x) * log (max_y)). In this case,
each element of the tree will contain array - (tree[max_x][max_y]). Updating indexes of x-
coordinate is the same as before. For example, suppose we are setting/removing dot (a , b). We
will call update(a , b , 1)/update(a , b , -1), where update is:

void update(int x , int y , int val){


while (x <= max_x){
updatey(x , y , val);
// this function should update array tree[x]
x += (x & -x);
}
}

The function updatey is the "same" as function update:

void updatey(int x , int y , int val){


while (y <= max_y){
tree[x][y] += val;
y += (y & -y);
}
77

It can be written in one function/procedure:

void update(int x , int y , int val){


int y1;
while (x <= max_x){
y1 = y;
while (y1 <= max_y){
tree[x][y1] += val;
y1 += (y1 & -y1);
}
x += (x & -x);
}
}
78

Image 1.8 - BIT is array of arrays, so this is two-dimensional BIT (size 16 x 8).
Blue fields are fields which we should update when we are updating index (5 , 3).

The modification for other functions is very similar. Also, note that BIT can be used as an n-
dimensional data structure.

Sample problem
79

SRM 310 - FloatingMedian


Problem 2:
Statement:
There is an array of n cards. Each card is putted face down on table. You have two
queries:
1. T i j (turn cards from index i to index j, include i-th and j-th card - card which was
face down will be face up; card which was face up will be face down)
2. Q i (answer 0 if i-th card is face down else answer 1)

Solution:
This has solution for each query (and 1 and 2) has time complexity O(log n). In array f
(of length n + 1) we will store each query T (i , j) - we set f[i]++ and f[j + 1]--. For each
card k between i and j (include i and j) sum f[1] + f[2] + ... + f[k] will be increased for 1,
for all others will be same as before (look at the image 2.0 for clarification), so our
solution will be described sum (which is same as cumulative frequency) module 2.

Image 2.0

Use BIT to store (increase/decrease) frequency and read cumulative frequency.

Conclusion

Binary Indexed Trees are very easy to code.


Each query on Binary Indexed Tree takes constant or logarithmic time.
Binary Indexeds Tree require linear memory space.
You can use it as an n-dimensional data structure.

Segment Tree (with lazy propagation)

se7so.blogspot.in

Segment Trees is a Tree data structure for storing intervals, or segments, It allows querying which of the
stored segments contain a given point. It is, in principle, a static structure; that is, its content cannot be
modified once the structure is built. It only uses O(N lg(N)) storage.
80

A segment trees has only three operations: build_tree, update_tree, query_tree.

Building tree: To init the tree segments or intervals values

Update tree: To update value of an interval or segment

Query tree: To retrieve the value of an interval or segment

Example Segment Tree:

The first node will hold the information for the interval [i, j]
If i<j the left and right son will hold the information for the intervals [i, (i+j)/2] and [(i+j)/2+1, j]

Notice that the height of a segment tree for an interval with N elements is [logN] + 1. Here is how a
segment tree for the interval [0, 9] would look like:

Order of growth of segment trees operations

build_tree: O(N lg(N))


update_tree: O(lg(N + k))
query_tree: O(lg(N + k))

K = Number of retrieved intervals or segments

Show me your code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
/**
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
81

37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 * In this code we have a very large


53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 array called arr, and very large set of
operations
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 * Operation #1: Increment the elements
85 86 87 88 within range [i, j] with value val
* Operation #2: Get max element within
range [i, j]
* Build tree: build_tree(1, 0, N-1)
* Update tree: update_tree(1, 0, N-1,
i, j, value)
* Query tree: query_tree(1, 0, N-1, i,
j)
*/

#include<iostream>
#include<algorithm>
using namespace std;

#include<string.h>
#include<math.h>

#define N 20
#define MAX (1+(1<<6)) // Why? :D
#define inf 0x7fffffff

int arr[N];
int tree[MAX];

/**
* Build and init tree
*/
void build_tree(int node, int a, int b)
{
if(a > b) return; // Out of range

if(a == b) { // Leaf node


tree[node] = arr[a]; //
Init value
return;
}

build_tree(node*2, a, (a+b)/2);
// Init left child
build_tree(node*2+1, 1+(a+b)/2,
b); // Init right child

tree[node] = max(tree[node*2],
tree[node*2+1]); // Init root value
}

/**
* Increment elements within range [i,
j] with value value
*/
void update_tree(int node, int a, int b,
int i, int j, int value) {
82

if(a > b || a > j || b < i) //


Current segment is not within range [i,
j]
return;

if(a == b) { // Leaf node


tree[node] += value;
return;
}

update_tree(node*2, a, (a+b)/2,
i, j, value); // Updating left child
update_tree(1+node*2, 1+(a+b)/2,
b, i, j, value); // Updating right child

tree[node] = max(tree[node*2],
tree[node*2+1]); // Updating root with
max value
}

/**
* Query tree to get max element value
within range [i, j]
*/
int query_tree(int node, int a, int b,
int i, int j) {

if(a > b || a > j || b < i)


return -inf; // Out of range

if(a >= i && b <= j) // Current


segment is totally within range [i, j]
return tree[node];

int q1 = query_tree(node*2, a,
(a+b)/2, i, j); // Query left child
int q2 = query_tree(1+node*2,
1+(a+b)/2, b, i, j); // Query right
child

int res = max(q1, q2); // Return


final result

return res;
}

int main() {
for(int i = 0; i < N; i++)
arr[i] = 1;

build_tree(1, 0, N-1);

update_tree(1, 0, N-1, 0, 6, 5);


// Increment range [0, 6] by 5
update_tree(1, 0, N-1, 7, 10,
12); // Incremenet range [7, 10] by 12
83

update_tree(1, 0, N-1, 10, N-1,


100); // Increment range [10, N-1] by
100

cout << query_tree(1, 0, N-1, 0,


N-1) << endl; // Get max element in
range [0, N-1]
}

Lazy Propagation

Sometimes a segment tree operation wouldn't survive if the problem constraints is too large, here
it come lazy propagation along with the segment tree.

In the current version when we update a range, we branch its childs even if the segment is covered
within range. In the lazy version we only mark its child that it needs to be updated and update it
when needed.
/**
* In this code we have a very large
array called arr, and very large set
of operations
* Operation #1: Increment the
elements within range [i, j] with
value val
* Operation #2: Get max element
within range [i, j]
* Build tree: build_tree(1, 0, N-1)
* Update tree: update_tree(1, 0, N-
1, i, j, value)
* Query tree: query_tree(1, 0, N-1,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 i, j)
*/
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 #include<iostream>
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 #include<algorithm>
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 using namespace std;
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
#include<string.h>
100 101 102 103 104 105 106 107 108 109 110 #include<math.h>
111 112 113 114 115 116 117 118 119
#define N 20
#define MAX (1+(1<<6)) // Why? :D
#define inf 0x7fffffff

int arr[N];
int tree[MAX];
int lazy[MAX];

/**
* Build and init tree
*/
void build_tree(int node, int a, int
b) {
84

if(a > b) return; // Out of


range

if(a == b) { // Leaf node


tree[node] = arr[a];
// Init value
return;
}

build_tree(node*2, a,
(a+b)/2); // Init left child
build_tree(node*2+1,
1+(a+b)/2, b); // Init right child

tree[node] =
max(tree[node*2], tree[node*2+1]); //
Init root value
}

/**
* Increment elements within range
[i, j] with value value
*/
void update_tree(int node, int a, int
b, int i, int j, int value) {

if(lazy[node] != 0) { // This
node needs to be updated
tree[node] +=
lazy[node]; // Update it

if(a != b) {
lazy[node*2]
+= lazy[node]; // Mark child as lazy
lazy[node*2+1]
+= lazy[node]; // Mark child as lazy
}

lazy[node] = 0; //
Reset it
}

if(a > b || a > j || b < i)


// Current segment is not within
range [i, j]
return;

if(a >= i && b <= j) { //


Segment is fully within range
tree[node] += value;

if(a != b) { // Not
leaf node
lazy[node*2]
+= value;
lazy[node*2+1]
+= value;
85

return;
}

update_tree(node*2, a,
(a+b)/2, i, j, value); // Updating
left child
update_tree(1+node*2,
1+(a+b)/2, b, i, j, value); //
Updating right child

tree[node] =
max(tree[node*2], tree[node*2+1]); //
Updating root with max value
}

/**
* Query tree to get max element
value within range [i, j]
*/
int query_tree(int node, int a, int
b, int i, int j) {

if(a > b || a > j || b < i)


return -inf; // Out of range

if(lazy[node] != 0) { // This
node needs to be updated
tree[node] +=
lazy[node]; // Update it

if(a != b) {
lazy[node*2]
+= lazy[node]; // Mark child as lazy
lazy[node*2+1]
+= lazy[node]; // Mark child as lazy
}

lazy[node] = 0; //
Reset it
}

if(a >= i && b <= j) //


Current segment is totally within
range [i, j]
return tree[node];

int q1 = query_tree(node*2,
a, (a+b)/2, i, j); // Query left
child
int q2 = query_tree(1+node*2,
1+(a+b)/2, b, i, j); // Query right
child

int res = max(q1, q2); //


Return final result
86

return res;
}

int main() {
for(int i = 0; i < N; i++)
arr[i] = 1;

build_tree(1, 0, N-1);

memset(lazy, 0, sizeof lazy);

update_tree(1, 0, N-1, 0, 6,
5); // Increment range [0, 6] by 5
update_tree(1, 0, N-1, 7, 10,
12); // Incremenet range [7, 10] by
12
update_tree(1, 0, N-1, 10, N-
1, 100); // Increment range [10, N-1]
by 100

cout << query_tree(1, 0, N-1,


0, N-1) << endl; // Get max element
in range [0, N-1]
}

Z algorithm

Codeforces.com

Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest
substring starting from S[i] which is also a prefix of S, i.e. the maximum k such that S[j]=S[i+j] for all
0j<k. Note that Z[i]=0 means that S[0]S[i]. For easier terminology, we will refer to substrings which
are also a prefix as prefix-substrings.

The algorithm relies on a single, crucial invariant. As we iterate over the letters in the string (index i from
1 to n-1), we maintain an interval [L,R] which is the interval with maximum R such that 1LiR and
S[L...R] is a prefix-substring (if no such interval exists, just let L=R=-1). For i=1, we can simply compute
L and R by comparing S[0...] to S[1...]. Moreover, we also get Z[1] during this.

Now suppose we have the correct interval [L,R] for i-1 and all of the Z values up to i-1. We will
compute Z[i] and the new [L,R] by the following steps:

If i>R, then there does not exist a prefix-substring of S that starts before i and ends at or after i.
If such a substring existed, [L,R] would have been the interval for that substring rather than its
current value. Thus we "reset" and compute a new [L,R] by comparing S[0...] to S[i...] and get
Z[i] at the same time (Z[i]=R-L+1).
87

Otherwise, iR, so the current [L,R] extends at least to i. Let k=i-L. We know that
Z[i]min(Z[k],R-i+1) because S[i...] matches S[k...] for at least R-i+1 characters (they are in
the [L,R] interval which we know to be a prefix-substring). Now we have a few more cases to
consider.
If Z[k]<R-i+1, then there is no longer prefix-substring starting at S[i] (or else Z[k] would be
larger), meaning Z[i]=Z[k] and [L,R] stays the same. The latter is true because [L,R] only
changes if there is a prefix-substring starting at S[i] that extends beyond R, which we know is not
the case here.
If Z[k]R-i+1, then it is possible for S[i...] to match S[0...] for more than R-i+1 characters (i.e.
past position R). Thus we need to update [L,R] by setting L=i and matching from S[R+1]
forward to obtain the new R. Again, we get Z[i] during this.

The process computes all of the Z values in a single pass over the string, so we're done. Correctness is
inherent in the algorithm and is pretty intuitively clear.

Analysis
We claim that the algorithm runs in O(n) time, and the argument is straightforward. We never compare
characters at positions less than R, and every time we match a character R increases by one, so there are
at most n comparisons there. Lastly, we can only mismatch once for each i (it causes R to stop
increasing), so that's another at most n comparisons, giving O(n) total.

Code
Simple and short. Note that the optimization L=R=i is used when S[0]S[i] (it doesn't affect the
algorithm since at the next iteration i>R regardless).

int L = 0, R = 0;
for (int i = 1; i < n; i++) {
if (i > R) {
L = R = i;
while (R < n && s[R-L] == s[R]) R++;
z[i] = R-L; R--;
} else {
int k = i-L;
if (z[k] < R-i+1) z[i] = z[k];
else {
L = i;
while (R < n && s[R-L] == s[R]) R++;
z[i] = R-L; R--;
}
}
}

Application
One application of the Z Algorithm is for the standard string matching problem of finding matches for a
pattern T of length m in a string S of length n. We can do this in O(n+m) time by using the Z Algorithm
88

on the string T S (that is, concatenating T, , and S) where is a character that matches nothing. The
indices i with Z[i]=m correspond to matches of T in S.

Lastly, to solve Problem B of Beta Round 93, we simply compute Z for the given string S, then iterate
from i to n-1. If Z[i]=n-i then we know the suffix from S[i] is a prefix, and if the largest Z value we've
seen so far is at least n-i, then we know some string inside also matches that prefix. That gives the
result.

int maxz = 0, res = 0;


for (int i = 1; i < n; i++) {
if (z[i] == n-i && maxz >= n-i) { res = n-i; break; }
maxz = max(maxz, z[i]);
}

Floyd Warshall Algorithm

Geeksforgeeks.org

The Floyd Warshall Algorithm is for solving the All Pairs Shortest Path problem. The problem is
to find shortest distances between every pair of vertices in a given edge weighted directed Graph.

Example:

Input:
graph[][] = { {0, 5, INF, 10},
{INF, 0, 3, INF},
{INF, INF, 0, 1},
{INF, INF, INF, 0} }
which represents the following graph
10
(0)------->(3)
| /|\
5 | |
| | 1
\|/ |
(1)------->(2)
3
Note that the value of graph[i][j] is 0 if i is equal to j
And graph[i][j] is INF (infinite) if there is no edge from vertex i to j.

Output:
Shortest distance matrix
0 5 8 9
INF 0 3 4
INF INF 0 1
INF INF INF 0
89

Floyd Warshall Algorithm


We initialize the solution matrix same as the input graph matrix as a first step. Then we update
the solution matrix by considering all vertices as an intermediate vertex. The idea is to one by
one pick all vertices and update all shortest paths which include the picked vertex as an
intermediate vertex in the shortest path. When we pick vertex number k as an intermediate
vertex, we already have considered vertices {0, 1, 2, .. k-1} as intermediate vertices. For every
pair (i, j) of source and destination vertices respectively, there are two possible cases.
1) k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it
is.
2) k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as
dist[i][k] + dist[k][j].

The following figure is taken from the Cormen book. It shows the above optimal substructure
property in the all-pairs shortest path problem.

Following is C implementation of the Floyd Warshall algorithm.

// Program for Floyd Warshall Algorithm


#include<stdio.h>

// Number of vertices in the graph


#define V 4

/* Define Infinite as a large enough value. This value will be used


for vertices not connected to each other */
#define INF 99999

// A function to print the solution matrix


void printSolution(int dist[][V]);

// Solves the all-pairs shortest path problem using Floyd Warshall algorithm
void floydWarshell (int graph[][V])
{
/* dist[][] will be the output matrix that will finally have the shortest
distances between every pair of vertices */
int dist[V][V], i, j, k;

/* Initialize the solution matrix same as input graph matrix. Or


we can say the initial values of shortest distances are based
on shortest paths considering no intermediate vertex. */
for (i = 0; i < V; i++)
for (j = 0; j < V; j++)
dist[i][j] = graph[i][j];
90

/* Add all vertices one by one to the set of intermediate vertices.


---> Before start of a iteration, we have shortest distances between
all
pairs of vertices such that the shortest distances consider only the
vertices in set {0, 1, 2, .. k-1} as intermediate vertices.
----> After the end of a iteration, vertex no. k is added to the set of
intermediate vertices and the set becomes {0, 1, 2, .. k} */
for (k = 0; k < V; k++)
{
// Pick all vertices as source one by one
for (i = 0; i < V; i++)
{
// Pick all vertices as destination for the
// above picked source
for (j = 0; j < V; j++)
{
// If vertex k is on the shortest path from
// i to j, then update the value of dist[i][j]
if (dist[i][k] + dist[k][j] < dist[i][j])
dist[i][j] = dist[i][k] + dist[k][j];
}
}
}

// Print the shortest distance matrix


printSolution(dist);
}

/* A utility function to print solution */


void printSolution(int dist[][V])
{
printf ("Following matrix shows the shortest distances"
" between every pair of vertices \n");
for (int i = 0; i < V; i++)
{
for (int j = 0; j < V; j++)
{
if (dist[i][j] == INF)
printf("%7s", "INF");
else
printf ("%7d", dist[i][j]);
}
printf("\n");
}
}

// driver program to test above function


int main()
{
/* Let us create the following weighted graph
10
(0)------->(3)
| /|\
5 | |
| | 1
91

\|/ |
(1)------->(2)
3 */
int graph[V][V] = { {0, 5, INF, 10},
{INF, 0, 3, INF},
{INF, INF, 0, 1},
{INF, INF, INF, 0}
};

// Print the solution


floydWarshell(graph);
return 0;
}

Output:

Following matrix shows the shortest distances between every pair of vertices
0 5 8 9
INF 0 3 4
INF INF 0 1
INF INF INF 0

Time Complexity: O(V^3)

The above program only prints the shortest distances. We can modify the solution to print the
shortest paths also by storing the predecessor information in a separate 2D matrix.
Also, the value of INF can be taken as INT_MAX from limits.h to make sure that we handle
maximum possible value. When we take INF as INT_MAX, we need to change the if condition
in the above program to avoid arithmatic overflow.

#include<limits.h>

#define INF INT_MAX


..........................
if (dist[i][k] != INF && dist[k][j] != INF && dist[i][k] + dist[k][j] <
dist[i][j])
dist[i][j] = dist[i][k] + dist[k][j];
...........................

Sparse Table(RMQ)
mayanknatani.wordpress.com

Range Minimum Query

Given an array say A[0,N-1], we are queried for the minimum element from some index i to
the index j such that j>=i, the notation for rest of the post for this will be RMQ(i,j) s.t. j>=i.

Ex: A = [3 , 2 , -1 , 4 , 2]
RMQ(0,2) = -1
92

RMQ(3,4) = 2
RMQ(0,4) = -1

1. Naives Approach
Time Complexity : construction O(1) , Query O(N) <O(1),O(N)>

The idea is simple and straight forward, what we do is nothing but trivially search for the
minimum from index i to index j by actually traversing the array between the indices. The
worst case time complexity for the array traversal may be O(N).

1
2 #include <cstdio>
3
4 int RMQ(int A[],int s,int e){
5 int minindex = s;
6 for(int i=s;i<=e;i++)
if(A[i]<A[minindex])
7 minindex = i;
8 return A[minindex];
9 }
10// driver programme
11int main(){
int A[10] = {3,1,2,-1,5,4,6,0,9,8};
12 int s,e;
13 scanf("%d %d",&s,&e);
14 printf("%d\n",RMQ(A,s,e));
15 return 0;
}
16
17

2. Dynamic Programming Approach 1 (Trivial)


Time Complexity : Construction O(N^2) , Query O(1) <O(N^2),O(1)>
Writing recursion for the given problem is no big task.

j, if A[j] < A[RMQ(i,j-1)]


RMQ(i,j)=
RMQ(i,j-1), otherwise

The recursion can easily be memoized by using a two dimensional array of N*N and can be
written as:

j, if A[j] < A[M[i][j-1]]


M[i][j]=
M[i][j-1], otherwise
#include <cstdio>
1 #define MAXN 100
2 int M[MAXN][MAXN];
3
4 int RMQ_DP(int A[],int N){
5 for(int i=0;i<N;i++)
M[i][i]=i;
93

6 for(int i=0;i<N;i++){
7 for(int j=i+1;j<N;j++){
if(A[M[i][j-1]]>A[j])
8 M[i][j]=j;
9 else
10 M[i][j]=M[i][j-1];
11 }
}
12}
13// driver programme
14int main(){
15 int A[10] = {3,1,2,-1,5,4,6,0,9,8};
16 RMQ_DP(A,10);
int s,e;
17 scanf("%d %d",&s,&e);
18 printf("%d\n",A[M[s][e]]);
19 return 0;
20 }
21
22
23
24
25

3. Split and Query


Time Complexity : construction O(N) , Query O(sqrt(N)) <O(N),O(sqrt(N))>
Here we split the array in sqrt(N) equal parts of size sqrt(N), The idea is to store the minimum of
each part in the an another array say M(0*sqrt(N)), and then for every query we traverse the
array M and the remaining elements of the array A.
Let us understand through an example :
A = [2,4,3,1,6,7,8,9,1,7]

image credits: TC

as it is clear from the image that M[0] is storing the index of the minimum element from A[0] to
A[2] then M[1] will store index of the minimum element from A[3] to A[5] ans so on till it is
M[3] which will store A[9] as there are no further elements.

1 #include <cstdio>
2 #include <cmath>
#define MAXN 100
3 int M[MAXN];
4 /* M[i] = the index of the minimum element from A[i*sqrt(N)] to
5 A[i*sqrt(N)+sqrt(N)-1] */
94

6 int size_m; // stores the size of the M array


7
8 void RMQ_SPLIT(int A[],int N){
size_m =0;
9 for(int i=0;i<N;){
10 int minindex = i;
11 for(int j=0;j<(int)sqrt(N) && i<N;j++){
12 if(A[i]<A[minindex])
minindex=i;
13 i++;
14 }
15 M[size_m++]=minindex;
16 }
17}
int RMQ(int A[],int s,int e,int N){
18 int start = s/(int)sqrt(N); // this will compute the starting index of
19the M array
20 int ans = A[s];
21 int end = e/(int)sqrt(N); // ending index of the M array
22 for(int i=s;i<(start+1)*sqrt(N);i++)
if(A[i]<ans)
23 ans = A[i];
24 for(int i=start+1;i<end;i++){
25 if(A[M[i]]<ans)
26 ans=A[M[i]];
}
27 for(int i=end*sqrt(N);i<=e;i++){
28 if(A[i]<ans)
29 ans = A[i];
30 }
31 return ans;
}
32// driver programme
33int main(){
34 int A[10] = {3,1,2,-1,5,4,6,0,9,8};
35 RMQ_SPLIT(A,10);
printf("\n");
36 int s,e;
37 scanf("%d %d",&s,&e);
38 printf("%d\n",RMQ(A,s,e,10));
39 return 0;
40 }
41
42
43
44
45
46

4. Dynamic Programming Approach 2 (Sparse Table)


Time Complexity : Construction O(NlogN) Query O(1) <O(NlogN),O(1)>
To get an asymptotically faster time bound, we need to think something out of the box then to
just look into the trivial comparisons based algorithms. The next algorithm will take use of a
95

special data structure known as Sparse Tables. Sparse tables stores the information from one
index i to the some index j which is at a specific distance from i.

Here we use Sparse table to store the minimum of the elements between index i to i+j^2. It
can be better understood with the help of an example :
let us say, A = [2,4,3,1,6,7,8,9,1,7]
and the sparse table be two dimensional Array M( N*(log(N)+1) )

To compute M[i][j] we use dynamic programming

M[i][j-1] if(A[M[i][j-1]]<= A[M[i+2^(j-1)-1][j-1]])


M[i][j] =
M[i+2^(j-1)-1][j-1]

Now after precomputation of the table M The RMQ can be solved in O(1) as follows :
let k = log(j-i+1)

A[M[i][k]] if(A[M[i][k]]<=A[M[j-2^k+1][k]])
RMQ(i,j) =
A[M[j-2^k+1][k]]
1 #include <cstdio>
2 #define MAXN 1000
#define MAXLOGN 10
3 int M[MAXN][MAXLOGN];
4 void Compute_ST(int A[],int N){
5 int i,j;
6 for(i=0;i<N;i++)
M[i][0]=i;
7 for(j=1;1<<j <=N ;j++){
8 for(i=0;i+(1<<(j-1))<N;i++){
9 if(A[M[i][j-1]]<=A[M[i+(1<<(j-1))][j-1]])
10 M[i][j]=M[i][j-1];
11 else
M[i][j]=M[i+(1<<(j-1))][j-1];
12 }
13 }
14}
15int RMQ(int A[],int s,int e){
int k=e-s;
16 k=31-__builtin_clz(k+1); // k = log(e-s+1)
17 if(A[M[s][k]]<=A[M[e-(1<<k)+1][k]])
18 return A[M[s][k]];
96

19 return A[M[e-(1<<k)+1][k]];
20 }
// driver programme
21int main(){
22 int A[10] = {3,1,2,-1,5,4,6,0,9,8};
23 Compute_ST(A,10);
24 int s,e;
scanf("%d %d",&s,&e);
25 printf("%d\n",RMQ(A,s,e));
26 return 0;
27}
28
29
30
31
32
33

5. Segment Trees
Time Complexity : Construction O(N) Query O(logN) <O(N),O(logN)>
segment tree are one of the most popular and most powerful data structures for interval update
and interval queries. Segment trees are mostly preferred over any other methods described above
and are useful in most of the programming competitions. Segment trees are heap like data
structures. The segment tree for an array of size 10 would look like this.

Segment tree can be stored in the form of an array of size 2^(logN+1)-1 say M.The left and the
right child of node numbered x will be 2*x+1 and 2*x+2 respectively.

1 #include <cstdio>
2 #define MAXN 1000
#define MAXSIZE 10000
3 int M[MAXSIZE];
4
5 //node is the index of the segment tree m, start and end are the index of
6 the the array A
7 void BuildTree(int node,int start,int end,int A[],int N){
8 if(start==end)
M[node]=start;
9 else{
10 BuildTree(2*node+1,start,(start+end)/2,A,N);
11 BuildTree(2*node+2,(start+end)/2+1,end,A,N);
97

12 if(A[M[2*node+1]]<A[M[2*node+2]])
13 M[node]=M[2*node+1];
else
14 M[node]=M[2*node+2];
15 }
16}
17
18int RMQ(int node,int start,int end,int s,int e,int A[]){
if(s<=start && e>=end)
19 return M[node];
20 else if(s>end || e<start)
21 return -1;
22 int q1 = RMQ(2*node+1,start,(start+end)/2,s,e,A);
23 int q2 = RMQ(2*node+2,(start+end)/2+1,end,s,e,A);
if(q1==-1)
24 return q2;
25 else if(q2==-1)
26 return q1;
27 if(A[q1]<A[q2])
return q1;
28 return q2;
29}
30// driver programme
31int main(){
32 int A[10] = {3,1,2,-1,5,4,6,0,9,8};
BuildTree(0,0,10-1,A,10);
33 int s,e;
34 scanf("%d %d",&s,&e);
35 printf("%d\n",A[RMQ(0,0,10-1,s,e,A)]);
36 return 0;
37 }
38
39
40
41
42
43

Now, out of the five approaches last two are generally preferred for most of the programming
competitions. Segment trees are more popular, because they are more flexible than sparse tables,
they can also be used for interval update which is most often clubbed with the interval query in a
programming challenges while sparse table can not be used to update the array(as it is based on
precomputations).

Heap / Priority Queue / Heapsort


sourcetricks.com

Heap is a binary tree that stores priorities (or priority-element pairs) at the nodes.
It has the following properties:
o All levels except last level are full. Last level is left filled.
98

o Priority of a node is at-least as large as that of its parent (min-heap) (or) vice-
versa (max-heap).
o If the smallest element is in the root node, it results in a min-heap.
o If the largest element is in the root node, it results in a max-heap.
o A heap can be thought of as a priority queue. The most important node will
always be at the root of the tree.
o Heaps can also be used to sort data, heap sort.
o The two most interesting operations in a heap is heapifyup and heapifydown.
Heapify-up (assumption min-heap)
Used to add a node to the heap. To add a node, it is inserted at
the last empty space and heapifyup process is done.
When a node is added, its key is compared to its parent. If parent
key is smaller than the current node it is swapped. The process is
repeated till the heap property is met.
Heapify-down
Used during removal of a node. When a node is removed which is
always the root (lowest in priority) the last available node in heap
is replaced as the root and heapifydown process is done.
The key of parent node is compared with the children. If any of the
children have lower priority it is swapped with the parent. The
process is repeated for the newly swapped node till the heap
property is met.

Implementation of a heap in C++. Demonstrates min-heap using arrays.


#include <iostream>
#include <vector>
#include <iterator>
using namespace std;

class Heap {
public:
Heap();
~Heap();
void insert(int element);
int deletemin();
void print();
int size() { return heap.size(); }
private:
int left(int parent);
int right(int parent);
int parent(int child);
void heapifyup(int index);
void heapifydown(int index);
private:
vector<int> heap;
};

Heap::Heap()
{
}

Heap::~Heap()
{
}
99

void Heap::insert(int element)


{
heap.push_back(element);
heapifyup(heap.size() - 1);
}

int Heap::deletemin()
{
int min = heap.front();
heap[0] = heap.at(heap.size() - 1);
heap.pop_back();
heapifydown(0);
return min;
}

void Heap::print()
{
vector<int>::iterator pos = heap.begin();
cout << "Heap = ";
while ( pos != heap.end() ) {
cout << *pos << " ";
++pos;
}
cout << endl;
}

void Heap::heapifyup(int index)


{
//cout << "index=" << index << endl;
//cout << "parent(index)=" << parent(index) << endl;
//cout << "heap[parent(index)]=" << heap[parent(index)] << endl;
//cout << "heap[index]=" << heap[index] << endl;
while ( ( index > 0 ) && ( parent(index) >= 0 ) &&
( heap[parent(index)] > heap[index] ) )
{
int tmp = heap[parent(index)];
heap[parent(index)] = heap[index];
heap[index] = tmp;
index = parent(index);
}
}

void Heap::heapifydown(int index)


{
//cout << "index=" << index << endl;
//cout << "left(index)=" << left(index) << endl;
//cout << "right(index)=" << right(index) << endl;
int child = left(index);
if ( ( child > 0 ) && ( right(index) > 0 ) &&
( heap[child] > heap[right(index)] ) )
{
child = right(index);
}
if ( child > 0 )
{
int tmp = heap[index];
100

heap[index] = heap[child];
heap[child] = tmp;
heapifydown(child);
}
}

int Heap::left(int parent)


{
int i = ( parent << 1 ) + 1; // 2 * parent + 1
return ( i < heap.size() ) ? i : -1;
}

int Heap::right(int parent)


{
int i = ( parent << 1 ) + 2; // 2 * parent + 2
return ( i < heap.size() ) ? i : -1;
}

int Heap::parent(int child)


{
if (child != 0)
{
int i = (child - 1) >> 1;
return i;
}
return -1;
}

int main()
{
// Create the heap
Heap* myheap = new Heap();
myheap->insert(700);
myheap->print();
myheap->insert(500);
myheap->print();
myheap->insert(100);
myheap->print();
myheap->insert(800);
myheap->print();
myheap->insert(200);
myheap->print();
myheap->insert(400);
myheap->print();
myheap->insert(900);
myheap->print();
myheap->insert(1000);
myheap->print();
myheap->insert(300);
myheap->print();
myheap->insert(600);
myheap->print();

// Get priority element from the heap


int heapSize = myheap->size();
for ( int i = 0; i < heapSize; i++ )
cout << "Get min element = " << myheap->deletemin() << endl;
101

// Cleanup
delete myheap;
}

OUTPUT:-
Heap = 700
Heap = 500 700
Heap = 100 700 500
Heap = 100 700 500 800
Heap = 100 200 500 800 700
Heap = 100 200 400 800 700 500
Heap = 100 200 400 800 700 500 900
Heap = 100 200 400 800 700 500 900 1000
Heap = 100 200 400 300 700 500 900 1000 800
Heap = 100 200 400 300 600 500 900 1000 800 700
Get min element = 100
Get min element = 200
Get min element = 300
Get min element = 400
Get min element = 500
Get min element = 600
Get min element = 700
Get min element = 800
Get min element = 900
Get min element = 1000

Modular Multiplicative Inverse


comeoncodeon.wordpress.com

The modular multiplicative inverse of an integer a modulo m is an integer x such that

That is, it is the multiplicative inverse in the ring of integers modulo m. This is equivalent to

The multiplicative inverse of a modulo m exists if and only if a and m are coprime (i.e., if gcd(a,
m) = 1).

Lets see various ways to calculate Modular Multiplicative Inverse:

1. Brute Force
We can calculate the inverse using a brute force approach where we multiply a with all possible
values x and find a x such that Heres a sample C++ code:

int modInverse(int a, int m) {


a %= m;
for(int x = 1; x < m; x++) {
if((a*x) % m == 1) return x;
102

}
}

The time complexity of the above codes is O(m).

2. Using Extended Euclidean Algorithm


We have to find a number x such that ax = 1 (mod m). This can be written as well as ax = 1 +
my, which rearranges into ax my = 1. Since x and y need not be positive, we can write it as
well in the standard form, ax + my = 1.

In number theory, Bzouts identity for two integers a, b is an expression ax + by = d, where x


and y are integers (called Bzout coefficients for (a,b)), such that d is a common divisor of a and
b. If d is the greatest common divisor of a and b then Bzouts identity ax + by = gcd(a,b) can be
solved using Extended Euclidean Algorithm.

The Extended Euclidean Algorithm is an extension to the Euclidean algorithm. Besides finding
the greatest common divisor of integers a and b, as the Euclidean algorithm does, it also finds
integers x and y (one of which is typically negative) that satisfy Bzouts identity
ax + by = gcd(a,b). The Extended Euclidean Algorithm is particularly useful when a and b are
coprime, since x is the multiplicative inverse of a modulo b, and y is the multiplicative inverse of
b modulo a.

We will look at two ways to find the result of Extended Euclidean Algorithm.

Iterative Method
This method computes expressions of the form ri = axi + byi for the remainder in each step i of
the Euclidean algorithm. Each successive number ri can be written as the remainder of the
division of the previous two such numbers, which remainder can be expressed using the whole
quotient qi of that division as follows:

By substitution, this gives:


which can be written

The first two values are the initial arguments to the algorithm:

So the coefficients start out as x1 = 1, y1 = 0, x2 = 0, and y2 = 1, and the others are given by

The expression for the last non-zero remainder gives the desired results since this method
computes every remainder in terms of a and b, as desired.

So the algorithm looks like,

1. Apply Euclidean algorithm, and let qn(n starts from 1) be a finite list of quotients in the
division.
2. Initialize x0, x1 as 1, 0, and y0, y1 as 0,1 respectively.
103

1. Then for each i so long as qi is defined,


2. Compute xi+1 = xi-1 qixi
3. Compute yi+1 = yi-1 qiyi
4. Repeat the above after incrementing i by 1.
3. The answers are the second-to-last of xn and yn.

/* This function return the gcd of a and b followed by


the pair x and y of equation ax + by = gcd(a,b)*/
pair<int, pair<int, int> > extendedEuclid(int a, int b) {
int x = 1, y = 0;
int xLast = 0, yLast = 1;
int q, r, m, n;
while(a != 0) {
q = b / a;
r = b % a;
m = xLast - q * x;
n = yLast - q * y;
xLast = x, yLast = y;
x = m, y = n;
b = a, a = r;
}
return make_pair(b, make_pair(xLast, yLast));
}

int modInverse(int a, int m) {


return (extendedEuclid(a,m).second.first + m) % m;
}

Recursive Method
This method attempts to solve the original equation directly, by reducing the dividend and
divisor gradually, from the first line to the last line, which can then be substituted with trivial
value and work backward to obtain the solution.
Notice that the equation remains unchanged after decomposing the original dividend in terms of
the divisor plus a remainder, and then regrouping terms. So the algorithm looks like this:

1. If b = 0, the algorithm ends, returning the solution x = 1, y = 0.


2. Otherwise:
o Determine the quotient q and remainder r of dividing a by b using the integer
division algorithm.
o Then recursively find coefficients s, t such that bs + rt divides both b and r.
o Finally the algorithm returns the solution x = t, and y = s qt.

Heres a C++ implementation:

/* This function return the gcd of a and b followed by


the pair x and y of equation ax + by = gcd(a,b)*/
pair<int, pair<int, int> > extendedEuclid(int a, int b) {
if(a == 0) return make_pair(b, make_pair(0, 1));
pair<int, pair<int, int> > p;
p = extendedEuclid(b % a, a);
return make_pair(p.first, make_pair(p.second.second -
p.second.first*(b/a), p.second.first));
104

int modInverse(int a, int m) {


return (extendedEuclid(a,m).second.first + m) % m;
}

The time complexity of the above codes is O(log(m)2).

3. Using Fermats Little Theorem


Fermats little theorem states that if m is a prime and a is an integer co-prime to m, then ap 1
will be evenly divisible by m. That is or Heres a
sample C++ code:

/* This function calculates (a^b)%MOD */


int pow(int a, int b, int MOD) {
int x = 1, y = a;
while(b > 0) {
if(b%2 == 1) {
x=(x*y);
if(x>MOD) x%=MOD;
}
y = (y*y);
if(y>MOD) y%=MOD;
b /= 2;
}
return x;
}

int modInverse(int a, int m) {


return pow(a,m-2,m);
}

The time complexity of the above codes is O(log(m)).

4. Using Eulers Theorem


Fermats Little theorem can only be used if m is a prime. If m is not a prime we can use Eulers
Theorem, which is a generalization of Fermats Little theorem. According to Eulers theorem, if
a is coprime to m, that is, gcd(a, m) = 1, then , where where (m) is Euler
Totient Function. Therefore the modular multiplicative inverse can be found directly:
. The problem here is finding (m). If we know (m), then it is very
similar to above method.

Now lets take a little different question. Now suppose you have to calculate the inverse of first n
numbers. From above the best we can do is O(n log(m)). Can we do any better? Yes.

We can use sieve to find a factor of composite numbers less than n. So for composite numbers
inverse(i) = (inverse(i/factor(i)) * inverse(factor(i))) % m, and we can use either Extended
Euclidean Algorithm or Fermats Theorem to find inverse for prime numbers. But we can still do
better.
105

a * (m / a) + m % a = m
(a * (m / a) + m % a) mod m = m mod m, or
(a * (m / a) + m % a) mod m = 0, or
(- (m % a)) mod m = (a * (m / a)) mod m.
Dividing both sides by (a * (m % a)), we get
inverse(a) mod m = ((m/a) * inverse(m % a)) mod m
inverse(a) mod m = (- (m/a) * inverse(m % a)) mod m

Heres a sample C++ code:

vector<int> inverseArray(int n, int m) {


vector<int> modInverse(n + 1,0);
modInverse[1] = 1;
for(int i = 2; i <= n; i++) {
modInverse[i] = (-(m/i) * modInverse[m % i]) % m + m;
}
return modInverse;
}

The time complexity of the above code is O(n).

nCr % M
codechef.com

The first key idea is that of Lucas' Theorem.

Lucas's Theorem reduces nCr % M to

(n0Cr0 % M) (n1Cr1 % M) ... (nkCrk % M)

Where,

(nknk-1...n0) is the base M representation of n


(rkrk-1...r0) is the base M representation of r

Note, if any of the above terms is zero because ri > ni or any other degeneracy, then the
binomial coefficient nCr % M = 0

This means that any of the terms in the expansion of niCri is not divisible by M. But this is only
half the job done.

Now you have to calculate nCr % M (ignoring subscripts for brevity) for some 0 r n < M

There are no ways around it, but to calculate

[ n! / ( r! (n-r)! ) ] % M
106

Without loss of generality, we can assume r n-r

Remember, you can always do the Obvious. Calculate the binomial and then take a modulo. This
is mostly not possible because the binomial will be too large to fit into either int or long long int
(and Big Int will be too slow)

This can then be simplified by using some clever ideas from Modular Arithmetic.

If A*B % M = 1, A and B are modular multiplicative inverses of each other.

For brevity, we say B % M = A-1 % M

It is not always possible to calculate modular multiplicative inverses. If A and M are not co-
prime, finding a B will not be possible.
For example, A = 2, M = 4. You can never find a number B such that 2*B % 4 = 1

Most problems give us a prime M. This means calculating B is always possible for any A < M.
For other problems, look at the decomposition of M. In the codesprint problem you mentioned

142857 = 33 * 11 * 13 * 37

You can find the result of nCr % m for each m = 27, 11, 13, 37. Once you have the answers, you
can reconstruct the answer modulo 142857 using Chinese Remainder Theorem. These answers
can be found by Naive Methods since, m is small.

I have also seen problems where M is a product of large primes, but square free. In these cases,
you can calculate the answers modulo the primes that M is composed of using modular inverses
(a little more about that below), and reconstruct the answer using CRT.

I am yet to see a problem where M is neither, but if it is. I do not know if there is a way to
calculate binomial coefficients generally (since you cannot calculate modular inverses, and
neither can you brote force). I can dream of a problem where there are powers of small primes,
but square-free larger ones for a Number Theory extravaganza.

There is one other way to calculate nCr for any M which is small enough (say M 5000) or small
n and r (say r n 5000) by using the following recursion with memoization
n
Cr = n-1Cr + n-1Cr-1

Since there are no divisions involved (no multiplications too) the answer is easy and precise to
calculate even if the actual binomials would be very large.

So, back to calculating

[ n! / ( r! (n-r)! ) ] % M, you can convert it to


107

n * (n-1) ... * (n-r+1) * r-1 * (r-1)-1 ... * 1

Of course, each product is maintained modulo M.

This may be fast enough for problems where M is large and r is small.

But sometimes, n and r can be very large. Fortunately, such problems always have a small
enough M :D

The trick is, you pre-calculate factorials, modulo M and pre-calculate inverse factorials, modulo
M.

fact[n] = n * fact[n-1] % M
ifact[n] = modular_inverse(n) * ifact[n-1] % M

Modular Multiplicative Inverse for a prime M is in fact very simple. From Fermat's Little
Theorem

AM-1 % M = 1

Hence, A * AM-2 % M = 1
Or in other words,

A-1 % M = AM-2 % M, which is easy (and fast) to find using repeated squaring.

There is one last link I wish to paste to make this complete. Modular inverses can also be found
using the Extended Euclid's Algorithm. I have only had to use it once or twice among all the
problems I ever solved.

Suffix Automaton
geeksforgeeks.org

Searching for Patterns | Set 5 (Finite Automata)

Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[])
that prints all occurrences of pat[] in txt[]. You may assume that n > m.

Examples:
1) Input:

txt[] = "THIS IS A TEST TEXT"


pat[] = "TEST"

Output:
108

Pattern found at index 10

2) Input:

txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"

Output:

Pattern found at index 0


Pattern found at index 9
Pattern found at index 13

Pattern searching is an important problem in computer science. When we do search for a string
in notepad/word file or browser or database, pattern searching algorithms are used to show the
search results.

We have discussed the following algorithms in the previous posts:

Naive Algorithm
KMP Algorithm
Rabin Karp Algorithm

In this post, we will discuss Finite Automata (FA) based pattern searching algorithm. In FA
based algorithm, we preprocess the pattern and build a 2D array that represents a Finite
Automata. Construction of the FA is the main tricky part of this algorithm. Once the FA is built,
the searching is simple. In search, we simply need to start from the first state of the automata and
first character of the text. At every step, we consider next character of text, look for the next state
in the built FA and move to new state. If we reach final state, then pattern is found in text. Time
complexity of the search prcess is O(n).
Before we discuss FA construction, let us take a look at the following FA for pattern
ACACAGA.
109

The abvoe diagrams represent graphical and tabular representations of pattern ACACAGA.

Number of states in FA will be M+1 where M is length of the pattern. The main thing to
construct FA is to get the next state from the current state for every possible character. Given a
character x and a state k, we can get the next state by considering the string pat[0..k-1]x which
is basically concatenation of pattern characters pat[0], pat[1] pat[k-1] and the character x. The
idea is to get length of the longest prefix of the given pattern such that the prefix is also suffix of
pat[0..k-1]x. The value of length gives us the next state. For example, let us see how to get the
next state from current state 5 and character C in the above diagram. We need to consider the
string, pat[0..5]C which is ACACAC. The lenght of the longest prefix of the pattern such
that the prefix is suffix of ACACACis 4 (ACAC). So the next state (from state 5) is 4 for
character C.

In the following code, computeTF() constructs the FA. The time complexity of the computeTF()
is O(m^3*NO_OF_CHARS) where m is length of the pattern and NO_OF_CHARS is size of
alphabet (total number of possible characters in pattern and text). The implementation tries all
possible prefixes starting from the longest possible that can be a suffix of pat[0..k-1]x. There
are better implementations to construct FA in O(m*NO_OF_CHARS) (Hint: we can use
something like lps array construction in KMP algorithm). We have covered the better
implementation in our next post on pattern searching.

#include<stdio.h>
#include<string.h>
#define NO_OF_CHARS 256

int getNextState(char *pat, int M, int state, int x)


{
// If the character c is same as next character in pattern,
// then simply increment state
if (state < M && x == pat[state])
return state+1;

int ns, i; // ns stores the result which is next state

// ns finally contains the longest prefix which is also suffix


// in "pat[0..state-1]c"

// Start from the largest possible value and stop when you find
// a prefix which is also suffix
110

for (ns = state; ns > 0; ns--)


{
if(pat[ns-1] == x)
{
for(i = 0; i < ns-1; i++)
{
if (pat[i] != pat[state-ns+1+i])
break;
}
if (i == ns-1)
return ns;
}
}

return 0;
}

/* This function builds the TF table which represents Finite Automata for a
given pattern */
void computeTF(char *pat, int M, int TF[][NO_OF_CHARS])
{
int state, x;
for (state = 0; state <= M; ++state)
for (x = 0; x < NO_OF_CHARS; ++x)
TF[state][x] = getNextState(pat, M, state, x);
}

/* Prints all occurrences of pat in txt */


void search(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);

int TF[M+1][NO_OF_CHARS];

computeTF(pat, M, TF);

// Process txt over FA.


int i, state=0;
for (i = 0; i < N; i++)
{
state = TF[state][txt[i]];
if (state == M)
{
printf ("\n patterb found at index %d", i-M+1);
}
}
}

// Driver program to test above function


int main()
{
char *txt = "AABAACAADAABAAABAA";
char *pat = "AABA";
search(pat, txt);
111

return 0;
}

Output:

Pattern found at index 0


Pattern found at index 9
Pattern found at index 13

Pattern Searching | Set 6 (Efficient Construction of Finite Automata)

The FA (Finite Automata) construction method discussed in previous post takes


O((m^3)*NO_OF_CHARS) time. FA can be constructed in O(m*NO_OF_CHARS) time. In this
post, we will discuss the O(m*NO_OF_CHARS) algorithm for FA construction. The idea is
similar to lps (longest prefix suffix) array construction discussed in the KMP algorithm. We use
previously filled rows to fill a new row.

The abvoe diagrams represent graphical and tabular representations of pattern ACACAGA.

Algorithm:
1) Fill the first row. All entries in first row are always 0 except the entry for pat[0] character. For
pat[0] character, we always need to go to state 1.
2) Initialize lps as 0. lps for the first index is always 0.
3) Do following for rows at index i = 1 to M. (M is the length of the pattern)
..a) Copy the entries from the row at index equal to lps.
112

..b) Update the entry for pat[i] character to i+1.


..c) Update lps lps = TF[lps][pat[i]] where TF is the 2D array which is being constructed.

Implementation
Following is C implementation for the above algorithm.

#include<stdio.h>
#include<string.h>
#define NO_OF_CHARS 256

/* This function builds the TF table which represents Finite Automata for a
given pattern */
void computeTransFun(char *pat, int M, int TF[][NO_OF_CHARS])
{
int i, lps = 0, x;

// Fill entries in first row


for (x =0; x < NO_OF_CHARS; x++)
TF[0][x] = 0;
TF[0][pat[0]] = 1;

// Fill entries in other rows


for (i = 1; i<= M; i++)
{
// Copy values from row at index lps
for (x = 0; x < NO_OF_CHARS; x++)
TF[i][x] = TF[lps][x];

// Update the entry corresponding to this character


TF[i][pat[i]] = i + 1;

// Update lps for next row to be filled


if (i < M)
lps = TF[lps][pat[i]];
}
}

/* Prints all occurrences of pat in txt */


void search(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);

int TF[M+1][NO_OF_CHARS];

computeTransFun(pat, M, TF);

// process text over FA.


int i, j=0;
for (i = 0; i < N; i++)
{
j = TF[j][txt[i]];
if (j == M)
{
printf ("\n pattern found at index %d", i-M+1);
113

}
}
}

/* Driver program to test above function */


int main()
{
char *txt = "GEEKS FOR GEEKS";
char *pat = "GEEKS";
search(pat, txt);
getchar();
return 0;
}

Output:

pattern found at index 0


pattern found at index 10

Time Complexity for FA construction is O(M*NO_OF_CHARS). The code for search is same as
the previous post and time complexity for it is O(n).

Lowest Common Ancestor

geeksforgeeks.org

Lowest Common Ancestor in a Binary Tree | Set 1

Given a binary tree (not a binary search tree) and two values say n1 and n2, write a program to
find the least common ancestor.

Following is definition of LCA from Wikipedia:


Let T be a rooted tree. The lowest common ancestor between two nodes n1 and n2 is defined as
the lowest node in T that has both n1 and n2 as descendants (where we allow a node to be a
descendant of itself).

The LCA of n1 and n2 in T is the shared ancestor of n1 and n2 that is located farthest from the
root. Computation of lowest common ancestors may be useful, for instance, as part of a
procedure for determining the distance between pairs of nodes in a tree: the distance from n1 to
n2 can be computed as the distance from the root to n1, plus the distance from the root to n2,
minus twice the distance from the root to their lowest common ancestor. (Source Wiki)
114

We have discussed an efficient solution to find LCA in Binary Search Tree. In Binary Search
Tree, using BST properties, we can find LCA in O(h) time where h is height of tree. Such an
implementation is not possible in Binary Tree as keys Binary Tree nodes dont follow any order.
Following are different approaches to find LCA in Binary Tree.

Method 1 (By Storing root to n1 and root to n2 paths):


Following is simple O(n) algorithm to find LCA of n1 and n2.
1) Find path from root to n1 and store it in a vector or array.
2) Find path from root to n2 and store it in another vector or array.
3) Traverse both paths till the values in arrays are same. Return the common element just before
the mismatch.

Following is C++ implementation of above algorithm.

// A O(n) solution to find LCA of two given values n1 and n2


#include <iostream>
#include <vector>
using namespace std;

// A Bianry Tree node


struct Node
{
int key;
struct Node *left, *right;
};

// Utility function creates a new binary tree node with given key
Node * newNode(int k)
{
Node *temp = new Node;
temp->key = k;
temp->left = temp->right = NULL;
return temp;
}

// Finds the path from root node to given root of the tree, Stores the
// path in a vector path[], returns true if path exists otherwise false
bool findPath(Node *root, vector<int> &path, int k)
{
// base case
115

if (root == NULL) return false;

// Store this node in path vector. The node will be removed if


// not in path from root to k
path.push_back(root->key);

// See if the k is same as root's key


if (root->key == k)
return true;

// Check if k is found in left or right sub-tree


if ( (root->left && findPath(root->left, path, k)) ||
(root->right && findPath(root->right, path, k)) )
return true;

// If not present in subtree rooted with root, remove root from


// path[] and return false
path.pop_back();
return false;
}

// Returns LCA if node n1, n2 are present in the given binary tree,
// otherwise return -1
int findLCA(Node *root, int n1, int n2)
{
// to store paths to n1 and n2 from the root
vector<int> path1, path2;

// Find paths from root to n1 and root to n1. If either n1 or n2


// is not present, return -1
if ( !findPath(root, path1, n1) || !findPath(root, path2, n2))
return -1;

/* Compare the paths to get the first different value */


int i;
for (i = 0; i < path1.size() && i < path2.size() ; i++)
if (path1[i] != path2[i])
break;
return path1[i-1];
}

// Driver program to test above functions


int main()
{
// Let us create the Binary Tree shown in above diagram.
Node * root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
root->right->left = newNode(6);
root->right->right = newNode(7);
cout << "LCA(4, 5) = " << findLCA(root, 4, 5);
cout << "\nLCA(4, 6) = " << findLCA(root, 4, 6);
cout << "\nLCA(3, 4) = " << findLCA(root, 3, 4);
116

cout << "\nLCA(2, 4) = " << findLCA(root, 2, 4);


return 0;
}

Output:

LCA(4, 5) = 2
LCA(4, 6) = 1
LCA(3, 4) = 1
LCA(2, 4) = 2

Time Complexity: Time complexity of the above solution is O(n). The tree is traversed twice,
and then path arrays are compared.
Thanks to Ravi Chandra Enaganti for suggesting the initial solution based on this method.

Method 2 (Using Single Traversal)


The method 1 finds LCA in O(n) time, but requires three tree traversals plus extra spaces for path
arrays. If we assume that the keys n1 and n2 are present in Binary Tree, we can find LCA using
single traversal of Binary Tree and without extra storage for path arrays.
The idea is to traverse the tree starting from root. If any of the given keys (n1 and n2) matches
with root, then root is LCA (assuming that both keys are present). If root doesnt match with any
of the keys, we recur for left and right subtree. The node which has one key present in its left
subtree and the other key present in right subtree is the LCA. If both keys lie in left subtree, then
left subtree has LCA also, otherwise LCA lies in right subtree.

/* Program to find LCA of n1 and n2 using one traversal of Binary Tree */


#include <iostream>
using namespace std;

// A Binary Tree Node


struct Node
{
struct Node *left, *right;
int key;
};

// Utility function to create a new tree Node


Node* newNode(int key)
{
Node *temp = new Node;
temp->key = key;
temp->left = temp->right = NULL;
return temp;
}

// This function returns pointer to LCA of two given values n1 and n2.
// This function assumes that n1 and n2 are present in Binary Tree
struct Node *findLCA(struct Node* root, int n1, int n2)
{
// Base case
117

if (root == NULL) return NULL;

// If either n1 or n2 matches with root's key, report


// the presence by returning root (Note that if a key is
// ancestor of other, then the ancestor key becomes LCA
if (root->key == n1 || root->key == n2)
return root;

// Look for keys in left and right subtrees


Node *left_lca = findLCA(root->left, n1, n2);
Node *right_lca = findLCA(root->right, n1, n2);

// If both of the above calls return Non-NULL, then one key


// is present in once subtree and other is present in other,
// So this node is the LCA
if (left_lca && right_lca) return root;

// Otherwise check if left subtree or right subtree is LCA


return (left_lca != NULL)? left_lca: right_lca;
}

// Driver program to test above functions


int main()
{
// Let us create binary tree given in the above example
Node * root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
root->right->left = newNode(6);
root->right->right = newNode(7);
cout << "LCA(4, 5) = " << findLCA(root, 4, 5)->key;
cout << "\nLCA(4, 6) = " << findLCA(root, 4, 6)->key;
cout << "\nLCA(3, 4) = " << findLCA(root, 3, 4)->key;
cout << "\nLCA(2, 4) = " << findLCA(root, 2, 4)->key;
return 0;
}

Output:

LCA(4, 5) = 2
LCA(4, 6) = 1
LCA(3, 4) = 1
LCA(2, 4) = 2

Thanks to Atul Singh for suggesting this solution.

Time Complexity: Time complexity of the above solution is O(n) as the method does a simple
tree traversal in bottom up fashion.
Note that the above method assumes that keys are present in Binary Tree. If one key is present
and other is absent, then it returns the present key as LCA (Ideally should have returned NULL).
118

We can extend this method to handle all cases by passing two boolean variables v1 and v2. v1 is
set as true when n1 is present in tree and v2 is set as true if n2 is present in tree.

/* Program to find LCA of n1 and n2 using one traversal of Binary Tree.


It handles all cases even when n1 or n2 is not there in Binary Tree */
#include <iostream>
using namespace std;

// A Binary Tree Node


struct Node
{
struct Node *left, *right;
int key;
};

// Utility function to create a new tree Node


Node* newNode(int key)
{
Node *temp = new Node;
temp->key = key;
temp->left = temp->right = NULL;
return temp;
}

// This function returns pointer to LCA of two given values n1 and n2.
// v1 is set as true by this function if n1 is found
// v2 is set as true by this function if n2 is found
struct Node *findLCAUtil(struct Node* root, int n1, int n2, bool &v1, bool &v2)
{
// Base case
if (root == NULL) return NULL;

// If either n1 or n2 matches with root's key, report the presence


// by setting v1 or v2 as true and return root (Note that if a key
// is ancestor of other, then the ancestor key becomes LCA)
if (root->key == n1)
{
v1 = true;
return root;
}
if (root->key == n2)
{
v2 = true;
return root;
}

// Look for keys in left and right subtrees


Node *left_lca = findLCAUtil(root->left, n1, n2, v1, v2);
Node *right_lca = findLCAUtil(root->right, n1, n2, v1, v2);

// If both of the above calls return Non-NULL, then one key


// is present in once subtree and other is present in other,
// So this node is the LCA
if (left_lca && right_lca) return root;
119

// Otherwise check if left subtree or right subtree is LCA


return (left_lca != NULL)? left_lca: right_lca;
}

// Returns true if key k is present in tree rooted with root


bool find(Node *root, int k)
{
// Base Case
if (root == NULL)
return false;

// If key is present at root, or in left subtree or right subtree,


// return true;
if (root->key == k || find(root->left, k) || find(root->right, k))
return true;

// Else return false


return false;
}

// This function returns LCA of n1 and n2 only if both n1 and n2 are present
// in tree, otherwise returns NULL;
Node *findLCA(Node *root, int n1, int n2)
{
// Initialize n1 and n2 as not visited
bool v1 = false, v2 = false;

// Find lca of n1 and n2 using the technique discussed above


Node *lca = findLCAUtil(root, n1, n2, v1, v2);

// Return LCA only if both n1 and n2 are present in tree


if (v1 && v2 || v1 && find(lca, n2) || v2 && find(lca, n1))
return lca;

// Else return NULL


return NULL;
}

// Driver program to test above functions


int main()
{
// Let us create binary tree given in the above example
Node * root = newNode(1);
root->left = newNode(2);
root->right = newNode(3);
root->left->left = newNode(4);
root->left->right = newNode(5);
root->right->left = newNode(6);
root->right->right = newNode(7);
Node *lca = findLCA(root, 4, 5);
if (lca != NULL)
cout << "LCA(4, 5) = " << lca->key;
else
cout << "Keys are not present ";
120

lca = findLCA(root, 4, 10);


if (lca != NULL)
cout << "\nLCA(4, 10) = " << lca->key;
else
cout << "\nKeys are not present ";

return 0;
}

Output:

LCA(4, 5) = 2
Keys are not present

Thanks to Dhruv for suggesting this extended solution.

We will soon be discussing more solutions to this problem. Solutions considering the following.
1) If there are many LCA queries and we can take some extra preprocessing time to reduce the
time taken to find LCA.
2) If parent pointer is given with every node.

Counting Inversions
geeksforgeeks.org

Count Inversions in an array

Inversion Count for an array indicates how far (or close) the array is from being sorted. If array
is already sorted then inversion count is 0. If array is sorted in reverse order that inversion count
is the maximum.
Formally speaking, two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j

Example:
The sequence 2, 4, 1, 3, 5 has three inversions (2, 1), (4, 1), (4, 3).

METHOD 1 (Simple)
For each element, count number of elements which are on right side of it and are smaller than it.

int getInvCount(int arr[], int n)

int inv_count = 0;

int i, j;
121

for(i = 0; i < n - 1; i++)

for(j = i+1; j < n; j++)

if(arr[i] > arr[j])

inv_count++;

return inv_count;

/* Driver progra to test above functions */

int main(int argv, char** args)

int arr[] = {1, 20, 6, 4, 5};

printf(" Number of inversions are %d \n", getInvCount(arr, 5));

getchar();

return 0;

Time Complexity: O(n^2)

METHOD 2(Enhance Merge Sort)


Suppose we know the number of inversions in the left half and right half of the array (let be inv1
and inv2), what kinds of inversions are not accounted for in Inv1 + Inv2? The answer is the
inversions we have to count during the merge step. Therefore, to get number of inversions, we
need to add number of inversions in left subarray, right subarray and merge().
122

How to get number of inversions in merge()?


In merge process, let i is used for indexing left sub-array and j for right sub-array. At any step in
merge(), if a[i] is greater than a[j], then there are (mid i) inversions. because left and right
subarrays are sorted, so all the remaining elements in left-subarray (a[i+1], a[i+2] a[mid]) will
be greater than a[j]
123

The complete picture:

Implementation:

#include <stdio.h>

#include <stdlib.h>

int _mergeSort(int arr[], int temp[], int left, int right);

int merge(int arr[], int temp[], int left, int mid, int right);

/* This function sorts the input array and returns the

number of inversions in the array */

int mergeSort(int arr[], int array_size)

int *temp = (int *)malloc(sizeof(int)*array_size);

return _mergeSort(arr, temp, 0, array_size - 1);

}
124

/* An auxiliary recursive function that sorts the input array and

returns the number of inversions in the array. */

int _mergeSort(int arr[], int temp[], int left, int right)

int mid, inv_count = 0;

if (right > left)

/* Divide the array into two parts and call _mergeSortAndCountInv()

for each of the parts */

mid = (right + left)/2;

/* Inversion count will be sum of inversions in left-part, right-part

and number of inversions in merging */

inv_count = _mergeSort(arr, temp, left, mid);

inv_count += _mergeSort(arr, temp, mid+1, right);

/*Merge the two parts*/

inv_count += merge(arr, temp, left, mid+1, right);

return inv_count;

/* This funt merges two sorted arrays and returns inversion count in

the arrays.*/

int merge(int arr[], int temp[], int left, int mid, int right)

int i, j, k;

int inv_count = 0;

i = left; /* i is index for left subarray*/


125

j = mid; /* i is index for right subarray*/

k = left; /* i is index for resultant merged subarray*/

while ((i <= mid - 1) && (j <= right))

if (arr[i] <= arr[j])

temp[k++] = arr[i++];

else

temp[k++] = arr[j++];

/*this is tricky -- see above explanation/diagram for merge()*/

inv_count = inv_count + (mid - i);

/* Copy the remaining elements of left subarray

(if there are any) to temp*/

while (i <= mid - 1)

temp[k++] = arr[i++];

/* Copy the remaining elements of right subarray

(if there are any) to temp*/

while (j <= right)

temp[k++] = arr[j++];

/*Copy back the merged elements to original array*/

for (i=left; i <= right; i++)

arr[i] = temp[i];
126

return inv_count;

/* Driver progra to test above functions */

int main(int argv, char** args)

int arr[] = {1, 20, 6, 4, 5};

printf(" Number of inversions are %d \n", mergeSort(arr, 5));

getchar();

return 0;

Note that above code modifies (or sorts) the input array. If we want to count only inversions then
we need to create a copy of original array and call mergeSort() on copy.

Time Complexity: O(nlogn)


Algorithmic Paradigm: Divide and Conquer

pavelsimo.blogspot.in

Counting inversions in an array using Binary Indexed Tree

Introduction

Given an array A of N integers, an inversion of the array is defined as any pair of indexes (i,j) such that
ij and A[i]A[j].

For example, the array a={2,3,1,5,4} has three inversions: (1,3), (2,3), (4,5), for the pairs of
entries (2,1), (3,1), (5,4).

Traditionally the problem of counting the inversions in an array is solved by using a modified version of
Merge Sort. In this article we are going to explain another approach using Binary Indexed Tree (BIT, also
known as Fenwick Tree). The benefit of this method is that once you understand its mechanics, can be
easily extended to many other problems.
127

Prerequisite

This article assume that you have some basic knowledge of Binary Indexed Trees, if not please first refer
to this tutorial.

Replacing the values of the array with indexes

Usually when we are implementing a BIT is necessarily to map the original values of the array to a new
range with values between [1,N], where N is the size of the array. This is due to the following reasons:

(1) The values in one or more A[i] entry are too high or too low.
(e.g. 1012 or 1012).

For example imagine that we are given an array of 3 integers:


{1,1012,5}
This means that if we want to construct a frequency table for our BIT data structure, we are going to
need at least an array of 1012 elements. Believe me not a good idea...

(2) The values in one or more A[i] entry are negative.


Because we are using arrays it's not possible to handle in our BIT frequency of negative values (e.g. we
are not able to do freq[12]).

A simple way to deal with this issues is to replace the original values of the target array for indexes that
maintain its relative order.

For example, given the following array:

The first step is to make a copy of the original array A let's call it B. Then we proceed to sort B in non-
descending order as follow:
128

Using binary search over the array B we are going to seek for every element in the array A, and stored
the resulting position indexes (1-based) in the new array A.

binary_search(B,9)=4 found at position 4 of the array B


binary_search(B,1)=1 found at position 1 of the array B
binary_search(B,0)=0 found at position 0 of the array B
binary_search(B,5)=3 found at position 3 of the array B
binary_search(B,4)=2 found at position 2 of the array B

The resulting array after increment each position by one is the following:

The following C++ code fragment illustrate the ideas previously explained:

for(int i = 0; i < N; ++i)

B[i] = A[i]; // copy the content of array A to array B

sort(B, B + N); // sort array B

for(int i = 0; i < N; ++i) {

int ind = int(lower_bound(B, B + N, A[i]) - B);

A[i] = ind + 1;

Counting inversions with the accumulate frequency

The idea to count the inversions with BIT is not to complicate, we start iterating our target array in
reverse order, at each point we should ask ourselves the following question "How many numbers less
129

than A[i] have already occurred in the array so far?" this number corresponds to the total number of
inversions beginning at some given index. For example consider the following array {3,2,1} when we
reach the element 3 we already seen two terms that are less than the number 3, which are 2 and 1. This
means that the total number of inversions beginning at the term 3 is two.

Having this ideas in mind let's see how we can applied BIT to answer the previous question:

1. read(idx) - accumulate frequency from index 1 to idx


2. update(idx,val) - update the accumulate frequency at point idx and update the tree.
3. cumulative frequency array - this array represents the cumulative frequencies (e.g.
c[3]=f[1]+f[2]+f[3]) , as a note to the reader this array is not used for the BIT, in this
article we used as a way of illustrating the inner workings of this data structure.

Step 1: Initially the cumulative frequency table is empty, we start the process with the element 3, the
last one in our array.

how many numbers less than 3 have we seen so far

x=read(31) = 0
inv_counter=inv_counter+x
update the count of 3's so far
update(3,+1)
inv_counter=0

Step 2: The cumulative frequency of value 3 was increased in the previous step, this is why the
read(41) count the inversion (4,3).
130

how many numbers less than 4 have we seen so far


x=read(41)=1
inv_counter=inv_counter+x

update the count of 4's so far


update(4,+1)
inv_counter=1

Step 3: The term 1 is the lowest in our array, this is why there is no inversions beginning at 1.

how many numbers less than 1 have we seen so far


x=read(11)=0
inv_counter=inv_counter+x

update the count of 1's so far


update(1,+1)
inv_counter=1

Step 4: Theres is only one inversion involving the value 2 and 1.


131

how many numbers less than 2 have we seen so far


x=read(21)=1
inv_counter=inv_counter+x

update the count of 2's so far


update(2,+1)
inv_counter=2

Step 5: There are 4 inversions involving the term 5: (5,2), (5,1), (5,4) and (5,3).

how many numbers less than 5 have we seen so far


x=read(51)=4
inv_counter=inv_counter+x

update the count of 5's so far


update(5,+1)
inv_counter=6

The total number of inversion in the array is 6.


132

The overall time complexity of this solution is O(NlogN), the following code corresponds to a
complete implementation of the ideas explained in this tutorial:

#include <algorithm>

#include <cstdio>

#include <cstring>

using namespace std;

typedef long long llong;

const int MAXN = 500020;

llong tree[MAXN], A[MAXN], B[MAXN];

llong read(int idx){

llong sum = 0;

while (idx > 0){

sum += tree[idx];

idx -= (idx & -idx);

return sum;

void update(int idx ,llong val){

while (idx <= MAXN){

tree[idx] += val;

idx += (idx & -idx);

}
133

int main(int argc, char *argv[]) {

int N;

while(1 == scanf("%d",&N)) {

if(!N) break;

memset(tree, 0, sizeof(tree));

for(int i = 0; i < N; ++i) {

scanf("%lld",&A[i]);

B[i] = A[i];

sort(B, B + N);

for(int i = 0; i < N; ++i) {

int rank = int(lower_bound(B, B + N, A[i]) - B);

A[i] = rank + 1;

llong inv_count = 0;

for(int i = N - 1; i >= 0; --i) {

llong x = read(A[i] - 1);

inv_count += x;

update(A[i], 1);

printf("%lld\n",inv_count);

return 0;

Extended Euclid's Algorithm


Codechef.com

the Extended Euclid's Algorithm, which as you will be able to see, can be seen as the
reciprocal of modular exponentiation. the Euclidean Algorithm.

Foreword about the Euclidean Algorithm


134

The Euclidean Algorithm is possibly one of the oldest numerical algorithms still in use (its first
appearance goes back to 300 BC, making it over 2000 years old), and it is used to find the GCD
of two numbers, i.e., the greatest common divisor of both numbers.

It's easily implemented in C++ as:

#include <iostream>
#include <algorithm>
using namespace std;
#define builtin_gcd __gcd

int gcd(int a, int b)


{
if(b==0)
return a;
else
return gcd(b,a%b);
}

int main()
{
cout << gcd(252,105) << endl;
cout << builtin_gcd(252,105) << endl;
return 0;
}

Also, please note that if you include the header <algorithm> on your code, you can actually use
the built-in gcd function, by renaming the language function __gcd (note the two underscore
characters to the left of gcd) to something you would like (on the code above, I renamed it to
builtin_gcd, just to distinguish it from my own implemented gcd function).

Note that I suggest a renaming of the built-in function solely for you not to use the full name
gcd, but something more convenient, but, you can also use gcd and everything will work
completely fine as well. :)

Returning to our algorithm discussion, it's easy to see that this algorithm finds the greatest
number that divides both numbers passed as arguments to the gcd() function.

The gcd() has some interesting properties related to the arguments it receives as well as its
number. Two interesting properties are:

gcd(a,b) = 1, implies that the integers a and b are coprime (this will have implications
further on this text);
It's possible to find the gcd of several numbers by finding the pairwise gcd of every 2
numbers, i.e., say we have three numbers a,b,c, then gcd(a,b,c) = gcd(gcd(a, b), c);

This sums up the basic properties of the gcd, which will allow us to understand a small extension
to its algorithm, which will, in turn, allow us to understand how division works over a given
modulo, m (concept commonly known as modular multiplicative inverse).
135

An extension of Euclid's Algorithm

The main motivation to have devised an extension of the original algorithm comes from the fact,
that we might want to actually check that a given integer number, say, d, is indeed the gcd of
two other integer numbers, say a and b, i.e., we want to check d = gcd(a,b).

As you might have noticed, it's not enough to check that d divides both a and b, to safely claim
that d is the largest number that does so, as this only shows that d is a common factor and not
necessarily the largest one.

To do so, we need to turn ourselves to a mathematical identity called the Bzout's identity.

The Bzout's identity

The Bzout's identity states that given two numbers a and b, passed as arguments to the gcd
function, we can be sure that d = gcd(a,b) if and only if there are two integers x and y such that
the identity:

d = ax + by

holds.

This is, in very simple terms, the Bzout's identity. (An outline of a proof might be found online)

What our extended Euclid's algorithm will allows us to do is to simultaneously find the value of
d = gcd(a,b) and the values of x and y that actually "solve" (verify) the Bzout's identity.

A simple implementation of the Extended Euclid's Algorithm in Python

Below you can find the implementation of the recursive version of this algorithm in the Python
language (I must admit I haven't yet implemented it myself before, so I am also learning as I go,
although I believe implementing the non-recursive version in C++ shouldn't be too complicated):

def egcd(a, b):


if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)

This now solves, as desired, our original issue and allows us to conclude without any doubt that
the value d on our original equation is indeed the gcd(a,b).

An application: Computing the modular multiplicative inverse of a modulo m


136

The most used application of this algorithm (at least, as far as I know and in the ambit of
programming competitions) is the computation of the modular multiplicative inverse of a given
integer a modulo m.

It is given by:

and mathematically speaking (as in, quoting Wikipedia), it is the multiplicative inverse in the
ring of integers modulo m.

What the above means is that we can multiply both sides by a and we can obtain the identity:

This means that m is a divisor of ax-1, which means we can have something like:

ax-1 = qm,

where q is an integer multiple that will be discarded.

If we rearrange the above as:

ax-mq = 1

we can now see that the above equation has the exact same form as the equation that the
Extended Euclid's Algorithm solves (with a and m given as original parameters, x being the
inverse and q being a multiple we can discard), with a very subtle but important difference:
gcd(a,m) NEEDS to be 1.

What this basically means is that it is mandatory that a is coprime to the modulus, or else the
inverse won't exist.

To wrap this text up, I will now leave you the code in Python which finds the modular
multiplicative inverse of a modulo m using the Extended Euclid's Algorithm:

def modinv(a, m):


g, x, y = egcd(a, m)
if g != 1:
return None # modular inverse does not exist
else:
return x % m

Further explorations and a final note


137

Number Theory is a beautiful field of Mathematics, but it is at the same time, one of the most
vast and in my personal opinion, hardest fields to master.

The need of gcd(a,m) = 1, allows one to exploit this fact and use Euler's Theorem, along with
Euler's Totient Function to find the modular inverse as well.

In fact, on the popular and most widely spread case where the modulus, m, happens to be a prime
number, we can use the simple formula:

am-2 (mod m)

to find the multiplicative inverse of a.

This result follows from Euler's Theorem directly.

Suffix Tree
cise.ufl.edu

The suffix tree for S is actually the compressed trie for the nonempty suffixes of the string S.
Since a suffix tree is a compressed trie, we sometimes refer to the tree as a trie and to its subtrees
as subtries.

The (nonempty) suffixes of the string S = peeper are peeper, eeper, eper, per, er, and
r. Therefore, the suffix tree for the string pepper is the compressed trie that contains the
elements (which are also the keys) peeper, eeper, eper, per, er, and r. The alphabet for
the string peeper is {e, p, r}. Therefore, the radix of the compressed trie is 3. If necessary, we
may use the mapping e -> 0, p -> 1, r -> 2, to convert from the letters of the string to
numbers. This conversion is necessary only when we use a node structure in which each node
has an array of child pointers. Figure 1 shows the compressed trie (with edge information) for the
suffixes of peeper. This compressed trie is also the suffix tree for the string peeper.

Figure 1 Compressed trie for the suffixes of peeper


138

Since the data in the information nodes D-I are the suffixes of peeper, each information node ne
5b4 ed retain only the start index of the suffix it contains. When the letters in peeper are indexed
from left to right beginning with the index 1, the information nodes D-I need only retain the
indexes 6, 2, 3, 5, 1, and 4, respectively. Using the index stored in an information node, we
can access the suffix from the string S. Figure 2 shows the suffix tree of Figure 1 with each
information node containing a suffix index.

Figure 2 Modified compressed trie for the suffixes of peeper

The first component of each branch node is a reference to an element in that subtrie. We may
replace the element reference by the index of the first digit of the referenced element. Figure 3
shows the resulting compressed trie. We shall use this modified form as the representation for the
suffix tree.

Figure 3 Suffix tree for peeper

When describing the search and construction algorithms for suffix trees, it is easier to deal with a
drawing of the suffix tree in which the edges are labeled by the digits used in the move from a
branch node to a child node. The first digit of the label is the digit used to determine which child
139

is moved to, and the remaining digits of the 5ad label give the digits that are skipped over. Figure
4 shows the suffix tree of Figure 3 drawn in this manner.

Figure 4 A more humane drawing of a suffix tree

In the more humane drawing of a suffix tree, the labels on the edges on any root to information
node path spell out the suffix represented by that information node. When the digit number for
the root is not 1, the humane drawing of a suffix tree includes a head node with an edge to the
former root. This edge is labeled with the digits that are skipped over.

The string represented by a node of a suffix tree is the string spelled by the labels on the path
from the root to that node. Node A of Figure 4 represents the empty string epsilon, node C
represents the string pe, and node F represents the string eper.

Since the keys in a suffix tree are of different length, we must ensure that no key is a proper
prefix of another (see Keys With Different Length). Whenever the last digit of string S appears
only once in S, no suffix of S can be a proper prefix of another suffix of S. In the st 5b4 ring
peeper, the last digit is r, and this digit appears only once. Therefore, no suffix of peeper is a
proper prefix of another. The last digit of data is a, and this last digit appears twice in data.
Therefore, data has two suffixes ata and a that begin with a. The suffix a is a proper prefix of
the suffix ata.

When the last digit of the string S appears more than once in S we must append a new digit (say
#) to the suffixes of S so that no suffix is a prefix of another. Optionally, we may append the new
digit to S to get the string S#, and then construct the suffix tree for S#. When this optional route is
taken, the suffix tree has one more suffix (#) than the suffix tree obtained by appending the
symbol # to the suffixes of S.

Let's Find That Substring


But First, Some Terminology
Let n = |S| d b6c ote the length (i.e., number of digits) of the string whose suffix tree we are to
build. We number the digits of S from left to right beginning with the number 1. S[i] denotes
the ith digit of S, and suffix(i) denotes the suffix S[i]...S[n] that begins at digit i, 1 <= i
<= n.

On With the Search


A fundamental observation used when searching for a pattern P in a string S is that P appears in S
140

(i.e., P is a substring of S) iff P is a prefix of some suffix of S.

Suppose that P = P[1]...P[k] = S[i]...S[i+k-1]. Then, P is a prefix of suffix(i). Since


suffix(i) is in our compressed trie (i.e., suffix tree), we can search for P by using the strategy
to search for a key prefix in a compressed trie.

Let's search for the pattern P = per in the string S = peeper. Imagine that we have already
constructed the suffix tree (Figure 4) for peeper. The search starts at the root node A. Since P[1]
= p, we follow the edge whose label begins with the digit p. When following this edge, we
compare the remaining digits of the edge label with successive digits of P. Since these remaining
label digits agree with the pattern digits, we reach the branch node C. In getting to node C, we
have used the first two digits of the pattern. The third digit of the pattern is r, and so, from node
C we follow the edge whose label begins with r. Since this edge has no additional digits in its
label, no additional digit comparisons are done and we reach the information node I. At this
time, the digits in the pattern have been exhausted and we conclude that the pattern is in the
string. Since an information node is reached, we conclude that the pattern is actually a suffix of
the string peeper. In the actual suffix tree representation (rather than in the humane drawing),
the information node I contains the index 4 which tells us that the pattern P = per begins at
digit 4 of peeper (i.e., P = suffix(4)). Further, we can conclude that per appears exactly once
in peeper; the search for a pattern that appears more than once terminates at a branch node, not
at an information node.

Now, let us search for the pattern P = eeee. Again, we start at the root. Since the first character
of the pattern is e, we follow the edge whose label begins with e and reach the node B. The next
digit of the pattern is also e, and so, from node B we follow the edge whose label begins with e.
In following this edge, we must compare the remaining digits per of the edge label with the
following digits ee of the pattern. We find a mismatch when the first pair (p,e) of digits are
compared and we conclude that the pattern does not appear in peeper.

Suppose we are to search for the pattern P = p. From the root, we follow the edge whose label
begins with p. In following this edge, we compare the remaining digits (only the digit e remains)
of the edge label with the following digits (there aren't any) of the pattern. Since the pattern is
exhausted while follow b66 g this edge, we conclude that the pattern is a prefix of all keys in the
subtrie rooted at node C. We can find all occurrences of the pattern by traversing the subtrie
rooted at C and visiting the information nodes in this subtrie. If we want the location of just one
of the occurrences of the pattern, we can use the index stored in the first component of the
branch node C (see Figure 3). When a pattern exhausts while following the edge to node X, we
say that node X has been reached; the search terminates at node X.

When searching for the pattern P = rope, we use the first digit r of P and reach the information
node D. Since the the pattern has not been exhausted, we must check the remaining digits of the
pattern against those of the key in D. This check reveals that the pattern is not a prefix of the key
in D, and so the pattern does not appear in peeper.

The last search we are going to do is for the pattern P = pepe. Starting at the root of Figure 4,
141

we move over the edge whose label begins with p and reach node C. The next unexamined digit
of the search pattern is p. So, from node C, we wish to follow the edge whose label begins with p.
Since no edge satisfies this requirement, we conclude that pepe does not appear in the string
peeper.

Other Nifty Things You Can Do with a Suffix Tree


Once we have set up the suffix tree for a string S, we can tell whether or not S contains a pattern
P in O(|P|) time. This means that if we have a suffix tree for the text of Shakespeare's play
``Romeo and Juliet,'' we can determine whether or not the phrase wherefore art thou appears
in this play with lightning speed. In fact, the time taken will be that needed to compare up to 18
(the length of the search pattern) letters/digits. The search time is independent of the length of
the play.

Other interesting things you can do at lightning speed are:

1. Find all occurrences of a pattern P. This is done by searching the suffix tree for P. If P
appears at least once, the search terminates successfully either at an information node or
at a branch node. When the search terminates at an 5b4 information node, the pattern
occurs exactly once. When we terminate at a branch node X, all places where the pattern
occurs can be found by visiting the information nodes in the subtrie rooted at X. This
visiting can be done in time linear in the number of occurrences of the pattern if we

(a)
Link all of the information nodes in the suffix tree into a chain, the linking is done in
lexicographic order of the represented suffixes (which also is the order in which the
information nodes are encountered in a left to right scan of the information nodes). The
information nodes of Figure 4 will be linked in the order E, F, G, H, I, D.
(b)
In each branch node, keep a reference to the first and last information node in the subtrie
of which that branch node is the root. In Figure 4, nodes A, B, and C keep the pairs (E,
D), (E, G), and (H,I), respectively. We use the pair (firstInformationNode,
lastInformationNode) to traverse the information node chain starting at
firstInformationNode and ending at lastInformationNode. This traversal yields all
occurrences of patterns that begin with the string spelled by the edge labels from the root
to the branch node. Notice that when ( 5b4 firstInformationNode,
lastInformationNode) pairs are kept in branch nodes, we can eliminate the branch
node field that keeps a reference to an information node in the subtrie (i.e., the field
element).

2. Find all strings that contain a pattern P. Suppose we have a collection S1, S2, ... Sk
of strings and we wish to report all strings that contain a query pattern P. For example,
the genome databank contains tens of thousands of strings, and when a researcher
submits a query string, we are to report all databank strings that contain the query string.
142

To answer queries of this type efficiently, we set up a compressed trie (we may call this a
multiple string suffix tree) that contains the suffixes of the string S1$S2$...$Sk#,
where $ and # are two different digits that do not appear in any of the strings S1, S2,
..., Sk. In each node of the suffix tree, we keep a list of all strings Si that are the start
point of a suffix represented by an information node in that subtrie.

3. Find the longest substring of S that appears at least m > 1 times. This query can be
answered in O(|S|) time in the follow b68 ing way:

(a)
Traverse the suffix tree labeling the branch nodes with the sum of the label lengths from
the root and also with the number of information nodes in the subtrie.
(b)
Traverse the suffix tree visiting branch nodes with information node count >= m.
Determine the visited branch node with longest label length.

Note that step (a) needs to be done only once. Following this, we can do step (b) for as
many values of m as is desired. Also, note that when m = 2 we can avoid determining the
number of information nodes in subtries. In a compressed trie, every subtrie rooted at a
branch node has at least two information nodes in it.

4. Find the longest common substring of the strings S and T. This can be done in time O(|S|
+ |T|) as below:

(a)
Construct a multiple string suffix tree for S and T (i.e., the suffix tree for S$T#).
(b)
Traverse the suffix tree to identify the branch node for which the sum of the label lengths
on the path from the root is maximum and whose subtrie has at least one information
node that represents a suffix that begins in S and at least one information node that
represents a suffix that begins in T.

Notice that the related problem to find the longest common subsequence of S and T is
solved in O(|S| * |T|) time using dynamic programming (see Exercise 15.22 of the
text).

How to Build Your Very Own Suffix Tree


Three Observations
To aid in the construction of the suffix tree, we add a longestProperSuffix field to each
143

branch node. The longestProperSuffix field of a branch node that represents the nonempty
string Y points to the branch node for the longest proper suffix of Y (this suffix is obtained by
removing the first digit from Y). The longestProperSuffix field of the root is not used.

Figure 5 shows the suffix tree of Figure 4 with longest proper suffix pointers (often, we refer to
the longest proper suffix pointer as simply the suffix pointer) included. Longest proper suffix
pointers are shown as red arrows. Node C represents the string pe. The longest proper suffix e of
pe is represented by node B. Therefore, the (longest p 5b4 roper) suffix pointer of C points to
node B. The longest proper suffix of the string e represented by node B is the empty string. Since
the root node represents the empty string, the longest proper suffix pointer of node B points to the
root node A.

Figure 5 Suffix tree of Figure 4 augmented with suffix pointers

Observation 1 If the suffix tree for any string S has a branch node that represents the string Y,
then the tree also has a branch node that represents the longest proper suffix Z of Y.
Proof Let P be the branch node for Y. Since P is a branch node, there are at least 2 different digits
x and y such that S has a suffix that begins with Yx and another that begins with Yy. Therefore, S
has a suffix that begins with Zx and another that begins with Zy. Consequently, the suf 5b4 fix
tree for S must have a branch node for Z.

Observation 2 If the suffix tree for any string S has a branch node that represents the string Y,
then the tree also has a branch node for each of the suffixes of Y.
Proof Follows from Observation 1.

Note that the suffix tree of Figure 5 has a branch node for pe. Therefore, it also must have branch
nodes for the suffixes e and epsilon of pe.

The concepts of the last branch node and the last branch index are useful in describing the
suffix tree construction algorithm. The last branch node for suffix suffix(i) is the parent of the
information node that represents suffix(i). In Figure 5, the last branch nodes for suffixes 1
through 6 are C, B, B, C, B, and A, respectively. For any suffix suffix(i), the last branch
index lastBranchIndex(i) is the index of the digit on which a branch is made at the last branch
node for suffix(i). In Figure 5, < b68 code class=var>lastBranchIndex(1) = 3 because
suffix(1) = peeper; suffix(1) is represented by information node H whose parent is node C;
the branch at C is made using the third digit of suffix(1); and the third digit of suffix(1) is
S[3]. You may verify that lastBranchIndex[1:6] = [3, 3, 4, 6, 6, 6].
144

Observation 3 In the suffix tree for any string S, lastBranchIndex(i) <=


lastBranchIndex(i+1), 1 <= i < n.
Proof Left as an exercise.

Get Out That Hammer and Saw, and Start Building


To build your very own suffix tree, you must start with your very own string. We shall use the
string R = ababbabbaabbabb to illustrate the construction procedure. Since the last digit b of R
appears more than once, we append a new digit # to R and build the suffix tree for S = R# =
ababbabbaabbabb#. With an n = 16 digit string S, you can imagine that this is going to be a
rather long example. However, when you are done with this example, you will know everything
you ever wanted to know about suffix tree construction.

Construction Strategy
The suffix tree construction algorithm starts with a root node that represents the empty string.
This node is a branch node. At any time during the suffix tree construction process, exactly one
of the branch nodes of the suffix tree will be designated the active node. This is the node from
where the process to insert the next suffix begins. Let activeLength be the length of the string
represented by the active node. Initially, the root is the active node and activeLength = 0.
Figure 6 shows the initial configuration, the active node is shown in green.

Figure 6 Initial configuration for suffix tree construction

As we proceed, we shall add branch and information nodes to our tree. Newly added branch
nodes will be colored magenta, and newly added information nodes will be colored cyan. Suffix
pointers will be shown in red.

Suffixes are inserted into the tree in the order suffix(1), suffix(2), ..., suffix(n). The
insertion of the suffixes in this order is accomplished by scanning the string S from left to right.
L 5b4 et tree(i) be the compressed trie for the suffixes suffix(1), ..., suffix(i), and let
lastBranchIndex(j, i) be the last branch index for suffix(j) in tree(i). Let minDistance
be a lower bound on the distance (measured by number of digits) from the active node to the last
branch index of the suffix that is to be inserted. Initially, minDistance = 0 as
lastBranchIndex(1,1) = 1. When inserting suffix(i), it will be the case that
lastBranchIndex(i,i) >= i + activeLength + minDistance.

To insert suffix(i+1) into tree(i), we must do the following:

1. Determine lastBranchIndex(i+1, i+1). To do this, we begin at the current active


node. The first activeLength number of digits of the new suffix (i.e., digits S[i+1],
S[i+2], ..., S[i + activeLength]) will be known to agree with the string
represented by the active node. So, to determine lastBranchIndex(i+1,i+1), we
examinine digits activeLength + 1, activeLength + 2, ..., of the new suffix. These
145

digits are used to follow a path through tree(i) begi b68 nning at the active node and
terminating when lastBranchIndex(i+1,i+1) has been determined. Some efficiencies
result from knowing that lastBranchIndex(i+1,i+1) >= i + 1 + activeLength +
minDistance.
2. If tree(i) does not have a branch node X which represents the string
S[i]...S[lastBranchIndex(i+1,i+1)-1], then create such a branch node X.
3. Add an information node for suffix(i+1). This information node is a child of the
branch node X, and the label on the edge from X to the new information node is
S[lastBranchIndex(i+1, i+1)]...S[n].

Back to the Example


We begin by inserting suffix(1) into the tree tree(0) that is shown in Figure 6. The root is the
active node, activeLength = minDistance = 0. The first digit of suffix(1) is S[1] = a. No
edge from the active node (i.e., the root) of tree(0) has a label that begins with a (in fact, at this
time, the active node has no edge at all). Therefore, lastBranchIndex(1,1) = 1. So, we create
an information node and an edge whose label is the entire string. Figure 7 shows the result,
tree(1). The root remains the active node, and activeLength and minDistance are
unchanged.

Figure 7 After the insertion of the suffix ababbabbaabbabb#

In our drawings, we shall show the labels on edges that go to information nodes using the
notation i+, where i gives the index, in S, where the label starts and the + tells us that the label
goes to the end of the string. Therefore, in Figure 7, the edge label 1+ denotes the string
S[1]...S[n]. Figure 7 also shows the string S. The newly inserted suffix is shown in cyan.

To insert the next suffix, suffix(2), we again begin at the active node examining digits
activeLength + 1 = 1, activeLength + 2 = 2, ..., of the new suffix. Since, digit 1 of the
new suffix is S[2] = b and since the active node has no edge whose label begins with S[2] =
b, lastBranchIndex(2,2) = 2. Therefore, we create a new information node and an edge
whose label is 2+. Figure 8 shows the resulting tree. O 5b4 nce again, the root remains the active
node and activeLength and minDistance are unchanged.
146

Figure 8 After the insertion of the suffix babbabbaabbabb#

Notice that the tree of Figure 8 is the compressed trie tree(2) for suffix(1) and suffix(2).

The next suffix, suffix(3), begins at S[3] = a. Since the active node of tree(2) (i.e., the
root) has an edge whose label begins with a, lastBranchIndex(3,3) > 3. To determine
lastBranchIndex(3,3), we must see more digits of suffix(3). In particular, we need to see as
many additional digits as are needed to distinguish between suffix(1) and suffix(3). We first
compare the second digit S[4] = b of the new suffix and the second digit S[2] = b of the edge
label 1+. Since S[4] = S[2], we must do additional comparisons. The next comparison is
between the third digit S[5] = b of the new suffix and the third digit S[3] = a of the edge label
1+. Since these digits are different, lastBranchIndex(3,3) is determined to
be 5. At this time, we update minDistance to have the value 2. Notice that,
at this time, this is the max value possible for minDistance because
lastBranchIndex(3,3) = 5 = 3 + activeLength + minDistance.

To insert the new suffix, suffix(3), we split the edge of tree(2) whose label
is 1+ into two. The first split edge has the label 1,2 and the label for the
second split edge is 3+. In between the two split edges, we place a branch
node. Additionally, we introduce an information node for the newly inserted
suffix. Figure 9 shows the tree tree(3) that results. The edge label 1,2 is
shown as the digits S[1]S[2] = ab.

Figure 9 After the insertion of the suffix abbabbaabbabb#

The compressed trie tree(3) is incomplete because we have yet to put in the
longest proper suffix pointer for the newly created branch node D. The
longest suffix for this branch node is b, but the branch node for b does not
exist. No need to panic, this branch node will be the next branch node
147

created by us.

The next suffix to insert is suffix(4). This suffix is the longest proper
suffix of the most recently inserted suffix, suffix(3). The insertion process
for the new suffix begins by updating the active node by following the suffix
pointer in the current active node. Since the root has no suffix pointer, the
active node is not updated. Therefore, activeLength is unchanged also.
However, we must update minDistance to ensure lastBranchIndex(4,4) >= 4 +
activeLength + minDistance. It is easy to see that lastBranchIndex(i,i) <=
lastBranchIndex(i+1,i+1) for all i < n. Therefore, lastBranchIndex(i+1,i+1)
>= lastBranchIndex(i,i) >= i + activeLength + minDistance. To ensure
lastBranchIndex(i+1,i+1) >= i + 1 + activeLength + minDistance, we must
reduce minDistance by 1.

Since minDistance = 1, we start at the active node (which is still the root)
and move forward following the path dictated by S[4]S[5].... We do not
compare the first minDistance digits as we follow this path, because a match
is assured until we get to the point where digit minDistance + 1 (i.e., S[5])
of the new suffix is to be compared. Since the active node edge label that
begins with S[4] = b is more than one digit long, we compare S[5] and the
second digit S[3] = a of this edge's label. Since the two digits are
different, the edge is split in the same way we split the edge with label 1+.
The first split edge has the label 2,2 = b, and the label on the second split
edge is 3+; in between the two split edges, we place a new branch node F, a
new information node G is created for the newly inserted suffix, this
information node is connected to the branch node F by an edge whose label is
5+. Figure 10 shows the resulting structure.

Figure 10 After the insertion of the suffix bbabbaabbabb#

We can now set the suffix pointer from the branch node D that was created
when suffix(3) was inserted. This suffix pointer has to go to the newly
created branch node F.

The longest proper suffix of the string b represented by node F is the empty
string. So, the suffix pointer in node F is to point to the root node. Figure
11 shows the compressed trie with suffix pointers added. This trie is
tree(4).
148

Figure 11 Trie of Figure 10 with suffix pointers added

The construction of the suffix tree continues with an attempt to insert the
next suffix suffix(5). Since suffix(5) is the longest proper suffix of the
most recently inserted suffix suffix(4), we begin by following the suffix
pointer in the active node. However, the active node is presently the root
and it has no suffix pointer. So, the active node is unchanged. To preserve
the desired relationship among lastBranchIndex, activeLength, minDistance,
and the index (5) of the next suffix that is to be inserted, we must reduce
minDistance by one. So, minDistance becomes zero.

Since activeLength = 0, we need to examine digits of suffix(5) beginning with


the first one S[5]. The active node has an edge whose label begins with S[5]
= b. We follow the edge with label b comparing suffix digit ea0 s and label
digits. Since all digits agree, we reach node F. Node F becomes the active
node (whenever we encounter a branch node during suffix digit examination,
the active node is updated to this encountered branch node) and activeLength
= 1. We continue the comparison of suffix digits using an edge from the
current active node. Since the next suffix digit to compare is S[6] = a, we
use an active node edge whose label begins with a (in case such an edge does
not exist, lastBranchIndex for the new suffix is activeLength + 1). This edge
has the label 3+. The digit comparisons terminate inside this label when
digit S[10] = a of the new suffix is compared with digit S[7] = b of the edge
label 3+. Therefore, lastBranchIndex(5,5) = 10. minDistance is set to its max
possible value, which is lastBranchIndex(5,5) - (index of suffix to be
inserted) - activeLength = 10 - 5 - 1 = 4.

To insert suffix(5), we split the edge (F,C) that is between nodes F and C.
The split takes place at digit splitDigit = 5 of the label of edge (F,C).
Figure 12 shows the resulting tree.
149

Figure 12 After the insertion of the suffix babbaabbabb#

Next, we insert suffix(6). Since this suffix is the longest proper suffix of
the last suffix suffix(5) that we inserted, we begin by following the suffix
link in the active node. This gets us to the tree root, which becomes the new
active node. activeLength becomes 0. Notice that when we follow a suffix
pointer, activeLength reduces by 1; the value of minDistance does not change
because lastBranchIndex(6,6) >= lastBranchIndex(5,5). Therefore, we still
have the desired relationship lastBranchIndex(6,6) >= 6 + activeLength +
minDistance.

From the new active node, we follow the edge whose label begins with a. When
an edge is followed, we do not compare suffix and label digits. Since
minDistance = 4, we are assured that the first mismatch will occur five or
more digits from here. Since the label ab that begins with a is 2 digits
long, we skip over S[6] and S[7] of the suffix, move to node D, make D the
active node, update activeLength to be 2 and minDistance to be 2, and examine
the label on the active node edge that begins with S[8] = b. The label on
this edge is 5+. We omit the comparisons with the first two digits of this
label because minDistance = 2 and immediately compare the fifth digit S[10] =
a of suffix(6) with the third digit S[7] = b of the edge label. Since these
are different, the edge is split at its third digit. The new branch node that
results from the edge split is the 27c node that the suffix pointer of node H
of Figure 12 is to point to. Figure 13 shows the tree that results.
150

Figure 13 After the insertion of the suffix abbaabbabb#

Notice that following the last insertion, the active node is D, activeLength
= 2, and minDistance = 2.

Next, we insert suffix(7). Since this suffix is the longest proper suffix of
the suffix just inserted, we can use a short cut to do the insertion. The
short cut is t e9a o follow the suffix pointer in the current active node D.
By following this short cut, we skip over a number of digits that is 1 less
than activeLength. In our example, we skip over 2 - 1 = 1 digit of suffix(7).
The short cut guarantees a match between the skipped over digits and the
string represented by the node that is moved to. Node F becomes the new
active node and activeLength is reduced by 1. Once again, minDistance is
unchanged. (You may verify that whenever a short cut is taken, leaving
minDistance unchanged satisfies the desired relationship among
lastBranchIndex, activeLength, minDistance, and the index of the next suffix
that is to be inserted.)

To insert suffix(7), we use S[8] = b (recall that because of the short cut we
have taken to node F, we must skip over activeLength = 1 digit of the suffix)
to determine the edge whose label is to be examined. This gets us the label
5+. Again, since minDistance = 2, we are assured that digits S[8] and S[9] of
the suffix match with the first two digits of the edge label 5+. Since there
is a mismatch at the third digit of the edge label, the edge is split at the
third digit of its label. The suffix pointer of node J is to point to the
branch node that is placed between the two parts of the edge just split.
Figure 14 shows the result.
151

Figure 14 After the insertion of the suffix bbaabbabb#

Notice that following the last insertion, the active node is F, activeLength
= 1, and minDistance = 2. If lastBranchIndex(7,7) had turned out to be
greater than 10, we would increase minDistance to lastBranchIndex(7,7) - 7 -
activeLength.

To insert suffix(8), we first take the short cut from the current active node
F to the root. The root becomes the new active node, activeLength is reduced
by 1 and minDistance is unchanged. We start the insert process at the new
active node. Since minDistance = 2, we have to move at least 3 digits down
from the active node. The active node edge whose label begins with S[8] = b
has the label b. Since minDistance = 2, we must follow edge labels until we
have skipped 2 digits. Consequently, we move to node F. Node F becomes the
active node, minDistance is reduced by the length 1 of the label on the edge
(A,F) and becomes 1, activeLength is increased by the length of the label on
the edge (A,F) and becomes 1, and we follow the edge (F,H) whose label begins
with S[9] = a. This edge is to be split at the second digit of its edge
label. The suffix pointer of L is to point to the branch node that will be
ins 27c erted between the two edges created when edge (F,H) is split. Figure
15 shows the result.
152

Figure 15 After the insertion of the suffix baabbabb#

The next suffix to insert is suffix(9). From the active node F, we follow the
suffix pointer to node A, which becomes the new active node. activeLength is
reduced by 1 to zero, and minDistance is unchanged at 1. The active nod ea0 e
edge whose label begins with S[9] = a has the label ab. Since minDistance =
1, we compare the second digit of suffix(9) and the second digit of the edge
label. Since these two digits are different, the edge (A,D) is split at the
second digit of its label. Further the suffix pointer of the branch node M
that was created when the last suffix was inserted into the trie, is to point
to the branch node that will be placed between nodes A and D. Finally, since
the newly created branch node represents a string whose length is one, its
suffix pointer is to point to the root. Figure 16 shows the result.

Figure 16 After the insertion of the suffix aabbabb#


153

As you can see, creating a suffix trie can be quite tiring. Let's continue
though, we have, so far, inserted only the first 9 suffixes into our suffix
tree.

For the next suffix, suffix(10), we begin with the root A as the active node.
We would normally follow the suffix pointer in the active node to get to the
new active node from which the insert process is to start. However, the root
has no suffix pointer. Instead, we reduce minDistance by one. The new value
of minDistance is zero.

The insertion process begins by examining the active node edge (if any) whose
label begins with the first digit S[10] = a of suffix(10). Since the active
node has an edge whose label begins with a, additional digits are examined to
determine lastBranchIndex(10,10). We follow a search path from the active
node. This path is determined by the digits of suffix(10). Following this
path, we reach node J. By examining the label on the edge (J,E), we determine
that lastBranchIndex(10,10) = 16. Node J becomes the active node,
activeLength = 4, and minDistance = 2.

When suffix[10] is inserted, the edge (J,E) splits. The split is at the third
digit of this edge's label. Figure 17 shows the tree after the new suffix is
inserted.

Figure 17 After the insertion of the suffix abbabb#

To insert the next suffix, suffix(11), we first take a short cut by following
the suffix pointer at the active node. This pointer gets us to node L, which
becomes the new active node. At this time, activeLength is reduced by one and
becomes 3. Next, we need to move forward from L by a number of digits greater
than minDistance = 2. Since digit activeLength + 1 of suffix(11) is S[14] = b
154

we follow the b edge of L. We omit comparing the first minDistance digits of


this edge's label. The first comparison made is between S[16] = # (digit of
suffix) and S[7 + 2] = a (digit of edge label). Since these two digits are
different, edge (L,G) is to be split. Splitting this edge (at its th ea0 ird
digit) and setting the suffix pointer from the most recently created branch
node R gives us the tree of Figure 18.

Figure 18 After the insertion of the suffix bbabb#

To insert the next suffix, suffix(12), we first take the short cut from the
current active node L to the node N. Node N becomes the new active node, and
we begin comparing minDistance + 1 = 3 digits down from node N. Edge (N,H) is
split. Figure 19 shows the tree after this edge has been split and after the
suffix pointer from the most recently created branch node T has been set.
155

Figure 19 After the insertion of the suffix babb#

When inserting suffix(13), we follow the short cut from the active node N to
the branch node P. Node P becomes the active node and we are to move down the
tree by at least minDistance + 1 = 3 digits. The active node edge whose label
begins with S[14] = b is used first. We reach node D, which becomes the
active node, and minDistance becomes 1. At node D, we use the edge whose
label begins with S[15] = b. Since the label on this edge is two digits long,
and since the second digit of this label differs from S[16], this edge is to
split. Figure 20 shows the tree after the edge is split and the suffix
pointer from node V is set.
156

Figure 20 After the insertion of the suffix abb#

To insert suffix(14), we take the short cut from the current active node D to
the branch node F. Node F becomes the active node. From node F, we must move
down by at least minDistance + 1 = 2 digits. We use the edge whose label
begins with S[15] = b (S[15] is used because it is activeLength = 1 digits
from the start of suffix(14)). The split takes place at the second digit of
edge (F,L)'s label. Figure 21 shows the new tree.
157

Figure 21 After the insertion of the suffix bb#

The next suffix to insert begins at S[15] = b. We take the short cut from the
current active node F, to the root. The root is made the current active node
and then we move down by minDistance + 1 = 2 digits. We follow the active
node edge whose label begins with b and reach node F. A new information node
is added to F. The suffix pointer for the last branch node Z is set to point
to the current active node F, and the root becomes the new active node.
Figure 22 shows the new tree.
158

Figure 22 After the insertion of the suffix b#

Don't despair, only one suffix remains. Since no suffix is a proper prefix of
another suffix, we are ass 160a ured that the root has no edge whose label
begins with the last digit of the string S. We simply insert an information
node as a child of the root. The label for the edge to this new information
node is the last digit of the string. Figure 23 shows the complete suffix
tree for the string S = ababbabbaabbabb#. The suffix pointers are not shown
as they are no longer needed; the space occupied by these pointers may be
freed.
159

Figure 23 Suffix tree for ababbabbaabbabb#

Complexity Analysis
Let r denote the number of different digits in the string S whose suffix tree is to be built (r is the
alphabet size), and let n be the number of digits (and hence the number of suffixes) of the string
S.

To insert suffix(i), we
(a)
Follow a suffix pointer in the active node (unless the active node is the root).
(b)
Then move down the existing suffix tree until minDistance digits have been crossed.
(c)
Then compare some number of suffix digits with edge label digits until
lastBranchIndex(i,i) is determined.
(d)
Finally insert a new information node and possibly also a branch node.
The total time spent in part (a) (over all n inserts) is O(n).

When moving down the suffix tree in part (b), no digit comparisons are made. Each move to a
branch node at the next level takes O(1) time. Also, each such move reduces the value of
minDistance by at least one. Since minDistance is zero initially and never becomes less than
zero, the total time spent in part (b) is O(n + total amount by which minDistance is
increased over all n inserts).

In part (c), O(1) time is spent determining whether lastBranchIndex(i,i) = i +


160

activeLength + minDistance. This is the case iff minDistance = 0 or the digit x at position
activeLength + minDistance + 1 of suffix(i) is not the same as the digit in position
minDistance + 1 of the label on the appropriate edge of the active node. When
lastBranchIndex(i,i) != i + activeLength + minDistance, lastBranchIndex(i,i) >
i + activeLength + minDistance and the value of lastBranchIndex(i,i) is determined by
making a sequence of comparisons between suffix digits and edge label digits (possibly
involving moves downwards to new branch nodes). For each such comparison that is made,
minDistance is increased by 1. This is the only circumstance under which minDistance
increases in the algorithm. So, the total time spent in part (c) is O(n + total amount by which
minDistance is increased over all n inserts). Since each unit increase in the value of
minDistance is the result of an equal compare between a digit at a new position (i.e., a position
from which such a compare has not been made previously) of the string S and an edge label digit,
the total amount by which minDistance is increased over all n inserts is O(n).

Part (d) takes O(r) time per insert, because we need to initialize the O(r) fields of the branch
node that may be created. The total time for part (d) is, therefore, O(nr).

So, the total time taken to build the suffix tree is O(nr). Under the assumption that the alphabet
size r is constant, the complexity of our suffix tree generation algorithm becomes O(n).

The use of branch nodes with as many children fields as the alphabet size is recommended only
when the alphabet size is small. When the alphabet size is large (and it may be as large as n,
making the above algorithm an O(n2) algorithm), the use of a hash table results in an expected
time complexity of O(n). The space complexity changes from O(nr) to O(n).

A divide-and-conquer algorithm that has a time and space complexity of O(n) (even when the
alphabet size is O(n)) is developed in Optimal suffix tree construction with large alphabets.

Exercises
1. Draw the suffix tree for S = ababab#.
2. Draw the suffix tree for S = aaaaaa#.
3. Draw the multiple string suffix tree for S1 = abba, S2 = bbbb, and s3 = aaaa.
4. Prove Observation 3.
5. Draw the trees tree(i), 1 < = i < = |S| for S = bbbbaaaabbabbaa#. Show the
active node in each tree. Also, a2d show the longest proper suffix pointers.
6. Draw the trees tree(i), 1 < = i < = |S| for S = aaaaaaaaaaaa#. Show the active
node in each tree. Also, show the longest proper suffix pointers.
7. Develop the class SuffixTree. Your class should include a method to create the suffix
tree for a given string as well as a method to search a suffix tree for a given pattern. Test
the correctness of your methods.
8. Explain how you can obtain the multiple string suffix tree for S1, ..., Sk from that for
S1, ..., S(k-1). What is the time complexity of your proposed method?
161

Dynamic Programming : chapter from clrs(essential)


Topcoder.com

By Dumitru
topcoder member

An important part of given problems can be solved with the help of dynamic programming (DP
for short). Being able to tackle problems of this type would greatly increase your skill. I will try
to help you in understanding how to solve problems using DP. The article is based on examples,
because a raw theory is very hard to understand.

Note: If you're bored reading one section and you already know what's being discussed in it -
skip it and go to the next one.

Introduction (Beginner)

What is a dynamic programming, how can it be described?

A DP is an algorithmic technique which is usually based on a recurrent formula and one (or
some) starting states. A sub-solution of the problem is constructed from previously found ones.
DP solutions have a polynomial complexity which assures a much faster running time than other
techniques like backtracking, brute-force etc.

Now let's see the base of DP with the help of an example:

Given a list of N coins, their values (V1, V2, ... , VN), and the total sum S. Find the minimum
number of coins the sum of which is S (we can use as many coins of one type as we want), or
report that it's not possible to select coins in such a way that they sum up to S.

Now let's start constructing a DP solution:

First of all we need to find a state for which an optimal solution is found and with the help of
which we can find the optimal solution for the next state.

What does a "state" stand for?

It's a way to describe a situation, a sub-solution for the problem. For example a state would be
the solution for sum i, where iS. A smaller state than state i would be the solution for any sum j,
where j<="" p="">

How can we find it?

It is simple - for each coin j, Vji, look at the minimum number of coins found for the i-Vjsum
(we have already found it previously). Let this number be m. If m+1 is less than the minimum
number of coins already found for current sum i, then we write the new result for it.
162

For a better understanding let's take this example:


Given coins with values 1, 3, and 5.
And the sum S is set to be 11.

First of all we mark that for state 0 (sum 0) we have found a solution with a minimum number of
0 coins. We then go to sum 1. First, we mark that we haven't yet found a solution for this one (a
value of Infinity would be fine). Then we see that only coin 1 is less than or equal to the current
sum. Analyzing it, we see that for sum 1-V1= 0 we have a solution with 0 coins. Because we add
one coin to this solution, we'll have a solution with 1 coin for sum 1. It's the only solution yet
found for this sum. We write (save) it. Then we proceed to the next state - sum 2. We again see
that the only coin which is less or equal to this sum is the first coin, having a value of 1. The
optimal solution found for sum (2-1) = 1 is coin 1. This coin 1 plus the first coin will sum up to
2, and thus make a sum of 2 with the help of only 2 coins. This is the best and only solution for
sum 2. Now we proceed to sum 3. We now have 2 coins which are to be analyzed - first and
second one, having values of 1 and 3. Let's see the first one. There exists a solution for sum 2 (3
- 1) and therefore we can construct from it a solution for sum 3 by adding the first coin to it.
Because the best solution for sum 2 that we found has 2 coins, the new solution for sum 3 will
have 3 coins. Now let's take the second coin with value equal to 3. The sum for which this coin
needs to be added to make 3 , is 0. We know that sum 0 is made up of 0 coins. Thus we can
make a sum of 3 with only one coin - 3. We see that it's better than the previous found solution
for sum 3 , which was composed of 3 coins. We update it and mark it as having only 1 coin. The
same we do for sum 4, and get a solution of 2 coins - 1+3. And so on.

Pseudocode:

Set Min[i] equal to Infinity for all of i


Min[0]=0

For i = 1 to S
For j = 0 to N - 1
If (Vj<=i AND Min[i-Vj]+1<="" pre="">

Here are the solutions found for all sums:

Coin value added to a smaller sum to


Sum Min. nr. of coins
obtain this sum (it is displayed in brackets)

0 0 -

1 1 1 (0)

2 2 1 (1)

3 1 3 (0)
163

4 2 1 (3)

5 1 5 (0)

6 2 3 (3)

7 3 1 (6)

8 2 3 (5)

9 3 1 (8)

10 2 5 (5)

11 3 1 (10)

As a result we have found a solution of 3 coins which sum up to 11.

Additionally, by tracking data about how we got to a certain sum from a previous one, we can
find what coins were used in building it. For example: to sum 11 we got by adding the coin with
value 1 to a sum of 10. To sum 10 we got from 5. To 5 - from 0. This way we find the coins
used: 1, 5 and 5.

Having understood the basic way a DP is used, we may now see a slightly different approach to
it. It involves the change (update) of best solution yet found for a sum i, whenever a better
solution for this sum was found. In this case the states aren't calculated consecutively. Let's
consider the problem above. Start with having a solution of 0 coins for sum 0. Now let's try to
add first coin (with value 1) to all sums already found. If the resulting sum t will be composed of
fewer coins than the one previously found - we'll update the solution for it. Then we do the same
thing for the second coin, third coin, and so on for the rest of them. For example, we first add
coin 1 to sum 0 and get sum 1. Because we haven't yet found a possible way to make a sum of 1
- this is the best solution yet found, and we mark S[1]=1. By adding the same coin to sum 1, we'll
get sum 2, thus making S[2]=2. And so on for the first coin. After the first coin is processed, take
coin 2 (having a value of 3) and consecutively try to add it to each of the sums already found.
Adding it to 0, a sum 3 made up of 1 coin will result. Till now, S[3] has been equal to 3, thus the
new solution is better than the previously found one. We update it and mark S[3]=1. After adding
the same coin to sum 1, we'll get a sum 4 composed of 2 coins. Previously we found a sum of 4
composed of 4 coins; having now found a better solution we update S[4] to 2. The same thing is
done for next sums - each time a better solution is found, the results are updated.
164

Elementary

To this point, very simple examples have been discussed. Now let's see how to find a way for
passing from one state to another, for harder problems. For that we will introduce a new term
called recurrent relation, which makes a connection between a lower and a greater state.

Let's see how it works:

Given a sequence of N numbers - A[1] , A[2] , ..., A[N] . Find the length of the longest non-
decreasing sequence.

As described above we must first find how to define a "state" which represents a sub-problem
and thus we have to find a solution for it. Note that in most cases the states rely on lower states
and are independent from greater states.

Let's define a state i as being the longest non-decreasing sequence which has its last number A[i]
. This state carries only data about the length of this sequence. Note that for iS[i] ), we make
S[i]=S[j]+1. This way we consecutively find the best solutions for each i, until last state N.

Let's see what happens for a randomly generated sequence: 5, 3, 4, 8, 6, 7:

The length of the longest The last sequence i from

I non-decreasing sequence which we "arrived"

of first i numbers to this one

1 1 1 (first number itself)

2 1 2 (second number itself)

3 2 2

4 3 3

5 3 3

6 4 5

Practice problem:
Given an undirected graph G having N (1<n<=1000) vertices="" and="" positive=""
weights.="" find="" the="" shortest="" path="" from="" vertex="" 1="" to="" n,="" or=""
state="" that="" such="" doesn't="" exist.="" <="" p=""> </n<=1000)>
165

Hint: At each step, among the vertices which weren't yet checked and for which a path from
vertex 1 was found, take the one which has the shortest path, from vertex 1 to it, yet found.

Try to solve the following problems from topcoder competitions:

ZigZag - 2003 TCCC Semifinals 3


BadNeighbors - 2004 TCCC Round 4
FlowerGarden - 2004 TCCC Round 1

Intermediate

Let's see now how to tackle bi-dimensional DP problems.

Problem:
A table composed of N x M cells, each having a certain quantity of apples, is given. You start
from the upper-left corner. At each step you can go down or right one cell. Find the maximum
number of apples you can collect.

This problem is solved in the same way as other DP problems; there is almost no difference.

First of all we have to find a state. The first thing that must be observed is that there are at most
2 ways we can come to a cell - from the left (if it's not situated on the first column) and from the
top (if it's not situated on the most upper row). Thus to find the best solution for that cell, we
have to have already found the best solutions for all of the cells from which we can arrive to the
current cell.

From above, a recurrent relation can be easily obtained:


S[i][j]=A[i][j] + max(S[i-1][j], if i>0 ; S[i][j-1], if j>0) (where i represents the row and j the
column of the table , its left-upper corner having coordinates {0,0} ; and A[i][j] being the
number of apples situated in cell i,j).

S[i][j] must be calculated by going first from left to right in each row and process the rows from
top to bottom, or by going first from top to bottom in each column and process the columns from
left to right.

Pseudocode:

For i = 0 to N - 1
For j = 0 to M - 1
S[i][j] = A[i][j] +
max(S[i][j-1], if j>0 ; S[i-1][j], if i>0 ; 0)

Output S[n-1][m-1]

Here are a few problems, from topcoder Competitions, for practicing:

AvoidRoads - 2003 TCO Semifinals 4


166

ChessMetric - 2003 TCCC Round 4

Upper-Intermediate

This section will discuss about dealing DP problems which have an additional condition besides
the values that must be calculated.

As a good example would serve the following problem:

Given an undirected graph G having positive weights and N vertices.

You start with having a sum of M money. For passing through a vertex i, you must pay S[i]
money. If you don't have enough money - you can't pass through that vertex. Find the shortest
path from vertex 1 to vertex N, respecting the above conditions; or state that such path doesn't
exist. If there exist more than one path having the same length, then output the cheapest one.
Restrictions: 1<n<=100 ;="" 0<="M<=100" for="" each="" i,="" as="" we="" can=""
see,="" this="" is="" the="" same="" classical="" dijkstra="" problem="" (finding=""
shortest="" path="" between="" two="" vertices),="" with="" exception="" that="" it=""
has="" a="" condition.="" in="" would="" have="" used="" uni-dimensional="" array=""
min[i]="" ,="" which="" marks="" length="" of="" found="" to="" vertex="" i.=""
however="" should="" also="" keep="" information="" about="" money="" have.="" thus=""
be="" reasonable="" extend="" something="" like="" min[i][j]="" represents="" j=""
being="" left.="" way="" reduced="" original="" path-finding="" algorithm.="" at="" step=""
find="" unmarked="" state="" (i,j)="" was="" found.="" mark="" visited="" (not="" use=""
later),="" and="" its="" neighbors="" look="" if="" may="" improved.="" so="" -="" then=""
update="" it.="" repeat="" until="" there="" will="" remain="" no="" solution=""
represented="" by="" min[n-1][j]="" having="" least="" value="" (and="" greatest=""
possible="" among="" states="" value,="" i.e.="" paths="" length).="" <="" p="">
</n<=100>

Pseudocode:

Set states(i,j) as unvisited for all (i,j)


Set Min[i][j] to Infinity for all (i,j)

Min[0][M]=0

While(TRUE)

Among all unvisited states(i,j) find the one for which Min[i][j]
is the smallest. Let this state found be (k,l).

If there wasn't found any state (k,l) for which Min[k][l] is


less than Infinity - exit While loop.

Mark state(k,l) as visited

For All Neighbors p of Vertex k.


If (l-S[p]>=0 AND
167

Min[p][l-S[p]]>Min[k][l]+Dist[k][p])
Then Min[p][l-S[p]]=Min[k][l]+Dist[k][p]
i.e.
If for state(i,j) there are enough money left for
going to vertex p (l-S[p] represents the money that
will remain after passing to vertex p), and the
shortest path found for state(p,l-S[p]) is bigger
than [the shortest path found for
state(k,l)] + [distance from vertex k to vertex p)],
then set the shortest path for state(i,j) to be equal
to this sum.
End For

End While

Find the smallest number among Min[N-1][j] (for all j, 0<=j<=M);


if there are more than one such states, then take the one with greater
j. If there are no states(N-1,j) with value less than Infinity - then
such a path doesn't exist.

Here are a few TC problems for practicing:

Jewelry - 2003 TCO Online Round 4


StripePainter - SRM 150 Div 1
QuickSums - SRM 197 Div 2
ShortPalindromes - SRM 165 Div 2

Advanced

The following problems will need some good observations in order to reduce them to a dynamic
solution.

Problem StarAdventure - SRM 208 Div 1:

Given a matrix with M rows and N columns (N x M). In each cell there's a number of apples.
You start from the upper-left corner of the matrix. You can go down or right one cell. You need
to arrive to the bottom-right corner. Then you need to go back to the upper-left cell by going
each step one cell left or up. Having arrived at this upper-left cell, you need to go again back to
the bottom-right cell.
Find the maximum number of apples you can collect.
When you pass through a cell - you collect all the apples left there.

Restrictions: 1 < N, M <= 50 ; each cell contains between 0 and 1000 apples inclusive.

First of all we observe that this problem resembles to the classical one (described in Section 3 of
this article), in which you need to go only once from the top-left cell to the bottom-right one,
collecting the maximum possible number of apples. It would be better to try to reduce the
problem to this one. Take a good look into the statement of the problem - what can be reduced or
modified in a certain way to make it possible to solve using DP? First observation is that we can
consider the second path (going from bottom-right cell to the top-left cell) as a path which goes
168

from top-left to bottom-right cell. It makes no difference, because a path passed from bottom to
top, may be passed from top to bottom just in reverse order. In this way we get three paths going
from top to bottom. This somehow decreases the difficulty of the problem. We can consider these
3 paths as left, middle and right. When 2 paths intersect (like in the figure below)

we may consider them as in the following picture, without affecting the result:

This way we'll get 3 paths, which we may consider as being one left, one middle and the other -
right. More than that, we may see that for getting an optimal results they must not intersect
(except in the leftmost upper corner and rightmost bottom corner). So for each row y (except first
and last), the x coordinates of the lines (x1[y] , x2[y] and respectively x3[y] ) will be : x1[y] <
x2[y] < x3[y] . Having done that - the DP solution now becomes much clearer. Let's consider
the row y. Now suppose that for any configuration of x1[y-1] , x2[y-1] and x3[y-1] we have
already found the paths which collect the maximum number of apples. From them we can find
the optimal solution for row y. We now have to find only the way for passing from one row to the
next one. Let Max[i][j][k] represent the maximum number of apples collected till row y-1
inclusive, with three paths finishing at column i, j, and respectively k. For the next row y, add to
each Max[i][j][k] (obtained previously) the number of apples situated in cells (y,i) , (y,j) and
(y,k). Thus we move down at each step. After we made such a move, we must consider that the
paths may move in a row to the right. For keeping the paths out of an intersection, we must first
consider the move to the right of the left path, after this of the middle path, and then of the right
path. For a better understanding think about the move to the right of the left path - take every
possible pair of, k (where j<="" p="">

TC problems for practicing:

MiniPaint - SRM 178 Div 1


169

Additional Note:
When have read the description of a problem and started to solve it, first look at its restrictions.
If a polynomial-time algorithm should be developed, then it's possible that the solution may be of
DP type. In this case try to see if there exist such states (sub-solutions) with the help of which the
next states (sub-solutions) may be found. Having found that - think about how to pass from one
state to another. If it seems to be a DP problem, but you can't define such states, then try to
reduce the problem to another one (like in the example above, from Section 5).

Basic Data Structures

Topcoder.com

By timmac
topcoder member

Even though computers can perform literally millions of mathematical computations per second,
when a problem gets large and complicated, performance can nonetheless be an important
consideration. One of the most crucial aspects to how quickly a problem can be solved is how the
data is stored in memory.

To illustrate this point, consider going to the local library to find a book about a specific subject
matter. Most likely, you will be able to use some kind of electronic reference or, in the worst
case, a card catalog, to determine the title and author of the book you want. Since the books are
typically shelved by category, and within each category sorted by author's name, it is a fairly
straightforward and painless process to then physically select your book from the shelves.

Now, suppose instead you came to the library in search of a particular book, but instead of
organized shelves, were greeted with large garbage bags lining both sides of the room, each
arbitrarily filled with books that may or may not have anything to do with one another. It would
take hours, or even days, to find the book you needed, a comparative eternity. This is how
software runs when data is not stored in an efficient format appropriate to the application.

Simple Data Structures

The simplest data structures are primitive variables. They hold a single value, and beyond that,
are of limited use. When many related values need to be stored, an array is used. It is assumed
that the reader of this article has a solid understanding of variables and arrays.

A somewhat more difficult concept, though equally primitive, are pointers. Pointers, instead of
holding an actual value, simply hold a memory address that, in theory, contains some useful
piece of data. Most seasoned C++ coders have a solid understanding of how to use pointers, and
many of the caveats, while fledgling programmers may find themselves a bit spoiled by more
modern "managed" languages which, for better or worse, handle pointers implicitly. Either way,
it should suffice to know that pointers "point" somewhere in memory, and do not actually store
data themselves.
170

A less abstract way to think about pointers is in how the human mind remembers (or cannot
remember) certain things. Many times, a good engineer may not necessarily know a particular
formula/constant/equation, but when asked, they could tell you exactly which reference to
check.

Arrays

Arrays are a very simple data structure, and may be thought of as a list of a fixed length. Arrays
are nice because of their simplicity, and are well suited for situations where the number of data
items is known (or can be programmatically determined). Suppose you need a piece of code to
calculate the average of several numbers. An array is a perfect data structure to hold the
individual values, since they have no specific order, and the required computations do not require
any special handling other than to iterate through all of the values. The other big strength of
arrays is that they can be accessed randomly, by index. For instance, if you have an array
containing a list of names of students seated in a classroom, where each seat is numbered 1
through n, then studentName[i] is a trivial way to read or store the name of the student in seat i.

An array might also be thought of as a pre-bound pad of paper. It has a fixed number of pages,
each page holds information, and is in a predefined location that never changes.

Linked Lists

A linked list is a data structure that can hold an arbitrary number of data items, and can easily
change size to add or remove items. A linked list, at its simplest, is a pointer to a data node. Each
data node is then composed of data (possibly a record with several data values), and a pointer to
the next node. At the end of the list, the pointer is set to null.

By nature of its design, a linked list is great for storing data when the number of items is either
unknown, or subject to change. However, it provides no way to access an arbitrary item from the
list, short of starting at the beginning and traversing through every node until you reach the one
you want. The same is true if you want to insert a new node at a specific location. It is not
difficult to see the problem of inefficiency.

A typical linked list implementation would have code that defines a node, and looks something
like this:

class ListNode {
String data;
ListNode nextNode;
}
ListNode firstNode;

You could then write a method to add new nodes by inserting them at the beginning of the list:

ListNode newNode = new ListNode();


NewNode.nextNode = firstNode;
firstNode = newNode;
171

Iterating through all of the items in the list is a simple task:

ListNode curNode = firstNode;


while (curNode != null) {
ProcessData(curNode);
curNode = curNode.nextNode;
}

A related data structure, the doubly linked list, helps this problem somewhat. The difference
from a typical linked list is that the root data structure stores a pointer to both the first and last
nodes. Each individual node then has a link to both the previous and next node in the list. This
creates a more flexible structure that allows travel in both directions. Even still, however, this is
rather limited.

Queues

A queue is a data structure that is best described as "first in, first out". A real world example of a
queue is people waiting in line at the bank. As each person enters the bank, he or she is
"enqueued" at the back of the line. When a teller becomes available, they are "dequeued" at the
front of the line.

Perhaps the most common use of a queue within a topcoder problem is to implement a Breadth
First Search (BFS). BFS means to first explore all states that can be reached in one step, then all
states that can be reached in two steps, etc. A queue assists in implementing this solution because
it stores a list of all state spaces that have been visited.

A common type of problem might be the shortest path through a maze. Starting with the point of
origin, determine all possible locations that can be reached in a single step, and add them to the
queue. Then, dequeue a position, and find all locations that can be reached in one more step, and
enqueue those new positions. Continue this process until either a path is found, or the queue is
empty (in which case there is no path). Whenever a "shortest path" or "least number of moves" is
requested, there is a good chance that a BFS, using a queue, will lead to a successful solution.

Most standard libraries, such the Java API, and the .NET framework, provide a Queue class that
provides these two basic interfaces for adding and removing items from a queue.

BFS type problems appear frequently on challenges; on some problems, successful identification
of BFS is simple and immediately, other times it is not so obvious.

A queue implementation may be as simple as an array, and a pointer to the current position
within the array. For instance, if you know that you are trying to get from point A to point B on a
50x50 grid, and have determined that the direction you are facing (or any other details) are not
relevant, then you know that there are no more than 2,500 "states" to visit. Thus, your queue is
programmed like so:

class StateNode {
int xPos;
int yPos;
172

int moveCount;
}

class MyQueue {
StateNode[] queueData = new StateNode[2500];
int queueFront = 0;
int queueBack = 0;

void Enqueue(StateNode node) {


queueData[queueBack] = node;
queueBack++;
}

StateNode Dequeue() {
StateNode returnValue = null;
if (queueBack > queueFront) {
returnValue = queueData[queueFront];
QueueFront++;
}
return returnValue;
}

boolean isNotEmpty() {
return (queueBack > queueFront);
}
}

Then, the main code of your solution looks something like this. (Note that if our queue runs out
of possible states, and we still haven't reached our destination, then it must be impossible to get
there, hence we return the typical "-1" value.)

MyQueue queue = new MyQueue();


queue.Enqueue(initialState);
while (queue.isNotEmpty()) {
StateNode curState = queue.Dequeue();
if (curState == destState)
return curState.moveCount;
for (int dir = 0; dir < 3; dir++) {
if (CanMove(curState, dir))
queue.Enqueue(MoveState(curState, dir));
}
}
Stacks

Stacks are, in a sense, the opposite of queues, in that they are described as "last in, first out". The
classic example is the pile of plates at the local buffet. The workers can continue to add clean
plates to the stack indefinitely, but every time, a visitor will remove from the stack the top plate,
which is the last one that was added.

While it may seem that stacks are rarely implemented explicitly, a solid understanding of how
they work, and how they are used implicitly, is worthwhile education. Those who have been
programming for a while are intimately familiar with the way the stack is used every time a
subroutine is called from within a program. Any parameters, and usually any local variables, are
173

allocated out of space on the stack. Then, after the subroutine has finished, the local variables are
removed, and the return address is "popped" from the stack, so that program execution can
continue where it left off before calling the subroutine.

An understanding of what this implies becomes more important as functions call other functions,
which in turn call other functions. Each function call increases the "nesting level" (the depth of
function calls, if you will) of the execution, and uses increasingly more space on the stack. Of
paramount importance is the case of a recursive function. When a recursive function continually
calls itself, stack space is quickly used as the depth of recursion increases. Nearly every seasoned
programmer has made the mistake of writing a recursive function that never properly returns,
and calls itself until the system throws up an "out of stack space" type of error.

Nevertheless, all of this talk about the depth of recursion is important, because stacks, even when
not used explicitly, are at the heart of a depth first search. A depth first search is typical when
traversing through a tree, for instance looking for a particular node in an XML document. The
stack is responsible for maintaining, in a sense, a trail of what path was taken to get to the current
node, so that the program can "backtrack" (e.g. return from a recursive function call without
having found the desired node) and proceed to the next adjacent node.

Soma (SRM 198) is an excellent example of a problem solved with this type of approach.

Trees

Trees are a data structure consisting of one or more data nodes. The first node is called the
"root", and each node has zero or more "child nodes". The maximum number of children of a
single node, and the maximum depth of children are limited in some cases by the exact type of
data represented by the tree.

One of the most common examples of a tree is an XML document. The top-level document
element is the root node, and each tag found within that is a child. Each of those tags may have
children, and so on. At each node, the type of tag, and any attributes, constitutes the data for that
node. In such a tree, the hierarchy and order of the nodes is well defined, and an important part
of the data itself. Another good example of a tree is a written outline. The entire outline itself is a
root node containing each of the top-level bullet points, each of which may contain one or more
sub-bullets, and so on. The file storage system on most disks is also a tree structure.

Corporate structures also lend themselves well to trees. In a classical management hierarchy, a
President may have one or more vice presidents, each of whom is in charge of several managers,
each of whom presides over several employees.

PermissionTree (SRM 218) provides an unusual problem on a common file system.

bloggoDocStructure (SRM 214) is another good example of a problem using trees.


174

Binary Trees

A special type of tree is a binary tree. A binary tree also happens to be one of the most efficient
ways to store and read a set of records that can be indexed by a key value in some way. The idea
behind a binary tree is that each node has, at most, two children.

In the most typical implementations, the key value of the left node is less than that of its parent,
and the key value of the right node is greater than that of its parent. Thus, the data stored in a
binary tree is always indexed by a key value. When traversing a binary tree, it is simple to
determine which child node to traverse when looking for a given key value.

One might ask why a binary tree is preferable to an array of values that has been sorted. In either
case, finding a given key value (by traversing a binary tree, or by performing a binary search on
a sorted array) carries a time complexity of O(log n). However, adding a new item to a binary
tree is an equally simple operation. In contrast, adding an arbitrary item to a sorted array requires
some time-consuming reorganization of the existing data in order to maintain the desired
ordering.

If you have ever used a field guide to attempt to identify a leaf that you find in the wild, then this
is a good way to understand how data is found in a binary tree. To use a field guide, you start at
the beginning, and answer a series of questions like "is the leaf jagged, or smooth?" that have
only two possible answers. Based upon your answer, you are directed to another page, which
asks another question, and so on. After several questions have sufficiently narrowed down the
details, you are presented with the name, and perhaps some further information about your leaf.
If one were the editor of such a field guide, newly cataloged species could be added to field
guide in much the same manner, by traversing through the questions, and finally at the end,
inserting a new question that differentiates the new leaf from any other similar leaves. In the case
of a computer, the question asked at each node is simply "are you less than or greater than X?"

Priority Queues

In a typical breadth first search (BFS) algorithm, a simple queue works great for keeping track of
what states have been visited. Since each new state is one more operational step than the current
state, adding new locations to the end of the queue is sufficient to insure that the quickest path is
found first. However, the assumption here is that each operation from one state to the next is a
single step.

Let us consider another example where you are driving a car, and wish to get to your destination
as quickly as possible. A typical problem statement might say that you can move one block
up/down/left/right in one minute. In such a case, a simple queue-based BFS works perfectly, and
is guaranteed to provide a correct result.

But what happens if we say that the car can move forward one block in two minute, but requires
three minutes to make a turn and then move one block (in a direction different from how the car
was originally facing)? Depending on what type of move operation we attempt, a new state is not
simply one "step" from the current state, and the "in order" nature of a simple queue is lost.
175

This is where priority queues come in. Simply put, a priority queue accepts states, and internally
stores them in a method such that it can quickly pull out the state that has the least cost. (Since,
by the nature of a "shortest time/path" type of problem, we always want to explore the states of
least cost first.)

A real world example of a priority queue might be waiting to board an airplane. Individuals
arriving at their gate earlier will tend to sit closest to the door, so that they can get in line as soon
as they are called. However, those individuals with a "gold card", or who travel first class, will
always be called first, regardless of when they actually arrived.

One very simple implementation of a priority queue is just an array that searches (one by one)
for the lowest cost state contained within, and appends new elements to the end. Such an
implementation has a trivial time-complexity for insertions, but is painfully slow to pull objects
out again.

A special type of binary tree called a heap is typically used for priority queues. In a heap, the
root node is always less than (or greater than, depending on how your value of "priority" is
implemented) either of its children. Furthermore, this tree is a "complete tree" from the left. A
very simple definition of a complete tree is one where no branch is n + 1 levels deep until all
other branches are n levels deep. Furthermore, it is always the leftmost node(s) that are filled
first.

To extract a value from a heap, the root node (with the lowest cost or highest priority) is pulled.
The deepest, rightmost leaf then becomes the new root node. If the new root node is larger than
at at least one of its children, then the root is swapped with its smallest child, in order to maintain
the property that the root is always less than its children. This continues downward as far as
necessary. Adding a value to the heap is the reverse. The new value is added as the next leaf, and
swapped upward as many times as necessary to maintain the heap property.

A convenient property of trees that are complete from the left is that they can be stored very
efficiently in a flat array. In general, element 0 of the array is the root, and elements 2k + 1 and
2k + 2 are the children of element k. The effect here is that adding the next leaf simply means
appending to the array.

Hash Tables

Hash tables are a unique data structure, and are typically used to implement a "dictionary"
interface, whereby a set of keys each has an associated value. The key is used as an index to
locate the associated values. This is not unlike a classical dictionary, where someone can find a
definition (value) of a given word (key).

Unfortunately, not every type of data is quite as easy to sort as a simple dictionary word, and this
is where the "hash" comes into play. Hashing is the process of generating a key value (in this
case, typically a 32 or 64 bit integer) from a piece of data. This hash value then becomes a basis
for organizing and sorting the data. The hash value might be the first n bits of data, the last n bits
of data, a modulus of the value, or in some cases, a more complicated function. Using the hash
176

value, different "hash buckets" can be set up to store data. If the hash values are distributed
evenly (which is the case for an ideal hash algorithm), then the buckets will tend to fill up
evenly, and in many cases, most buckets will have no more than one or only a few objects in
them. This makes the search even faster.

A hash bucket containing more than one value is known as a "collision". The exact nature of
collision handling is implementation specific, and is crucial to the performance of the hash table.
One of the simplest methods is to implement a structure like a linked list at the hash bucket level,
so that elements with the same hash value can be chained together at the proper location. Other,
more complicated schemes may involve utilizing adjacent, unused locations in the table, or re-
hashing the hash value to obtain a new value. As always, there are good and bad performance
considerations (regarding time, size, and complexity) with any approach.

Another good example of a hash table is the Dewey decimal system, used in many libraries.
Every book is assigned a number, based upon its subject matter the 500's are all science
books, the 700's are all the arts, etc. Much like a real hash table, the speed at which a person
could find a given book is based upon how well the hash buckets are evenly divided It will
take longer to find a book about frogs in a library with many science materials than in a library
consisting mostly of classical literature.

In applications development, hash tables are a convenient place to store reference data, like state
abbreviations that link to full state names. In problem solving, hash tables are useful for
implementing a divide-and-conquer approach to knapsack-type problems. In LongPipes, we are
asked to find the minimum number of pipes needed to construct a single pipe of a given length,
and we have up to 38 pieces of pipe. By dividing this into two sets of 19, and calculating all
possible lengths from each set, we create hash tables linking the length of the pipe to the fewest
number of segments used. Then, for each constructed pipe in one set, we can easily look up,
whether or not we constructed a pipe of corresponding length in the other set, such that the two
join to form a complete pipe of the desired length.

logarithmic exponentiation
codechef.com

Fast Modulo Multiplication (Exponential


Squaring)
also known as Exponential Squaring or Repeated Squaring

This is a very useful technique to have under your arsenal as a competitive programmer,
177

especially because such technique often appears on Maths related problems and, sometimes, it
might be the difference between having AC veredict or TLE veredict, so, for someone who still
doesn't know about it, I hope this tutorial will help :)

Main reason why the usage of repeated squaring is useful

This technique might seem a bit too complicated at a first glance for a newbie, after all, say, we
want to compute the number 310. We can simply write the following code and do a simple for
loop:

#include <iostream>
using namespace std;

int main()
{
int base = 3;
int exp = 10;
int ans = 1;
for(int i = 1; i <= exp; i++)
{
ans *= base;
}
cout << ans;
return 0;
}

The above code will correctly compute the desired number, 59049. However, after we analyze it
carefully, we will obtain an insight which will be the key to develop a faster method.

Apart from being correct, the main issue with the above method is the number of
multiplications that are being executed.

We see that this method executes exactly exp-1 multiplications to compute the number nexp,
which is not desirable when the value of exp is very large.

Usually, on contest problems, the idea of computing large powers of a number appears coupled
with the existence of a modulus value, i.e., as the values being computed can get very large, very
fast, the value we are looking to compute is usually something of the form:

nexp % M ,

where M is usually a large prime number (typically, 109 + 7).

Note that we could still use the modulus in our naive way of computing a large power: we simply
use modulus on all the intermediate steps and take modulus at the end, to ensure every
calculation is kept within the limit of "safe" data-types.

The fast-exponentiation method: an implementation in C++


178

It is possible to find several formulas and several ways of performing fast exponentiation by
searching over the internet, but, there's nothing like implementing them on our own to get a
better feel about what we are doing :)

I will describe here the most naive method of performing repeated squaring. As found on
Wikipedia, the main formula is:

A brief analysis of this formula (an intuitive analysis if you prefer), based both on its recursive
formulation and on its implementation allows us to see that the formula uses only O(log2n)
squarings and O(log2n) multiplications!

This is a major improvement over the most naive method described in the beginning of this text,
where we used much more multiplication operations.

Below, I provide a code which computes baseexp % 1000000007, based on wikipedia formula:

long long int fast_exp(int base, int exp)


{
if(exp==1)
return base;
else
{
if(exp%2 == 0)
{
long long int base1 = pow(fast_exp(base, exp/2),2);
if(base1 >= 1000000007)
return base1%1000000007;
else
return base1;
}
else
{
long long int ans = (base* pow(fast_exp(base,(exp-1)/2),2));
if(ans >= 1000000007)
return ans%1000000007;
else
return ans;
}
}
}

The 2k-ary method for repeated squaring

Besides the recursive method detailed above, we can use yet another insight which allows us to
compute the value of the desired power.
179

The main idea is that we can expand the exponent in base 2 (or more generally, in base 2k, with k
>= 1) and use this expansion to achieve the same result as above, but, this time, using less
memory.

Below you can find the code provided in the comment by @michal27:

ll fast_exp(int base, int exp) {


ll res=1;
while(exp>0) {
if(exp%2==1) res=(res*base)%MOD;
base=(base*base)%MOD;
exp/=2;
}
return res%MOD;
}

These are the two most common ways of performing repeated squaring on a live contest

Graphs
codechef.com

Graphs are everywhere! we define a graph as being composed by two sets, V and E, respectively
denoting the set of vertices and the set of edges. We say that vertices are connected by edges, so
an edge connects two vertices together. If we generically denote a graph as G, we usually denote
it by its two sets, such that the following notation is very common:

G(V, E) - is the graph G composed by V vertices and E edges.


180

Below is a simple example of what a graph looks like (the description is in portuguese, but, it
should be easy to understand):

So we can immediately see that the small orange circles represent vertices and the black lines
represent edges.

Please note that while it might seem a bit formal and confusing at the beginning, when we talk
about graphs, this is the terminology we are going to use, as it is the only terminology we have
which allows us to understand all graph algorithms and to actually talk about graphs, so, for
someone who finds this a bit confusing, please twist your mind a bit :)

So, now we know how to define a graph and how it is formally represented. This is very
important for the remaining of this post, so, make sure you understand this well :)

However, for sure that graphs aren't only about defining a set of vertices and edges, are they?

Well, it turns out that things can get a little bit more complicated and there are some special
types of graphs besides the generic one, which was the one we just looked at. So, let us see what
more kinds of graphs we can have.

Special types of graphs


181

As it was mentioned on the foreword, graphs are used to model a lot of different everyday and
science problems, so, it was necessary to devise more particular types of graphs to suit such vast
problem-set. Let's see some definitions about these "special" graphs and examples of application
domains:

Type 1: Directed Graph

On a Directed Graph, the edges can have an associated direction.

This type of graph can be particularly useful if we are modelling for example some given system
of roads and/or highways to study traffic, as we can have a defined way on a given road. Making
the edges directed embeds this property on the graph.

Type 2: Directed Acyclic Graph

It is alike the graph above, except that on this particular type of graph we can't have "loops", i.e.,
we can't start a given path along the graph starting on vertex A and ending of vertex A again.

Type 3: Weighted Graph

In a weighted graph, the main characteristic is that the edges can have an associated "cost".

Mathematically speaking (and also in Operations Research) this "cost", need not to be money. It
can model any given quantity we desire and it is always "attached" to the problem we are
modelling: if we are analysing a map it can represent the cost of a given road due to a toll that's
placed there, or distances between cities.

In Sports, it can represent the time taken by a professional paddling team to travel between two
checkpoints, etc, etc... The possibilites are endless.
182

Besides these types there are also many other types of graphs (a tree, for instance, can be seen as
a particular type of graph) that were devised for more advanced applications and that I really
don't know much about yet, so these will be all the types of graphs I will talk about on this
introductory post.

Representing a graph so we can manipulate it in a computer problem

It turns out that one of the hardest things about graphs for me (besides wrapping your head
around the enormous amount of applications they have) is that literally everything on them is
highly dependent on the problem they are modelling, even the way we decide to represent
them is.

However, there are not that many ways to represent a graph internally in computer memory and
even from that small universe we are going to focus solely on two representation schemes.

Representation via adjacency lists and representation via adjacency matrix.

For a better understanding, let us consider the graph below:

We can see now, thanks to our formal definition given earlier on this text, that this graph is
composed from two sets, the set of the vertices and the set of the edges.

We have 4 vertices and 4 edges.


183

However, only the information concerning the number of vertices and edges is not sufficient for
us to describe a given graph, because, as you might have figured out by now, the edges could be
placed differently (like in a square, for example.) and we wouldn't be able to know it.

This is why there are representations of graphs which allows us to "visualize" the graph in all of
its structure, from which the adjacency matrix will be the one we will study in the first place.

Adjacency Matrix

The adjacency matrix (let's call it A) is very simple to understand and, as the name itself says, it's
a representation that is based on a matrix of dimensions V x V, where it's elements are as
follows:

A(i,j) -> 1, if there exists an edge between vertex i and vertex j;

A(i,j) -> 0, otherwise;

So, for the above graph, its adjacency matrix representation is as follows:

Adjacency List

The representation as an adjacency list is usually more convenient to represent a graph internally
as it allows for an easier implementation of graph traversal methods, and it is based, also, as the
name states, in having a "list of lists" for each vertex, that states the vertices to which a given
vertex is connected.

As a picture is worth a thousand words, here is the adjacency list representation of the above
graph:
184

here, the symbol / is used to denote end of list :)

Now that we have looked into two of the most common ways of representing a graph, we are
ready to study the first of two traversal algorithms, Depth-First Search.

Understanding Depth-First Search

Depth-First search, often known as DFS (it stands for the beginning letters of each word) is a
graph traversal algorithm where the vertices of the graph are explored in depth first order, with
the search algorithm backtracking whenever it has reached the end of the search space, visiting
the remaining vertices as it tracks back.

This idea is illustrated on the figure below where the order by which vertices are visited is
described as a number (note that this is a tree, but, also, as discussed earlier, a tree is a particular
kind of graph, and it has the advantage of being easy to visualize).

As the vertices are first explored to a certain order and then we must backtrack to explore
remaining vertices, the DFS algorithm can be naturally implemented in a recursive fashion. Also
please note that we assume here that the visited vertices are visited only once, so we can have, in
pseudo-code:

dfs(v):
185

visited(v)=true
for i = graph.begin() to graph.end()
if(visited(i)=false)
dfs(i)

It's now important to mention two things:

The first one is that in Graph Theory, DFS is usually presented in an even more formal fashion,
mentioning colors and discovery times and end times for each vertex. Well, from what I
understood, in programming contests the above routine is enough and the purpose of having
array visited() is precisely to "replace" the color attributes that are related to discovery and end
times, which I chose to ommit to keep this post in the ambit of programming competitions.

Finally, to end this explanation, what is more important to mention is that DFS usually is
implemented with something more besides only that code.

Usually we "add" to DFS something "useful" that serves the purpose of the problem we are
solving and that might or might not be related to two of most well known applications of DFS,
which are:

DFS applications

Connected Components;

Topological Sorting;

Wrapping it all together: Solving FIRESC problem using what we know

FIRESC problem is the perfect problem for us to put our skills to test :)

Once we understand how to model the problem in terms of graphs, implemeting it is simply
using one of the applications of DFS (finding connected components).

Simply, we can model this is terms of graphs as follows:

Let us define a graph whose vertices are people. An undirected edge connects two people who
are friends.

On this graph, two vertices will be of the same color if they share an edge. This represents that
the two people should have to go in the same fire escape route. We wish to maximize the number
of colors used.

All the vertices in a connected component in this graph will have to be in the same color.

So as you see, we need to count the size of all connected components to obtain part of our
answer, and below I leave my commented implementation of this problem, largely based on the
work of @anton_lunyov and on the post found on this site (yes, I had to use google translate :p ).
186

#include <iostream>
#include <stdio.h>
#include <vector>
#include <iterator>
using namespace std;

vector<bool> visited; //this vector will mark visited components


vector<vector<int> > graph; //this will store the graph represented
internally as an adjacency list
//this is because the adjacency list representation is the most suited to use
DFS procedure on a given graph

int sz_connect_comp = 0; //this will store the size of current connected


component (problem-specific feature)

void dfs(int v)
{
sz_connect_comp++; //"useful feature" performed on this DFS, this can
vary from problem to problem
visited[v] = true;

for(vector<int>::iterator it = graph[v].begin(); it != graph[v].end();


it++)
{
if(! visited[*it]) //note that *it represents the adjacent vertex
itself
{
dfs(*it);
}
}
}

int main()
{
int t;
cin >> t;
while(t--)
{
int n,m;
cin >> n >> m;
graph = vector<vector<int> > (n); //initialization of the graph
for(int i = 0; i < m; i++)
{
int u,v;
cin >> u >> v;
u--;
v--;
//these are added this way due to the friendship relation being
mutual
graph[u].push_back(v);
graph[v].push_back(u);
}
int res = 0; // the number of fire escape routes
int ways = 1; // the number of ways to choose drill captains
visited = vector<bool> (n, 0); // initially mark all vertices as
unvisited
for(int u = 0; u < n; u++)
187

{
//if the vertex was visited we skip it.
if(visited[u]==true)
continue;
// if vertex was not visited it starts a new component
res++; // so we increase res
sz_connect_comp = 0; // init sz_connect_comp
dfs(u); // and calculate it through the dfs, marking visited
vertices
// we multiply ways by sz_connect_comp modulo 1000000007
ways = (long long)sz_connect_comp * ways % 1000000007;
}
printf("%d %d\n", res, ways);

}
return 0;
}

Efficient Prime Factorization


geeksforgeeks.org

Efficient program to print all prime factors of a given number

Given a number n, write an efficient function to print all prime factors of n. For example, if the
input number is 12, then output should be 2 2 3. And if the input number is 315, then output
should be 3 3 5 7.

Following are the steps to find all prime factors.


1) While n is divisible by 2, print 2 and divide n by 2.
2) After step 1, n must be odd. Now start a loop from i = 3 to square root of n. While i divides n,
print i and divide n by i, increment i by 2 and continue.
3) If n is a prime number and is greater than 2, then n will not become 1 by above two steps. So
print n if it is greater than 2.

// Program to print all prime factors


# include <stdio.h>
# include <math.h>

// A function to print all prime factors of a given number n


void primeFactors(int n)
{
// Print the number of 2s that divide n
while (n%2 == 0)
{
printf("%d ", 2);
n = n/2;
}
188

// n must be odd at this point. So we can skip one element (Note i = i


+2)
for (int i = 3; i <= sqrt(n); i = i+2)
{
// While i divides n, print i and divide n
while (n%i == 0)
{
printf("%d ", i);
n = n/i;
}
}

// This condition is to handle the case whien n is a prime number


// greater than 2
if (n > 2)
printf ("%d ", n);
}

/* Driver program to test above function */


int main()
{
int n = 315;
primeFactors(n);
return 0;
}

Output:

3 3 5 7

How does this work?


The steps 1 and 2 take care of composite numbers and step 3 takes care of prime numbers. To
prove that the complete algorithm works, we need to prove that steps 1 and 2 actually take care
of composite numbers. This is clear that step 1 takes care of even numbers. And after step 1, all
remaining prime factor must be odd (difference of two prime factors must be at least 2), this
explains why i is incremented by 2.
Now the main part is, the loop runs till square root of n not till. To prove that this optimization
works, let us consider the following property of composite numbers.
Every composite number has at least one prime factor less than or equal to square root of itself.
This property can be proved using counter statement. Let a and b be two factors of n such that
a*b = n. If both are greater than , then a.b > which contradicts the expression a *
b = n.

In step 2 of the above algorithm, we run a loop and do following in loop


a) Find the least prime factor i (must be less than )
b) Remove all occurrences i from n by repeatedly dividing n by i.
c) Repeat steps a and b for divided n and i = i + 2. The steps a and b are repeated till n becomes
either 1 or a prime number.

Combinatorics
topcoder.com
189

By x-ray
topcoder member

Introduction

Counting the objects that satisfy some criteria is a very common task in both TopCoder problems
and in real-life situations. The myriad ways of counting the number of elements in a set is one of
the main tasks in combinatorics, and Ill try to describe some basic aspects of it in this tutorial.
These methods are used in a range of applications, from discrete math and probability theory to
statistics, physics, biology, and more.

Combinatorial primitives

Lets begin with a quick overview of the basic rules and objects that we will reference later.

The rule of sum

The rule of product

For example, if we have three towns -- A, B and C -- and there are 3 roads from A to B and 5
roads from B to C, then we can get from A to C through B in 3*5=15 different ways.

These rules can be used for a finite collections of sets.

Permutation without repetition

When we choose k objects from n-element set in such a way that the order matters and each
object can be chosen only once:

For example, suppose we are planning the next 3 challenges and we have a set of 10 easy
problems to choose from. We will only use one easy problem in each contest, so we can choose

our problems in different ways.

Permutation (variation) with repetition


190

The number of possible choices of k objects from a set of n objects when order is important and
one object can be chosen more than once:

nk

For example, if we have 10 different prizes that need to be divided among 5 people, we can do so
in 510 ways.

Permutation with repetition

The number of different permutations of n objects, where there are n1 indistinguishable objects
of type 1, n2 indistinguishable objects of type 2,..., and nk indistinguishable objects of
type k (n1+n2++nk=n), is:

For example, if we have 97 coders and want to assign them to 5 rooms (rooms 1-4 have 20
coders in each, while the 5th room has 17), then there

are possible ways to do it.

Combinations without repetition

In combinations we choose a set of elements (rather than an arrangement, as in permutations) so


the order doesnt matter. The number of different k-element subsets (when each element can be
chosen only once) of n-element set is:

For example, if we have 7 different colored balls, we can choose any 3 of them

in different ways.

Combination with repetition

Let's say we choose k elements from an n-element set, the order doesnt matter and each element
can be chosen more than once. In that case, the number of different combinations is:
191

For example, let's say we have 11 identical balls and 3 different pockets, and we need to
calculate the number of different divisions of these balls to the pockets. There would

be different combinations.

It is useful to know that is also the number of integer solutions to this equation:

Why? It's easy to prove. Consider a vector (1, 1, , 1) consisting of (n+k-1) ones, in which we
want to substitute n-1 ones for zeroes in such way that we'll get n groups of ones (some of which
may be empty) and the number of ones in the ith group will be the value of xi:

The sum of xi will be k, because k ones are left after substitution.

The Basics

Binary vectors

Some problems, and challenge problems are no exception, can be reformulated in terms of binary
vectors. Accordingly, some knowledge of the basic combinatorial properties of binary vectors is
rather important. Lets have a look at some simple things associated with them:

1. Number of binary vectors of length n: 2n.

2. Number of binary vectors of length n and with k 1 is

We just choose k positions for our 1s.

3. The number of ordered pairs (a, b) of binary vectors, such that the distance between them (k)

can be calculated as follows: .

The distance between a and b is the number of components that differs in a and b -- for example,
the distance between (0, 0, 1, 0) and (1, 0, 1, 1) is 2).
192

Let a = (a1, a2, an), b = (b1, b2, bn) and distance between them is k. Next, lets look at the
sequence of pairs (a1, b1), (a2, b2), (an, bn). There are exactly k indices i in which ai bi.
They can be (0,1) or (1,0), so there are 2 variants, and n-k can be either (0,0) or (1,1), for another

2 variants. To calculate the answer we can choose k indices in which vectors differs in ways,
then we choose components that differs in 2k ways and components that are equal in 2n-kways
(all of which use the permutation with repetition formula), and in the end we just multiply all

these numbers and get .

Delving deeper

Now lets take a look at a very interesting and useful formula called the inclusion-exclusion
principle (also known as the sieve principle):

This formula is a generalization of:

There are many different problems that can be solved using the sieve principle, so lets focus our
attention on one of them. This problem is best known as Derangements. A derangement of the
finite set X is a bijection from X into X that doesnt have fixed points. A small example: For set
X = {1, 2, 3} bijection {(1,1), (2,3), (3,2)} is not derangement, because of (1,1), but bijection
{(1,2), (2,3), (3,1)} is derangement. So lets turn back to the problem, the goal of which is to find
the number of derangements of n-element set.

We have X = {1, 2, 3,, n}. Let:

A be the set of all bijections from X into X, |A|=n!,


A0 be the set of all derangements of X,
Ai ( i X ) be the set of bijections from X into X that have (i,i),
AI (I X) be the set of bijections from X into X that have (i,i) iI,

so and |AI|=(n-|AI|)!.
193

In formula we have sums , in our case well have , so lets


calculate them:

(because there are exactly i-element subsets of X).

Now we just put that result into the sieve principles formula:

And now the last step, from well have the answer:

And the last remark:

This problem may not look very practical for use in TopCoder problems, but the thinking behind
it is rather important, and these ideas can be widely applied.

Another interesting method in combinatorics -- and one of my favorites, because of its elegance -
- is called method of paths (or trajectories). The main idea is to find a geometrical interpretation
for the problem in which we should calculate the number of paths of a special type. More
precisely, if we have two points A, B on a plane with integer coordinates, then we will operate
only with the shortest paths between A and B that pass only through the lines of the integer grid
and that can be done only in horizontal or vertical movements with length equal to 1. For
example:
194

All paths between A and B have the same length equal to n+m (where n is the difference
between x-coordinates and m is the difference between y-coordinates). We can easily calculate
the number of all the paths between A and B as follows:

or .

Lets solve a famous problem using this method. The goal is to find the number of Dyck words
with a length of 2n. What is a Dyck word? It's a string consisting only of n Xs and n Ys, and
matching this criteria: each prefix of this string has more Xs than Ys. For example, XXYY
and XYXY are Dyck words, but XYYX and YYXX are not.

Lets start the calculation process. We are going to build a geometrical analog of this problem, so
lets consider paths that go from point A(0, 0) to point B(n, n) and do not cross segment AB, but
can touch it (see examples for n=4).

It is obvious that these two problems are equivalent; we can just build a bijection in such a way:
step right - X, step up - Y.

Here's the main idea of the solution: Find the number of paths from A to B that cross segment
AB, and call them incorrect. If path is incorrect it has points on the segment CD, where C =
(0, 1), D = (n-1, n). Let E be the point nearest to A that belongs to CD (and to the path). Lets
symmetrize AE, part of our incorrect path with respect to the line CD. After this operation
well get a path from F = (-1, 1) to B.
195

It should be easy to see that, for each path from F to B, we can build only one incorrect path
from A to B, so weve got a bijection. Thus, the number of incorrect paths from A to B

is . Now we can easily get the answer, by subtracting the number of incorrect paths
from all paths:

This number is also known as ns Catalan number: Cn is the number of Dyck words with length
2n. These numbers also appear in many other problems, for example, Cn counts the number of
correct arrangements of n pairs of parentheses in the expression, and Cn is also the number of the
possible triangulations of a polygon with (n+2) vertices, and so on.

Using recurrence relations

Recurrence relations probably deserves their own separate article, but I should mention that they
play a great role in combinatorics. Particularly with regard to TopCoder, most calculation
problems seem to require coders to use some recurrence relation and find the solution for the
values of parameters.

If you'd like to learn more, check out these tutorials: An Introduction to Recursion, Recursion,
Part 2, and Dynamic Programming: From novice to advanced. Done reading? Lets take a look at
some examples.

ChristmasTree (SRM 331 Division Two Level Three):

Well use DP to solve this -- it may not be the best way to tackle this problem, but its the easiest
to understand. Let cnt[lev][r][g][b] be the number of possible ways to decorate the
first lev levels of tree using r red, g green and b blue baubles. To make a recurrent step
calculating cnt[lev][r][g][b] we consider 3 variants:

cnt[lev][r][g][b]=

1) we fill the last level with one color (red, green or blue), so just:
196

= cnt [lev-1][r-lev][g][b]+ cnt[lev-1][r][g-lev][b]+ cnt[lev-1][r][g][b-lev]+ ;

2) if (lev%2 == 0) we fill the last level with two colors (red+green, green+blue or red+blue),
then we calculate the number of possible decorations using the Permutation with

repetition formula. Well get possible variants for each two colors, so just

(cnt[lev-1][r-lev/2][g-lev/2][b]+...+cnt[lev-1][r][g-lev/2][b-lev/2])+;

3) if (lev%3 == 0) we fill the last level with three colors and, again using the Permutation with

repetition formula, well get possible variants, so well get:

(all cnt[l][i][j][k] with negative indices are 0).

DiceGames (SRM 349 Division One Level Two):

First we should do some preparation for the main part of the solution, by sorting sides array in
increasing order and calculating only the formations where the numbers on the dice go in non-
decreasing order. This preparation saves us from calculating the same formations several times
(see SRM 349 - Problem Set & Analysis for additional explanation). Now we will only need to
build the recurrence relation, since the implementation is rather straightforward. Letret[i][j] be
the number of different formations of the first i dice, with the last dice equal
to j (so , where n is the number of elements in sides). Now we can
simply write the recurrence relation:

The answer will be .

Union Find/Disjoint Set


topcoder.com

Disjoint-set Data Structures


197

Introduction
Many times the efficiency of an algorithm depends on the data structures used in the algorithm.
A wise choice in the structure you use in solving a problem can reduce the time of execution, the
time to implement the algorithm and the amount of memory used. During SRM competitions we
are limited to a time limit of 2 seconds and 64 MB of memory, so the right data structure can
help you remain in competition. While some Data Structures have been covered before, in this
article we'll focus on data structures for disjoint sets.

The problem
Lets consider the following problem: In a room are N persons, and we will define two persons
are friends if they are directly or indirectly friends. If A is a friend with B, and B is a friend with
C, then A is a friend of C too. A group of friends is a group of persons where any two persons in
the group are friends. Given the list of persons that are directly friends find the number of groups
of friends and the number of persons in each group. For example N = 5 and the list of friends is:
1-2, 5-4, and 5-1. Here is the figure of the graph that represents the groups of friends. 1 and 2 are
friends, then 5 and 4 are friends, and then 5 and 1 are friends, but 1 is friend with 2; therefore 5
and 2 are friends, etc.

In the end there are 2 groups of friends: one group is {1, 2, 4, 5}, the other is {3}.

The solution
This problem can be solved using BFS, but lets see how to solve this kind of problem using data
structures for disjoint sets. First of all: a disjoint-set data structure is a structure that maintains a
collection S1, S2, S3, , Sn of dynamic disjoint sets. Two sets are disjoint if their intersection is
null. For example set {1, 2, 3} and set {1, 5, 6} arent disjoint because they have in common
{1}, but the sets {1, 2, 3} and {5, 6} are disjoint because their intersection is null. In a data
structure of disjoint sets every set contains a representative, which is one member of the set.

Lets see how things will work with sets for the example of the problem. The groups will be
represented by sets, and the representative of each group is the person with the biggest index. At
the beginning there are 5 groups (sets): {1}, {2}, {3}, {4}, {5}. Nobody is anybodys friend and
everyone is the representative of his or her own group.

The next step is that 1 and 2 become friends, this means the group containing 1 and the group
with 2 will become one group. This will give us these groups: {1, 2} , {3}, {4}, {5}, and the
representative of the first group will become 2. Next, 5 and 4 become friends. The groups will be
{1,2}, {3}, {4, 5}. And in the last step 5 and 1 become friends and the groups will be {1, 2, 4,
198

5}, {3}. The representative of the first group will be 5 and the representative for second group
will be 3. (We will see why we need representatives later). At the end we have 2 sets, the first set
with 4 elements and the second with one, and this is the answer for the problem example: 2
groups, 1 group of 4 and one group of one.

Perhaps now you are wondering how you can check if 2 persons are in the same group. This is
where the use of the representative elements comes in. Lets say we want to check if 3 and 2 are
in the same group, we will know this if the representative of the set that contains 3 is the same as
the representative of the set that contains 2. One representative is 5 and the other one is 3;
therefore 3 and 2 arent in same groups of friends.

Some operations
Lets define the following operations:

CREATE-SET(x) creates a new set with one element {x}.


MERGE-SETS(x, y) merge into one set the set that contains element x and the set that
contains element y (x and y are in different sets). The original sets will be destroyed.
FIND-SET(x) returns the representative or a pointer to the representative of the set that
contains element x.

The solution using these operations


Lets see the solution for our problem using these operations:

Read N;
for (each person x from 1 to N) CREATE-SET(x)
for (each pair of friends (x y) ) if (FIND-SET(x) != FIND-SET(y)) MERGE-
SETS(x, y)

Now if we want to see if 2 persons (x, y) are in same group we check if FIND-SET(x) == FIND-
SET(y).

We will analyze the running time of the disjoint-set data structure in terms of N and M, where N
is the number of times that CREATE-SET(x) is called and M is the total number of times that
CREATE-SET(x), MERGE-SETS(x, y) and FIND-SET(x) are called. Since the sets are disjoint,
each time MERGE-SETS(x, y) is called one set will be created and two will be destroyed, giving
us one less set. If there are n sets after n-1 calls of MERGE-SETS(x,y) there will remain only
one set. Thats why the number of MERGE-SETS(x,y) calls is less than or equal to the number
of CREATE-SET(x) operations.

Implementation with linked lists


One way to implement disjoint set data structures is to represent each set by a linked list. Each
element (object) will be in a linked list and will contain a pointer to the next element in the set
and another pointer to the representative of the set. Here is a figure of how the example of the
problem will look like after all operations are made. The blue arrows are the pointers to the
representatives and the black arrows are the pointers to the next element in the sets. Representing
sets with linked lists we will obtain a complexity of O(1) for CREATE-SET(x) and FIND-
SET(x). CREATE-SET(x) will just create a new linked list whose only element (object) is x, the
199

operation FIND-SET(x) just returns the pointer to the representative of the set that contains
element (object) x.

Now lets see how to implement the MERGE-SETS(x, y) operations. The easy way is to append
xs list onto the end of ys list. The representative of the new set is the representative of the
original set that contained y. We must update the pointer to the representative for each element
(object) originally on xs list, which takes linear time in terms of the length of xs list. Its easy to
prove that, in the worst case, the complexity of the algorithm will be O(M^2) where M is the
number of operations MERGE-SETS(x, y). With this implementation the complexity will
average O(N) per operation where N represents the number of elements in all sets.

The weighted union heuristic


Lets see how a heuristic will make the algorithm more efficient. The heuristic is called a
weighted-union heuristic." In this case, lets say that the representative of a set contains
information about how many objects (elements) are in that set as well. The optimization is to
always append the smaller list onto the longer and, in case of ties, append arbitrarily. This will
bring the complexity of the algorithm to O(M + NlogN) where M is the number of operations
(FIND-SET, MERGE-SETS, CREATE-SETS) and N is the number of operations CREATE-
SETS. I will not prove why the complexity is this, but if you are interested you can find the proof
in the resources mentioned at the end of the article.

So far we reach an algorithm to solve the problem in O(M + NlogN) where N is the number of
persons and M is the number of friendships and a memory of O(N). Still the BFS will solve the
problem in O(M + N) and memory of O(N + M). We can see that we have optimized the
memory but not the execution time.

Next step: root trees


The next step is to see what we can do for a faster implementation of disjoint set data structures.
Lets represent sets by rooted trees, with each node containing one element and each tree
representing one set. Each element will point only to its parent and the root of each tree is the
representative of that set and its own parent. Lets see, in steps, how the trees will look for the
example from the problem above.
200

Step 1: nobody is anybody friend

We have 5 trees and each tree has a single element, which is the root and the representative of
that tree.

Step 2: 1 and 2 are friends, MERGE-SETS(1, 2):

The operation made is MERGE-SETS(1, 2). We have 4 trees one tree contain 2 elements and
have the root 1. The other trees have a single element.

Step 3: 5 and 4 are friends, MERGE-SETS(5, 4)

The operation made is MERGE-SETS(5, 4). Now we have 3 trees, 2 trees with 2 elements and
one tree with one element.

Step 4: 5 and 1 are friends, MERGE-SETS(5, 1)


201

The operation made is MERGE-SETS(5, 1). Now we have 2 trees, one tree has 4 elements and
the other one has only one element.

As we see so far the algorithm using rooted trees is no faster than the algorithm using linked
lists.

Two heuristics
Next we will see how, by using two heuristics, we will achieve the asymptotically fastest disjoint
set data structure known so far, which is almost linear in terms of the number of operations
made. These two heuristics are called union by rank and path compression. The idea in
the first heuristic union by rank is to make the root of the tree with fewer nodes point to the
root of the tree with more nodes. For each node, we maintain a rank that approximates the
logarithm of the sub-tree size and is also an upper bound on the height of the node. When
MERGE-SETS(x, y) is called, the root with smaller rank is made to point to the root with larger
rank. The idea in the second heuristic path compression, which is used for operation FIND-
SET(x), is to make each node on the find path point directly to the root. This will not change any
ranks.

To implement a disjoint set forest with these heuristics, we must keep track of ranks. With each
node x, we keep the integer value rank[x], which is bigger than or equal to the number of edges
in the longest path between node x and a sub-leaf. When CREATE-SET(x) is called the initial
rank[x] will be 0. When a MERGE-SETS(x, y) operation is made then the root of higher rank
will become the parent of the root of lower rank or, in case of tie, we arbitrarily choose one of
the roots as the parent and increment its rank.

Lets see how the algorithm will look.

Let P[x] = the parent of node x.


CREATE-SET(x)
P[x] = x
rank[x] = 0
MERGE-SETS(x, y)
PX = FIND-SET(X)
202

PY =FIND-SET(Y)
If (rank[PX] > rank[PY]) P[PY] = PX
Else P[PX] = PY
If (rank[PX] == rank[PY]) rank[PY] = rank[PY] + 1

And the last operation looks like:

FIND-SET(x)
If (x != P[x]) p[x] = FIND-SET(P[X])
Return P[X]

Now lets see how the heuristics helped the running time. If we use only the first heuristic
union by rank then we will get the same running time we achieved with the weighted union
heuristic when we used lists for representation. When we use both union by rank and path
compression, the worst running time is O( m (m,n)), where (m,n) is the very slowly growing
inverse of Ackermanns function. In application (m,n) <= 4 thats why we can say that the
running time is linear in terms of m, in practical situations. (For more details on Ackermanns
function or complexity see the references below.)

Back to the problem


The problem from the beginning of the article is solvable in O(N + M) and O(N) for memory
using disjoint-set data structure. The difference for time execution is not big if the problem is
solved with BFS, but we dont need to keep in memory the vertices of the graph. Lets see if the
problem was like: In a room are N persons and you had to handle Q queries. A query is of the
form x y 1, meaning that x is friends with y, or x y 2 meaning that we are asked to output if
x and y are in same group of friends at that moment in time. In this case the solution with
disjoint-set data structure is the fastest, giving a complexity of O(N + Q).

Practice
Disjoint-set data structures are a helpful tool for use in different algorithms, or even for solving
problems in an SRM. They are efficient and use small amount of memory. They are useful in
applications like Computing the shorelines of a terrain, Classifying a set of atoms into
molecules or fragments, Connected component labeling in image analysis, and others.

To practice what you've learned, try to solve GrafixMask the Division 1 500 from SRM211.
The idea is to keep track of all the blocks and consider each grid point as a node. Next, take all
the nodes that arent blocked and let (x, y) be the coordinate of the left, right, down or up node,
and if (x, y) is not blocked then you do the operation MERGE-SETS(node, node2). You should
also try to determine how disjoint-set data structures can be used in the solution of
RoadReconstruction from SRM 356. Disjoint-set data structures can also be used in
TopographicalImage from SRM 210 and PathFinding, from SRM 156.
203

Knapsack problem
geeksforgeeks.org

Dynamic Programming | Set 10 ( 0-1 Knapsack Problem)

Given weights and values of n items, put these items in a knapsack of capacity W to get the
maximum total value in the knapsack. In other words, given two integer arrays val[0..n-1] and
wt[0..n-1] which represent values and weights associated with n items respectively. Also given
an integer W which represents knapsack capacity, find out the maximum value subset of val[]
such that sum of the weights of this subset is smaller than or equal to W. You cannot break an
item, either pick the complete item, or dont pick it (0-1 property).

A simple solution is to consider all subsets of items and calculate the total weight and value of all
subsets. Consider the only subsets whose total weight is smaller than W. From all such subsets,
pick the maximum value subset.

1) Optimal Substructure:
To consider all subsets of items, there can be two cases for every item: (1) the item is included in
the optimal subset, (2) not included in the optimal set.
Therefore, the maximum value that can be obtained from n items is max of following two values.
1) Maximum value obtained by n-1 items and W weight (excluding nth item).
2) Value of nth item plus maximum value obtained by n-1 items and W minus weight of the nth
item (including nth item).

If weight of nth item is greater than W, then the nth item cannot be included and case 1 is the
only possibility.

2) Overlapping Subproblems
Following is recursive implementation that simply follows the recursive structure mentioned
above.

/* A Naive recursive implementation of 0-1 Knapsack problem */

#include<stdio.h>

// A utility function that returns maximum of two integers

int max(int a, int b) { return (a > b)? a : b; }

// Returns the maximum value that can be put in a knapsack of capacity W

int knapSack(int W, int wt[], int val[], int n)

// Base Case
204

if (n == 0 || W == 0)

return 0;

// If weight of the nth item is more than Knapsack capacity W, then

// this item cannot be included in the optimal solution

if (wt[n-1] > W)

return knapSack(W, wt, val, n-1);

// Return the maximum of two cases: (1) nth item included (2) not included

else return max( val[n-1] + knapSack(W-wt[n-1], wt, val, n-1),

knapSack(W, wt, val, n-1)

);

// Driver program to test above function

int main()

int val[] = {60, 100, 120};

int wt[] = {10, 20, 30};

int W = 50;

int n = sizeof(val)/sizeof(val[0]);

printf("%d", knapSack(W, wt, val, n));

return 0;

It should be noted that the above function computes the same subproblems again and again. See
the following recursion tree, K(1, 1) is being evaluated twice. Time complexity of this naive
recursive solution is exponential (2^n).

In the following recursion tree, K() refers to knapSack(). The two


parameters indicated in the following recursion tree are n and W.
The recursion tree is for following sample inputs.
wt[] = {1, 1, 1}, W = 2, val[] = {10, 20, 30}

K(3, 2) ---------> K(n, W)


/ \
205

/ \
K(2,2) K(2,1)
/ \ / \
/ \ / \
K(1,2) K(1,1) K(1,1) K(1,0)
/ \ / \ / \
/ \ / \ / \
K(0,2) K(0,1) K(0,1) K(0,0) K(0,1) K(0,0)
Recursion tree for Knapsack capacity 2 units and 3 items of 1 unit weight.

Since suproblems are evaluated again, this problem has Overlapping Subprolems property. So
the 0-1 Knapsack problem has both properties (see this and this) of a dynamic programming
problem. Like other typical Dynamic Programming(DP) problems, recomputations of same
subproblems can be avoided by constructing a temporary array K[][] in bottom up manner.
Following is Dynamic Programming based implementation.

// A Dynamic Programming based solution for 0-1 Knapsack problem

#include<stdio.h>

// A utility function that returns maximum of two integers

int max(int a, int b) { return (a > b)? a : b; }

// Returns the maximum value that can be put in a knapsack of capacity W

int knapSack(int W, int wt[], int val[], int n)

int i, w;

int K[n+1][W+1];

// Build table K[][] in bottom up manner

for (i = 0; i <= n; i++)

for (w = 0; w <= W; w++)

if (i==0 || w==0)

K[i][w] = 0;

else if (wt[i-1] <= w)

K[i][w] = max(val[i-1] + K[i-1][w-wt[i-1]], K[i-1][w]);


206

else

K[i][w] = K[i-1][w];

return K[n][W];

int main()

int val[] = {60, 100, 120};

int wt[] = {10, 20, 30};

int W = 50;

int n = sizeof(val)/sizeof(val[0]);

printf("%d", knapSack(W, wt, val, n));

return 0;

Time Complexity: O(nW) where n is the number of items and W is the capacity of knapsack.

Aho-Corasick String Matching Algorithm

Implementation
Gist.github.com

using namespace std;


#include <algorithm>
#include <iostream>
#include <iterator>
#include <numeric>
#include <sstream>
#include <fstream>
#include <cassert>
#include <climits>
#include <cstdlib>
#include <cstring>
#include <string>
#include <cstdio>
207

#include <vector>
#include <cmath>
#include <queue>
#include <deque>
#include <stack>
#include <list>
#include <map>
#include <set>

#define foreach(x, v) for (typeof (v).begin() x=(v).begin(); x !=(v).end();


++x)
#define For(i, a, b) for (int i=(a); i<(b); ++i)
#define D(x) cout << #x " is " << x << endl

/////////////////////////////////////////////////////////////////////////////
////////////
// Aho-Corasick's algorithm, as explained in
http://dx.doi.org/10.1145/360825.360855 //
/////////////////////////////////////////////////////////////////////////////
////////////

const int MAXS = 6 * 50 + 10; // Max number of states in the matching


machine.
// Should be equal to the sum of the length of
all keywords.

const int MAXC = 26; // Number of characters in the alphabet.

int out[MAXS]; // Output for each state, as a bitwise mask.


// Bit i in this mask is on if the keyword with index i
appears when the
// machine enters this state.

// Used internally in the algorithm.


int f[MAXS]; // Failure function
int g[MAXS][MAXC]; // Goto function, or -1 if fail.

// Builds the string matching machine.


//
// words - Vector of keywords. The index of each keyword is important:
// "out[state] & (1 << i)" is > 0 if we just found word[i] in the
text.
// lowestChar - The lowest char in the alphabet. Defaults to 'a'.
// highestChar - The highest char in the alphabet. Defaults to 'z'.
// "highestChar - lowestChar" must be <= MAXC, otherwise we
will
// access the g matrix outside its bounds and things will go
wrong.
//
// Returns the number of states that the new machine has.
// States are numbered 0 up to the return value - 1, inclusive.
int buildMatchingMachine(const vector<string> &words, char lowestChar = 'a',
char highestChar = 'z') {
memset(out, 0, sizeof out);
memset(f, -1, sizeof f);
memset(g, -1, sizeof g);
208

int states = 1; // Initially, we just have the 0 state

for (int i = 0; i < words.size(); ++i) {


const string &keyword = words[i];
int currentState = 0;
for (int j = 0; j < keyword.size(); ++j) {
int c = keyword[j] - lowestChar;
if (g[currentState][c] == -1) { // Allocate a new node
g[currentState][c] = states++;
}
currentState = g[currentState][c];
}
out[currentState] |= (1 << i); // There's a match of keywords[i] at
node currentState.
}

// State 0 should have an outgoing edge for all characters.


for (int c = 0; c < MAXC; ++c) {
if (g[0][c] == -1) {
g[0][c] = 0;
}
}

// Now, let's build the failure function


queue<int> q;
for (int c = 0; c <= highestChar - lowestChar; ++c) { // Iterate over
every possible input
// All nodes s of depth 1 have f[s] = 0
if (g[0][c] != -1 and g[0][c] != 0) {
f[g[0][c]] = 0;
q.push(g[0][c]);
}
}
while (q.size()) {
int state = q.front();
q.pop();
for (int c = 0; c <= highestChar - lowestChar; ++c) {
if (g[state][c] != -1) {
int failure = f[state];
while (g[failure][c] == -1) {
failure = f[failure];
}
failure = g[failure][c];
f[g[state][c]] = failure;
out[g[state][c]] |= out[failure]; // Merge out values
q.push(g[state][c]);
}
}
}

return states;
}

// Finds the next state the machine will transition to.


//
// currentState - The current state of the machine. Must be between
// 0 and the number of states - 1, inclusive.
209

// nextInput - The next character that enters into the machine. Should be
between lowestChar
// and highestChar, inclusive.
// lowestChar - Should be the same lowestChar that was passed to
"buildMatchingMachine".

// Returns the next state the machine will transition to. This is an integer
between
// 0 and the number of states - 1, inclusive.
int findNextState(int currentState, char nextInput, char lowestChar = 'a') {
int answer = currentState;
int c = nextInput - lowestChar;
while (g[answer][c] == -1) answer = f[answer];
return g[answer][c];
}

// How to use this algorithm:


//
// 1. Modify the MAXS and MAXC constants as appropriate.
// 2. Call buildMatchingMachine with the set of keywords to search for.
// 3. Start at state 0. Call findNextState to incrementally transition
between states.
// 4. Check the out function to see if a keyword has been matched.
//
// Example:
//
// Assume keywords is a vector that contains {"he", "she", "hers", "his"} and
text is a string
// that contains "ahishers".
//
// Consider this program:
//
// buildMatchingMachine(v, 'a', 'z');
// int currentState = 0;
// for (int i = 0; i < text.size(); ++i) {
// currentState = findNextState(currentState, text[i], 'a');
// if (out[currentState] == 0) continue; // Nothing new, let's move on to
the next character.
// for (int j = 0; j < keywords.size(); ++j) {
// if (out[currentState] & (1 << j)) { // Matched keywords[j]
// cout << "Keyword " << keywords[j] << " appears from "
// << i - keywords[j].size() + 1 << " to " << i << endl;
// }
// }
// }
//
// The output of this program is:
//
// Keyword his appears from 1 to 3
// Keyword he appears from 4 to 5
// Keyword she appears from 3 to 5
// Keyword hers appears from 4 to 7

/////////////////////////////////////////////////////////////////////////////
////////////
210

// End of Aho-Corasick's algorithm.


//
/////////////////////////////////////////////////////////////////////////////
////////////

int main(){
vector<string> keywords;
keywords.push_back("he");
keywords.push_back("she");
keywords.push_back("hers");
keywords.push_back("his");
string text = "ahishers";

buildMatchingMachine(keywords, 'a', 'z');


int currentState = 0;
for (int i = 0; i < text.size(); ++i) {
currentState = findNextState(currentState, text[i], 'a');
if (out[currentState] == 0) continue; // Nothing new, let's move on to
the next character.
for (int j = 0; j < keywords.size(); ++j) {
if (out[currentState] & (1 << j)) { // Matched keywords[j]
cout << "Keyword " << keywords[j] << " appears from "
<< i - keywords[j].size() + 1 << " to " << i << endl;
}
}
}

return 0;

Strongly Connected Components


geeksforgeeks.org

A directed graph is strongly connected if there is a path between all pairs of vertices. A strongly
connected component (SCC) of a directed graph is a maximal strongly connected subgraph. For
example, there are 3 SCCs in the following graph.
211

We can find all strongly connected components in O(V+E) time using Kosarajus algorithm.
Following is detailed Kosarajus algorithm.
1) Create an empty stack S and do DFS traversal of a graph. In DFS traversal, after calling
recursive DFS for adjacent vertices of a vertex, push the vertex to stack.
2) Reverse directions of all arcs to obtain the transpose graph.
3) One by one pop a vertex from S while S is not empty. Let the popped vertex be v. Take v as
source and do DFS (call DFSUtil(v)). The DFS starting from v prints strongly connected
component of v.

How does this work?


The above algorithm is DFS based. It does DFS two times. DFS of a graph produces a single tree
if all vertices are reachable from the DFS starting point. Otherwise DFS produces a forest. So
DFS of a graph with only one SCC always produces a tree. The important point to note is DFS
may produce a tree or a forest when there are more than one SCCs depending upon the chosen
starting point. For example, in the above diagram, if we start DFS from vertices 0 or 1 or 2, we
get a tree as output. And if we start from 3 or 4, we get a forest. To find and print all SCCs, we
would want to start DFS from vertex 4 (which is a sink vertex), then move to 3 which is sink in
the remaining set (set excluding 4) and finally any of the remaining vertices (0, 1, 2). So how do
we find this sequence of picking vertices as starting points of DFS? Unfortunately, there is no
direct way for getting this sequence. However, if we do a DFS of graph and store vertices
according to their finish times, we make sure that the finish time of a vertex that connects to
other SCCs (other that its own SCC), will always be greater than finish time of vertices in the
other SCC (See this for proof). For example, in DFS of above example graph, finish time of 0 is
always greater than 3 and 4 (irrespective of the sequence of vertices considered for DFS). And
finish time of 3 is always greater than 4. DFS doesnt guarantee about other vertices, for example
finish times of 1 and 2 may be smaller or greater than 3 and 4 depending upon the sequence of
vertices considered for DFS. So to use this property, we do DFS traversal of complete graph and
push every finished vertex to a stack. In stack, 3 always appears after 4, and 0 appear after both 3
and 4.
In the next step, we reverse the graph. Consider the graph of SCCs. In the reversed graph, the
edges that connect two components are reversed. So the SCC {0, 1, 2} becomes sink and the
212

SCC {4} becomes source. As discussed above, in stack, we always have 0 before 3 and 4. So if
we do a DFS of the reversed graph using sequence of vertices in stack, we process vertices from
sink to source. That is what we wanted to achieve and that is all needed to print SCCs one by
one.

Following is C++ implementation of Kosarajus algorithm.

// Implementation of Kosaraju's algorithm to print all SCCs


#include <iostream>
#include <list>
#include <stack>
using namespace std;

class Graph
{
int V; // No. of vertices
list<int> *adj; // An array of adjacency lists

// Fills Stack with vertices (in increasing order of finishing times)


// The top element of stack has the maximum finishing time
void fillOrder(int v, bool visited[], stack<int> &Stack);

// A recursive function to print DFS starting from v


void DFSUtil(int v, bool visited[]);
public:
Graph(int V);
void addEdge(int v, int w);

// The main function that finds and prints strongly connected components
void printSCCs();

// Function that returns reverse (or transpose) of this graph


Graph getTranspose();
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}
213

// A recursive function to print DFS starting from v


void Graph::DFSUtil(int v, bool visited[])
{
// Mark the current node as visited and print it
visited[v] = true;
cout << v << " ";

// Recur for all the vertices adjacent to this vertex


list<int>::iterator i;
for (i = adj[v].begin(); i != adj[v].end(); ++i)
if (!visited[*i])
DFSUtil(*i, visited);
}

Graph Graph::getTranspose()
{
Graph g(V);
for (int v = 0; v < V; v++)
{
// Recur for all the vertices adjacent to this vertex
list<int>::iterator i;
for(i = adj[v].begin(); i != adj[v].end(); ++i)
{
g.adj[*i].push_back(v);
}
}
return g;
}

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to vs list.
}

void Graph::fillOrder(int v, bool visited[], stack<int> &Stack)


{
// Mark the current node as visited and print it
visited[v] = true;

// Recur for all the vertices adjacent to this vertex


list<int>::iterator i;
for(i = adj[v].begin(); i != adj[v].end(); ++i)
if(!visited[*i])
fillOrder(*i, visited, Stack);

// All vertices reachable from v are processed by now, push v to Stack


Stack.push(v);
}

// The main function that finds and prints all strongly connected components
void Graph::printSCCs()
{
stack<int> Stack;
214

// Mark all the vertices as not visited (For first DFS)


bool *visited = new bool[V];
for(int i = 0; i < V; i++)
visited[i] = false;

// Fill vertices in stack according to their finishing times


for(int i = 0; i < V; i++)
if(visited[i] == false)
fillOrder(i, visited, Stack);

// Create a reversed graph


Graph gr = getTranspose();

// Mark all the vertices as not visited (For second DFS)


for(int i = 0; i < V; i++)
visited[i] = false;

// Now process all vertices in order defined by Stack


while (Stack.empty() == false)
{
// Pop a vertex from stack
int v = Stack.top();
Stack.pop();

// Print Strongly connected component of the popped vertex


if (visited[v] == false)
{
gr.DFSUtil(v, visited);
cout << endl;
}
}
}

// Driver program to test above functions


int main()
{
// Create a graph given in the above diagram
Graph g(5);
g.addEdge(1, 0);
g.addEdge(0, 2);
g.addEdge(2, 1);
g.addEdge(0, 3);
g.addEdge(3, 4);

cout << "Following are strongly connected components in given graph \n";
g.printSCCs();

return 0;
}

Output:

Following are strongly connected components in given graph


0 1 2
215

3
4

Time Complexity: The above algorithm calls DFS, fins reverse of the graph and again calls
DFS. DFS takes O(V+E) for a graph represented using adjacency list. Reversing a graph also
takes O(V+E) time. For reversing the graph, we simple traverse all adjacency lists.

The above algorithm is asymptotically best algorithm, but there are other algorithms like
Tarjans algorithm and path-based which have same time complexity but find SCCs using single
DFS. The Tarjans algorithm is discussed in the following post.

Tarjans Algorithm to find Strongly Connected Components

Applications:
SCC algorithms can be used as a first step in many graph algorithms that work only on strongly
connected graph.
In social networks, a group of people are generally strongly connected (For example, students of
a class or any other common place). Many people in these groups generally like some common
pages or play common games. The SCC algorithms can be used to find such groups and suggest
the commonly liked pages or games to the people in the group who have not yet liked commonly
liked a page or played a game.

Bellman Ford algorithm


geeksforgeeks.com

Given a graph and a source vertex src in graph, find shortest paths from src to all vertices in the
given graph. The graph may contain negative weight edges.
We have discussed Dijkstras algorithm for this problem. Dijksras algorithm is a Greedy
algorithm and time complexity is O(VLogV) (with the use of Fibonacci heap). Dijkstra doesnt
work for Graphs with negative weight edges, Bellman-Ford works for such graphs. Bellman-
Ford is also simpler than Dijkstra and suites well for distributed systems. But time complexity of
Bellman-Ford is O(VE), which is more than Dijkstra.

Algorithm
Following are the detailed steps.

Input: Graph and a source vertex src


Output: Shortest distance to all vertices from src. If there is a negative weight cycle, then
shortest distances are not calculated, negative weight cycle is reported.

1) This step initializes distances from source to all vertices as infinite and distance to source
itself as 0. Create an array dist[] of size |V| with all values as infinite except dist[src] where src is
source vertex.

2) This step calculates shortest distances. Do following |V|-1 times where |V| is the number of
vertices in given graph.
216

..a) Do following for each edge u-v


If dist[v] > dist[u] + weight of edge uv, then update dist[v]
.dist[v] = dist[u] + weight of edge uv

3) This step reports if there is a negative weight cycle in graph. Do following for each edge u-v
If dist[v] > dist[u] + weight of edge uv, then Graph contains negative weight cycle
The idea of step 3 is, step 2 guarantees shortest distances if graph doesnt contain negative
weight cycle. If we iterate through all edges one more time and get a shorter path for any vertex,
then there is a negative weight cycle

How does this work? Like other Dynamic Programming Problems, the algorithm calculate
shortest paths in bottom-up manner. It first calculates the shortest distances for the shortest paths
which have at-most one edge in the path. Then, it calculates shortest paths with at-nost 2 edges,
and so on. After the ith iteration of outer loop, the shortest paths with at most i edges are
calculated. There can be maximum |V| 1 edges in any simple path, that is why the outer loop
runs |v| 1 times. The idea is, assuming that there is no negative weight cycle, if we have
calculated shortest paths with at most i edges, then an iteration over all edges guarantees to give
shortest path with at-most (i+1) edges (Proof is simple, you can refer this or MIT Video Lecture)

Example
Let us understand the algorithm with following example graph. The images are taken from this
source.

Let the given source vertex be 0. Initialize all distances as infinite, except the distance to source
itself. Total number of vertices in the graph is 5, so all edges must be processed 4 times.

Let all edges are processed in following order: (B,E), (D,B), (B,D), (A,B), (A,C), (D,C), (B,C),
(E,D). We get following distances when all edges are processed first time. The first row in shows
initial distances. The second row shows distances when edges (B,E), (D,B), (B,D) and (A,B) are
processed. The third row shows distances when (A,C) is processed. The fourth row shows when
217

(D,C), (B,C) and (E,D) are processed.

The first iteration guarantees to give all shortest paths which are at most 1 edge long. We get
following distances when all edges are processed second time (The last row shows final values).

The second iteration guarantees to give all shortest paths which are at most 2 edges long. The
algorithm processes all edges 2 more times. The distances are minimized after the second
iteration, so third and fourth iterations dont update the distances.

Implementation:

// A C / C++ program for Bellman-Ford's single source shortest path


algorithm.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

// a structure to represent a weighted edge in graph


struct Edge
{
int src, dest, weight;
};

// a structure to represent a connected, directed and weighted graph


struct Graph
{
// V-> Number of vertices, E-> Number of edges
int V, E;

// graph is represented as an array of edges.


218

struct Edge* edge;


};

// Creates a graph with V vertices and E edges


struct Graph* createGraph(int V, int E)
{
struct Graph* graph = (struct Graph*) malloc( sizeof(struct Graph) );
graph->V = V;
graph->E = E;

graph->edge = (struct Edge*) malloc( graph->E * sizeof( struct Edge ) );

return graph;
}

// A utility function used to print the solution


void printArr(int dist[], int n)
{
printf("Vertex Distance from Source\n");
for (int i = 0; i < n; ++i)
printf("%d \t\t %d\n", i, dist[i]);
}

// The main function that finds shortest distances from src to all other
// vertices using Bellman-Ford algorithm. The function also detects negative
// weight cycle
void BellmanFord(struct Graph* graph, int src)
{
int V = graph->V;
int E = graph->E;
int dist[V];

// Step 1: Initialize distances from src to all other vertices as


INFINITE
for (int i = 0; i < V; i++)
dist[i] = INT_MAX;
dist[src] = 0;

// Step 2: Relax all edges |V| - 1 times. A simple shortest path from src
// to any other vertex can have at-most |V| - 1 edges
for (int i = 1; i <= V-1; i++)
{
for (int j = 0; j < E; j++)
{
int u = graph->edge[j].src;
int v = graph->edge[j].dest;
int weight = graph->edge[j].weight;
if (dist[u] + weight < dist[v])
dist[v] = dist[u] + weight;
}
}

// Step 3: check for negative-weight cycles. The above step guarantees


// shortest distances if graph doesn't contain negative weight cycle.
// If we get a shorter path, then there is a cycle.
219

for (int i = 0; i < E; i++)


{
int u = graph->edge[i].src;
int v = graph->edge[i].dest;
int weight = graph->edge[i].weight;
if (dist[u] + weight < dist[v])
printf("Graph contains negative weight cycle");
}

printArr(dist, V);

return;
}

// Driver program to test above functions


int main()
{
/* Let us create the graph given in above example */
int V = 5; // Number of vertices in graph
int E = 8; // Number of edges in graph
struct Graph* graph = createGraph(V, E);

// add edge 0-1 (or A-B in above figure)


graph->edge[0].src = 0;
graph->edge[0].dest = 1;
graph->edge[0].weight = -1;

// add edge 0-2 (or A-C in above figure)


graph->edge[1].src = 0;
graph->edge[1].dest = 2;
graph->edge[1].weight = 4;

// add edge 1-2 (or B-C in above figure)


graph->edge[2].src = 1;
graph->edge[2].dest = 2;
graph->edge[2].weight = 3;

// add edge 1-3 (or B-D in above figure)


graph->edge[3].src = 1;
graph->edge[3].dest = 3;
graph->edge[3].weight = 2;

// add edge 1-4 (or A-E in above figure)


graph->edge[4].src = 1;
graph->edge[4].dest = 4;
graph->edge[4].weight = 2;

// add edge 3-2 (or D-C in above figure)


graph->edge[5].src = 3;
graph->edge[5].dest = 2;
graph->edge[5].weight = 5;

// add edge 3-1 (or D-B in above figure)


graph->edge[6].src = 3;
graph->edge[6].dest = 1;
220

graph->edge[6].weight = 1;

// add edge 4-3 (or E-D in above figure)


graph->edge[7].src = 4;
graph->edge[7].dest = 3;
graph->edge[7].weight = -3;

BellmanFord(graph, 0);

return 0;
}

Output:

Vertex Distance from Source


0 0
1 -1
2 2
3 -2
4 1

Notes
1) Negative weights are found in various applications of graphs. For example, instead of paying
cost for a path, we may get some advantage if we follow the path.

2) Bellman-Ford works better (better than Dijksras) for distributed systems. Unlike Dijksras
where we need to find minimum value of all vertices, in Bellman-Ford, edges are considered one
by one.

Exercise
1) The standard Bellman-Ford algorithm reports shortest path only if there is no negative weight
cycles. Modify it so that it reports minimum distances even if there is a negative weight cycle.

2) Can we use Dijksras algorithm for shortest paths for graphs with negative weights one idea
can be, calculate the minimum weight value, add a positive value (equal to absolute value of
minimum weight value) to all weights and run the Dijksras algorithm for the modified graph.
Will this algorithm work?
221

Heavy-light Decomposition
blog.anudeep2011.com

Why a Balanced Binary Tree is good?

Balanced Binary Tree

A balanced binary tree with N nodes has a height of log N. This gives us the following
properties:

You need to visit at most log N nodes to reach root node from any other node
You need to visit at most 2 * log N nodes to reach from any node to any other node in the tree

The log factor is always good in the world of competitive programming

Now, if a balanced binary tree with N nodes is given, then many queries can be done with O( log
N ) complexity. Distance of a path, Maximum/Minimum in a path, Maximum contiguous sum
etc etc.

Why a Chain is good?

A chain is a set of nodes connected one after another. It can be viewed as a simple array of
nodes/numbers. We can do many operations on array of elements with O( log N )
222

complexity using segment tree / BIT / other data structures. You can read more about segment
trees here A tutorial by Utkarsh .

Now, we know that Balanced Binary Trees and arrays are good for computation. We can do a lot
of operations with O( log N ) complexity on both the data structures.

Why an Unbalanced Tree is bad?

Height of unbalanced tree can be arbitrary. In the worst case, we have to visit O( N ) nodes to
move from one node to another node. So Unbalanced trees are not computation friendly. We
shall see how we can deal with unbalanced trees.

Consider this example..

Consider the following question: Given A and B, calculate the sum of all node values on the
path from A to B.
223

Here are details about the given images

1. The tree in the image has N nodes.


2. We need to visit N/3 nodes to travel from A to D.
3. We need to visit >N/3 nodes to travel from B to D.
4. We need to visit >N/2 nodes to travel from C to D.

It is clear that moving from one node to another can be up to O( N ) complexity.

This is important: What if we broke the tree in to 3 chains as shown in image below. Then we
consider each chain as an independent problem. We are dealing with chains so we can use
Segment Trees/other data structures that work well on linear list of data.

Here are the details after the trick

1. The tree still has N nodes, but it is DECOMPOSED into 3 chains each of size N/3. See 3 different
colors, each one is a chain.
224

2. A and D belong to the same chain. We can get the required sum of node values of path
from A to D in O( log N ) time using segment tree data structure.
3. B belongs to yellow chain, and D belongs to blue chain. Path from B to D can be broken as
B to T3 and T4 to D. Now we are dealing with 2 cases which are similar to the above case. We
can calculate required sum in O( log N ) time for B to T3 and O( log N ) time for T4 to D. Great,
we reduced this to O( log N ).
4. C belongs to red chain, and D belongs to blue chain. Path from C to D can be broken
as C to T1, T2 to T3 and T4 to D. Again we are dealing with 3 cases similar to 2nd case. So we can
again do it in O( log N ).

Awesome!! We used concepts of Decomposition and Segment Tree DS, reduced the query
complexity from O( N ) to O( log N ). As I said before, competitive programmers always love
the log factors

But wait the tree in the example is special, only 2 nodes had degree greater than 2. We did a
simple decomposition and achieved better complexity, but in a general tree we need to do some
thing little more complex to get better complexity. And that little more complex decomposition
is called Heavy Light Decomposition.

Basic Idea

We will divide the tree into vertex-disjoint chains ( Meaning no two chains has a node in
common ) in such a way that to move from any node in the tree to the root node, we will have to
change at most log N chains. To put it in another words, the path from any node to root can be
broken into pieces such that the each piece belongs to only one chain, then we will have no more
than log N pieces.

Let us assume that the above is done, So what?. Now the path from any node A to any
node B can be broken into two paths: A to LCA( A, B ) and B to LCA( A, B ). Details about
LCA - Click Here or Here. So at this point we need to only worry about paths of the following
format: Start at some node and go up the tree because A to LCA( A, B ) and B to LCA( A, B
) are both such paths.

What are we up to till now?

We assumed that we can break tree into chains such that we will have to change at most log
N chains to move from any node up the tree to any other node.
Any path can be broken into two paths such both paths start at some node and move up the
tree
We already know that queries in each chain can be answered with O( log N ) complexity and
there are at most log N chains we need to consider per path. So on the whole we have O( log^2
N ) complexity solution. Great!!

Till now I have explained how HLD can be used to reduce complexity. Now we shall see details
about how HLD actually decomposes the tree.
225

Note : My version of HLD is little different from the standard one, but still everything said
above holds.

Terminology

Common tree related terminology can be found here.

Special Child : Among all child nodes of a node, the one with maximum sub-tree size is
considered as Special child. Each non leaf node has exactly one Special child.

Special Edge : For each non-leaf node, the edge connecting the node with its Special child is
considered as Special Edge.

Read the next 3 paras until you clearly understand every line of it, every line makes sense
(Hope!). Read it 2 times 3 times 10 times 2 power 10 times .. , until you understand!!

What happens if you go to each node, find the special child and special edge and mark all special
edges with green color and other edges are still black? Well, what happens is HLD. What would
the graph look like then? Colorful yes. Not just colorful. Green edges actually forms vertex
disjoint chains and black edges will be the connectors between chains. Let us explore one chain,
start at root, move to the special child of root (there is only one special child, so easy pick), then
to its special child and so on until you reach a leaf node, what you just traveled is a chain which
starts at root node. Let us assume that root node has m child nodes. Note that all m-1 normal
child nodes are starting nodes of some other chains.

What happens if you move from a node to a normal child node of it. This is the most important
part. When you move from a node to any of its normal child, the sub-tree size at least becomes
half. Consider a node X whose sub-tree size is s and has m child nodes. If m=1, then the only
child is special child (So there is no case of moving to normal child). For m>2, sub-tree size of
any normal child is <=s/2. To prove that, let us talk about the sub-tree size of special child. What
is the least sub-tree size possible for special child? Answer is ceil( s/m ) (what is ceil? click
here). To prove it, let us assume it is less than ceil( s/m ). As this child is the one with maximum
sub-tree size, all other normal child nodes will be at most as large as special child, m child nodes
with each less than ceil( s/m ) will not sum up to s, so with this counter-intuition. We have the
following: The mininum sub-tree size possible for special child is ceil( s/m ). This being said, the
maximum size for normal child is ceil( s/m ) and m>2. So when ever you move from a node to
a normal child, the sub-tree size at least becomes half.

We stated early that to move from root to any node (or viceversa) we need to change at most log
N chains. Here is the proof; Changing a chain means we are moving for a node to a normal child,
so each time we change chain we are at least halving the sub-tree size. For a tree with size N, we
can halve it at most log N times (Why? Well, take a number and keep halving, let me know if it
takes more than ceil( log N ) steps).
226

At this point, we know what HLD is, we know why one has to use HLD, basic idea of HLD,
terminology and proof. We shall now see implementation details of HLD and few related
problems.

Implementation

Algorithm

1HLD(curNode, Chain):

2 Add curNode to curChain

3 If curNode is LeafNode: return //Nothing left to do

4 sc := child node with maximum sub-tree size //sc is the special


child
5
HLD(sc, Chain) //Extend current chain
6 to special child
227

7 for each child node cn of curNode: //For normal childs

if cn != sc: HLD(cn, newChain) //As told above, for


each normal child, a new chain starts

Above algorithm correctly does HLD. But we will need bit more information when
solving HLD related problems. We should be able to answer the following questions:

1. Given a node, to which chain does that node belong to.


2. Given a node, what is the position of that node in its chain.
3. Given a chain, what is the head of the chain
4. Given a chain, what is the length of the chain

So let us see a C++ implementation which covers all of the above

int chainNo=0,chainHead[N],chainPos[N],chainInd[N],chainSize[N];
1
void hld(int cur) {
2
if(chainHead[chainNo] == -1) chainHead[chainNo]=cur;
3
chainInd[cur] = chainNo;
4 chainPos[cur] = chainSize[chainNo];
5 chainSize[chainNo]++;

7 int ind = -1,mai = -1;

8 for(int i = 0; i < adj[cur].sz; i++) { if(subsize[ adj[cur][i] ]


> mai) {
9
mai = subsize[ adj[cur][i] ];
10 ind = i;
11 }

12 }

13

14 if(ind >= 0) hld( adj[cur][ind] );

15

16 for(int i = 0; i < adj[cur].sz; i++) {

if(i != ind) {
17
chainNo++;
18
hld( adj[cur][i] );
19
}
228

20 }

21}

22

23

Initially all entries of chainHead[] are set to -1. So in line 3 when ever a new chain is started,
chain head is correctly assigned. As we add a new node to chain, we will note its position in the
chain and increment the chain length. In the first for loop (lines 9-14) we find the child node
which has maximum sub-tree size. The following if condition (line 16) is failed for leaf nodes.
When the if condition passes, we expand the chain to special child. In the second for loop (lines
18-23) we recursively call the function on all normal nodes. chainNo++ ensures that we are
creating a new chain for each normal child.

Example

Problem : SPOJ QTREE


Solution : Each edge has a number associated with it. Given 2 nodes A and B, we need to find
the edge on path from A to B with maximum value. Clearly we can break the path
into A to LCA( A, B ) and B to LCA( A, B ), calculate answer for each of them and take the
maximum of both. As mentioned above as the tree need not be balanced, it may take upto O( N )
to travel from A to LCA( A, B ) and find the maximum. Let us use HLD as detailed above to
solve the problem.

Solution Link : Github Qtree.cpp (well commented solution)


I will not explain all the functions of the solution. I will explain how query works in detail

main() : Scans the tree, calls all required functions in order.


dfs() : Helper function. Sets up depth, subsize, parent of each node.
LCA() : Returns Lowest Common Ancestor of two node
make_tree() : Segment tree construction
update_tree() : Segment tree update. Point Update
query_tree() : Segment tree query. Range Update
HLD() : Does HL-Decomposition, similar to one explained above
change() : Performs change operation as given in problem statemenm

query() : We shall see in detail about the query function.

1int query(int u, int v) {

2 int lca = LCA(u, v);

3 return max( query_up(u, lca), query_up(v, lca) );

4}
229

we calculate LCA(u, v). we call query_up function twice once for the path u to lca and again for
the path v to lca. we take the maximum of those both as the answer.

query_up() : This is important. This function takes 2 nodes u, v such that v is ancestor of u.
That means the path from u to v is like going up the tree. We shall see how it works.

1
int query_up(int u, int v) {
2
int uchain, vchain = chainInd[v], ans = -1;
3

4 while(1) {
5 if(uchain == vchain) {

6 int cur = query_tree(1, 0, ptr, posInBase[v]+1, posInBase[u]+1);

7 if( cur > ans ) ans = cur;

8 break;

}
9
int cur = query_tree(1, 0, ptr, posInBase[chainHead[uchain]],
10posInBase[u]+1);

11 if( cur > ans ) ans = cur;

12 u = chainHead[uchain];

13 u = parent(u);

}
14
return ans;
15
}
16

uchain and vchain are chain numbers in which u and v are present respectively. We have a
while loop which goes on until we move up from u till v. We have 2 cases, one case is when
both u and v belong to the same chain, in this case we can query for the range between u and v.
We can stop our query at this point because we reached v.
Second case is when u and v are in different chains. Clearly v is above in the tree than u. So we
need to completely move up the chain of u and go to next chain above u. We query from
chainHead[u] to u, update the answer. Now we need to change chain. Next node after current
chain is the parent of chainHead[u].
230

Convex Hull
geeksforgeeks.com

Convex Hull | Set 1 (Jarviss Algorithm or Wrapping)

Given a set of points in the plane. the convex hull of the set is the smallest convex polygon that
contains all the points of it.

We strongly recommend to see the following post first.


How to check if two given line segments intersect?

The idea of Jarviss Algorithm is simple, we start from the leftmost point (or point with
minimum x coordinate value) and we keep wrapping points in counterclockwise direction. The
big question is, given a point p as current point, how to find the next point in output? The idea is
to use orientation() here. Next point is selected as the point that beats all other points at
counterclockwise orientation, i.e., next point is q if for any other point r, we have orientation(p,
r, q) = counterclockwise. Following is the detailed algorithm.

1) Initialize p as leftmost point.


2) Do following while we dont come back to the first (or leftmost) point.
..a) The next point q is the point such that the triplet (p, q, r) is counterclockwise for any other
point r.
..b) next[p] = q (Store q as next of p in the output convex hull).
..c) p = q (Set p as q for next iteration).

// A C++ program to find convex hull of a set of points

// Refer http://www.geeksforgeeks.org/check-if-two-given-line-segments-
intersect/

// for explanation of orientation()

#include <iostream>

using namespace std;

// Define Infinite (Using INT_MAX caused overflow problems)

#define INF 10000


231

struct Point

int x;

int y;

};

// To find orientation of ordered triplet (p, q, r).

// The function returns following values

// 0 --> p, q and r are colinear

// 1 --> Clockwise

// 2 --> Counterclockwise

int orientation(Point p, Point q, Point r)

int val = (q.y - p.y) * (r.x - q.x) -

(q.x - p.x) * (r.y - q.y);

if (val == 0) return 0; // colinear

return (val > 0)? 1: 2; // clock or counterclock wise

// Prints convex hull of a set of n points.

void convexHull(Point points[], int n)

// There must be at least 3 points

if (n < 3) return;

// Initialize Result

int next[n];

for (int i = 0; i < n; i++)

next[i] = -1;
232

// Find the leftmost point

int l = 0;

for (int i = 1; i < n; i++)

if (points[i].x < points[l].x)

l = i;

// Start from leftmost point, keep moving counterclockwise

// until reach the start point again

int p = l, q;

do

// Search for a point 'q' such that orientation(p, i, q) is

// counterclockwise for all points 'i'

q = (p+1)%n;

for (int i = 0; i < n; i++)

if (orientation(points[p], points[i], points[q]) == 2)

q = i;

next[p] = q; // Add q to result as a next point of p

p = q; // Set p as q for next iteration

} while (p != l);

// Print Result

for (int i = 0; i < n; i++)

if (next[i] != -1)

cout << "(" << points[i].x << ", " << points[i].y << ")\n";

}
233

// Driver program to test above functions

int main()

Point points[] = {{0, 3}, {2, 2}, {1, 1}, {2, 1},

{3, 0}, {0, 0}, {3, 3}};

int n = sizeof(points)/sizeof(points[0]);

convexHull(points, n);

return 0;

Output: The output is points of the convex hull.

(0, 3)
(3, 0)
(0, 0)
(3, 3)

Time Complexity: For every point on the hull we examine all the other points to determine the
next point. Time complexity is where n is number of input points and m is number of
output or hull points (m <= n). In worst case, time complexity is O(n 2). The worst case occurs
when all the points are on the hull (m = n)

Convex Hull | Set 2 (Graham Scan)

Given a set of points in the plane. the convex hull of the set is the smallest convex polygon that
contains all the points of it.

We strongly recommend to see the following post first.


How to check if two given line segments intersect?

We have discussed Jarviss Algorithm for Convex Hull. Worst case time complexity of Jarviss
Algorithm is O(n^2). Using Grahams scan algorithm, we can find Convex Hull in O(nLogn)
time. Following is Grahams algorithm

Let points[0..n-1] be the input array.


234

1) Find the bottom-most point by comparing y coordinate of all points. If there are two points
with same y value, then the point with smaller x coordinate value is considered. Put the bottom-
most point at first position.

2) Consider the remaining n-1 points and sort them by polor angle in counterclockwise order
around points[0]. If polor angle of two points is same, then put the nearest point first.

3) Create an empty stack S and push points[0], points[1] and points[2] to S.

4) Process remaining n-3 points one by one. Do following for every point points[i]
4.1) Keep removing points from stack while orientation of following 3 points is not
counterclockwise or they dont make a left turn.
a) Point next to top in stack
b) Point at the top of stack
c) points[i]
4.2) Push points[i] to S

5) Print contents of S

The above algorithm can be divided in two phases.

Phase 1 (Sort points): We first find the bottom-most point. The idea is to pre-process points be
sorting them with respect to the bottom-most point. Once the points are sorted, they form a
simple closed path (See following diagram).

What should be the sorting criteria? computation of actual angles would be inefficient since
trigonometric functions are not simple to evaluate. The idea is to use the orientation to compare
angles without actually computing them (See the compare() function below)

Phase 2 (Accept or Reject Points): Once we have the closed path, the next step is to traverse
the path and remove concave points on this path. How to decide which point to remove and
which to keep? Again, orientation helps here. The first two points in sorted array are always part
of Convex Hull. For remaining points, we keep track of recent three points, and find the angle
formed by them. Let the three points be prev(p), curr(c) and next(n). If orientation of these points
(considering them in same order) is not counterclockwise, we discard c, otherwise we keep it.
235

Following diagram shows step by step process of this phase (Source of these diagrams is Ref 2).

Following is C++ implementation of the above algorithm.

// A C++ program to find convex hull of a set of points


// Refer http://www.geeksforgeeks.org/check-if-two-given-line-segments-
intersect/
// for explanation of orientation()
#include <iostream>
#include <stack>
#include <stdlib.h>
using namespace std;

struct Point
{
int x;
int y;
};

// A globle point needed for sorting points with reference to the first
point
// Used in compare function of qsort()
Point p0;

// A utility function to find next to top in a stack


Point nextToTop(stack<Point> &S)
{
Point p = S.top();
S.pop();
Point res = S.top();
S.push(p);
236

return res;
}

// A utility function to swap two points


int swap(Point &p1, Point &p2)
{
Point temp = p1;
p1 = p2;
p2 = temp;
}

// A utility function to return square of distance between p1 and p2


int dist(Point p1, Point p2)
{
return (p1.x - p2.x)*(p1.x - p2.x) + (p1.y - p2.y)*(p1.y - p2.y);
}

// To find orientation of ordered triplet (p, q, r).


// The function returns following values
// 0 --> p, q and r are colinear
// 1 --> Clockwise
// 2 --> Counterclockwise
int orientation(Point p, Point q, Point r)
{
int val = (q.y - p.y) * (r.x - q.x) -
(q.x - p.x) * (r.y - q.y);

if (val == 0) return 0; // colinear


return (val > 0)? 1: 2; // clock or counterclock wise
}

// A function used by library function qsort() to sort an array of


// points with respect to the first point
int compare(const void *vp1, const void *vp2)
{
Point *p1 = (Point *)vp1;
Point *p2 = (Point *)vp2;

// Find orientation
int o = orientation(p0, *p1, *p2);
if (o == 0)
return (dist(p0, *p2) >= dist(p0, *p1))? -1 : 1;

return (o == 2)? -1: 1;


}

// Prints convex hull of a set of n points.


void convexHull(Point points[], int n)
{
// Find the bottommost point
int ymin = points[0].y, min = 0;
for (int i = 1; i < n; i++)
{
int y = points[i].y;

// Pick the bottom-most or chose the left most point in case of tie
if ((y < ymin) || (ymin == y && points[i].x < points[min].x))
237

ymin = points[i].y, min = i;


}

// Place the bottom-most point at first position


swap(points[0], points[min]);

// Sort n-1 points with respect to the first point. A point p1 comes
// before p2 in sorted ouput if p2 has larger polar angle (in
// counterclockwise direction) than p1
p0 = points[0];
qsort(&points[1], n-1, sizeof(Point), compare);

// Create an empty stack and push first three points to it.


stack<Point> S;
S.push(points[0]);
S.push(points[1]);
S.push(points[2]);

// Process remaining n-3 points


for (int i = 3; i < n; i++)
{
// Keep removing top while the angle formed by points next-to-top,
// top, and points[i] makes a non-left turn
while (orientation(nextToTop(S), S.top(), points[i]) != 2)
S.pop();
S.push(points[i]);
}

// Now stack has the output points, print contents of stack


while (!S.empty())
{
Point p = S.top();
cout << "(" << p.x << ", " << p.y <<")" << endl;
S.pop();
}
}

// Driver program to test above functions


int main()
{
Point points[] = {{0, 3}, {1, 1}, {2, 2}, {4, 4},
{0, 0}, {1, 2}, {3, 1}, {3, 3}};
int n = sizeof(points)/sizeof(points[0]);
convexHull(points, n);
return 0;
}

Output:

(0, 3)
(4, 4)
(3, 1)
(0, 0)
238

Time Complexity: Let n be the number of input points. The algorithm takes O(nLogn) time if
we use a O(nLogn) sorting algorithm.
The first step (finding the bottom-most point) takes O(n) time. The second step (sorting points)
takes O(nLogn) time. In third step, every element is pushed and popped at most one time. So the
third step to process points one by one takes O(n) time, assuming that the stack operations take
O(1) time. Overall complexity is O(n) + O(nLogn) + O(n) which is O(nLogn)

Line Intersection
geeksforgeeks.org

Given two line segments (p1, q1) and (p2, q2), find if the given line segments intersect with each
other.

Before we discuss solution, let us define notion of orientation. Orientation


of an ordered triplet of points in
the plane can be
counterclockwise
clockwise
colinear
The following diagram shows different possible orientations of (a, b, c)

Note the word ordered here. Orientation of (a, b, c) may be different from orientation of (c, b,
a).

How is Orientation useful here?


Two segments (p1,q1) and (p2,q2) intersect if and only if one of the following two conditions is
verified

1. General Case:
- (p1, q1, p2) and (p1, q1, q2) have different orientations and
- (p2, q2, p1) and (p2, q2, q1) have different orientations

2. Special Case
- (p1, q1, p2), (p1, q1, q2), (p2, q2, p1), and (p2, q2, q1) are all collinear and
- the x-projections of (p1, q1) and (p2, q2) intersect
- the y-projections of (p1, q1) and (p2, q2) intersect
239

Examples of General Case:

Examples of Special Case:

Following is C++ implementation based on above idea.

// A C++ program to check if two given line segments intersect


#include <iostream>
using namespace std;

struct Point
{
int x;
int y;
};

// Given three colinear points p, q, r, the function checks if


// point q lies on line segment 'pr'
bool onSegment(Point p, Point q, Point r)
{
if (q.x <= max(p.x, r.x) && q.x >= min(p.x, r.x) &&
q.y <= max(p.y, r.y) && q.y >= min(p.y, r.y))
240

return true;

return false;
}

// To find orientation of ordered triplet (p, q, r).


// The function returns following values
// 0 --> p, q and r are colinear
// 1 --> Clockwise
// 2 --> Counterclockwise
int orientation(Point p, Point q, Point r)
{
// See 10th slides from following link for derivation of the formula
// http://www.dcs.gla.ac.uk/~pat/52233/slides/Geometry1x1.pdf
int val = (q.y - p.y) * (r.x - q.x) -
(q.x - p.x) * (r.y - q.y);

if (val == 0) return 0; // colinear

return (val > 0)? 1: 2; // clock or counterclock wise


}

// The main function that returns true if line segment 'p1q1'


// and 'p2q2' intersect.
bool doIntersect(Point p1, Point q1, Point p2, Point q2)
{
// Find the four orientations needed for general and
// special cases
int o1 = orientation(p1, q1, p2);
int o2 = orientation(p1, q1, q2);
int o3 = orientation(p2, q2, p1);
int o4 = orientation(p2, q2, q1);

// General case
if (o1 != o2 && o3 != o4)
return true;

// Special Cases
// p1, q1 and p2 are colinear and p2 lies on segment p1q1
if (o1 == 0 && onSegment(p1, p2, q1)) return true;

// p1, q1 and p2 are colinear and q2 lies on segment p1q1


if (o2 == 0 && onSegment(p1, q2, q1)) return true;

// p2, q2 and p1 are colinear and p1 lies on segment p2q2


if (o3 == 0 && onSegment(p2, p1, q2)) return true;

// p2, q2 and q1 are colinear and q1 lies on segment p2q2


if (o4 == 0 && onSegment(p2, q1, q2)) return true;

return false; // Doesn't fall in any of the above cases


}

// Driver program to test above functions


241

int main()
{
struct Point p1 = {1, 1}, q1 = {10, 1};
struct Point p2 = {1, 2}, q2 = {10, 2};

doIntersect(p1, q1, p2, q2)? cout << "Yes\n": cout << "No\n";

p1 = {10, 0}, q1 = {0, 10};


p2 = {0, 0}, q2 = {10, 10};
doIntersect(p1, q1, p2, q2)? cout << "Yes\n": cout << "No\n";

p1 = {-5, -5}, q1 = {0, 0};


p2 = {1, 1}, q2 = {10, 10};
doIntersect(p1, q1, p2, q2)? cout << "Yes\n": cout << "No\n";

return 0;
}

Output:

No
Yes
No

Sieve of Erastothenes
geeksforgeeks.org

Given a number n, print all primes smaller than or equal to n. It is also given that n is a small
number.
For example, if n is 10, the output should be 2, 3, 5, 7. If n is 20, the output should be 2, 3, 5,
7, 11, 13, 17, 19.

The sieve of Eratosthenes is one of the most efficient ways to find all primes smaller than n when
n is smaller than 10 million or so (Ref Wiki).

Following is the algorithm to find all the prime numbers less than or equal to a given integer n by
Eratosthenes method:

1. Create a list of consecutive integers from 2 to n: (2, 3, 4, , n).


2. Initially, let p equal 2, the first prime number.
3. Starting from p, count up in increments of p and mark each of these numbers greater
than p itself in the list. These numbers will be 2p, 3p, 4p, etc.; note that some of them
may have already been marked.
4. Find the first number greater than p in the list that is not marked. If there was no such
number, stop. Otherwise, let p now equal this number (which is the next prime), and
repeat from step 3.
242

When the algorithm terminates, all the numbers in the list that are not marked are prime.

Following is C++ implementation of the above algorithm. In the following implementation, a


boolean array arr[] of size n is used to mark multiples of prime numbers.

#include <stdio.h>
#include <string.h>

// marks all mutiples of 'a' ( greater than 'a' but less than equal to 'n')
as 1.
void markMultiples(bool arr[], int a, int n)
{
int i = 2, num;
while ( (num = i*a) <= n )
{
arr[ num-1 ] = 1; // minus 1 because index starts from 0.
++i;
}
}

// A function to print all prime numbers smaller than n


void SieveOfEratosthenes(int n)
{
// There are no prime numbers smaller than 2
if (n >= 2)
{
// Create an array of size n and initialize all elements as 0
bool arr[n];
memset(arr, 0, sizeof(arr));

/* Following property is maintained in the below for loop


arr[i] == 0 means i + 1 is prime
arr[i] == 1 means i + 1 is not prime */
for (int i=1; i<n; ++i)
{
if ( arr[i] == 0 )
{
//(i+1) is prime, print it and mark its multiples
printf("%d ", i+1);
markMultiples(arr, i+1, n);
}
}
}
}

// Driver Program to test above function


int main()
{
int n = 30;
printf("Following are the prime numbers below %d\n", n);
SieveOfEratosthenes(n);
return 0;
}
243

Output:

Following are the prime numbers below 30


2 3 5 7 11 13 17 19 23 29

Interval Tree
geeksforgeeks.org

Consider a situation where we have a set of intervals and we need following operations to be
implemented efficiently.
1) Add an interval
2) Remove an interval
3) Given an interval x, find if x overlaps with any of the existing intervals.

Interval Tree: The idea is to augment a self-balancing Binary Search Tree (BST) like Red Black
Tree, AVL Tree, etc to maintain set of intervals so that all operations can be done in O(Logn)
time.

Every node of Interval Tree stores following information.


a) i: An interval which is represented as a pair [low, high]
b) max: Maximum high value in subtree rooted with this node.

The low value of an interval is used as key to maintain order in BST. The insert and delete
operations are same as insert and delete in self-balancing BST used.

The main operation is to search for an overlapping interval. Following is algorithm for searching
an overlapping interval x in an Interval tree rooted with root.

Interval overlappingIntervalSearch(root, x)
1) If x overlaps with root's interval, return the root's interval.
244

2) If left child of root is not empty and the max in left child
is greater than x's low value, recur for left child

3) Else recur for right child.

How does the above algorithm work?


Let the interval to be searched be x. We need to prove this in for following two cases.

Case 1: When we go to right subtree, one of the following must be true.


a) There is an overlap in right subtree: This is fine as we need to return one overlapping interval.
b) There is no overlap in either subtree: We go to right subtree only when either left is NULL or
maximum value in left is smaller than x.low. So the interval cannot be present in left subtree.

Case 2: When we go to left subtree, one of the following must be true.


a) There is an overlap in left subtree: This is fine as we need to return one overlapping interval.
b) There is no overlap in either subtree: This is the most important part. We need to consider
following facts.
We went to left subtree because x.low <= max in left subtree
. max in left subtree is a high of one of the intervals let us say [a, max] in left subtree.
. Since x doesnt overlap with any node in left subtree x.low must be smaller than a.
. All nodes in BST are ordered by low value, so all nodes in right subtree must have low value
greater than a.
. From above two facts, we can say all intervals in right subtree have low value greater than
x.low. So x cannot overlap with any interval in right subtree.

Implementation of Interval Tree:


Following is C++ implementation of Interval Tree. The implementation uses basic insert
operation of BST to keep things simple. Ideally it should be insertion of AVL Tree or insertion
of Red-Black Tree. Deletion from BST is left as an exercise.

#include <iostream>
using namespace std;

// Structure to represent an interval


struct Interval
{
int low, high;
};

// Structure to represent a node in Interval Search Tree


struct ITNode
{
Interval *i; // 'i' could also be a normal variable
int max;
ITNode *left, *right;
};

// A utility function to create a new Interval Search Tree Node


ITNode * newNode(Interval i)
{
245

ITNode *temp = new ITNode;


temp->i = new Interval(i);
temp->max = i.high;
temp->left = temp->right = NULL;
};

// A utility function to insert a new Interval Search Tree Node


// This is similar to BST Insert. Here the low value of interval
// is used tomaintain BST property
ITNode *insert(ITNode *root, Interval i)
{
// Base case: Tree is empty, new node becomes root
if (root == NULL)
return newNode(i);

// Get low value of interval at root


int l = root->i->low;

// If root's low value is smaller, then new interval goes to


// left subtree
if (i.low < l)
root->left = insert(root->left, i);

// Else, new node goes to right subtree.


else
root->right = insert(root->right, i);

// Update the max value of this ancestor if needed


if (root->max < i.high)
root->max = i.high;

return root;
}

// A utility function to check if given two intervals overlap


bool doOVerlap(Interval i1, Interval i2)
{
if (i1.low <= i2.high && i2.low <= i1.high)
return true;
return false;
}

// The main function that searches a given interval i in a given


// Interval Tree.
Interval *intervalSearch(ITNode *root, Interval i)
{
// Base Case, tree is empty
if (root == NULL) return NULL;

// If given interval overlaps with root


if (doOVerlap(*(root->i), i))
return root->i;

// If left child of root is present and max of left child is


// greater than or equal to given interval, then i may
246

// overlap with an interval is left subtree


if (root->left != NULL && root->left->max >= i.low)
return intervalSearch(root->left, i);

// Else interval can only overlap with right subtree


return intervalSearch(root->right, i);
}

void inorder(ITNode *root)


{
if (root == NULL) return;

inorder(root->left);

cout << "[" << root->i->low << ", " << root->i->high << "]"
<< " max = " << root->max << endl;

inorder(root->right);
}

// Driver program to test above functions


int main()
{
// Let us create interval tree shown in above figure
Interval ints[] = {{15, 20}, {10, 30}, {17, 19},
{5, 20}, {12, 15}, {30, 40}
};
int n = sizeof(ints)/sizeof(ints[0]);
ITNode *root = NULL;
for (int i = 0; i < n; i++)
root = insert(root, ints[i]);

cout << "Inorder traversal of constructed Interval Tree is\n";


inorder(root);

Interval x = {6, 7};

cout << "\nSearching for interval [" << x.low << "," << x.high << "]";
Interval *res = intervalSearch(root, x);
if (res == NULL)
cout << "\nNo Overlapping Interval";
else
cout << "\nOverlaps with [" << res->low << ", " << res->high << "]";
}

Output:

Inorder traversal of constructed Interval Tree is


[5, 20] max = 20
[10, 30] max = 30
[12, 15] max = 15
[15, 20] max = 40
[17, 19] max = 40
[30, 40] max = 40
247

Searching for interval [6,7]


Overlaps with [5, 20]

Applications of Interval Tree:


Interval tree is mainly a geometric data structure and often used for windowing queries, for
instance, to find all roads on a computerized map inside a rectangular viewport, or to find all
visible elements inside a three-dimensional scene (Source Wiki).

Interval Tree vs Segment Tree


Both segment and interval trees store intervals. Segment tree is mainly optimized for queries for
a given point, and interval trees are mainly optimized for overlapping queries for a given
interval.

Exercise:
1) Implement delete operation for interval tree.
2) Extend the intervalSearch() to print all overlapping intervals instead of just one.

Counting Sort
geeksforgeeks.org

a sorting technique based on keys between a specific range. It works by counting the number of
objects having distinct key values (kind of hashing). Then doing some arithmetic to calculate the
position of each object in the output sequence.

Let us understand it with the help of an example.

For simplicity, consider the data in the range 0 to 9.


Input data: 1, 4, 1, 2, 7, 5, 2
1) Take a count array to store the count of each unique object.
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 2 2 0 1 1 0 1 0 0

2) Modify the count array such that each element at each index
stores the sum of previous counts.
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 2 4 4 5 6 6 7 7 7

The modified count array indicates the position of each object in


the output sequence.

3) Output each object from the input sequence followed by


decreasing its count by 1.
Process the input data: 1, 4, 1, 2, 7, 5, 2. Position of 1 is 2.
Put data 1 at index 2 in output. Decrease count by 1 to place
next data 1 at an index 1 smaller than this index.

Following is C implementation of counting sort.


248

// C Program for counting sort


#include <stdio.h>
#include <string.h>
#define RANGE 255

// The main function that sort the given string str in alphabatical order
void countSort(char *str)
{
// The output character array that will have sorted str
char output[strlen(str)];

// Create a count array to store count of inidividul characters and


// initialize count array as 0
int count[RANGE + 1], i;
memset(count, 0, sizeof(count));

// Store count of each character


for(i = 0; str[i]; ++i)
++count[str[i]];

// Change count[i] so that count[i] now contains actual position of


// this character in output array
for (i = 1; i <= RANGE; ++i)
count[i] += count[i-1];

// Build the output character array


for (i = 0; str[i]; ++i)
{
output[count[str[i]]-1] = str[i];
--count[str[i]];
}

// Copy the output array to str, so that str now


// contains sorted characters
for (i = 0; str[i]; ++i)
str[i] = output[i];
}

// Driver program to test above function


int main()
{
char str[] = "geeksforgeeks";//"applepp";

countSort(str);

printf("Sorted string is %s\n", str);


return 0;
}

Output:

Sorted character array is eeeefggkkorss


249

Time Complexity: O(n+k) where n is the number of elements in input array and k is the range of
input.
Auxiliary Space: O(n+k)

Points to be noted:
1. Counting sort is efficient if the range of input data is not significantly greater than the number
of objects to be sorted. Consider the situation where the input sequence is between range 1 to
10K and the data is 10, 5, 10K, 5K.
2. It is not a comparison based sorting. It running time complexity is O(n) with space
proportional to the range of data.
3. It is often used as a sub-routine to another sorting algorithm like radix sort.
4. Counting sort uses a partial hashing to count the occurrence of the data object in O(1).
5. Counting sort can be extended to work for negative inputs also.

Exercise:
1. Modify above code to sort the input data in the range from M to N.
2. Modify above code to sort negative input data.
3. Is counting sort stable and online?
4. Thoughts on parallelizing the counting sort algorithm.

Probabilities
topcoder.com

By supernova
topcoder member

It has been said that life is a school of probability. A major effect of probability theory on
everyday life is in risk assessment. Let's suppose you have an exam and you are not so well
prepared. There are 20 possible subjects, but you only had time to prepare for 15. If two subjects
are given, what chances do you have to be familiar with both? This is an example of a simple
question inspired by the world in which we live today. Life is a very complex chain of eventsand
almost everything can be imagined in terms of probabilities.

Gambling has become part of our lives and it is an area in which probability theory is obviously
involved. Although gambling had existed since time immemorial, it was not until the seventeenth
century that the mathematical foundations finally became established. It all started with a simple
question directed to Blaise Pascal by Chevalier de Mr, a nobleman that gambled
frequently to increase his wealth. The question was whether a double six could be obtained on
twenty-four rolls of two dice.

As far as TopCoder problems are concerned, they're inspired by reality. You are presented with
many situations, and you are explained the rules of many games. While it's easy to recognize a
problem that deals with probability computations, the solution may not be obvious at all. This is
partly because probabilities are often overlooked for not being a common theme in programming
challenges. But it is not true and TopCoder has plenty of them! Knowing how to approach such
250

problems is a big advantage in TopCoder competitions and this article is to help you prepare for
this topic.

Before applying the necessary algorithms to solve these problems, you first need some
mathematical understanding. The next chapter presents the basic principles of probability. If you
already have some experience in this area, you might want to skip this part and go to the
following chapter: Step by Step Probability Computation. After that it follows a short discussion
on Randomized Algorithms and in the end there is a list with the available problems on
TopCoder. This last part is probably the most important. Practice is the key!

Basics

Working with probabilities is much like conducting an experiment. An outcome is the result of
an experiment or other situation involving uncertainty. The set of all possible outcomes of a
probability experiment is called a sample space. Each possible result of such a study is
represented by one and only one point in the sample space, which is usually denoted by S. Let's
consider the following experiments:

Rolling a die once


Sample space S = {1, 2, 3, 4, 5, 6}
Tossing two coins
Sample space S = {(Heads, Heads), (Heads, Tails), (Tails, Heads), (Tails,
Tails)}

We define an event as any collection of outcomes of an experiment. Thus, an event is a subset of


the sample space S. If we denote an event by E, we could say that ES. If an event consists of a
single outcome in the sample space, it is called a simple event. Events which consist of more
than one outcome are called compound events.

What we are actually interested in is the probability of a certain event to occur, or P(E). By
definition, P(E) is a real number between 0 and 1, where 0 denotes the impossible event and 1
denotes the certain event (or the whole sample space).

As stated earlier, each possible outcome is represented by exactly one point in the sample space.
This leads us to the following formula:

That is, the probability of an event to occur is calculated by dividing the number of favorable
outcomes (according to the event E) by the total number of outcomes (according to the sample
251

space S). In order to represent the relationships among events, you can apply the known
principles of set theory. Consider the experiment of rolling a die once. As we have seen
previously, the sample space is S = {1, 2, 3, 4, 5, 6}. Let's now consider the following events:

Event A = 'score > 3' = {4, 5, 6}


Event B = 'score is odd' = {1, 3, 5}
Event C = 'score is 7' =
AB ='the score is > 3 or odd or both' = {1, 3, 4, 5, 6}
AB ='the score is > 3 and odd' = {5}
A' = 'event A does not occur' = {1, 2, 3}

We have:

P(AB) = 5/6
P(AB) = 1/6
P(A') = 1 - P(A) = 1 - 1/2 = 1/2
P(C) = 0

The first step when trying to solve a probability problem is to be able to recognize the sample
space. After that, you basically have to determine the number of favorable outcomes. This is the
classical approach, but the way we implement it may vary from problem to problem. Let's take a
look at QuizShow (SRM 223, Div 1 - Easy). The key to solving this problem is to take into
account all the possibilities, which are not too many. After a short analysis, we determine the
sample space to be the following:

S = { (wager 1 is wrong, wager 2 is wrong, you are wrong),


(wager 1 is wrong, wager 2 is wrong, you are right),
(wager 1 is wrong, wager 2 is right, you are wrong),
(wager 1 is wrong, wager 2 is right, you are right),
(wager 1 is right, wager 2 is wrong, you are wrong),
(wager 1 is right, wager 2 is wrong, you are right),
(wager 1 is right, wager 2 is right, you are wrong),
(wager 1 is right, wager 2 is right, you are right) }

The problem asks you to find a wager that maximizes the number of favorable outcomes. In
order to compute the number of favorable outcomes for a certain wager, we need to determine
how many points the three players end with for each of the 8 possible outcomes. The idea is
illustrated in the following program:

int wager (vector scores, int wager1, int wager2)


{
int best, bet, odds, wage, I, J, K;
best = 0; bet = 0;

for (wage = 0; wage scores[0]; wage++)


{
odds = 0;
// in 'odds' we keep the number of favorable outcomes
for (I = -1; I 1; I = I + 2)
for (J = -1; J 1; J = J + 2)
for (K = -1; K 1; K = K + 2)
if (scores[0] + I * wage > scores[1] + J * wager1 &&
252

scores[0] + I * wage > scores[2] + K * wager2) { odds++; }


if (odds > best) { bet = wage ; best = odds; }
// a better wager has been found
}
return bet;
}

Another good problem to start with is PipeCuts (SRM 233, Div 1 - Easy). This can be solved in a
similar manner. There is a finite number of outcomes and all you need to do is to consider them
one by one.

Let's now consider a series of n independent events: E1, E2, ... , En. Two surprisingly common
questions that may appear (and many of you have already encountered) are the following:

1. What is the probability that all events will occur?


2. What is the probability that at least one event will occur?

To answer the first question, we relate to the occurrence of the first event (call it E1). If E1 does
not occur, the hypothesis can no longer be fulfilled. Thus, it must be inferred that E1 occurs with
a probability of P(E1). This means there is a P(E1) chance we need to check for the occurrence
of the next event (call it E2). The event E2 occurs with a probability of P(E2) and we can
continue this process in the same manner. Because probability is by definition a real number
between 0 and 1, we can synthesize the probability that all events will occur in the following
formula:

The best way to answer the second question is to first determine the probability that no event will
occur and then, take the complement. We have:

These formulae are very useful and you should try to understand them well before you move.

BirthdayOdds
A good example to illustrate the probability concepts discussed earlier is the classical "Birthday
Paradox". It has been shown that if there are at least 23 people in a room, there is a more than
50% chance that at least two of them will share the same birthday. While this is not a paradox in
the real sense of the word, it is a mathematical truth that contradicts common intuition. The
TopCoder problem asks you to find the minimum number of people in order to be more than
minOdds% sure that at least two of them have the same birthday. One of the first things to notice
about this problem is that it is much easier to solve the complementary problem: "What is the
probability that N randomly selected people have all different birthdays?". The strategy is to start
with an empty room and put people in the room one by one, comparing their birthdays with those
of them already in the room:
253

int minPeople (int minOdds, int days)


{
int nr;
double target, p;

target = 1 - (double) minOdds / 100;


nr = 1;
p = 1;

while (p > target)


{
p = p * ( (double) 1 - (double) nr / days);
nr ++;
}

return nr;
}

This so called "Birthday Paradox'' has many real world applications and one of them is described
in the TopCoder problem called Collision (SRM 153, Div 1 - Medium). The algorithm is
practically the same, but one has to be careful about the events that may alter the sample space.

Sometimes a probability problem can be quite tricky. As we have seen before, the 'Birthday
Paradox' tends to contradict our common sense. But the formulas prove to us that the answer is
indeed correct. Formulas can help, but to become a master of probabilities you need one more
ingredient: "number sense" . This is partly innate ability and partly learned ability acquired
through practice. Take this quiz to assess your number sense and to also become familiar with
some of the common probability misconceptions.

Step by Step Probability Computation

In this chapter we will discuss some real TopCoder problems in which the occurrence of an event
is influenced by occurrences of previous events. We can think of it as a graph in which the nodes
are events and the edges are dependencies between them. This is a somewhat forced analogy, but
the way we compute the probabilities for different events is similar to the way we traverse the
nodes of a graph. We start from the root, which is the initial state and has a probability of 1.
Then, as we consider different scenarios, the probability is distributed accordingly.

NestedRandomness
This problem looked daunting to some people, but for those who figured it out, it was just a
matter of a few lines. For the first step, it is clear what do we have to do: the function random(N)
is called and it returns a random integer uniformly distributed in the range 0 to N-1. Thus, every
integer in this interval has a probability of 1/N to occur. If we consider all these outcomes as
input for the next step, we can determine all the outcomes of the random(random(N)) call. To
understand this better, let's work out the case when N = 4.

After the first nesting all integers have the same probability to occur, which is 1 / 4.
For the second nesting there is a 1/4 chance for each of the following functions to be
called: random(0), random(1), random(2) and random(3). Random(0) produces an error,
254

random(1) returns 0, random (2) returns 0 or 1 (each with a probability of 1/2) and
random(3) returns 0, 1 or 2.
As a result, for the third nesting, random(0) has a probability of 1/4 + 1/8 + 1/12 of
being called, random(1) has a probability of 1/8 + 1/12 of being called and random(2) has
a probability of 1/12 of being called.
Analogously, for the fourth nesting, the function random(0) has a probability of 1/4 of
being called, while random(1) has a probability of 1/24.
As for the fifth nesting, we can only call random(0), which produces an error. The whole
process is described in the picture to the right.

NestedRandomness for N = 4

The source code for this problem is given below:

double probability (int N, int nestings, int target)


{
int I, J, K;
double A[1001], B[2001];
// A[I] represents the probability of number I to appear

for (I = 0; I < N ; I++) A[I] = (double) 1 / N;


for (K = 2; K nestings; K++)
{
for (I = 0; I < N; I++) B[I] = 0;
// for each I between 0 and N-1 we call the function "random(I)"
// as described in the problem statement
for (I = 0; I < N; I++)
for (J = 0; J < I; J++)
B[J] += (double) A[I] / I;
for (I = 0; I < N; I++) A[I] = B[I];
}
return A[target];
}

If you got the taste for this problem, here are another five you may want to try:
255

ChessKnight - assign each square a probability and for every move check the squares one by one
to compute the probabilities for the next move.
DiceThrows - determine the probability of each possible outcome for both players and then
compare the results.
RockSkipping - the same approach, just make sure you got the lake pattern correctly.
PointSystem - represent the event space as a matrix of possible scores (x, y).
VolleyBall - similar to PointSystem, but the scores may go up pretty high.

Let's now take a look at another TopCoder problem, GeneticCrossover, which deals
with conditional probability. Here, you are asked to predict the quality of an animal, based on
the genes it inherits from its parents. Considering the problem description, there are two
situations that may occur: a gene does not depend on another gene, or a gene is dependent.

For the first case, consider p the probability that the gene is to be expressed dominantly. There
are only 4 cases to consider:

at least one parent has two dominant genes. (p = 1)


each parent has exactly one dominant gene. (p = 0.5)
one parent has one dominant gene and the other has only recessive genes (p = 0.25)
both parents have two recessive genes (p = 0)

Now let's take the case when a gene is dependent on another. This make things a bit trickier as
the "parent" gene may also depend on another and so on ... To determine the probability that a
dependent gene is dominant, we take the events that each gene in the chain (starting with the
current gene) is dominant. In order for the current gene to be expressed dominantly, we need all
these events to occur. To do this, we take the product of probabilities for each individual event in
the chain. The algorithm works recursively. Here is the complete source code for this problem:

int n, d[200];
double power[200];

// here we determine the characteristic for each gene (in power[I]


// we keep the probability of gene I to be expressed dominantly)
double detchr (string p1a, string p1b, string p2a, string p2b, int nr)
{
double p, p1, p2;
p = p1 = p2 = 1.0;
if (p1a[nr] 'Z') p1 = p1 - 0.5;
// is a dominant gene
if (p1b[nr] 'Z') p1 = p1 - 0.5;
if (p2a[nr] 'Z') p2 = p2 - 0.5;
if (p2b[nr] 'Z') p2 = p2 - 0.5;
p = 1 - p1 * p2;

if (d[nr] != 1) power[nr] = p * detchr (p1a, p1b, p2a, p2b, d[nr]);


// gene 'nr' is dependent on gene d[nr]
else power[nr] = p;
return power[nr];
}

double cross (string p1a, string p1b, string p2a, string p2b,
256

vector dom, vector rec, vector dependencies)


{
int I;
double fitness = 0.0;

n = rec.size();
for (I = 0; I < n; i++) d[i] = dependencies[i];
for (I = 0 ;I < n; I++) power[i] = -1.0;
for (I = 0; I < n; i++)
if (power[I] == -1.0) detchr (p1a, p1b, p2a, p2b, i);
// we check if the dominant character of gene I has
// not already been computed
for (I = 0; I n; I++)
fitness=fitness+(double) power[i]*dom[i]-(double) (1-power[i])*rec[i];
// we compute the expected 'quality' of an animal based on the
// probabilities of each gene to be expressed dominantly

return fitness;
}
Randomized Algorithms

We call randomized algorithms those algorithms that use random numbers to make decisions
during their execution. Unlike deterministic algorithms that for a fixed input always give the
same output and the same running-time, a randomized algorithm behaves differently from
execution to execution. Basically, we distinguish two kind of randomized algorithms:

1. Monte Carlo algorithms: may sometimes produce an incorrect solution - we bound the
probability of failure.
2. Las Vegas algorithms: always give the correct solution, the only variation is the running time -
we study the distribution of the running time.

Read these lecture notes from the College of Engineering at UIUC for an example of how these
algorithms work.

The main goal of randomized algorithms is to build faster, and perhaps simpler solutions. Being
able to tackle "harder" problems is also a benefit of randomized algorithms. As a result, these
algorithms have become a research topic of major interest and have already been utilized to more
easily solve many different problems.

An interesting question is whether such an algorithm may become useful in TopCoder


competitions. Some problems have many possible solutions, where a number of which are also
optimal. The classical approach is to check them one by one, in an established order. But it
cannot be guaranteed that the optima are uniformly distributed in the solution domain. Thus, a
deterministic algorithm may not find you an optimum quickly enough. The advantage of a
randomized algorithm is that there are actually no rules to set about the order in which the
solutions are checked and for the cases when the optima are clustered together, it usually
performs much better. See QueenInterference for a TopCoder example.

Randomized algorithms are particularly useful when faced with malicious attackers who
deliberately try to feed a bad input to the algorithm. Such algorithms are widely used
257

in cryptography, but it sometimes makes sense to also use them in TopCoder competitions. It
may happen that you have an efficient algorithm, but there are a few degenerate cases for which
its running time is significantly slower. Assuming the algorithm is correct, it has to run fast
enough for all inputs. Otherwise, all the points you earned for submitting that particular problem
are lost. This is why here, on TopCoder, we are interested in worst case execution time.

To challenge or not to challenge?


Another fierce coding challenge is now over and you have 15 minutes to look for other coders'
bugs. The random call in a competitor's submission is likely to draw your attention. This will
most likely fall into one of two scenarios:

1. the submission was just a desperate attempt and will most likely fail on many inputs.
2. the algorithm was tested rather thoroughly and the probability to fail (or time out) is virtually
null.

The first thing you have to do is to ensure it was not already unsuccessfully challenged (check
the coder's history). If it wasn't, it may deserve a closer look. Otherwise, you should ensure that
you understand what's going on before even considering a challenge. Also take into account
other factors such as coder rating, coder submission accuracy, submission time, number of
resubmissions or impact on your ranking.
Will "random" really work?
In most optimizing problems, the ratio between the number of optimal solutions and the total
number of solutions is not so obvious. An easy, but not so clever solution, is to simply try
generating different samples and see how the algorithm behaves. Running such a simulation is
usually pretty quick and may also give you some extra clues in how to actually solve the
problem.

Max = 1000000; attempt = 0;


while (attempt < Max)
{
answer = solve_random (...);
if (better (answer, optimum))
// we found a better solution
{
optimum = answer;
cout << "Solution " << answer << " found on step " << attempt << "\n";
}
attempt ++;
}

Building up the recurrence matrix to compute recurrences in O(logN) time


codechef.com

Advantages of using matrix form instead of the recurrence relationship itself


258

To use matrix exponentiation it's first necessary to understand why we would want to use it...
After all, methods such as classic DP and/or memoization are available and they are easier to
code.

The great advantage of matrix exponentiation is that its running time is simply O(k^3 * logN)(for
a matrix of dimensions k*k) which is critical when we are dealing with values as large as 10^15,
for example. It is used when the recursive relationship we derived is somehow entangled, in the
sense that the values it takes depend on more than one of the previous values...Using the base
cases of the recurrence relation and a repeated squaring/fast exponentiation algorithm, we have a
very efficent way of dealing with large input values :D I will try to illustrate this with an
example, the Tribonacci Numbers.

The Tribonacci Numbers

For those of you who had never heard the name before, this sequence of numbers is an
"expansion" of the fibonacci sequence that includes a third term on the sum of the previous two,
such that the formula looks like:

F(n) = F(n-1) + F(n-2) + F(n-3), F(1) = 1; F(2) = 1; F(3) = 2

as stated on WolframMathworld.

Of course, all the problems that arose when we tried to compute the fibonnaci numbers via dp or
any other way become a lot more complicated with tribonacci numbers, and for N as large as
10^15, using dp will always be very slow, regardless of the time limit.

Understanding Matrix exponentiation

The basic idea behind matrix exponentiation, as stated earlier is to use the base cases of the
recurrence relationship in order to assemble a matrix which will allow us to compute values fast.

On our case we have:

F(1) = 1

F(2) = 1

F(3) = 2

And we now have a relationship that will go like this:

|f(4)| = MATRIX * |f(1)|


|f(3)| |f(2)|
|f(2)| |f(3)|

Now all that is left is to assemble the matrix... and that is done based both on the rules of matrix
multiplication and on the recursive relationship... Now, on our example we see that to obtain
259

f(4), the 1st line of the matrix needs to be composed only of ones, as f(4) = 1 f(3) + 1 f(2) + 1*
f(1).

Now, denoting the unknown elements as *, we have:

|f(4)| = |1 1 1| * |f(1)|
|f(3)| |* * *| |f(2)|
|f(2)| |* * *| |f(3)|

For the second line, we want to obtain f(3), and the only possible way of doing it is by having:

|f(4)| = |1 1 1| * |f(1)|
|f(3)| |0 0 1| |f(2)|
|f(2)| |* * *| |f(3)|

To get the value of f(2), we can follow the same logic and get the final matrix:

|1 1 1|
|0 0 1|
|0 1 0|

To end it, we now need to generalize it, and, as we have 3 base cases, all we need to do to
compute the Nth tribonacci number in O(logN) time, is to raise the matrix to the power N -3 to
get:

|f(n)| = |1 1 1|^(N-3) * |f(1)|


|f(n-1)| |0 0 1| |f(2)|
|f(n-2)| |0 1 0| |f(3)|

The power of the matrix can now be computed in O(logN) time using repeated squaring applied
to matrices and the method is complete... Below is the Python code that does this, computing the
number modulo 10000000007.

def matrix_mult(A, B):


C = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
for i in range(3):
for j in range(3):
for k in range(3):
C[i][k] = (C[i][k] + A[i][j] * B[j][k]) % 1000000007
return C

def fast_exponentiation(A, n):


if n == 1:
return A
else:
if n % 2 == 0:
A1 = fast_exponentiation(A, n/2)
return matrix_mult(A1, A1)
else:
return matrix_mult(A, fast_exponentiation(A, n - 1))

def solve(n):
260

A = [[1,1,1],[0,0,1],[0,1,0]]
A_n = fast_exponentiation(A,n-3)
return A_n[0][0] + A_n[0][1] + A_n[0][2]*2

Network flow
topcoder.com

Maximum Flow: Section 1


By _efer_
topcoder member

Introduction

This article covers a problem that often arises in real life situations and, as expected, in
programming challenges, with Top Coder being no exception. It is addressed mostly to coders
who are not familiar with the subject, but it may prove useful to the more experienced as well.
Lots of papers have been written, and there are many algorithms known to solve this problem.
While they are not the fastest, the algorithms presented here have the advantage of being simple
and efficient, and because of this they are usually preferred during a challenge setup. The reader
is advised to read the article on graph theory first, as the concepts presented there are needed to
understand those presented here.

The Standard Maximum Flow Problem


So, what are we being asked for in a max-flow problem? The simplest form that the statement
could take would be something along the lines of: "A list of pipes is given, with different flow-
capacities. These pipes are connected at their endpoints. What is the maximum amount of water
that you can route from a given starting point to a given ending point?" or equivalently "A
company owns a factory located in city X where products are manufactured that need to be
transported to the distribution center in city Y. You are given the one-way roads that connect
pairs of cities in the country, and the maximum number of trucks that can drive along each road.
What is the maximum number of trucks that the company can send to the distribution center?"

A first observation is that it makes no sense to send a truck to any other city than Y, so every
truck that enters a city other than Y must also leave it. A second thing to notice is that, because
of this, the number of trucks leaving X is equal to the number of trucks arriving in Y.

Rephrasing the statement in terms of graph theory, we are given a network - a directed graph, in
which every edge has a certain capacity c associated with it, a starting vertex (the source, X in
261

the example above), and an ending vertex (the sink). We are asked to associate another
value f satisfying f c for each edge such that for every vertex other than the source and the sink,
the sum of the values associated to the edges that enter it must equal the sum of the values
associated to the edges that leave it. We will call f the flow along that edge. Furthermore, we are
asked to maximize the sum of the values associated to the arcs leaving the source, which is the
total flow in the network.

The image below shows the optimal solution to an instance of this problem, each edge being
labeled with the values f/c associated to it.

How to Solve It
Now how do we actually solve the problem? First, let us define two basic concepts for
understanding flow networks: residual networks and augmenting paths. Consider an arbitrary
flow in a network. The residual network has the same vertices as the original network, and one or
two edges for each edge in the original. More specifically, if the flow along the edge x-y is less
than the capacity there is a forward edge x-y with a capacity equal to the difference between the
capacity and the flow (this is called the residual capacity), and if the flow is positive there is a
backward edge y-x with a capacity equal to the flow on x-y. An augmenting path is simply a path
from the source to the sink in the residual network, whose purpose is to increase the flow in the
original one. It is important to understand that the edges in this path can point the "wrong way"
according to the original network. The path capacity of a path is the minimum capacity of an
edge along that path. Let's take the following example:

By considering the path X_A_C_Y, we can increase the flow by 1 - the edges X_A and A_C
have capacity of 3, as in the original network, but the edge C_Y has capacity 1, and we take the
minimum of these values to get the path capacity. Increasing the flow along this path with 1
yields the flow below:
262

The value of the current flow is now 2, and as shown in Figure 1, we could do better. So, let's try
to increase the flow. Clearly, there is no point in considering the directed paths X_A_C_Y or
X_B_D_E_Y as the edges C_Y and X_B, respectively, are filled to capacity. As a matter of fact,
there is no directed path in the network shown above, due to the edges mentioned above being
filled to capacity. At this point, the question that naturally comes to mind is: is it possible to
increase the flow in this case? And the answer is yes, it is. Let's take a look at the residual
network:

Let's consider the only path from X to Y here: X_A_C_B_D_E_Y. Note that this is not a path in
the directed graph, because C_B is walked in the opposite way. We'll use this path in order to
increase the total flow in the original network. We'll "push" flow on each of the edges, except for
C_B which we will use in order to "cancel" flow on B_C. The amount by which this operation
can be performed is limited by the capacities of all edges along the path (as shown in Figure 3b).
Once again we take the minimum, to conclude that this path also has capacity 1. Updating the
path in the way described here yields the flow shown in Figure 1a. We are left with the following
residual network where a path between the source and the sink doesn't exist:
263

This example suggests the following algorithm: start with no flow everywhere and increase the
total flow in the network while there is an augmenting path from the source to the sink with no
full forward edges or empty backward edges - a path in the residual network. The algorithm
(known as the Ford-Fulkerson method) is guaranteed to terminate: due to the capacities and
flows of the edges being integers and the path-capacity being positive, at each step we get a new
flow that is closer to the maximum. As a side note, the algorithm isn't guaranteed to even
terminate if the capacities are irrationals.

What about the correctness of this algorithm? It is obvious that in a network in which a
maximum flow has been found there is no augmenting path, otherwise we would be able to
increase the maximum value of the flow, contradicting our initial assumption. If the converse of
this affirmation is true, so that when there is no augmenting path, the value of the flow has
reached its maximum, we can breathe a sigh of relief, our algo is correct and computes the
maximum flow in a network. This is known as the max-flow min-cut theorem and we shall
justify its correctness in a few moments.

A cut in a flow network is simply a partition of the vertices in two sets, let's call them A and B,
in such a way that the source vertex is in A and the sink is in B. The capacity of a cut is the sum
of the capacities of the edges that go from a vertex in A to a vertex in B. The flow of the cut is
the difference of the flows that go from A to B (the sum of the flows along the edges that have
the starting point in A and the ending point in B), respectively from B to A, which is exactly the
value of the flow in the network, due to the entering flow equals leaving flow - property, which
is true for every vertex other than the source and the sink.
264

Notice that the flow of the cut is less or equal to the capacity of the cut due to the constraint of
the flow being less or equal to the capacity of every edge. This implies that the maximum flow is
less or equal to every cut of the network. This is where the max-flow min-cut theorem comes in
and states that the value of the maximum flow through the network is exactly the value of the
minimum cut of the network. Let's give an intuitive argument for this fact. We will assume that
we are in the situation in which no augmenting path in the network has been found. Let's color in
yellow, like in the figure above, every vertex that is reachable by a path that starts from the
source and consists of non-full forward edges and of non-empty backward edges. Clearly the
sink will be colored in blue, since there is no augmenting path from the source to the sink. Now
take every edge that has a yellow starting point and a blue ending point. This edge will have the
flow equal to the capacity, otherwise we could have added this edge to the path we had at that
point and color the ending point in yellow. Note that if we remove these edges there will be no
directed path from the source to the sink in the graph. Now consider every edge that has a blue
starting point and a yellow ending point. The flow on this edge must be 0 since otherwise we
could have added this edge as a backward edge on the current path and color the starting point in
yellow. Thus, the value of the flow must equal the value of the cut, and since every flow is less
or equal to every cut, this must be a maximum flow, and the cut is a minimum cut as well.

In fact, we have solved another problem that at first glance would appear to have nothing to do
with maximum flow in a network, ie. given a weighted directed graph, remove a minimum-
weighted set of edges in such a way that a given node is unreachable from another given node.
The result is, according to the max-flow min-cut theorem, the maximum flow in the graph, with
capacities being the weights given. We are also able to find this set of edges in the way described
above: we take every edge with the starting point marked as reachable in the last traversal of the
graph and with an unmarked ending point. This edge is a member of the minimum cut.
265

Augmenting-Path Algorithms
The neat part of the Ford-Fulkerson algorithm described above is that it gets the correct result no
matter how we solve (correctly!!) the sub-problem of finding an augmenting path. However,
every new path may increase the flow by only 1, hence the number of iterations of the algorithm
could be very large if we carelessly choose the augmenting path algorithm to use. The
function max_flow will look like this, regardless of the actual method we use for finding
augmenting paths:

int max_flow()
result = 0
while (true)
// the function find_path returns the path capacity of the
augmenting path found
path_capacity = find_path()
// no augmenting path found
if (d = 0) exit while
else result += path_capacity
end while
return result

To keep it simple, we will use a 2-dimensional array for storing the capacities of the residual
network that we are left with after each step in the algorithm. Initially the residual network is just
the original network. We will not store the flows along the edges explicitly, but it's easy to figure
out how to find them upon the termination of the algorithm: for each edge x-y in the original
network the flow is given by the capacity of the backward edge y-x in the residual network. Be
careful though; if the reversed arc y-x also exists in the original network, this will fail, and it is
recommended that the initial capacity of each arc be stored somewhere, and then the flow along
the edge is the difference between the initial and the residual capacity.

We now require an implementation for the function find_path. The first approach that comes to
mind is to use a depth-first search (DFS), as it probably is the easiest to implement.
Unfortunately, its performance is very poor on some networks, and normally is less preferred to
the ones discussed next.

The next best thing in the matter of simplicity is a breadth-first search (BFS). Recall that this
search usually yields the shortest path in an un-weighted graph. Indeed, this also applies here to
get the shortest augmenting path from the source to the sink. In the following pseudocode we
will basically: find a shortest path from the source to the sink and compute the minimum
capacity of an edge (that could be a forward or a backward edge) along the path - the path
capacity. Then, for each edge along the path we reduce its capacity and increase the capacity of
the reversed edge with the path capacity.

int bfs()
queue Q
push source to Q
mark source as visited
keep an array from with the semnification: from[x] is the
previous vertex visited in the shortest path from the source to x;
initialize from with -1 (or any other sentinel value)
while Q is not empty
266

where = pop from Q


for each vertex next adjacent to where
if next is not visited and capacity[where][next] > 0
push next to Q
mark next as visited
from[next] = where
if next = sink
exit while loop
end for
end while
// we compute the path capacity
where = sink, path_cap = infinity
while from[where] > -1
prev = from[where] // the previous vertex
path_cap = min(path_cap, capacity[prev][where])
where = prev
end while
// we update the residual network; if no path is found the while
loop will not be entered
where = sink
while from[where] > -1
prev = from[where]
capacity[prev][where] -= path_capacity
capacity[where][prev] += path_capacity
where = prev
end while
// if no path is found, path_cap is infinity
if path_cap = infinity
return 0
else return path_cap

As we can see, this is pretty easy to implement. As for its performance, it is guaranteed that this
takes at most N * M/2 steps, where N is the number of vertices and M is the number of edges in
the network. This number may seem very large, but it is over-estimated for most networks. For
example, in the network we considered 3 augmenting paths are needed which is significantly less
than the upper bound of 28. Due to the O(M) running time of BFS (implemented with adjacency
lists) the worst-case running time of the shortest-augmenting path max-flow algorithm
is O(N * M), but usually the algorithm performs much better than this.

Next we will consider an approach that uses a priority-first search (PFS), that is very similar to
the Dijkstra heap method explained here. In this method the augmenting path with a maximum
path capacity is preferred. Intuitively this would lead to a faster algorithm, since at each step we
increase the flow with the maximum possible amount. However, things are not always so, and
the BFS implementation has better running times on some networks. We assign as a priority to
each vertex the minimum capacity of a path (in the residual network) from the source to that
vertex. We process vertices in a greedy manner, as in Dijkstra's algorithm, in decreasing order of
priorities. When we get to the sink, we are done, since a path with a maximum capacity is found.
We would like to implement this with a data structure that allows us to efficiently find the vertex
with the highest priority and increase the priority of a vertex (when a new better path is found) -
this suggests the use of a heap which has a space complexity proportional to the number of
vertices. In topocoder matches we may find it faster and easier to implement this with a priority
queue or some other data structure that approximates one, even though the space required might
267

grow to being proportional with the number of edges. This is how the following pseudocode is
implemented. We also define a structure node that has the members vertex and priority with the
above significance. Another field from is needed to store the previous vertex on the path.

int pfs()
priority queue PQ
push node(source, infinity, -1) to PQ
keep the array from as in bfs()
// if no augmenting path is found, path_cap will remain 0
path_cap = 0
while PQ is not empty
node aux = pop from PQ
where = aux.vertex, cost = aux.priority
if we already visited where continue
from[where] = aux.from
if where = sink
path_cap = cost
exit while loop
mark where as visited
for each vertex next adjacent to where
if capacity[where][next] > 0
new_cost = min(cost, capacity[where][next])
push node(next, new_cost, where) to PQ
end for
end while
// update the residual network
where = sink
while from[where] > -1
prev = from[where]
capacity[prev][where] -= path_cap
capacity[where][prev] += path_cap
where = prev
end while
return path_cap

The analysis of its performance is pretty complicated, but it may prove worthwhile to remember
that with PFS at most 2M1gU steps are required, where U is the maximum capacity of an edge in
the network. As with BFS, this number is a lot larger than the actual number of steps for most
networks. Combine this with the O(M 1g M) complexity of the search to get the worst-case
running time of this algorithm.

Now that we know what these methods are all about, which of them do we choose when we are
confronted with a max-flow problem? The PFS approach seems to have a better worst-case
performance, but in practice their performance is pretty much the same. So, the method that one
is more familiar with may prove more adequate. Personally, I prefer the shortest-path method, as
I find it easier to implement during a challenge and less error prone.
268

Maximum Flow: Section 1


By _efer_
topcoder member

Introduction

This article covers a problem that often arises in real life situations and, as expected, in
programming challenges, with Top Coder being no exception. It is addressed mostly to coders
who are not familiar with the subject, but it may prove useful to the more experienced as well.
Lots of papers have been written, and there are many algorithms known to solve this problem.
While they are not the fastest, the algorithms presented here have the advantage of being simple
and efficient, and because of this they are usually preferred during a challenge setup. The reader
is advised to read the article on graph theory first, as the concepts presented there are needed to
understand those presented here.

The Standard Maximum Flow Problem


So, what are we being asked for in a max-flow problem? The simplest form that the statement
could take would be something along the lines of: "A list of pipes is given, with different flow-
capacities. These pipes are connected at their endpoints. What is the maximum amount of water
that you can route from a given starting point to a given ending point?" or equivalently "A
company owns a factory located in city X where products are manufactured that need to be
transported to the distribution center in city Y. You are given the one-way roads that connect
pairs of cities in the country, and the maximum number of trucks that can drive along each road.
What is the maximum number of trucks that the company can send to the distribution center?"

A first observation is that it makes no sense to send a truck to any other city than Y, so every
truck that enters a city other than Y must also leave it. A second thing to notice is that, because
of this, the number of trucks leaving X is equal to the number of trucks arriving in Y.

Rephrasing the statement in terms of graph theory, we are given a network - a directed graph, in
which every edge has a certain capacity c associated with it, a starting vertex (the source, X in
the example above), and an ending vertex (the sink). We are asked to associate another
value f satisfying f c for each edge such that for every vertex other than the source and the sink,
the sum of the values associated to the edges that enter it must equal the sum of the values
associated to the edges that leave it. We will call f the flow along that edge. Furthermore, we are
asked to maximize the sum of the values associated to the arcs leaving the source, which is the
total flow in the network.

The image below shows the optimal solution to an instance of this problem, each edge being
labeled with the values f/c associated to it.
269

How to Solve It
Now how do we actually solve the problem? First, let us define two basic concepts for
understanding flow networks: residual networks and augmenting paths. Consider an arbitrary
flow in a network. The residual network has the same vertices as the original network, and one or
two edges for each edge in the original. More specifically, if the flow along the edge x-y is less
than the capacity there is a forward edge x-y with a capacity equal to the difference between the
capacity and the flow (this is called the residual capacity), and if the flow is positive there is a
backward edge y-x with a capacity equal to the flow on x-y. An augmenting path is simply a path
from the source to the sink in the residual network, whose purpose is to increase the flow in the
original one. It is important to understand that the edges in this path can point the "wrong way"
according to the original network. The path capacity of a path is the minimum capacity of an
edge along that path. Let's take the following example:
270

By considering the path X_A_C_Y, we can increase the flow by 1 - the edges X_A and A_C
have capacity of 3, as in the original network, but the edge C_Y has capacity 1, and we take the
minimum of these values to get the path capacity. Increasing the flow along this path with 1
yields the flow below:

The value of the current flow is now 2, and as shown in Figure 1, we could do better. So, let's try
to increase the flow. Clearly, there is no point in considering the directed paths X_A_C_Y or
X_B_D_E_Y as the edges C_Y and X_B, respectively, are filled to capacity. As a matter of fact,
there is no directed path in the network shown above, due to the edges mentioned above being
filled to capacity. At this point, the question that naturally comes to mind is: is it possible to
increase the flow in this case? And the answer is yes, it is. Let's take a look at the residual
network:
271

Let's consider the only path from X to Y here: X_A_C_B_D_E_Y. Note that this is not a path in
the directed graph, because C_B is walked in the opposite way. We'll use this path in order to
increase the total flow in the original network. We'll "push" flow on each of the edges, except for
C_B which we will use in order to "cancel" flow on B_C. The amount by which this operation
can be performed is limited by the capacities of all edges along the path (as shown in Figure 3b).
Once again we take the minimum, to conclude that this path also has capacity 1. Updating the
path in the way described here yields the flow shown in Figure 1a. We are left with the following
residual network where a path between the source and the sink doesn't exist:

This example suggests the following algorithm: start with no flow everywhere and increase the
total flow in the network while there is an augmenting path from the source to the sink with no
full forward edges or empty backward edges - a path in the residual network. The algorithm
(known as the Ford-Fulkerson method) is guaranteed to terminate: due to the capacities and
flows of the edges being integers and the path-capacity being positive, at each step we get a new
flow that is closer to the maximum. As a side note, the algorithm isn't guaranteed to even
terminate if the capacities are irrationals.

What about the correctness of this algorithm? It is obvious that in a network in which a
maximum flow has been found there is no augmenting path, otherwise we would be able to
increase the maximum value of the flow, contradicting our initial assumption. If the converse of
272

this affirmation is true, so that when there is no augmenting path, the value of the flow has
reached its maximum, we can breathe a sigh of relief, our algo is correct and computes the
maximum flow in a network. This is known as the max-flow min-cut theorem and we shall
justify its correctness in a few moments.

A cut in a flow network is simply a partition of the vertices in two sets, let's call them A and B,
in such a way that the source vertex is in A and the sink is in B. The capacity of a cut is the sum
of the capacities of the edges that go from a vertex in A to a vertex in B. The flow of the cut is
the difference of the flows that go from A to B (the sum of the flows along the edges that have
the starting point in A and the ending point in B), respectively from B to A, which is exactly the
value of the flow in the network, due to the entering flow equals leaving flow - property, which
is true for every vertex other than the source and the sink.

Notice that the flow of the cut is less or equal to the capacity of the cut due to the constraint of
the flow being less or equal to the capacity of every edge. This implies that the maximum flow is
less or equal to every cut of the network. This is where the max-flow min-cut theorem comes in
and states that the value of the maximum flow through the network is exactly the value of the
minimum cut of the network. Let's give an intuitive argument for this fact. We will assume that
we are in the situation in which no augmenting path in the network has been found. Let's color in
yellow, like in the figure above, every vertex that is reachable by a path that starts from the
source and consists of non-full forward edges and of non-empty backward edges. Clearly the
sink will be colored in blue, since there is no augmenting path from the source to the sink. Now
take every edge that has a yellow starting point and a blue ending point. This edge will have the
flow equal to the capacity, otherwise we could have added this edge to the path we had at that
point and color the ending point in yellow. Note that if we remove these edges there will be no
directed path from the source to the sink in the graph. Now consider every edge that has a blue
starting point and a yellow ending point. The flow on this edge must be 0 since otherwise we
could have added this edge as a backward edge on the current path and color the starting point in
yellow. Thus, the value of the flow must equal the value of the cut, and since every flow is less
or equal to every cut, this must be a maximum flow, and the cut is a minimum cut as well.

In fact, we have solved another problem that at first glance would appear to have nothing to do
with maximum flow in a network, ie. given a weighted directed graph, remove a minimum-
weighted set of edges in such a way that a given node is unreachable from another given node.
The result is, according to the max-flow min-cut theorem, the maximum flow in the graph, with
capacities being the weights given. We are also able to find this set of edges in the way described
273

above: we take every edge with the starting point marked as reachable in the last traversal of the
graph and with an unmarked ending point. This edge is a member of the minimum cut.

Augmenting-Path Algorithms
The neat part of the Ford-Fulkerson algorithm described above is that it gets the correct result no
matter how we solve (correctly!!) the sub-problem of finding an augmenting path. However,
every new path may increase the flow by only 1, hence the number of iterations of the algorithm
could be very large if we carelessly choose the augmenting path algorithm to use. The
function max_flow will look like this, regardless of the actual method we use for finding
augmenting paths:

int max_flow()
result = 0
while (true)
// the function find_path returns the path capacity of the
augmenting path found
path_capacity = find_path()
// no augmenting path found
if (d = 0) exit while
else result += path_capacity
end while
return result

To keep it simple, we will use a 2-dimensional array for storing the capacities of the residual
network that we are left with after each step in the algorithm. Initially the residual network is just
the original network. We will not store the flows along the edges explicitly, but it's easy to figure
out how to find them upon the termination of the algorithm: for each edge x-y in the original
network the flow is given by the capacity of the backward edge y-x in the residual network. Be
careful though; if the reversed arc y-x also exists in the original network, this will fail, and it is
recommended that the initial capacity of each arc be stored somewhere, and then the flow along
the edge is the difference between the initial and the residual capacity.

We now require an implementation for the function find_path. The first approach that comes to
mind is to use a depth-first search (DFS), as it probably is the easiest to implement.
Unfortunately, its performance is very poor on some networks, and normally is less preferred to
the ones discussed next.
274

The next best thing in the matter of simplicity is a breadth-first search (BFS). Recall that this
search usually yields the shortest path in an un-weighted graph. Indeed, this also applies here to
get the shortest augmenting path from the source to the sink. In the following pseudocode we
will basically: find a shortest path from the source to the sink and compute the minimum
capacity of an edge (that could be a forward or a backward edge) along the path - the path
capacity. Then, for each edge along the path we reduce its capacity and increase the capacity of
the reversed edge with the path capacity.

int bfs()
queue Q
push source to Q
mark source as visited
keep an array from with the semnification: from[x] is the
previous vertex visited in the shortest path from the source to x;
initialize from with -1 (or any other sentinel value)
while Q is not empty
where = pop from Q
for each vertex next adjacent to where
if next is not visited and capacity[where][next] > 0
push next to Q
mark next as visited
from[next] = where
if next = sink
exit while loop
end for
end while
// we compute the path capacity
where = sink, path_cap = infinity
while from[where] > -1
prev = from[where] // the previous vertex
path_cap = min(path_cap, capacity[prev][where])
where = prev
end while
// we update the residual network; if no path is found the while
loop will not be entered
where = sink
while from[where] > -1
prev = from[where]
capacity[prev][where] -= path_capacity
capacity[where][prev] += path_capacity
where = prev
end while
// if no path is found, path_cap is infinity
if path_cap = infinity
return 0
else return path_cap

As we can see, this is pretty easy to implement. As for its performance, it is guaranteed that this
takes at most N * M/2 steps, where N is the number of vertices and M is the number of edges in
the network. This number may seem very large, but it is over-estimated for most networks. For
example, in the network we considered 3 augmenting paths are needed which is significantly less
than the upper bound of 28. Due to the O(M) running time of BFS (implemented with adjacency
275

lists) the worst-case running time of the shortest-augmenting path max-flow algorithm
is O(N * M), but usually the algorithm performs much better than this.

Next we will consider an approach that uses a priority-first search (PFS), that is very similar to
the Dijkstra heap method explained here. In this method the augmenting path with a maximum
path capacity is preferred. Intuitively this would lead to a faster algorithm, since at each step we
increase the flow with the maximum possible amount. However, things are not always so, and
the BFS implementation has better running times on some networks. We assign as a priority to
each vertex the minimum capacity of a path (in the residual network) from the source to that
vertex. We process vertices in a greedy manner, as in Dijkstra's algorithm, in decreasing order of
priorities. When we get to the sink, we are done, since a path with a maximum capacity is found.
We would like to implement this with a data structure that allows us to efficiently find the vertex
with the highest priority and increase the priority of a vertex (when a new better path is found) -
this suggests the use of a heap which has a space complexity proportional to the number of
vertices. In topocoder matches we may find it faster and easier to implement this with a priority
queue or some other data structure that approximates one, even though the space required might
grow to being proportional with the number of edges. This is how the following pseudocode is
implemented. We also define a structure node that has the members vertex and priority with the
above significance. Another field from is needed to store the previous vertex on the path.

int pfs()
priority queue PQ
push node(source, infinity, -1) to PQ
keep the array from as in bfs()
// if no augmenting path is found, path_cap will remain 0
path_cap = 0
while PQ is not empty
node aux = pop from PQ
where = aux.vertex, cost = aux.priority
if we already visited where continue
from[where] = aux.from
if where = sink
path_cap = cost
exit while loop
mark where as visited
for each vertex next adjacent to where
if capacity[where][next] > 0
new_cost = min(cost, capacity[where][next])
push node(next, new_cost, where) to PQ
end for
end while
// update the residual network
where = sink
while from[where] > -1
prev = from[where]
capacity[prev][where] -= path_cap
capacity[where][prev] += path_cap
where = prev
end while
return path_cap
276

The analysis of its performance is pretty complicated, but it may prove worthwhile to remember
that with PFS at most 2M1gU steps are required, where U is the maximum capacity of an edge in
the network. As with BFS, this number is a lot larger than the actual number of steps for most
networks. Combine this with the O(M 1g M) complexity of the search to get the worst-case
running time of this algorithm.

Now that we know what these methods are all about, which of them do we choose when we are
confronted with a max-flow problem? The PFS approach seems to have a better worst-case
performance, but in practice their performance is pretty much the same. So, the method that one
is more familiar with may prove more adequate. Personally, I prefer the shortest-path method, as
I find it easier to implement during a challenge and less error prone.

Max-Flow/Min-Cut Related Problems


How to recognize max-flow problems? Often they are hard to detect and usually boil down to
maximizing the movement of something from a location to another. We need to look at the
constraints when we think we have a working solution based on maximum flow - they should
suggest at least an O(N) approach. If the number of locations is large, another algorithm (such as
dynamic programming or greedy), is more appropriate.

The problem description might suggest multiple sources and/or sinks. For example, in the sample
statement in the beginning of this article, the company might own more than one factory and
multiple distribution centers. How can we deal with this? We should try to convert this to a
network that has a unique source and sink. In order to accomplish this we will add two "dummy"
vertices to our original network - we will refer to them as super-source and super-sink. In
addition to this we will add an edge from the super-source to every ordinary source (a factory).
As we don't have restrictions on the number of trucks that each factory can send, we should
assign to each edge an infinite capacity. Note that if we had such restrictions, we should have
assigned to each edge a capacity equal to the number of trucks each factory could send.
Likewise, we add an edge from every ordinary sink (distribution centers) to the super-sink with
infinite capacity. A maximum flow in this new-built network is the solution to the problem - the
sources now become ordinary vertices, and they are subject to the entering-flow equals leaving-
flow property. You may want to keep this in your bag of tricks, as it may prove useful to most
problems.

What if we are also given the maximum number of trucks that can drive through each of the
277

cities in the country (other than the cities where the factory and the distribution center are
located)? In other words we have to deal with vertex-capacities too. Intuitively, we should be
able to reduce this to maximum-flow, but we must find a way to take the capacities from vertices
and put them back on edges, where they belong. Another nice trick comes into play. We will
build a network that has two times more vertices than the initial one. For each vertex we will
have two nodes: an in-vertex and an out-vertex, and we will direct each edge x-y from the out-
vertex of x to the in-vertex of y. We can assign them the capacities from the problem statement.
Additionally we can add an edge for each vertex from the in to the out-vertex. The capacity this
edge will be assigned is obviously the vertex-capacity. Now we just run max-flow on this
network and compute the result.

Maximum flow problems may appear out of nowhere. Let's take this problem for instance: "You
are given the in and out degrees of the vertices of a directed graph. Your task is to find the edges
(assuming that no edge can appear more than once)." First, notice that we can perform this
simple test at the beginning. We can compute the number M of edges by summing the out-
degrees or the in-degrees of the vertices. If these numbers are not equal, clearly there is no graph
that could be built. This doesn't solve our problem, though. There are some greedy approaches
that come to mind, but none of them work. We will combine the tricks discussed above to give a
max-flow algorithm that solves this problem. First, build a network that has 2 (in/out) vertices
for each initial vertex. Now draw an edge from every out vertex to every in vertex. Next, add a
super-source and draw an edge from it to every out-vertex. Add a super-sink and draw an edge
from every in vertex to it. We now need some capacities for this to be a flow network. It should
be pretty obvious what the intent with this approach is, so we will assign the following
capacities: for each edge drawn from the super-source we assign a capacity equal to the out-
degree of the vertex it points to. As there may be only one arc from a vertex to another, we
assign a 1 capacity to each of the edges that go from the outs to the ins. As you can guess, the
capacities of the edges that enter the super-sink will be equal to the in-degrees of the vertices. If
the maximum flow in this network equals M - the number of edges, we have a solution, and for
each edge between the out and in vertices that has a flow along it (which is maximum 1, as the
capacity is 1) we can draw an edge between corresponding vertices in our graph. Note that both
x-y and y-x edges may appear in the solution. This is very similar to the maximum matching in a
bipartite graph that we will discuss later. An example is given below where the out-degrees are
(2, 1, 1, 1) and the in-degrees (1, 2, 1, 1).
278

Some other problems may ask to separate two locations minimally. Some of these problems
usually can be reduced to minimum-cut in a network. Two examples will be discussed here, but
first let's take the standard min-cut problem and make it sound more like a TopCoder problem.
We learned earlier how to find the value of the min-cut and how to find an arbitrary min-cut. In
addition to this we will now like to have a minimum-cut with the minimum number of edges. An
idea would be to try to modify the original network in such a way that the minimum cut here is
the minimum cut with the minimum edges in the original one. Notice what happens if we
multiply each edge capacity with a constant T. Clearly, the value of the maximum flow is
multiplied by T, thus the value of the minimum cut is T times bigger than the original. A
minimum cut in the original network is a minimum cut in the modified one as well. Now suppose
we add 1 to the capacity of each edge. Is a minimum cut in the original network a minimum cut
in this one? The answer is no, as we can see in Figure 8 shown below, if we take T = 2.

Why did this happen? Take an arbitrary cut. The value of the cut will be T times the original
value of the cut, plus the number of edges in it. Thus, a non-minimum cut in the first place could
become minimum if it contains just a few edges. This is because the constant might not have
been chosen properly in the beginning, as is the case in the example above. We can fix this by
choosing T large enough to neutralize the difference in the number of edges between cuts in the
network. In the above example T = 4 would be enough, but to generalize, we take T = 10, one
more than the number of edges in the original network, and one more than the number of edges
279

that could possibly be in a minimum-cut. It is now true that a minimum-cut in the new network is
minimum in the original network as well. However the converse is not true, and it is to our
advantage. Notice how the difference between minimum cuts is now made by the number of
edges in the cut. So we just find the min-cut in this new network to solve the problem correctly.

Let's illustrate some more the min-cut pattern: "An undirected graph is given. What is the
minimum number of edges that should be removed in order to disconnect the graph?" In other
words the problem asks us to remove some edges in order for two nodes to be separated. This
should ring a bell - a minimum cut approach might work. So far we have only seen maximum
flow in directed graphs, but now we are facing an undirected one. This should not be a very big
problem though, as we can direct the graph by replacing every (undirected) edge x-y with two
arcs: x-y and y-x. In this case the value of the min-cut is the number of edges in it, so we assign a
1 capacity to each of them. We are not asked to separate two given vertices, but rather to
disconnect optimally any two vertices, so we must take every pair of vertices and treat them as
the source and the sink and keep the best one from these minimum-cuts. An improvement can be
made, however. Take one vertex, let's say vertex numbered 1. Because the graph should be
disconnected, there must be another vertex unreachable from it. So it suffices to treat vertex 1 as
the source and iterate through every other vertex and treat it as the sink.

What if instead of edges we now have to remove a minimum number of vertices to disconnect
the graph? Now we are asked for a different min-cut, composed of vertices. We must somehow
convert the vertices to edges though. Recall the problem above where we converted vertex-
capacities to edge-capacities. The same trick works here. First "un-direct" the graph as in the
previous example. Next double the number of vertices and deal edges the same way: an edge x-y
is directed from the out-x vertex to in-y. Then convert the vertex to an edge by adding a 1-
capacity arc from the in-vertex to the out-vertex. Now for each two vertices we must solve the
sub-problem of minimally separating them. So, just like before take each pair of vertices and
treat the out-vertex of one of them as the source and the in-vertex of the other one as the sink
(this is because the only arc leaving the in-vertex is the one that goes to the out-vertex) and take
the lowest value of the maximum flow. This time we can't improve in the quadratic number of
steps needed, because the first vertex may be in an optimum solution and by always considering
it as the source we lose such a case.

Maximum Bipartite Matching


This is one of the most important applications of maximum flow, and a lot of problems can be
reduced to it. A matching in a graph is a set of edges such that no vertex is touched by more than
one edge. Obviously, a matching with a maximum cardinality is a maximum matching. For a
general graph, this is a hard problem to deal with.
280

Let's direct our attention towards the case where the graph is bipartite - its vertices can be split
into two sets such that there is no edge connecting vertices from the same set. In this case, it may
sound like this: "Each of your employees can handle a given set of jobs. Assign a job to as many
of them as you can."

A bipartite graph can be built in this case: the first set consists of your employees while the
second one contains the jobs to be done. There is an edge from an employee to each of the jobs
he could be assigned. An example is given below:

So, Joe can do jobs B, C and D while Mark wouldn't mind being assigned jobs A, D or E. This is
a happy case in which each of your employees is assigned a job:
281

In order to solve the problem we first need to build a flow network. Just as we did in the
multiple-source multiple-sink problem we will add two "dummy" vertices: a super-source and a
super-sink, and we will draw an edge from the super-source to each of the vertices in set A
(employees in the example above) and from each vertex in set B to the super-sink. In the end,
each unit of flow will be equivalent to a match between an employee and a job, so each edge will
be assigned a capacity of 1. If we would have assigned a capacity larger than 1 to an edge from
the super-source, we could have assigned more than one job to an employee. Likewise, if we
would have assigned a capacity larger than 1 to an edge going to the super-sink, we could have
assigned the same job to more than one employee. The maximum flow in this network will give
us the cardinality of the maximum matching. It is easy to find out whether a vertex in set B is
matched with a vertex x in set A as well. We look at each edge connecting x to a vertex in set B,
and if the flow is positive along one of them, there exists a match. As for the running time, the
number of augmenting paths is limited by min(|A|,|B|), where by |X| is denoted the cardinality of
set X, making the running time O(NM), where N is the number of vertices, and M the number of
edges in the graph.

An implementation point of view is in place. We could implement the maximum bipartite


matching just like in the pseudocode given earlier. Usually though, we might want to consider
the particularities of the problem before getting to the implementation part, as they can save time
or space. In this case, we could drop the 2-dimensional array that stored the residual network and
replace it with two one-dimensional arrays: one of them stores the match in set B (or a sentinel
value if it doesn't exist) for each element of set A, while the other is the other way around. Also,
notice that each augmenting path has capacity 1, as it contributes with just a unit of flow. Each
element of set A can be the first (well, the second, after the super-source) in an augmenting path
at most once, so we can just iterate through each of them and try to find a match in set B. If an
augmenting path exists, we follow it. This might lead to de-matching other elements along the
way, but because we are following an augmenting path, no element will eventually remain
unmatched in the process.
282

Now let's solve some TopCoder problems!

RookAttack
Problem Statement

This problem asks us to place a maximum number of rooks on a rows x cols chessboard with
some squares cut out. The idea behind this might be a little hard to spot, but once this is done, we
get into a standard maximum bipartite-matching problem.

Notice that at most one rook can be placed on each row or column. In other words, each row
corresponds at most to one column where a rook can be placed. This suggests a bipartite
matching where set A is composed of elements corresponding to every row of the board, while
set B consists of the columns. For each row add edges to every column if the corresponding
square is not cut out of the board. Now we can just run maximum bipartite-matching in this
network and compute the result. Since there are at most rows * cols edges, the time complexity
of the algorithm is: O(rows * cols)

In the C++ code below BFS is used for finding an augmenting-path:


class RookAttack
{ // a list of the non-empty squares for each row
vector lst[300];
// in this arrays we keep matches found to every row and column
int row_match[300], col_match[300];
// we search for an augmenting path starting with row source
bool find_match(int source) {
// from[x] = the row-vertex that precedes x in the path
int from[300], where, match;
memset(from, -1, sizeof(from));
from[source] = source;
deque q;
q.push_back(source);
283

bool found_path = false;


while (!found_path && !q.empty()) {
// where = current row-vertex we are in
where = q.front(); q.pop_front();
// we take every uncut square in the current row
for (int i = 0; i < lst[where].size(); ++ i) {
match = lst[where][i];
// next = the row matched with column match
int next = col_match[match];
if (where != next) {
// no row matched with column match thus
we found an augmenting path
if (next == -1) {
found_path = true;
break;
}
// a check whether we already visited
the row-vertex next
if (from[next] == -1) {
q.push_back(next);
from[next] = where;
}
}
}
}
if (!found_path)
return false;
while (from[where] != where) {
// we de-match where from its current match (aux)
and match it with match
int aux = row_match[where];
row_match[where] = match;
col_match[match] = where;
where = from[where];
match = aux;
}
// at this point where = source
row_match[where] = match;
col_match[match] = where;
return true;
}

public:
int howMany(int rows, int cols, vector cutouts)
{ // build lst from cutouts; column j should appear in
row's i list if square (i, j) is present on the board

int ret = 0;
memset(row_match, -1, sizeof(row_match));
memset(col_match, -1, sizeof(col_match));
// we try to find a match for each row
for (int i = 0; i < rows; ++ i)
ret += find_match(i);
return ret;
}
};
284

Let's take a look at the DFS version, too. We can implement the find_match function like this:
for each non-empty square in the current row try to match the row with its corresponding column
and call find_match recursively to attempt to find a new match for the current match (if the
current match exists - if not, an augmenting path is found) of this column. If one is found, we can
perform the desired match. Note that to make this run in time we must not visit the same column
(or row) twice. Notice the C++ code below is extremely short:
bool find_match(int where) {
// the previous column was not matched
if (where == -1)
return true;
for (int i = 0; i < lst[where].size(); ++ i) {
int match = lst[where][i];
if (visited[match] == false) {
visited[match] = true;
if (find_match(col_match[match])) {
col_match[match] = where;
return true;
}
}
}
return false;
}
This runs in time because the number of augmenting paths is the same for both versions. The
only difference is that BFS finds the shortest augmenting-path while DFS finds a longer one. As
implementation speed is an important factor in TopCoder matches, in this case it would be a
good deal to use the slower, but easier DFS version.

The following version of the problem is left as an exercise for the reader: to try and place as
many rooks as possible on the board in such a way that the number of rooks on each row is equal
to the number of rooks on each column (it is allowed for two rooks to attack each other).

Graduation
Problem Statement

In this problem we are given a set of requirements, each stating that a number of classes should
be taken from a given set of classes. Each class may be taken once and fulfills a single
requirement. Actually, the last condition is what makes the problem harder, and excludes the
idea of a greedy algorithm. We are also given a set of classes already taken. If it weren't for this,
to ensure the minimality of the return, the size of the returned string would have been (if a
solution existed) the sum of the number of classes for each requirement. Now as many classes as
possible must be used from this set.

At first glance, this would have been a typical bipartite-matching problem if every requirement
had been fulfilled by taking just a single class. Set A would have consisted of the classes
available (all characters with ASCII code in the range 33-126, except for the numeric characters
'0'-'9'), while the set of requirements would have played the role of set B. This can be taken care
of easily. Each requirement will contribute to set B with a number of elements equal to the
number of classes that must be taken in order to fulfill it - in other words, split each requirement
into several requirements. At this point, a bipartite-matching algorithm can be used, but care
285

should be allotted to the order in which we iterate through the set of classes and match a class
with a requirement.

It is important to understand that any order to iterate through set A can be considered when
solving the standard bipartite-matching problem. For example, it doesn't matter what element
from set A we choose to be the first one to be matched. Consider the solution found by the
algorithm containing this element x from A, matched with an element y from B. Also, we should
consider any optimal solution. Clearly, in the optimal, y must be matched with an element z from
A, otherwise we can add the pair x-y to the matching, contradicting the fact that the solution is
optimal. Then, we can just exchange z with x to come with a solution of the same cardinality,
which completes the proof.

That being said, to gain as much as possible from the classes already taken we first must match
each of these with a requirement. If, after completing this step, all requirements are fulfilled, we
just need to return the empty string, as there is no need for taking more classes. Now we have to
deal with the requirement that the return must be the first in lexicographic order. It should be
obvious now that the other classes must be considered in increasing order. If a match is found for
a class, that class is added to the return value. In the end, if not every requirement is fulfilled, we
don't have a solution. The implementation is left as an exercise for the reader.

As a final note, it is possible to speed things up a bit. To achieve this, we will drop the idea of
splitting each requirement. Instead we will modify the capacities of the edges connecting those
with the super-sink. They will now be equal to the number of classes to be taken for each
requirement. Then we can just go on with the same approach as above.
286

Parking
Problem Statement

In this problem we have to match each of the cars with a parking spot. Additionally the time it
takes for all cars to find a parking spot must be minimized. Once again we build a bipartite
graph: set A is the set that consists of the cars and set B contains the parking spots. Each edge
connecting elements from different sets has as the cost (and not the capacity!) the time required
for the car to get to the parking spot. If the spot is unreachable, we can assign it an infinite cost
(or remove it). These costs are determined by running breadth-first search.

For solving it, assume that the expected result is less than or equal to a constant D. Then, there
exists a matching in which each edge connecting a car and a parking spot has the cost less than
or equal to D. Thus, removing edges with cost greater than D will have no effect on the solution.
This suggests a binary search on D, removing all edges with cost greater than D, and then
performing a maximum bipartite-matching algorithm. If a matching exists in which every car can
drive to a parking spot, we can decrease D otherwise we must increase it.

However, there is a faster and more elegant solution using a priority-first search. Instead of
keeping D fixed as above, we could try to successively increase D whenever we find that it is too
low. We will start with D = 0. Then we iterate through each of the cars and try to find an
augmenting path in which no edge has a cost larger than D. If none exists, we increase D until
one path exists. Obviously, we will increase it with the smallest possible amount. In order to
achieve this, we will search for the augmenting path with the smallest cost - the cost of the path
is the maximum cost of an edge on that path. This can be done with a priority-first search similar
to the PFS augmenting-path algorithm presented in the first section of the article. C++ code
follows:
struct node {
287

int where, cost, from;


node(int _where, int _cost, int _from): where(_where),
cost(_cost), from(_from) {};
};
bool operator < (node a, node b) {
return a.cost > b.cost;
}

int minTime(vector park)


{
// build a cost matrix cost[i][j] = cost of getting from car i to
parking spot j, by doing a BFS
// vertices 0, 1, ..., N - 1 will represent the cars, and
vertices N, N + 1, ..., N + M - 1 will represent
//the parking spots; N + M will be the super-sink
int D = 0, sink = N + M;
int car_match[105], park_match[105];
memset(car_match, -1, sizeof(car_match));
memset(park_match, -1, sizeof(park_match));

for (int source = 0; source < N; ++ source) {


bool visited[210];
memset(visited, false, sizeof(visited));
int from[210];
memset(from, -1, sizeof(from));
priority_queue pq;
pq.push(node(source, 0, -1));
while (!pq.empty()) {
int cst = pq.top().cost, where = pq.top().where,
_from = pq.top().from;
pq.pop();
if (visited[where]) continue;
visited[where] = true;
from[where] = _from;
// if where is a car try all M parking spots
if (where < N) {
for (int i = 0; i < M; ++ i) {
// if the edge doesn't exist or this car
is already matched with this parking spot
if (cost[where][i] == infinity ||
car_match[where] == i) continue;
int ncst = max(cst, cost[where][i]);
// the i-th parking spot is N + i
pq.push(node(N + i, ncst, where));
}
}
else {
// if this parking spot is unmatched we found
the augmenting path with minimum cost
if (park_match[where - N] == -1) {
from[sink] = where;
// if D needs to be increased, increase it
D = max(D, cst);
break;
}
// otherwise we follow the backward edge
int next = park_match[where - N];
288

int ncst = max(cst, cost[next][where]);


pq.push(node(next, ncst, where));
}
}

int where = from[sink];


// if no augmenting path is found we have no solution
if (where == -1)
return -1;
// follow the augmenting path
while (from[where] > -1) {
int prev = from[where];
// if where is a parking spot the edge (prev, where)
is a forward edge and the match must be updated
if (where >= N) {
car_match[prev] = where;
park_match[where - N] = prev;
}
where = prev;
}
}

return D;
}

This article covers the so-called "min-cost flow" problem, which has many applications for both
TopCoder competitors and professional programmers. The article is targeted to readers who are
not familiar with the subject, with a focus more on providing a general understanding of the
ideas involved rather than heavy theory or technical details; for a more in-depth look at this
topic, check out the references at the end of this article, in particular [1]. In addition, readers of
this article should be familiar with the basic concepts of graph theory -- including shortest paths
[4], paths with negative cost arcs, negative cycles [1] -- and maximum flow theory's basic
algorithms [3].

The article is divided into three parts. In Part 1, we'll look at the problem itself. The next part
will describe three basic algorithms, and Part 3 some applications to the problem will be covered
in Part 3.

Statement of the Problem


What is the minimum cost flow problem? Let's begin with some important terminology.

Let be a directed network defined by a set V of vertexes (nodes) and set E of edges
(arcs). For each edge we associate a capacity uij that denotes the maximum amount that
can flow on the edge. Each edge also has an associated cost cij that denotes the cost per
unit flow on that edge.
289

We associate with each vertex a number bi. This value represents supply/demand of the
vertex. If bi > 0, node i is a supply node; if bi < 0, node i is a demand node (its demand is equal
to -bi). We call vertex i a transshipment if bi is zero.

For simplification, let's call G a transportation network and write in case we


want to show all the network parameters explicitly.

Figure 1. An example of the transportation network. In this we have 2 supply vertexes (with
supply values 5 and 2), 3 demand vertexes (with demand values 1, 4 and 2), and 1 transshipment
node. Each edge has two numbers, capacity and cost, divided by comma.

Representing the flow on arc by xij, we can obtain the optimization model for the
minimum cost flow problem:

subject to

The first constraint states that the total outflow of a node minus the total inflow of the node must
be equal to mass balance (supply/demand value) of this node. This is known as the mass balance
constraints. Next, the flow bound constraints model physical capacities or restrictions imposed
290

on the flow's range. As you can see, this optimization model describes a typical relationship
between warehouses and shops, for example, in a case where we have only one kind of product.
We need to satisfy the demand of each shop by transferring goods from the subset of
warehouses, while minimizing the expenses on transportation.

This problem could be solved using simplex-method, but in this article we concentrate on some
other ideas related to network flow theory. Before we move on to the three basic algorithms used
to solve the minimum cost flow problem, let's review the necessary theoretical base.

Finding a solution
When does the minimum cost flow problem have a feasible (though not necessarily optimal)
solution? How do we determine whether it is possible to translate the goods or not?

If then the problem has no solution, because either the supply or the demand
dominates in the network and the mass balance constraints come into play.

We can easily avoid this situation, however, if we add a special node r with the supply/demand
value . Now we have two options: If (supply dominates) then for each node
with bi > 0 we add an arc with infinite capacity and zero cost; otherwise (demand
dominates), for each node with bi < 0, we add an arc with the same properties. Now we
have a new network with -- and it is easy to prove that this network has the same
optimal value as the objective function.

Consider the vertex r as a rubbish or scrap dump. If the shops demand is less than what the
warehouse supplies, then we have to eject the useless goods as rubbish. Otherwise, we take the
missing goods from the dump. This would be considered shady in real life, of course, but for our
purposes it is very convenient. Keep in mind that, in this case, we cannot say what exactly the
"solution" of the corrected (with scrap) problem is. And it is up to the reader how to classify the
emergency uses of the "dump." For example, we can suggest that goods remain in the
warehouses or some of the shop's demands remain unsatisfied.

Even if we have we are not sure that the edge's capacities allow us to transfer enough flow
from supply vertexes to demand ones. To determine if the network has a feasible flow, we want
to find any transfer way what will satisfy all the problem's constraints. Of course, this feasible
solution is not necessarily optimal, but if it is absent we cannot solve the problem.

Let us introduce a source node s and a sink node t. For each node with bi > 0, we add a
source arc to G with capacity bi and cost 0. For each node with bi < 0, we add a sink
arc to G with capacity -bi and cost 0.
291

Figure 2. Maximum flow in the transformed network. For simplicity we are ignoring the costs.

The new network is called a transformed network. Next, we solve a maximum flow problem
from s to t (ignoring costs, see fig.2). If the maximum flow saturates all the source and sink arcs,
then the problem has a feasible solution; otherwise, it is infeasible. As for why this approach
works, we'll leave its proof to the reader.

Having found a maximum flow, we can now remove source, sink, and all adjacent arcs and
obtain a feasible flow in G. How do we detect whether the flow is optimal or not? Does this flow
minimize costs of the objective function z? We usually verify "optimality conditions" for the
answer to these questions, but let us put them on hold for a moment and discuss some
assumptions.

Now, suppose that we have a network that has a feasible solution. Does it have an optimal
solution? If our network contains the negative cost cycle of infinite capacity then the objective
function will be unbounded. However, in some tasks, we are able to assign finite capacity to each
uncapacitated edge escaping such a situation.

So, from the theoretical point of view, for any minimum cost flow problem we have to check
some conditions: The supply/demand balance, the existence of a feasible solution, and the last
situation with uncapacitated negative cycles. These are necessary conditions for resolving the
problem. But from the practical point of view, we can check the conditions while the solution is
being found.

Assumptions
In understanding the basics of network flow theory it helps to make some assumptions, although
sometimes they can lead to a loss of generality. Of course, we could solve the problems without
these assumptions, but the solutions would rapidly become too complex. Fortunately, these
assumptions are not as restrictive as they might seem.

Assumption 1. All data (uij, cij, bi) are integral.


As we have to deal with a computer, which works with rational numbers, this assumption is not
292

restrictive in practice. We can convert rational numbers to integers by multiplying by a suitable


large number.

Assumption 2. The network is directed.


If the network were undirected we would transform it into a directed one. Unfortunately, this
transformation requires the edge's cost to be nonnegative. Let's validate this assumption.

To transform an undirected case to a directed one, we replace each undirected edge connecting
vertexes i and j by two directed edges and , both with the capacity and cost of the
replaced arc. To establish the correctness of this transformation, first we note that for undirected
arc we have constraint and the term in the objective
function. Given that we see that in some optimal solution either xij or xji will be zero. We
call such a solution non-overlapping. Now it is easy to make sure (and we leave it to the reader)
that every non-overlapping flow in the original network has an associated flow in the
transformed network with the same cost, and vise versa.

Assumption 3. All costs associated with edges are nonnegative.


This assumption imposes a loss of generality. We will show below that if a network with
negative costs had no negative cycle it would be possible to transform it into one with
nonnegative costs. However, one of the algorithms (namely cycle-canceling algorithm) which we
are going to discuss is able to work without this assumption.

For each vertex let's denote by a number associated with the vertex and call it the
potential of node i. Next define the reduced cost of an edge as

How does our objective value change? Let's denote reduced value by . Evidently, if
, then

For other values of we obtain following result:


293

For a fixed , the difference is constant. Therefore, a flow that minimizes


also minimizes z(x) and vice versa. We have proved:

Theorem 1. For any node potential the minimum cost flow problems with edge costs cij or
have the same optimal solutions. Moreover,

The following result contains very useful properties of reduced costs.

Theorem 2. Let G be a transportation network. Suppose P is a directed path from to


. Then for any node potential

Suppose W is a directed cycle. Then for any node potential

This theorem implies the following reasoning. Let's introduce a vertex s and for each node ,
we add an arc to G with some positive capacity and zero cost. Suppose that for each
number denotes length of the shortest path from s to i with respect to cost function c.
(Reminder: there is no negative length cycle). If so, one can justify (or read it in [2]) that for each
the shortest path optimality condition is satisfied:
294

Since, and yields . Moreover, applying theorem 2,


we can note that if G contains negative cycle, it will be negative for any node potential in
reduced network. So, if the transportation network has no negative cycle, we will be able to
reduce costs and make them positive by finding the shortest paths from the introduced vertex s,
otherwise, our assumption doesn't work. If the reader asks how to find the shortest path in graph
with negative costs, I'll refer you back to the basics of the graph theory. One can use Bellman-
Ford (label-correcting) algorithm to achieve this goal [1, 2].

Remember this reduced cost technique, since it appears in many applications and other
algorithms (for example, Johnson's algorithm for all pair shortest path in sparse networks uses it
[2]).

Assumption 4. The supply/demand at the vertexes satisfy the condition and the
minimum cost flow problem has a feasible solution.

This assumption is a consequence of the "Finding a Solution" section of this article. If the
network doesn't satisfy the first part of this assumption, we can either say that the problem has no
solution or make corresponding transformation according to the steps outlined in that section. If
the second part of the assumption isn't met then the solution doesn't exist.

By making these assumptions we do transform our original transportation network. However,


many problems are often given in such a way which satisfies all the assumptions.

Minimum Cost Flow, Part 2: Algorithms


By Zealint
topcoder member

In Part 1, we looked at the basics of minimum cost flow. In this section, we'll look at three
algorithms that can be applied to minimum cost flow problems.

Working with Residual Networks

Let's consider the concept of residual networks from the perspective of min-cost flow theory.
You should be familiar with this concept thanks to maximum flow theory, so we'll just extend it
to minimum cost flow theory.

We start with the following intuitive idea. Let G be a network and x be a feasible solution of the
minimum cost flow problem. Suppose that an edge (i,j) in E carries xij units of flow. We define
the residual capacity of the edge (i,j) as rij = uij - xij. This means that we can send an
additional rij units of flow from vertex i to vertex j. We can also cancel the existing flow xij on
the arc if we send up xij units of flow from j to i over the arc (i,j). Now note that sending a unit
295

of flow from i to j along the arc (i,j) increases the objective function by cij, while sending a unit
of flow from j to i on the same arc decreases the flow cost by cij.

Figure 1. The transportation network from Part 1. (a) A feasible solution. (b) The residual
network with respect to the found feasible solution.

Based on these ideas we define the residual network with respect to the given flow x as follows.
Suppose we have a transportation network G = (V,E). A feasible solution x engenders a new
(residual) transportation network, which we are used to defining by Gx = (V,Ex), where Ex is a
set of residual edges corresponding to the feasible solution x.

What is Ex? We replace each arc (i,j) in E by two arcs (i,j), (j,i): the arc (i,j) has cost cij and
(residual) capacity rij = uij - xij, and the arc (j,i) has cost -cij and (residual) capacity rji=xij. Then
we construct the set Ex from the new edges with a positive residual capacity. Look at Figure 1 to
make sure that you understand the construction of the residual network.

You can notice immediately that such a definition of the residual network has some technical
difficulties. Let's sum them up:

If G contains both the edges (i,j) and (j,i) (remember assumption 2) the residual network
may contain four edges between i and j (two parallel arcs from i to j and two contrary).
To avoid this situation we have two options. First, transform the original network to one
in which the network contains either edge (i,j) or edge (j,i), but not both, by splitting the
vertexes i and j. Second, represent our network by the adjacency list, which is handling
parallel arcs. We could even use two adjacency matrixes if it were more convenient.
Let's imagine now that we have a lot of parallel edges from i to j with different costs.
Unfortunately, we can't merge them by summarizing their capacities, as we could do
while we were finding the maximum flow. So, we need to keep each of the parallel edges
in our data structure separate.

The proof of the fact that there is a one-to-one correspondence between the original and residual
networks is out the scope of this article, but you could prove all the necessary theorems as it was
done within the maximum flow theory, or by reading [1].
296

Cycle-canceling Algorithm

This section describes the negative cycle optimality conditions and, as a consequence, cycle-
canceling algorithm. We are starting with this important theorem:

Theorem 1 (Solution Existence). Let G be a transportation network. Suppose that G contains no


uncapacitated negative cost cycle and there exists a feasible solution of the minimum cost flow
problem. Then the optimal solution exists.

Proof. One can see that the minimum cost flow problem is a special case of the linear
programming problem. The latter is well known to have an optimal solution if it has a feasible
solution and its objective function is bounded. Evidently, if G doesn't contain an uncapacitated
negative cycle then the objective function of the minimum cost flow problem is bounded from
below -- therefore, the assertion of the theorem follows forthwith.

We will use the following theorem without proof, because we don't want our article to be
overloaded with difficult theory, but you can read the proof in [1].

Theorem 2 (Negative Cycle Optimality Conditions). Let x* be a feasible solution of a minimum


cost flow problem. Then x* is an optimal solution if and only if the residual
network Gx* contains no negative cost (directed) cycle.
297

Figure 2. Cycle-Canceling Algorithm, example of the network from Figure 1. (a) We have a
feasible solution of cost 54. (b) A negative cycle 1-2-3-1 is detected in the residual network. Its
cost is -1 and capacity is 1. (c) The residual network after augmentation along the cycle. (d)
Another negative cost cycle 3-4-5-3 is detected. It has cost -2 and capacity 3. (e) The residual
network after augmentation. It doesn't contain negative cycles. (f) Optimal flow cost value is
equal to 47.

This theorem gives the cycle-canceling algorithm for solving the minimum cost flow problem.
First, we use any maximum flow algorithm [3] to establish a feasible flow in the network
(remember assumption 4). Then the algorithm attempts to improve the objective function by
finding negative cost cycles in the residual network and augmenting the flow on these cycles. Let
us specify a program in pseudo code like it is done in [1].

Cycle-Canceling
298

1 Establish a feasible flow x in the network


2 while ( Gx contains a negative cycle ) do
3 identify a negative cycle W

4
5 augment units of flow along the cycle W
6 update Gx

How many iterations does the algorithm perform? First, note that due to assumption 1 all the data
is integral. After line 1 of the program we have an integral feasible solution x. It implies the
integrality of Gx. In each iteration of the cycle in line 2 the algorithm finds the minimum residual
capacity in the found negative cycle. In the first iteration will be an integer. Therefore, the
modified residual capacities will be integers, too. And in all subsequent iterations the residual
capacities will be integers again. This reasoning implies:

Theorem 3 (Integrality Property). If all edge capacities and supplies/demands on vertexes are
integers, then the minimum cost flow problem always has an integer solution.

The cycle-canceling algorithm works in cases when the minimum cost flow problem has an
optimal solution and all the data is integral and we don't need any other assumptions.

Now let us denote the maximum capacity of an arc by U and its maximum absolute value of cost
by C. Suppose that m denotes the number of edges in G and n denotes the number of vertexes.
For a minimum cost flow problem, the absolute value of the objective function is bounded
by mCU. Any cycle canceling decreases the objective function by a strictly positive amount.
Since we are assuming that all data is integral, the algorithm terminates
within O(mCU)iterations. One can use O(nm) algorithm for identifying a negative cycle (for
instance, Bellman-Ford's algorithm or label correcting algorithm [1]), and obtain
complexity O(nm2CU) of the algorithm.

Successive Shortest Path Algorithm

The previous algorithm solves the maximum flow problem as a subtask. The successive shortest
path algorithm searches for the maximum flow and optimizes the objective function
simultaneously. It solves the so-called max-flow-min-cost problem by using the following idea.

Suppose we have a transportation network G and we have to find an optimal flow across it. As it
is described in the "Finding a Solution" section we transform the network by adding two
vertexes s and t (source and sink) and some edges as follows. For each node i in V with bi > 0,
we add a source arc (s,i) with capacity bi and cost 0. For each node i in V with bi < 0, we add a
sink arc (i,t) with capacity -bi and cost 0.

Then, instead of searching for the maximum flow as usual, we send flow from s to t along the
shortest path (with respect to arc costs). Next we update the residual network, find another
shortest path and augment the flow again, etc. The algorithm terminates when the residual
network contains no path from s to t (the flow is maximal). Since the flow is maximal, it
299

corresponds to a feasible solution of the original minimum cost flow problem. Moreover, this
solution will be optimal (and we are going to explain why).

The successive shortest path algorithm can be used when G contains no negative cost cycles.
Otherwise, we cannot say exactly what "the shortest path" means. Now let us justify the
successive shortest path approach. When the current flow has zero value, the transportation
network G doesn't contain a negative cost cycle (by hypothesis). Suppose that after some
augmenting steps we have flow x and Gx still contains no negative cycles. If x is maximal then it
is optimal, according to theorem 2. Otherwise, let us denote the next successfully found shortest
path in Gx by P.

Figure 3. How could a negative cycle appear in a residual network?

Suppose that after augmenting the current flow x along path P a negative cost cycle W turned up
in the residual network. Before augmenting there were no negative cycles. This means that there
was an edge (i,j) in P (or subpath (i,,j)in P) the reversal of which (j,i) closed cycle W after the
augmentation. Evidently, we could choose another path from s to t, which goes from s to i then
from i to j along edges of W then from j to t. Moreover, the cost of this path is less than the cost
of P. We have a contradiction to the supposition that P is the shortest.

What do we have? After the last step we have a feasible solution and the residual network
contains no negative cycle. The latter is the criterion of optimality.

A simple analysis shows that the algorithm performs at most O(nB) augmentations, where B is
assigned to an upper bound on the largest supply of any node. Really, each augmentation strictly
decreases the residual capacity of a source arc (which is equal to the supply of the corresponding
node). Thanks to the integrality property it decreases by at least one unit. By using
an O(nm) algorithm for finding a shortest path (there may be negative edges), we achieve
anO(n2mB) complexity of the successive shortest path algorithm.

Successive Shortest Path


1 Transform network G by adding source and sink
2 Initial flow x is zero
3 while ( Gx contains a path from s to t ) do
4 Find any shortest path P from s to t
5 Augment current flow x along P
6 update Gx

Let us reveal the meaning of node potentials from assumption 3. As it is said within assumption
3, we are able to make all edge costs nonnegative by using, for instance, Bellman-Ford's
algorithm. Since working with residual costs doesn't change shortest paths (by theorem 2, part 1)
we can work with the transformed network and use Dijkstra's algorithm to find the successive
shortest path more efficiently. However, we need to keep the edge costs nonnegative on each
300

iteration -- for this purpose, we update node potentials and reduce costs right after the shortest
path has been found. The reduce cost function could be written in the following manner:

Reduce Cost ( )
1 For each (i,j) in Ex do
2
3

Having found the successive shortest path we need to update node potentials. For each i in V the
potential is equal to the length of the shortest paths from s to t. After having reduced the cost
of each arc, we will see that along the shortest path from s to i arcs will have zero cost while the
arcs which lie out of any shortest path to any vertex will have a positive cost. That is why we
assign zero cost to any reversal arc (crev(i,j)) in the Reduce Cost Procedure in line 3. The
augmentation (along the found path) adds reversal arc (j,i) and due to the fact that (reduced)
cost cij = 0 we make (crev(i,j)) = 0 beforehand.

Why have we denoted cost of reversal arc by (crev(i,j)) instead of cji? Because the network may
contain both arcs (i,j) and (j,i) (remember assumption 2 and "Working with Residual Networks"
section). For other arcs (which lie out of the augmenting path) this forcible assignment does
nothing, because its reversal arcs will not appear in the residual network. Now we propose a
pseudo-code program:

Successive Shortest Path with potentials


1 Transform network G by adding source and sink
2 Initial flow x is zero
3 Use Bellman-Ford's algorithm to establish potentials
4 Reduce Cost ( )
5 while ( Gx contains a path from s to t ) do
6 Find any shortest path P from s to t
7 Reduce Cost ( )
8 Augment current flow x along P
9 update Gx

Before starting the cycle in line 5 we calculate node potentials and obtain all costs to be
nonnegative. We use the same massif of costs c when reducing. In line 6 we use Dijkstra's
algorithm to establish a shortest path with respect to the reduced costs. Then we reduce costs and
augment flow along the path. After the augmentation all costs will remain nonnegative and in the
next iteration Dijkstra's algorithm will work correctly.
301

Figure 4. The Successive shortest Path Algorithm. (a) Initial task. (b) Node potentials are
calculated after line 3 of the program. (c) Reduced costs after line 4. (d) The first augmenting
path s-1-2-3-4-t of capacity 2 is found and new node potentials are calculated. (e) The residual
network with reduced costs. (f) The second augmenting path s-1-3-4-t of capacity 1 is found. (g)
The residual network with reduced costs. (h) The third shortest augmenting path s-1-3-5-t and
new node potentials are found. (i) The residual network contains no augmenting paths. (j) The
reconstructed transportation network. Optimal flow has cost 12.
302

We use Bellman-Ford's algorithm only once to avoid negative costs on edges. It


takes O(nm) time. Then O(nB) times we use Dijkstra algorithm, which takes
either O(n2) (simple realization) or O(mlogn) (heap realization for sparse network, [4]) time.
Summing up, we receive O(n3B) estimate working time for simple realization
and O(nmBlogn) if using heap. One could even use Fibonacci Heaps to
obtain O(nlogn+m) complexity of Dijkstra's shortest path algorithm; however I wouldn't
recommend doing so because this case works badly in practice.

Primal-Dual Algorithm

The primal-dual algorithm for the minimum cost flow problem is similar to the successive
shortest path algorithm in the sense that it also uses node potentials and shortest path algorithm
to calculate them. Instead of augmenting the flow along one shortest path, however, this
algorithm increases flow along all the shortest paths at once. For this purpose in each step it uses
any maximum flow algorithm to find the maximum flow through the so called admissible
network, which contains only those arcs in Gx with a zero reduced cost. We represent the
admissible residual network with respect to flow x as . Let's explain the idea by using a
pseudo-code program.

Primal-Dual
1 Transform network G by adding source and sink
2 Initial flow x is zero
3 Use Bellman-Ford's algorithm to establish potentials
4 Reduce Cost ( )
5 while ( Gx contains a path from s to t ) do
6 Calculate node potential using Dijkstra's algorithm
7 Reduce Cost ( )
8 Establish a maximum flow y from s to t in
9 x x + y
10 update Gx

For a better illustration look at Figure 5.


303

Figure 5. Primal-Dual algorithm. (a) Example network. (b) Node potentials are calculated. (c)
The maximum flow in the admissible network. (d) Residual network and new node potentials. (e)
The maximum flow in the admissible network. (f) Residual network with no augmenting paths.
(g) The optimal solution.
304

As mentioned above, the primal-dual algorithm sends flow along all shortest paths at once;
therefore, proof of correctness is similar to the successive shortest path one.

First, the primal-dual algorithm guarantees that the number of iterations doesn't exceed O(nB) as
well as the successive shortest path algorithm. Moreover, since we established a maximum flow
in , the residual network Gx contains no directed path from vertex s to vertex t consisting
entirely of arcs of zero costs. Consequently, the distance between s and t increases by at least one
unit. These observations give a bound of min{nB,nC} on the number of iterations which the
primal-dual algorithm performs. Keep in mind, though, that the algorithm incurs the additional
expense of solving a maximum flow problem at every iteration. However, in practice both the
successive shortest path and the primal-dual algorithm work fast enough within the constraint of
50 vertexes and reasonable supply/demand values and costs.

Minimum Cost Flow, Part 3: Applications


By Zealint
topcoder member

The last part of the article introduces some well known applications of the minimum cost flow
problem. Some of the applications are described according to [1].

The Assignment Problem

There are a number of agents and a number of tasks. Any agent can be assigned to perform any
task, incurring some cost that may vary depending on the agent-task assignment. We have to get
all tasks performed by assigning exactly one agent to each task in such a way that the total cost
of the assignment is minimal with respect to all such assignments.

In other words, consider we have a square matrix with n rows and n columns. Each cell of the
matrix contains a number. Let's denote by cij the number which lays on the intersection of i-th
row and j-th column of the matrix. The task is to choose a subset of the numbers from the matrix
in such a way that each row and each column has exactly one number chosen and sum of the
chosen numbers is as minimal as possible. For example, assume we had a matrix like this:

In this case, we would chose numbers 3, 4, and 3 with sum 10. In other words, we have to find an
integral solution of the following linear programming problem:
305

subject to

If binary variable xij = 1 we will choose the number from cell (i,j) of the given matrix.
Constraints guarantee that each row and each column of the matrix will have only one number
chosen. Evidently, the problem has a feasible solution (one can choose all diagonal numbers). To
find the optimal solution of the problem we construct the bipartite transportation network as it is
drawn in Figure 1. Each edge (i,j') of the graph has unit capacity and cost cij. All supplies and
demands are equal to 1 and -1 respectively. Implicitly, minimum cost flow solution corresponds
to the optimal assignment and vise versa. Thanks to left-to-right directed edges the network
contains no negative cycles and one is able to solve it with complexity of O(n3). Why? Hint: use
the successive shortest path algorithm.

Figure 1. Full weighted bipartite network for the assignment problem. Each edge has capacity 1
and cost according to the number in the given matrix.

The assignment problem can also be represented as weight matching in a weighted bipartite
graph. The problem allows some extensions:

Suppose that there is a different number of supply and demand nodes. The objective
might be to find a maximum matching with a minimum weight.
Suppose that we have to choose not one but k numbers in each row and each column. We
could easily solve this task if we considered supplies and demands to be equal to k and -
k (instead of 1 and -1) respectively.
306

However, we should point out that, due to the specialty of the assignment problem, there are
more effective algorithms to solve it. For instance, the Hungarian algorithm has complexity
of O(n3), but it works much more quickly in practice.

Discrete Location Problems

Suppose we have n building sites and we have to build n new facilities on these sites. The new
facilities interact with m existing facilities. The objective is to assign each new facility i to the
available building site j in such a way that minimizes the total transportation cost between the
new and existing facilities. One example is the location of hospitals, fire stations etc. in the city;
in this case we can treat population concentrations as the existing facilities.

Let's denote by dkj the distance between existing facility k and site j; and the total transportation
cost per unit distance between the new facility i and the existing one k by wik. Let's denote the
assignment by binary variable xij. Given an assignment x we can get a corresponding
transportation cost between the new facility i and the existing facility k:

Thus the total transportation cost is given by

Note, that is the cost of locating the new facility i at site j. Appending
necessary conditions, we obtain another instance of the assignment problem.

The Transportation Problem

A minimum cost flow problem is well known to be a transportation problem in the statement of
network. But there is a special case of transportation problem which is called the transportation
problem in statement of matrix. We can obtain the optimization model for this case as follows.

subject to
307

For example, suppose that we have a set of m warehouses and a set of n shops. Each
warehouse i has nonnegative supply value bi while each shop j has nonnegative demand value dj.
We are able to transfer goods from a warehousei directly to a shop j by the cost cij per unit of
flow.

Figure 2. Formulating the transportation problem as a minimum cost flow problem. Each edge
connecting a vertex i and a vertex j' has capacity uij and cost cij.

There is an upper bound to the amount of flow between each warehouse i and each
shop j denoted by uij. Minimizing total transportation cost is the object. Representing the flow
from a warehouse i to a shop j by xij we obtain the model above. Evidently, the assignment
problem is a special case of the transportation problem in the statement of matrix, which in turn
is a special case of the minimum cost flow problem.

Optimal Loading of a Hopping Airplane

We took this application from [1]. A small commuter airline uses a plane with the capacity to
carry at most p passengers on a "hopping flight." The hopping flight visits the cities 1, 2, , n, in
a fixed sequence. The plane can pick up passengers at any node and drop them off at any other
node.

Let bij denote the number of passengers available at node i who want to go to node j, and
let fij denote the fare per passenger from node i to node j.

The airline would like to determine the number of passengers that the plane should carry
between the various origins and destinations in order to maximize the total fare per trip while
never exceeding the plane capacity.
308

Figure 3. Formulating the hopping plane flight problem as a minimum cost flow problem.

Figure 3 shows a minimum cost flow formulation of this hopping plane flight problem. The
network contains data for only those arcs with nonzero costs and with finite capacities: Any arc
without an associated cost has a zero cost; any arc without an associated capacity has an infinite
capacity.

Consider, for example, node 1. Three types of passengers are available at node 1, those whose
destination is node 2, node 3, or node 4. We represent these three types of passengers by the
nodes 1-2, 1-3, and 1-4 with supplies b12,b13, and b14. A passenger available at any such node,
say 1-3, either boards the plane at its origin node by flowing though the arc (1-3,1) and thus
incurring a cost of -f13 units, or never boards the plane which we represent by the flow through
the arc (1-3,3).

We invite the reader to establish one-to-one correspondence between feasible passenger routings
and feasible flows in the minimum cost flow formulation of the problem.

Dynamic Lot Sizing

Here's another application that was first outlined in [1]. In the dynamic lot-size problem, we wish
to meet prescribed demand dj for each of K periods j = 1, 2, , K by either producing an
amount aj in period j and/or by drawing upon the inventory Ij-1 carried from the previous period.
Figure 4 shows the network for modeling this problem.

The network has K+1 vertexes: The j-th vertex, for j = 1, 2, , K, represents the j-th planning
period; node 0 represents the "source" of all production. The flow on the "production
arc" (0,j) prescribes the production level aj in period j, and the flow on "inventory carrying
arc" (j,j+1) prescribes the inventory level Ij to be carried from period j to period j+1.
309

Figure 4. Network flow model of the dynamic lot-size problem.

The mass balance equation for each period j models the basic accounting equation: Incoming
inventory plus production in that period must equal the period's demand plus the final inventory
at the end of the period. The mass balance equation for vertex 0 indicates that during the
planning periods 1, 2, , K, we must produce all of the demand (we are assuming zero
beginning and zero final inventory over the planning horizon).

geeksforgeeks.com
Ford-Fulkerson Algorithm for Maximum Flow Problem

Given a graph which represents a flow network where every edge has a capacity. Also given
two vertices source s and sink t in the graph, find the maximum possible flow from s to t with
following constraints:

a) Flow on an edge doesnt exceed the given capacity of the edge.

b) Incoming flow is equal to outgoing flow for every vertex except s and t.

For example, consider the following graph from CLRS book.


ford_fulkerson1

The maximum possible flow in the above graph is 23.


ford_fulkerson2
310

Ford-Fulkerson Algorithm
The following is simple idea of Ford-Fulkerson algorithm:
1) Start with initial flow as 0.
2) While there is a augmenting path from source to sink.
Add this path-flow to flow.
3) Return flow.

Time Complexity: Time complexity of the above algorithm is O(max_flow * E). We run a loop
while there is an augmenting path. In worst case, we may add 1 unit flow in every iteration.
Therefore the time complexity becomes O(max_flow * E).

How to implement the above simple algorithm?


Let us first define the concept of Residual Graph which is needed for understanding the
implementation.
Residual Graph of a flow network is a graph which indicates additional possible flow. If there is a
path from source to sink in residual graph, then it is possible to add flow. Every edge of a
residual graph has a value called residual capacity which is equal to original capacity of the
edge minus current flow. Residual capacity is basically the current capacity of the edge.
Let us now talk about implementation details. Residual capacity is 0 if there is no edge between
to vertices of residual graph. We can initialize the residual graph as original graph as there is no
initial flow and initially residual capacity is equal to original capacity. To find an augmenting
path, we can either do a BFS or DFS of the residual graph. We have used BFS in below
implementation. Using BFS, we can find out if there is a path from source to sink. BFS also
builds parent[] array. Using the parent[] array, we traverse through the found path and find
possible flow through this path by finding minimum residual capacity along the path. We later
add the found path flow to overall flow.
The important thing is, we need to update residual capacities in the residual graph. We subtract
path flow from all edges along the path and we add path flow along the reverse edges We need
to add path flow along reverse edges because may later need to send flow in reverse direction
(See following video for example).

http://www.youtube.com/watch?v=-8MwfgB-lyM

Following is C++ implementation of Ford-Fulkerson algorithm. To keep things simple, graph is


represented as a 2D matrix.
// C++ program for implementation of Ford Fulkerson algorithm
311

#include <iostream>
#include <limits.h>
#include <string.h>
#include <queue>
using namespace std;

// Number of vertices in given graph


#define V 6

/* Returns true if there is a path from source 's' to sink 't' in


residual graph. Also fills parent[] to store the path */
bool bfs(int rGraph[V][V], int s, int t, int parent[])
{
// Create a visited array and mark all vertices as not visited
bool visited[V];
memset(visited, 0, sizeof(visited));

// Create a queue, enqueue source vertex and mark source vertex


// as visited
queue <int> q;
q.push(s);
visited[s] = true;
parent[s] = -1;

// Standard BFS Loop


while (!q.empty())
{
int u = q.front();
q.pop();
312

for (int v=0; v<V; v++)


{
if (visited[v]==false && rGraph[u][v] > 0)
{
q.push(v);
parent[v] = u;
visited[v] = true;
}
}
}

// If we reached sink in BFS starting from source, then return


// true, else false
return (visited[t] == true);
}

// Returns tne maximum flow from s to t in the given graph


int fordFulkerson(int graph[V][V], int s, int t)
{
int u, v;

// Create a residual graph and fill the residual graph with


// given capacities in the original graph as residual capacities
// in residual graph
int rGraph[V][V]; // Residual graph where rGraph[i][j] indicates
// residual capacity of edge from i to j (if there
// is an edge. If rGraph[i][j] is 0, then there is not)
for (u = 0; u < V; u++)
for (v = 0; v < V; v++)
rGraph[u][v] = graph[u][v];
313

int parent[V]; // This array is filled by BFS and to store path

int max_flow = 0; // There is no flow initially

// Augment the flow while tere is path from source to sink


while (bfs(rGraph, s, t, parent))
{
// Find minimum residual capacity of the edhes along the
// path filled by BFS. Or we can say find the maximum flow
// through the path found.
int path_flow = INT_MAX;
for (v=t; v!=s; v=parent[v])
{
u = parent[v];
path_flow = min(path_flow, rGraph[u][v]);
}

// update residual capacities of the edges and reverse edges


// along the path
for (v=t; v != s; v=parent[v])
{
u = parent[v];
rGraph[u][v] -= path_flow;
rGraph[v][u] += path_flow;
}

// Add path flow to overall flow


max_flow += path_flow;
}
314

// Return the overall flow


return max_flow;
}

// Driver program to test above functions


int main()
{
// Let us create a graph shown in the above example
int graph[V][V] = { {0, 16, 13, 0, 0, 0},
{0, 0, 10, 12, 0, 0},
{0, 4, 0, 0, 14, 0},
{0, 0, 9, 0, 0, 20},
{0, 0, 0, 7, 0, 4},
{0, 0, 0, 0, 0, 0}
};

cout << "The maximum possible flow is " << fordFulkerson(graph, 0, 5);

return 0;
}

Output:

The maximum possible flow is 23

The above implementation of Ford Fulkerson Algorithm is called Edmonds-Karp Algorithm. The
idea of Edmonds-Karp is to use BFS in Ford Fulkerson implementation as BFS always picks a
path with minimum number of edges. When BFS is used, the worst case time complexity can be
reduced to O(VE2). The above implementation uses adjacency matrix representation though
where BFS takes O(V2) time, the time complexity of the above implementation is O(EV3) (Refer
CLRS book for proof of time complexity)
315

This is an important problem as it arises in many practical situations. Examples include,


maximizing the transportation with given traffic limits, maximizing packet flow in computer
networks.

Exercise:

Modify the above implementation so that it that runs in O(VE2) time.

Find minimum s-t cut in a flow network

In a flow network, an s-t cut is a cut that requires the source s and the sink t to be in different
subsets, and it consists of edges going from the sources side to the sinks side. The capacity of
an s-t cut is defined by the sum of capacity of each edge in the cut-set. (Source: Wiki)
The problem discussed here is to find minimum capacity s-t cut of the given network. Expected
output is all edges of the minimum cut.

For example, in the following flow network, example s-t cuts are {{0 ,1}, {0, 2}}, {{0, 2}, {1,
2}, {1, 3}}, etc. The minimum s-t cut is {{1, 3}, {4, 3}, {4 5}} which has capacity as 12+7+4 =
23.

We strongly recommend to read the below post first.


Ford-Fulkerson Algorithm for Maximum Flow Problem

Minimum Cut and Maximum Flow


Like Maximum Bipartite Matching, this is another problem which can solved using Ford-
Fulkerson Algorithm. This is based on max-flow min-cut theorem.

The max-flow min-cut theorem states that in a flow network, the amount of maximum flow is
equal to capacity of the minimum cut. See CLRS book for proof of this theorem.
316

From Ford-Fulkerson, we get capacity of minimum cut. How to print all edges that form the
minimum cut? The idea is to use residual graph.

Following are steps to print all edges of minimum cut.

1) Run Ford-Fulkerson algorithm and consider the final residual graph.

2) Find the set of vertices that are reachable from source in the residual graph.

3) All edges which are from a reachable vertex to non-reachable vertex are minimum cut edges.
Print all such edges.

Following is C++ implementation of the above approach.

// C++ program for finding minimum cut using Ford-Fulkerson

#include <iostream>

#include <limits.h>

#include <string.h>

#include <queue>

using namespace std;

// Number of vertices in given graph

#define V 6

/* Returns true if there is a path from source 's' to sink 't' in

residual graph. Also fills parent[] to store the path */

int bfs(int rGraph[V][V], int s, int t, int parent[])

// Create a visited array and mark all vertices as not visited

bool visited[V];

memset(visited, 0, sizeof(visited));

// Create a queue, enqueue source vertex and mark source vertex

// as visited

queue <int> q;

q.push(s);
317

visited[s] = true;

parent[s] = -1;

// Standard BFS Loop

while (!q.empty())

int u = q.front();

q.pop();

for (int v=0; v<V; v++)

if (visited[v]==false && rGraph[u][v] > 0)

q.push(v);

parent[v] = u;

visited[v] = true;

// If we reached sink in BFS starting from source, then return

// true, else false

return (visited[t] == true);

// A DFS based function to find all reachable vertices from s. The function

// marks visited[i] as true if i is reachable from s. The initial values in

// visited[] must be false. We can also use BFS to find reachable vertices

void dfs(int rGraph[V][V], int s, bool visited[])

visited[s] = true;
318

for (int i = 0; i < V; i++)

if (rGraph[s][i] && !visited[i])

dfs(rGraph, i, visited);

// Prints the minimum s-t cut

void minCut(int graph[V][V], int s, int t)

int u, v;

// Create a residual graph and fill the residual graph with

// given capacities in the original graph as residual capacities

// in residual graph

int rGraph[V][V]; // rGraph[i][j] indicates residual capacity of edge i-j

for (u = 0; u < V; u++)

for (v = 0; v < V; v++)

rGraph[u][v] = graph[u][v];

int parent[V]; // This array is filled by BFS and to store path

// Augment the flow while tere is path from source to sink

while (bfs(rGraph, s, t, parent))

// Find minimum residual capacity of the edhes along the

// path filled by BFS. Or we can say find the maximum flow

// through the path found.

int path_flow = INT_MAX;

for (v=t; v!=s; v=parent[v])

u = parent[v];

path_flow = min(path_flow, rGraph[u][v]);


319

// update residual capacities of the edges and reverse edges

// along the path

for (v=t; v != s; v=parent[v])

u = parent[v];

rGraph[u][v] -= path_flow;

rGraph[v][u] += path_flow;

// Flow is maximum now, find vertices reachable from s

bool visited[V];

memset(visited, false, sizeof(visited));

dfs(rGraph, s, visited);

// Print all edges that are from a reachable vertex to

// non-reachable vertex in the original graph

for (int i = 0; i < V; i++)

for (int j = 0; j < V; j++)

if (visited[i] && !visited[j] && graph[i][j])

cout << i << " - " << j << endl;

return;

// Driver program to test above functions

int main()

// Let us create a graph shown in the above example


320

int graph[V][V] = { {0, 16, 13, 0, 0, 0},

{0, 0, 10, 12, 0, 0},

{0, 4, 0, 0, 14, 0},

{0, 0, 9, 0, 0, 20},

{0, 0, 0, 7, 0, 4},

{0, 0, 0, 0, 0, 0}

};

minCut(graph, 0, 5);

return 0;

Output:

1 - 3
4 - 3
4 - 5

K-d tree
rosettacode.org

A k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing
points in a k-dimensional space. k-d trees are a useful data structure for several applications,
such as searches involving a multidimensional search key (e.g. range searches and nearest
neighbor searches). k-d trees are a special case of binary space partitioning trees.

k-d trees are not suitable, however, for efficiently finding the nearest neighbor in high
dimensional spaces. As a general rule, if the dimensionality is k, the number of points in the data,
N, should be N2k. Otherwise, when k-d trees are used with high-dimensional data, most of the
points in the tree will be evaluated and the efficiency is no better than exhaustive search, and
other methods such as approximate nearest-neighbor are used instead.

Task: Construct a k-d tree and perform a nearest neighbor search for two example data sets:

1. The Wikipedia example data of [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)].
2. 1000 3-d points uniformly distributed in a 3-d cube.
321

For the Wikipedia example, find the nearest neighbor to point (9, 2) For the random data, pick a
random location and find the nearest neighbor.

In addition, instrument your code to count the number of nodes visited in the nearest neighbor
search. Count a node as visited if any field of it is accessed.

Output should show the point searched for, the point found, the distance to the point, and the
number of nodes visited.

There are variant algorithms for constructing the tree. You can use a simple median strategy or
implement something more efficient. Variants of the nearest neighbor search include nearest N
neighbors, approximate nearest neighbor, and range searches. You do not have to implement
these. The requirement for this task is specifically the nearest single neighbor. Also there are
algorithms for inserting, deleting, and balancing k-d trees. These are also not required for the
task.

Example:

Using a Quickselectesque median algorithm. Compared to unbalanced trees (random insertion),


it takes slightly longer (maybe half a second or so) to construct a million-node tree, though
average look up visits about 1/3 fewer nodes.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>

#define MAX_DIM 3
struct kd_node_t{
double x[MAX_DIM];
struct kd_node_t *left, *right;
};

inline double
dist(struct kd_node_t *a, struct kd_node_t *b, int dim)
{
double t, d = 0;
while (dim--) {
t = a->x[dim] - b->x[dim];
d += t * t;
}
return d;
}

/* see quickselect method */


struct kd_node_t*
find_median(struct kd_node_t *start, struct kd_node_t *end, int idx)
{
if (end <= start) return NULL;
if (end == start + 1)
return start;
322

inline void swap(struct kd_node_t *x, struct kd_node_t *y) {


double tmp[MAX_DIM];
memcpy(tmp, x->x, sizeof(tmp));
memcpy(x->x, y->x, sizeof(tmp));
memcpy(y->x, tmp, sizeof(tmp));
}

struct kd_node_t *p, *store, *md = start + (end - start) / 2;


double pivot;
while (1) {
pivot = md->x[idx];

swap(md, end - 1);


for (store = p = start; p < end; p++) {
if (p->x[idx] < pivot) {
if (p != store)
swap(p, store);
store++;
}
}
swap(store, end - 1);

/* median has duplicate values */


if (store->x[idx] == md->x[idx])
return md;

if (store > md) end = store;


else start = store;
}
}

struct kd_node_t*
make_tree(struct kd_node_t *t, int len, int i, int dim)
{
struct kd_node_t *n;

if (!len) return 0;

if ((n = find_median(t, t + len, i))) {


i = (i + 1) % dim;
n->left = make_tree(t, n - t, i, dim);
n->right = make_tree(n + 1, t + len - (n + 1), i, dim);
}
return n;
}

/* global variable, so sue me */


int visited;

void nearest(struct kd_node_t *root, struct kd_node_t *nd, int i, int dim,
struct kd_node_t **best, double *best_dist)
{
double d, dx, dx2;

if (!root) return;
d = dist(root, nd, dim);
323

dx = root->x[i] - nd->x[i];
dx2 = dx * dx;

visited ++;

if (!*best || d < *best_dist) {


*best_dist = d;
*best = root;
}

/* if chance of exact match is high */


if (!*best_dist) return;

if (++i >= dim) i = 0;

nearest(dx > 0 ? root->left : root->right, nd, i, dim, best,


best_dist);
if (dx2 >= *best_dist) return;
nearest(dx > 0 ? root->right : root->left, nd, i, dim, best,
best_dist);
}

#define N 1000000
#define rand1() (rand() / (double)RAND_MAX)
#define rand_pt(v) { v.x[0] = rand1(); v.x[1] = rand1(); v.x[2] = rand1(); }
int main(void)
{
int i;
struct kd_node_t wp[] = {
{{2, 3}}, {{5, 4}}, {{9, 6}}, {{4, 7}}, {{8, 1}}, {{7, 2}}
};
struct kd_node_t this = {{9, 2}};
struct kd_node_t *root, *found, *million;
double best_dist;

root = make_tree(wp, sizeof(wp) / sizeof(wp[1]), 0, 2);

visited = 0;
found = 0;
nearest(root, &this, 0, 2, &found, &best_dist);

printf(">> WP tree\nsearching for (%g, %g)\n"


"found (%g, %g) dist %g\nseen %d nodes\n\n",
this.x[0], this.x[1],
found->x[0], found->x[1], sqrt(best_dist), visited);

million = calloc(N, sizeof(struct kd_node_t));


srand(time(0));
for (i = 0; i < N; i++) rand_pt(million[i]);

root = make_tree(million, N, 0, 3);


rand_pt(this);

visited = 0;
found = 0;
nearest(root, &this, 0, 3, &found, &best_dist);
324

printf(">> Million tree\nsearching for (%g, %g, %g)\n"


"found (%g, %g, %g) dist %g\nseen %d nodes\n",
this.x[0], this.x[1], this.x[2],
found->x[0], found->x[1], found->x[2],
sqrt(best_dist), visited);

/* search many random points in million tree to see average behavior.


tree size vs avg nodes visited:
10 ~ 7
100 ~ 16.5
1000 ~ 25.5
10000 ~ 32.8
100000 ~ 38.3
1000000 ~ 42.6
10000000 ~ 46.7 */
int sum = 0, test_runs = 100000;
for (i = 0; i < test_runs; i++) {
found = 0;
visited = 0;
rand_pt(this);
nearest(root, &this, 0, 3, &found, &best_dist);
sum += visited;
}
printf("\n>> Million tree\n"
"visited %d nodes for %d random findings (%f per lookup)\n",
sum, test_runs, sum/(double)test_runs);

// free(million);

return 0;
}
output
>> WP tree

searching for (9, 2)

found (8, 1) dist 1.41421

seen 3 nodes

>> Million tree

searching for (0.29514, 0.897237, 0.941998)

found (0.296093, 0.896173, 0.948082) dist 0.00624896

seen 44 nodes

>> Million tree

visited 4271442 nodes for 100000 random findings (42.714420 per lookup)
325

Queue
sourcetricks.com

Deque is an abbreviation for double-ended queue.

It is a data structure in which the elements can only be added or removed from front and back
of the queue.

A typical deque implementation support the following operations. Insert at front an element,
insert at back an element, remove from back an element, remove from front an element, list the
front element and list the back element.

Simple method of implementing a deque is using a doubly linked list.

The time complexity of all the deque operations using a doubly linked list can be achieced
O(1).

A general purpose deque implementation can be used to mimic specialized behaviors like
stacks and queues.

For example to use deque as a stack. Insert at back an element (Push) and Remove at back
an element (Pop) can behave as a stack.

For example to use deque as a queue. Insert at back an element (Enqueue) and Remove at
front an element (Dequeue) can behave as a queue.

Deque is also supported in C++ Standard Template Library.

EXAMPLE:- Implement a deque using doubly linked lists.

#include <iostream>
using namespace std;
326

class DequeEmptyException
{
public:
DequeEmptyException()
{
cout << "Deque empty" << endl;
}
};

// Each node in a doubly linked list


class Node
{
public:
int data;
Node* next;
Node* prev;
};

class Deque
{
private:
Node* front;
Node* rear;
int count;

public:
Deque()
{
front = NULL;
rear = NULL;
327

count = 0;
}

void InsertFront(int element)


{
// Create a new node
Node* tmp = new Node();
tmp->data = element;
tmp->next = NULL;
tmp->prev = NULL;

if ( isEmpty() ) {
// Add the first element
front = rear = tmp;
}
else {
// Prepend to the list and fix links
tmp->next = front;
front->prev = tmp;
front = tmp;
}

count++;
}

int RemoveFront()
{
if ( isEmpty() ) {
throw new DequeEmptyException();
}
328

// Data in the front node


int ret = front->data;

// Delete the front node and fix the links


Node* tmp = front;
if ( front->next != NULL )
{
front = front->next;
front->prev = NULL;
}
else
{
front = NULL;
}
count--;
delete tmp;

return ret;
}

void InsertBack(int element)


{
// Create a new node
Node* tmp = new Node();
tmp->data = element;
tmp->next = NULL;
tmp->prev = NULL;

if ( isEmpty() ) {
329

// Add the first element


front = rear = tmp;
}
else {
// Append to the list and fix links
rear->next = tmp;
tmp->prev = rear;
rear = tmp;
}

count++;
}

int RemoveBack()
{
if ( isEmpty() ) {
throw new DequeEmptyException();
}

// Data in the rear node


int ret = rear->data;

// Delete the front node and fix the links


Node* tmp = rear;
if ( rear->prev != NULL )
{
rear = rear->prev;
rear->next = NULL;
}
else
330

{
rear = NULL;
}
count--;
delete tmp;

return ret;
}

int Front()
{
if ( isEmpty() )
throw new DequeEmptyException();

return front->data;
}

int Back()
{
if ( isEmpty() )
throw new DequeEmptyException();

return rear->data;
}

int Size()
{
return count;
}
331

bool isEmpty()
{
return count == 0 ? true : false;
}
};

int main()
{
// Stack behavior using a general dequeue
Deque q;
try {
if ( q.isEmpty() )
{
cout << "Deque is empty" << endl;
}

// Push elements
q.InsertBack(100);
q.InsertBack(200);
q.InsertBack(300);

// Size of queue
cout << "Size of dequeue = " << q.Size() << endl;

// Pop elements
cout << q.RemoveBack() << endl;
cout << q.RemoveBack() << endl;
cout << q.RemoveBack() << endl;
}
catch (...) {
332

cout << "Some exception occured" << endl;


}

// Queue behavior using a general dequeue


Deque q1;
try {
if ( q1.isEmpty() )
{
cout << "Deque is empty" << endl;
}

// Push elements
q1.InsertBack(100);
q1.InsertBack(200);
q1.InsertBack(300);

// Size of queue
cout << "Size of dequeue = " << q1.Size() << endl;

// Pop elements
cout << q1.RemoveFront() << endl;
cout << q1.RemoveFront() << endl;
cout << q1.RemoveFront() << endl;
}
catch (...) {
cout << "Some exception occured" << endl;
}
}

OUTPUT:
333

Deque is empty
Size of dequeue = 3
300
200
100
Deque is empty
Size of dequeue = 3
100
200
300

EXAMPLE:- Using the STL deque

#include <iostream>
#include <deque>
using namespace std;

int main()
{
// Stack behavior using a STL deque
deque<int> q;
try {
if ( q.empty() )
{
cout << "Deque is empty" << endl;
}
334

// Push elements
q.push_back(100);
q.push_back(200);
q.push_back(300);

// Size of queue
cout << "Size of deque = " << q.size() << endl;

// Pop elements
cout << q.back() << endl;
q.pop_back();
cout << q.back() << endl;
q.pop_back();
cout << q.back() << endl;
q.pop_back();
}
catch (...) {
cout << "Some exception occured" << endl;
}

// Queue behavior using a STL deque


deque<int> q1;
try {
if ( q1.empty() )
{
cout << "Deque is empty" << endl;
}

// Push elements
q1.push_back(100);
335

q1.push_back(200);
q1.push_back(300);

// Size of queue
cout << "Size of deque = " << q1.size() << endl;

// Pop elements
cout << q1.front() << endl;
q1.pop_front();
cout << q1.front() << endl;
q1.pop_front();
cout << q1.front() << endl;
q1.pop_front();
}
catch (...) {
cout << "Some exception occured" << endl;
}
}

OUTPUT:-

Deque is empty
Size of dequeue = 3
300
200
100
Deque is empty
Size of dequeue = 3
100
200
336

300

Binary Search Tree


sourcetricks.com

Binary Search Tree (BST) is a binary tree (has atmost 2 children).


It is also referred as sorted/ ordered binary tree.
BST has the following properties. (notes from wikipedia)
o The left subtree of a node contains only nodes with keys less than the node's
key.
o The right subtree of a node contains only nodes with keys greater than the
node's key.
o Both the left and right subtrees must also be binary search trees.

BST Operations:-
o Searching in a BST
Examine the root node. If tree is NULL value doesn't exist.
If value equals the key in root search is successful and return.
If value is less than root, search the left sub-tree.
If value is greater than root, search the right sub-tree.
Continue until the value is found or the sub tree is NULL.
Time complexity. Average: O(log n), Worst: O(n) if the BST is unbalanced
and resembles a linked list.

o Insertion in BST
Insertion begin as a search.
Compare the key with root. If not equal search the left or right sub tre
When a leaf node is reached add the new node to left or right based on
the value.
Time complexity. Average: O(log n), Worst O(n)

o Deletion in BST
There are three possible cases to consider:
Deleting a leaf (node with no children): Deleting a leaf is easy, as
we can simply remove it from the tree.
Deleting a node with one child: Remove the node and replace it
with its child.
Deleting a node with two children: Call the node to be deleted N.
Do not delete N. Instead, choose either its in-order successor
node or its in-order predecessor node, R. Replace the value of N
with the value of R, then delete R.

A sample implementation of BST using C++


#include <iostream>
using namespace std;

// A generic tree node class


class Node {
int key;
Node* left;
337

Node* right;
Node* parent;
public:
Node() { key=-1; left=NULL; right=NULL; parent = NULL;};
void setKey(int aKey) { key = aKey; };
void setLeft(Node* aLeft) { left = aLeft; };
void setRight(Node* aRight) { right = aRight; };
void setParent(Node* aParent) { parent = aParent; };
int Key() { return key; };
Node* Left() { return left; };
Node* Right() { return right; };
Node* Parent() { return parent; };
};

// Binary Search Tree class


class Tree {
Node* root;
public:
Tree();
~Tree();
Node* Root() { return root; };
void addNode(int key);
Node* findNode(int key, Node* parent);
void walk(Node* node);
void deleteNode(int key);
Node* min(Node* node);
Node* max(Node* node);
Node* successor(int key, Node* parent);
Node* predecessor(int key, Node* parent);
private:
void addNode(int key, Node* leaf);
void freeNode(Node* leaf);
};

// Constructor
Tree::Tree() {
root = NULL;
}

// Destructor
Tree::~Tree() {
freeNode(root);
}

// Free the node


void Tree::freeNode(Node* leaf)
{
if ( leaf != NULL )
{
freeNode(leaf->Left());
freeNode(leaf->Right());
delete leaf;
}
}

// Add a node [O(height of tree) on average]


void Tree::addNode(int key)
338

{
// No elements. Add the root
if ( root == NULL ) {
cout << "add root node ... " << key << endl;
Node* n = new Node();
n->setKey(key);
root = n;
}
else {
cout << "add other node ... " << key << endl;
addNode(key, root);
}
}

// Add a node (private)


void Tree::addNode(int key, Node* leaf) {
if ( key <= leaf->Key() )
{
if ( leaf->Left() != NULL )
addNode(key, leaf->Left());
else {
Node* n = new Node();
n->setKey(key);
n->setParent(leaf);
leaf->setLeft(n);
}
}
else
{
if ( leaf->Right() != NULL )
addNode(key, leaf->Right());
else {
Node* n = new Node();
n->setKey(key);
n->setParent(leaf);
leaf->setRight(n);
}
}
}

// Find a node [O(height of tree) on average]


Node* Tree::findNode(int key, Node* node)
{
if ( node == NULL )
return NULL;
else if ( node->Key() == key )
return node;
else if ( key <= node->Key() )
findNode(key, node->Left());
else if ( key > node->Key() )
findNode(key, node->Right());
else
return NULL;
}

// Print the tree


void Tree::walk(Node* node)
339

{
if ( node )
{
cout << node->Key() << " ";
walk(node->Left());
walk(node->Right());
}
}

// Find the node with min key


// Traverse the left sub-tree recursively
// till left sub-tree is empty to get min
Node* Tree::min(Node* node)
{
if ( node == NULL )
return NULL;

if ( node->Left() )
min(node->Left());
else
return node;
}

// Find the node with max key


// Traverse the right sub-tree recursively
// till right sub-tree is empty to get max
Node* Tree::max(Node* node)
{
if ( node == NULL )
return NULL;

if ( node->Right() )
max(node->Right());
else
return node;
}

// Find successor to a node


// Find the node, get the node with max value
// for the right sub-tree to get the successor
Node* Tree::successor(int key, Node *node)
{
Node* thisKey = findNode(key, node);
if ( thisKey )
return max(thisKey->Right());
}

// Find predecessor to a node


// Find the node, get the node with max value
// for the left sub-tree to get the predecessor
Node* Tree::predecessor(int key, Node *node)
{
Node* thisKey = findNode(key, node);
if ( thisKey )
return max(thisKey->Left());
}
340

// Delete a node
// (1) If leaf just delete
// (2) If only one child delete this node and replace
// with the child
// (3) If 2 children. Find the predecessor (or successor).
// Delete the predecessor (or successor). Replace the
// node to be deleted with the predecessor (or successor).
void Tree::deleteNode(int key)
{
// Find the node.
Node* thisKey = findNode(key, root);

// (1)
if ( thisKey->Left() == NULL && thisKey->Right() == NULL )
{
if ( thisKey->Key() > thisKey->Parent()->Key() )
thisKey->Parent()->setRight(NULL);
else
thisKey->Parent()->setLeft(NULL);

delete thisKey;
}

// (2)
if ( thisKey->Left() == NULL && thisKey->Right() != NULL )
{
if ( thisKey->Key() > thisKey->Parent()->Key() )
thisKey->Parent()->setRight(thisKey->Right());
else
thisKey->Parent()->setLeft(thisKey->Right());

delete thisKey;
}
if ( thisKey->Left() != NULL && thisKey->Right() == NULL )
{
if ( thisKey->Key() > thisKey->Parent()->Key() )
thisKey->Parent()->setRight(thisKey->Left());
else
thisKey->Parent()->setLeft(thisKey->Left());

delete thisKey;
}

// (3)
if ( thisKey->Left() != NULL && thisKey->Right() != NULL )
{
Node* sub = predecessor(thisKey->Key(), thisKey);
if ( sub == NULL )
sub = successor(thisKey->Key(), thisKey);

if ( sub->Parent()->Key() <= sub->Key() )


sub->Parent()->setRight(sub->Right());
else
sub->Parent()->setLeft(sub->Left());

thisKey->setKey(sub->Key());
delete sub;
341

}
}

// Test main program


int main() {
Tree* tree = new Tree();

// Add nodes
tree->addNode(300);
tree->addNode(100);
tree->addNode(200);
tree->addNode(400);
tree->addNode(500);

// Traverse the tree


tree->walk(tree->Root());
cout << endl;

// Find nodes
if ( tree->findNode(500, tree->Root()) )
cout << "Node 500 found" << endl;
else
cout << "Node 500 not found" << endl;

if ( tree->findNode(600, tree->Root()) )
cout << "Node 600 found" << endl;
else
cout << "Node 600 not found" << endl;

// Min & Max


cout << "Min=" << tree->min(tree->Root())->Key() << endl;
cout << "Max=" << tree->max(tree->Root())->Key() << endl;

// Successor and Predecessor


cout << "Successor to 300=" <<
tree->successor(300, tree->Root())->Key() << endl;
cout << "Predecessor to 300=" <<
tree->predecessor(300, tree->Root())->Key() << endl;

// Delete a node
tree->deleteNode(300);

// Traverse the tree


tree->walk(tree->Root());
cout << endl;

delete tree;
return 0;
}

OUTPUT:-
add root node ... 300
add other node ... 100
add other node ... 200
add other node ... 400
add other node ... 500
300 100 200 400 500
342

Node 500 found


Node 600 not found
Min=100
Max=500
Successor to 300=500
Predecessor to 300=200
200 100 400

Binary Search Tree | Set 1 (Search and Insertion)


geeksquiz.com

Binary Search Tree, is a node-based binary tree data structure which has the following
properties:

The left subtree of a node contains only nodes with keys less than the nodes key.
The right subtree of a node contains only nodes with keys greater than the nodes key.
The left and right subtree each must also be a binary search tree.
There must be no duplicate nodes.

The above properties of Binary Search Tree provide an ordering among keys so that the
operations like search, minimum and maximum can be done fast. If there is no ordering, then we
may have to compare every key to search a given key.

Searching a key
To search a given key in Bianry Search Tree, we first compare it with root, if the key is present
at root, we return root. If key is greater than roots key, we recur for right subtree of root node.
Otherwise we recur for left subtree.

// C function to search a given key in a given BST


struct node* search(struct node* root, int key)
{
// Base Cases: root is null or key is present at root
if (root == NULL || root->key == key)
return root;

// Key is greater than root's key


if (root->key < key)
return search(root->right, key);
343

// Key is smaller than root's key


return search(root->left, key);
}

Insertion of a key
A new key is always inserted at leaf. We start searching a key from root till we hit a leaf node.
Once a leaf node is found, the new node is added as a child of the leaf node.

100 100
/ \ Insert 40 / \
20 500 ---------> 20 500
/ \ / \
10 30 10 30
\
40
// C program to demonstrate insert operation in binary search tree
#include<stdio.h>
#include<stdlib.h>

struct node
{
int key;
struct node *left, *right;
};

// A utility function to create a new BST node


struct node *newNode(int item)
{
struct node *temp = (struct node *)malloc(sizeof(struct node));
temp->key = item;
temp->left = temp->right = NULL;
return temp;
}

// A utility function to do inorder traversal of BST


void inorder(struct node *root)
{
if (root != NULL)
{
inorder(root->left);
printf("%d ", root->key);
inorder(root->right);
}
}

/* A utility function to insert a new node with given key in BST */


struct node* insert(struct node* node, int key)
{
/* If the tree is empty, return a new node */
if (node == NULL) return newNode(key);

/* Otherwise, recur down the tree */


if (key < node->key)
node->left = insert(node->left, key);
else
344

node->right = insert(node->right, key);

/* return the (unchanged) node pointer */


return node;
}

// Driver Program to test above functions


int main()
{
/* Let us create following BST
50
/ \
30 70
/ \ / \
20 40 60 80 */
struct node *root = NULL;
root = insert(root, 50);
insert(root, 30);
insert(root, 20);
insert(root, 40);
insert(root, 70);
insert(root, 60);
insert(root, 80);

// print inoder traversal of the BST


inorder(root);

return 0;
}

Output:

20 30 40 50 60 70 80

Time Complexity: The worst case time complexity of search and insert operations is O(h) where
h is height of Binary Search Tree. In worst case, we may have to travel from root to the deepest
leaf node. The height of a skewed tree may become n and the time complexity of search and
insert operation may become O(n).

Binary Search Tree | Set 2 (Delete)


January 30, 2014

We have discussed BST search and insert operations. In this post, delete operation is discussed.
When we delete a node, there possibilities arise.

1) Node to be deleted is leaf: Simply remove from the tree.

50 50
/ \ delete(20) / \
30 70 ---------> 30 70
/ \ / \ \ / \
20 40 60 80 40 60 80
345

2) Node to be deleted has only one child: Copy the child to the node and delete the child

50 50
/ \ delete(30) / \
30 70 ---------> 40 70
\ / \ / \
40 60 80 60 80

3) Node to be deleted has two children: Find inorder successor of the node. Copy contents of the
inorder successor to the node and delete the inorder successor. Note that inorder predecessor can
also be used.

50 60
/ \ delete(50) / \
40 70 ---------> 40 70
/ \ \
60 80 80

The important thing to note is, inorder successor is needed only when right child is not empty. In
this particular case, inorder successor can be obtained by finding the minimum value in right
child of the node.

// C program to demonstrate delete operation in binary search tree

#include<stdio.h>

#include<stdlib.h>

struct node

int key;

struct node *left, *right;

};

// A utility function to create a new BST node

struct node *newNode(int item)

struct node *temp = (struct node *)malloc(sizeof(struct node));

temp->key = item;

temp->left = temp->right = NULL;

return temp;
346

// A utility function to do inorder traversal of BST

void inorder(struct node *root)

if (root != NULL)

inorder(root->left);

printf("%d ", root->key);

inorder(root->right);

/* A utility function to insert a new node with given key in BST */

struct node* insert(struct node* node, int key)

/* If the tree is empty, return a new node */

if (node == NULL) return newNode(key);

/* Otherwise, recur down the tree */

if (key < node->key)

node->left = insert(node->left, key);

else

node->right = insert(node->right, key);

/* return the (unchanged) node pointer */

return node;

/* Given a non-empty binary search tree, return the node with minimum

key value found in that tree. Note that the entire tree does not
347

need to be searched. */

struct node * minValueNode(struct node* node)

struct node* current = node;

/* loop down to find the leftmost leaf */

while (current->left != NULL)

current = current->left;

return current;

/* Given a binary search tree and a key, this function deletes the key

and returns the new root */

struct node* deleteNode(struct node* root, int key)

// base case

if (root == NULL) return root;

// If the key to be deleted is smaller than the root's key,

// then it lies in left subtree

if (key < root->key)

root->left = deleteNode(root->left, key);

// If the key to be deleted is greater than the root's key,

// then it lies in right subtree

else if (key > root->key)

root->right = deleteNode(root->right, key);

// if key is same as root's key, then This is the node

// to be deleted
348

else

// node with only one child or no child

if (root->left == NULL)

struct node *temp = root->right;

free(root);

return temp;

else if (root->right == NULL)

struct node *temp = root->left;

free(root);

return temp;

// node with two children: Get the inorder successor (smallest

// in the right subtree)

struct node* temp = minValueNode(root->right);

// Copy the inorder successor's content to this node

root->key = temp->key;

// Delete the inorder successor

root->right = deleteNode(root->right, temp->key);

return root;

// Driver Program to test above functions

int main()
349

/* Let us create following BST

50

/ \

30 70

/ \ / \

20 40 60 80 */

struct node *root = NULL;

root = insert(root, 50);

root = insert(root, 30);

root = insert(root, 20);

root = insert(root, 40);

root = insert(root, 70);

root = insert(root, 60);

root = insert(root, 80);

printf("Inorder traversal of the given tree \n");

inorder(root);

printf("\nDelete 20\n");

root = deleteNode(root, 20);

printf("Inorder traversal of the modified tree \n");

inorder(root);

printf("\nDelete 30\n");

root = deleteNode(root, 30);

printf("Inorder traversal of the modified tree \n");

inorder(root);

printf("\nDelete 50\n");

root = deleteNode(root, 50);


350

printf("Inorder traversal of the modified tree \n");

inorder(root);

return 0;

Output:

Inorder traversal of the given tree


20 30 40 50 60 70 80
Delete 20
Inorder traversal of the modified tree
30 40 50 60 70 80
Delete 30
Inorder traversal of the modified tree
40 50 60 70 80
Delete 50
Inorder traversal of the modified tree
40 60 70 80

Time Complexity: The worst case time complexity of delete operation is O(h) where h is height
of Binary Search Tree. In worst case, we may have to travel from root to the deepest leaf node.
The height of a skewed tree may become n and the time complexity of delete operation may
become O(n)

Quick Select
sourcetricks.com

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n
elements.
It is an O(n), worst-case linear time, selection algorithm. A typical selection by sorting method
would need atleast O(n log n) time.
This algorithm is identical to quick sort but it does only a partial sort, since we already know
which partition our desired element lies as the pivot is in final sorted position.

Quick select implementation in C++


#include <iostream>
using namespace std;

// A simple print function


void print(int *input)
{
for ( int i = 0; i < 5; i++ )
cout << input[i] << " ";
cout << endl;
}
351

int partition(int* input, int p, int r)


{
int pivot = input[r];

while ( p < r )
{
while ( input[p] < pivot )
p++;

while ( input[r] > pivot )


r--;

if ( input[p] == input[r] )
p++;
else if ( p < r ) {
int tmp = input[p];
input[p] = input[r];
input[r] = tmp;
}
}

return r;
}

int quick_select(int* input, int p, int r, int k)


{
if ( p == r ) return input[p];
int j = partition(input, p, r);
int length = j - p + 1;
if ( length == k ) return input[j];
else if ( k < length ) return quick_select(input, p, j - 1, k);
else return quick_select(input, j + 1, r, k - length);
}

int main()
{
int A1[] = { 100, 400, 300, 500, 200 };
cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
int A2[] = { 100, 400, 300, 500, 200 };
cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
int A3[] = { 100, 400, 300, 500, 200 };
cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
int A4[] = { 100, 400, 300, 500, 200 };
cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
int A5[] = { 100, 400, 300, 500, 200 };
cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}

OUTPUT:-
1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500
352

rosettacode.org

Use the quickselect algorithm on the vector

[9, 8, 7, 6, 5, 0, 1, 2, 3, 4]

To show the first, second, third, ... up to the tenth largest member of the vector, in order, here on
this page.

Note: Quicksort has a separate task.

C
#include <stdio.h>
#include <string.h>

int qselect(int *v, int len, int k)


{
# define SWAP(a, b) { tmp = v[a]; v[a] = v[b]; v[b] = tmp; }
int i, st, tmp;

for (st = i = 0; i < len - 1; i++) {


if (v[i] > v[len-1]) continue;
SWAP(i, st);
st++;
}

SWAP(len-1, st);

return k == st ?v[st]
:st > k ? qselect(v, st, k)
: qselect(v + st, len - st, k - st);
}

int main(void)
{
# define N (sizeof(x)/sizeof(x[0]))
int x[] = {9, 8, 7, 6, 5, 0, 1, 2, 3, 4};
int y[N];

int i;
for (i = 0; i < 10; i++) {
memcpy(y, x, sizeof(x)); // qselect modifies array
printf("%d: %d\n", i, qselect(y, 10, i));
}

return 0;
}
Output:
0: 0
1: 1
2: 2
3: 3
353

4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
C++
Library

It is already provided in the standard library as std::nth_element(). Although the standard


does not explicitly mention what algorithm it must use, the algorithm partitions the sequence into
those less than the nth element to the left, and those greater than the nth element to the right, like
quickselect; the standard also guarantees that the complexity is "linear on average", which fits
quickselect.

#include <algorithm>
#include <iostream>

int main() {
for (int i = 0; i < 10; i++) {
int a[] = {9, 8, 7, 6, 5, 0, 1, 2, 3, 4};
std::nth_element(a, a + i, a + sizeof(a)/sizeof(*a));
std::cout << a[i];
if (i < 9) std::cout << ", ";
}
std::cout << std::endl;

return 0;
}
Output:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Implementation

A more explicit implementation:

#include <iterator>
#include <algorithm>
#include <functional>
#include <cstdlib>
#include <ctime>
#include <iostream>

template <typename Iterator>


Iterator select(Iterator begin, Iterator end, int n) {
typedef typename std::iterator_traits<Iterator>::value_type T;
while (true) {
Iterator pivotIt = begin + std::rand() % std::distance(begin, end);
std::iter_swap(pivotIt, end-1); // Move pivot to end
pivotIt = std::partition(begin, end-1, std::bind2nd(std::less<T>(),
*(end-1)));
std::iter_swap(end-1, pivotIt); // Move pivot to its final place
if (n == pivotIt - begin) {
354

return pivotIt;
} else if (n < pivotIt - begin) {
end = pivotIt;
} else {
n -= pivotIt+1 - begin;
begin = pivotIt+1;
}
}
}

int main() {
std::srand(std::time(NULL));
for (int i = 0; i < 10; i++) {
int a[] = {9, 8, 7, 6, 5, 0, 1, 2, 3, 4};
std::cout << *select(a, a + sizeof(a)/sizeof(*a), i);
if (i < 9) std::cout << ", ";
}
std::cout << std::endl;

return 0;
}
Output:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9

STL (C++)
topcoder.com

Power up C++ with the Standard Template


Library: Part I
By DmitryKorolev
topcoder member

Perhaps you are already using C++ as your main programming language to solve topcoder
problems. This means that you have already used STL in a simple way, because arrays and
strings are passed to your function as STL objects. You may have noticed, though, that many
coders manage to write their code much more quickly and concisely than you.

Or perhaps you are not a C++ programmer, but want to become one because of the great
functionality of this language and its libraries (and, maybe, because of the very short solutions
you've read in topcoder practice rooms and competitions).
355

Regardless of where you're coming from, this article can help. In it, we will review some of the
powerful features of the Standard Template Library (STL) a great tool that, sometimes, can
save you a lot of time in an algorithm competition.

The simplest way to get familiar with STL is to begin from its containers.

Containers

Any time you need to operate with many elements you require some kind of container. In native
C (not C++) there was only one type of container: the array.

The problem is not that arrays are limited (though, for example, its impossible to determine the
size of array at runtime). Instead, the main problem is that many problems require a container
with greater functionality.

For example, we may need one or more of the following operations:

Add some string to a container.


Remove a string from a container.
Determine whether a string is present in the container.
Return a number of distinct elements in a container.
Iterate through a container and get a list of added strings in some order.

Of course, one can implement this functionality in an ordinal array. But the trivial
implementation would be very inefficient. You can create the tree- of hash- structure to solve it
in a faster way, but think a bit: does the implementation of such a container depend on elements
we are going to store? Do we have to re-implement the module to make it functional, for
example, for points on a plane but not strings?

If not, we can develop the interface for such a container once, and then use everywhere for data
of any type. That, in short, is the idea of STL containers.

Before we begin
When the program is using STL, it should #include the appropriate standard headers. For most
containers the title of standard header matches the name of the container, and no extension is
required. For example, if you are going to use stack, just add the following line at the beginning
of your program:

#include <stack>

Container types (and algorithms, functors and all STL as well) are defined not in global
namespace, but in special namespace called std." Add the following line after your includes and
before the code begin:

using namespace std;


356

Another important thing to remember is that the type of a container is the template parameter.
Template parameters are specified with the </> "brackets" in code. For example:

vector<int> N;

When making nested constructions, make sure that the "brackets" are not directly following one
another leave a blank between them.

vector< vector<int> > CorrectDefinition;


vector<vector<int>> WrongDefinition; // Wrong: compiler may be confused by
'operator >>'

Vector

The simplest STL container is vector. Vector is just an array with extended functionality. By the
way, vector is the only container that is backward-compatible to native C code this means that
vector actually IS the array, but with some additional features.

vector<int> v(10);
for(int i = 0; i < 10; i++) {
v[i] = (i+1)*(i+1);
}
for(int i = 9; i > 0; i--) {
v[i] -= v[i-1];
}

Actually, when you type

vector<int> v;

the empty vector is created. Be careful with constructions like this:

vector<int> v[10];

Here we declare 'v' as an array of 10 vector<int>s, which are initially empty. In most cases, this
is not that we want. Use parentheses instead of brackets here. The most frequently used feature
of vector is that it can report its size.

int elements_count = v.size();

Two remarks: first, size() is unsigned, which may sometimes cause problems. Accordingly, I
usually define macros, something like sz(C) that returns size of C as ordinal signed int. Second,
its not a good practice to compare v.size() to zero if you want to know whether the container is
empty. You're better off using empty() function:
357

bool is_nonempty_notgood = (v.size() >= 0); // Try to avoid this


bool is_nonempty_ok = !v.empty();

This is because not all the containers can report their size in O(1), and you definitely should not
require counting all elements in a double-linked list just to ensure that it contains at least one.

Another very popular function to use in vector is push_back. Push_back adds an element to the
end of vector, increasing its size by one. Consider the following example:

vector<int> v;
for(int i = 1; i < 1000000; i *= 2) {
v.push_back(i);
}
int elements_count = v.size();

Dont worry about memory allocation -- vector will not allocate just one element each time.
Instead, vector allocates more memory then it actually needs when adding new elements with
push_back. The only thing you should worry about is memory usage, but at topcoder this may
not matter. (More on vectors memory policy later.)

When you need to resize vector, use the resize() function:

vector<int> v(20);
for(int i = 0; i < 20; i++) {
v[i] = i+1;
}
v.resize(25);
for(int i = 20; i < 25; i++) {
v[i] = i*2;
}

The resize() function makes vector contain the required number of elements. If you require less
elements than vector already contain, the last ones will be deleted. If you ask vector to grow, it
will enlarge its size and fill the newly created elements with zeroes.

Note that if you use push_back() after resize(), it will add elements AFTER the newly allocated
size, but not INTO it. In the example above the size of the resulting vector is 25, while if we use
push_back() in a second loop, it would be 30.

vector<int> v(20);
for(int i = 0; i < 20; i++) {
v[i] = i+1;
}
v.resize(25);
for(int i = 20; i < 25; i++) {
v.push_back(i*2); // Writes to elements with indices [25..30), not
[20..25) ! <
}
358

To clear a vector use clear() member function. This function makes vector to contain 0 elements.
It does not make elements zeroes -- watch out -- it completely erases the container.

There are many ways to initialize vector. You may create vector from another vector:

vector<int> v1; // ...


vector<int> v2 = v1;
vector<int> v3(v1);

The initialization of v2 and v3 in the example above are exactly the same.

If you want to create a vector of specific size, use the following constructor:

vector<int> Data(1000);

In the example above, the data will contain 1,000 zeroes after creation. Remember to use
parentheses, not brackets. If you want vector to be initialized with something else, write it in
such manner:

vector<string> names(20, Unknown);

Remember that you can create vectors of any type.

Multidimensional arrays are very important. The simplest way to create the two-dimensional
array via vector is to create a vector of vectors.

vector< vector<int> > Matrix;

It should be clear to you now how to create the two-dimensional vector of given size:

int N, N;
// ...
vector< vector<int> > Matrix(N, vector<int>(M, -1));

Here we create a matrix of size N*M and fill it with -1.

The simplest way to add data to vector is to use push_back(). But what if we want to add data
somewhere other than the end? There is the insert() member function for this purpose. And there
is also the erase() member function to erase elements, as well. But first we need to say a few
words about iterators.

You should remember one more very important thing: When vector is passed as a parameter to
some function, a copy of vector is actually created. It may take a lot of time and memory to
create new vectors when they are not really needed. Actually, its hard to find a task where the
copying of vector is REALLY needed when passing it as a parameter. So, you should never
write:
359

void some_function(vector<int> v) { // Never do it unless youre sure what


you do!
// ...
}

Instead, use the following construction:

void some_function(const vector<int>& v) { // OK


// ...
}

If you are going to change the contents of vector in the function, just omit the const modifier.

int modify_vector(vector<int>& v) { // Correct


V[0]++;
}

Pairs

Before we come to iterators, let me say a few words about pairs. Pairs are widely used in STL.
Simple problems, like topcoder SRM 250 and easy 500-point problems, usually require some
simple data structure that fits well with pair. STL std::pair is just a pair of elements. The simplest
form would be the following:

template<typename T1, typename T2> struct pair {


T1 first;
T2 second;
};

In general pair<int,int> is a pair of integer values. At a more complex level, pair<string, pair<int,
int> > is a pair of string and two integers. In the second case, the usage may be like this:

pair<string, pair<int,int> > P;


string s = P.first; // extract string
int x = P.second.first; // extract first int
int y = P.second.second; // extract second int

The great advantage of pairs is that they have built-in operations to compare themselves. Pairs
are compared first-to-second element. If the first elements are not equal, the result will be based
on the comparison of the first elements only; the second elements will be compared only if the
first ones are equal. The array (or vector) of pairs can easily be sorted by STL internal functions.
360

For example, if you want to sort the array of integer points so that they form a polygon, its a
good idea to put them to the vector< pair<double, pair<int,int> >, where each element of vector
is { polar angle, { x, y } }. One call to the STL sorting function will give you the desired order of
points.

Pairs are also widely used in associative containers, which we will speak about later in this
article.

Iterators

What are iterators? In STL iterators are the most general way to access data in containers.
Consider the simple problem: Reverse the array A of N ints. Lets begin from a C-like solution:

void reverse_array_simple(int *A, int N) {


int first = 0, last = N-1; // First and last indices of elements to be
swapped
While(first < last) { // Loop while there is something to swap
swap(A[first], A[last]); // swap(a,b) is the standard STL
function
first++; // Move first index forward
last--; // Move last index back
}
}

This code should be clear to you. Its pretty easy to rewrite it in terms of pointers:

void reverse_array(int *A, int N) {


int *first = A, *last = A+N-1;
while(first < last) {
Swap(*first, *last);
first++;
last--;
}
}

Look at this code, at its main loop. It uses only four distinct operations on pointers 'first' and
'last':

compare pointers (first < last),


get value by pointer (*first, *last),
increment pointer, and
decrement pointer

Now imagine that you are facing the second problem: Reverse the contents of a double-linked
list, or a part of it. The first code, which uses indexing, will definitely not work. At least, it will
361

not work in time, because its impossible to get element by index in a double-linked list in O(1),
only in O(N), so the whole algorithm will work in O(N^2). Errr...

But look: the second code can work for ANY pointer-like object. The only restriction is that that
object can perform the operations described above: take value (unary *), comparison (<), and
increment/decrement (++/--). Objects with these properties that are associated with containers
are called iterators. Any STL container may be traversed by means of an iterator. Although not
often needed for vector, its very important for other container types.

So, what do we have? An object with syntax very much like a pointer. The following operations
are defined for iterators:

get value of an iterator, int x = *it;


increment and decrement iterators it1++, it2--;
compare iterators by '!=' and by '<'
add an immediate to iterator it += 20; <=> shift 20 elements forward
get the distance between iterators, int n = it2-it1;

But instead of pointers, iterators provide much greater functionality. Not only can they operate
on any container, they may also perform, for example, range checking and profiling of container
usage.

And the main advantage of iterators, of course, is that they greatly increase the reuse of code:
your own algorithms, based on iterators, will work on a wide range of containers, and your own
containers, which provide iterators, may be passed to a wide range of standard functions.

Not all types of iterators provide all the potential functionality. In fact, there are so-called
"normal iterators" and "random access iterators". Simply put, normal iterators may be compared
with == and !=, and they may also be incremented and decremented. They may not be
subtracted and we can not add a value to the normal iterator. Basically, its impossible to
implement the described operations in O(1) for all container types. In spite of this, the function
that reverses array should look like this:

template<typename T> void reverse_array(T *first, T *last) {


if(first != last) {
while(true) {
swap(*first, *last);
first++;
if(first == last) {
break;
}
last--;
if(first == last) {
break;
}
}
}
}
362

The main difference between this code and the previous one is that we dont use the "<"
comparison on iterators, just the "==" one. Again, dont panic if you are surprised by the
function prototype: template is just a way to declare a function, which works on any appropriate
parameter types. This function should work perfectly on pointers to any object types and with all
normal iterators.

Let's return to the STL. STL algorithms always use two iterators, called "begin" and "end." The
end iterator is pointing not to the last object, however, but to the first invalid object, or the object
directly following the last one. Its often very convenient.

Each STL container has member functions begin() and end() that return the begin and end
iterators for that container.

Based on these principles, c.begin() == c.end() if and only if c is empty, and c.end() c.begin()
will always be equal to c.size(). (The last sentence is valid in cases when iterators can be
subtracted, i.e. begin() and end() return random access iterators, which is not true for all kinds of
containers. See the prior example of the double-linked list.)

The STL-compliant reverse function should be written as follows:

template<typename T> void reverse_array_stl_compliant(T *begin, T *end) {


// We should at first decrement 'end'
// But only for non-empty range
if(begin != end)
{
end--;
if(begin != end) {
while(true) {
swap(*begin, *end);
begin++;
If(begin == end) {
break;
}
end--;
if(begin == end) {
break;
}
}
}
}
}

Note that this function does the same thing as the standard function std::reverse(T begin, T end)
that can be found in algorithms module (#include <algorithm>).

In addition, any object with enough functionality can be passed as an iterator to STL algorithms
and functions. That is where the power of templates comes in! See the following examples:

vector<int> v;
// ...
vector<int> v2(v);
363

vector<int> v3(v.begin(), v.end()); // v3 equals to v2

int data[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31 };


vector<int> primes(data, data+(sizeof(data) / sizeof(data[0])));

The last line performs a construction of vector from an ordinal C array. The term 'data' without
index is treated as a pointer to the beginning of the array. The term 'data + N' points to N-th
element, so, when N if the size of array, 'data + N' points to first element not in array, so 'data +
length of data' can be treated as end iterator for array 'data'. The expression
'sizeof(data)/sizeof(data[0])' returns the size of the array data, but only in a few cases, so dont
use it anywhere except in such constructions. (C programmers will agree with me!)

Furthermore, we can even use the following constructions:

vector<int> v;
// ...
vector<int> v2(v.begin(), v.begin() + (v.size()/2));

It creates the vector v2 that is equal to the first half of vector v.

Here is an example of reverse() function:

int data[10] = { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 };


reverse(data+2, data+6); // the range { 5, 7, 9, 11 } is now { 11, 9, 7,
5 };

Each container also has the rbegin()/rend() functions, which return reverse iterators. Reverse
iterators are used to traverse the container in backward order. Thus:

vector<int> v;
vector<int> v2(v.rbegin()+(v.size()/2), v.rend());

will create v2 with first half of v, ordered back-to-front.

To create an iterator object, we must specify its type. The type of iterator can be constructed by a
type of container by appending ::iterator, ::const_iterator, ::reverse_iterator or
::const_reverse_iterator to it. Thus, vector can be traversed in the following way:

vector<int> v;
// ...
// Traverse all container, from begin() to end()
for(vector<int>::iterator it = v.begin(); it != v.end(); it++) {
*it++; // Increment the value iterator is pointing to
}

I recommend you use '!=' instead of '<', and 'empty()' instead of 'size() != 0' -- for some container
types, its just very inefficient to determine which of the iterators precedes another.
364

Now you know of STL algorithm reverse(). Many STL algorithms are declared in the same way:
they get a pair of iterators the beginning and end of a range and return an iterator.

The find() algorithm looks for appropriate elements in an interval. If the element is found, the
iterator pointing to the first occurrence of the element is returned. Otherwise, the return value
equals the end of interval. See the code:

vector<int> v;
for(int i = 1; i < 100; i++) {
v.push_back(i*i);
}

if(find(v.begin(), v.end(), 49) != v.end()) {


// ...
}

To get the index of element found, one should subtract the beginning iterator from the result of
find():

int i = (find(v.begin(), v.end(), 49) - v.begin();


if(i < v.size()) {
// ...
}

Remember to #include <algorithm> in your source when using STL algorithms.

The min_element and max_element algorithms return an iterator to the respective element. To
get the value of min/max element, like in find(), use *min_element(...) or *max_element(...), to
get index in array subtract the begin iterator of a container or range:

int data[5] = { 1, 5, 2, 4, 3 };
vector<int> X(data, data+5);
int v1 = *max_element(X.begin(), X.end()); // Returns value of max
element in vector
int i1 = min_element(X.begin(), X.end()) X.begin; // Returns index of
min element in vector

int v2 = *max_element(data, data+5); // Returns value of max element in


array
int i3 = min_element(data, data+5) data; // Returns index of min
element in array

Now you may see that the useful macros would be:

#define all(c) c.begin(), c.end()

Dont put the whole right-hand side of these macros into parentheses -- that would be wrong!

Another good algorithm is sort(). It's very easy to use. Consider the following examples:

vector<int> X;
365

// ...

sort(X.begin(), X.end()); // Sort array in ascending order


sort(all(X)); // Sort array in ascending order, use our #define
sort(X.rbegin(), X.rend()); // Sort array in descending order using with
reverse iterators

Compiling STL Programs

One thing worth pointing out here is STL error messages. As the STL is distributed in sources,
and it becomes necessary for compilers to build efficient executables, one of STL's habits is
unreadable error messages.

For example, if you pass a vector<int> as a const reference parameter (as you should do) to some
function:

void f(const vector<int>& v) {


for(
vector<int>::iterator it = v.begin(); // hm... wheres the
error?..
// ...
// ...
}

The error here is that you are trying to create the non-const iterator from a const object with the
begin() member function (though identifying that error can be harder than actually correcting it).
The right code looks like this:

void f(const vector<int>& v) {


int r = 0;
// Traverse the vector using const_iterator
for(vector<int>::const_iterator it = v.begin(); it != v.end(); it++)
{
r += (*it)*(*it);
}
return r;
}

In spite of this, let me tell about very important feature of GNU C++ called 'typeof'. This
operator is replaced to the type of an expression during the compilation. Consider the following
example:

typeof(a+b) x = (a+b);

This will create the variable x of type matching the type of (a+b) expression. Beware that
typeof(v.size()) is unsigned for any STL container type. But the most important application of
typeof for topcoder is traversing a container. Consider the following macros:
366

#define tr(container, it) \


for(typeof(container.begin()) it = container.begin(); it !=
container.end(); it++)

By using these macros we can traverse every kind of container, not only vector. This will
produce const_iterator for const object and normal iterator for non-const object, and you will
never get an error here.

void f(const vector<int>& v) {


int r = 0;
tr(v, it) {
r += (*it)*(*it);
}
return r;
}

Note: I did not put additional parentheses on the #define line in order to improve its readability.
See this article below for more correct #define statements that you can experiment with in
practice rooms.

Traversing macros is not really necessary for vectors, but its very convenient for more complex
data types, where indexing is not supported and iterators are the only way to access data. We will
speak about this later in this article.

Data manipulation in vector


One can insert an element to vector by using the insert() function:

vector<int> v;
// ...
v.insert(1, 42); // Insert value 42 after the first

All elements from second (index 1) to the last will be shifted right one element to leave a place
for a new element. If you are planning to add many elements, it's not good to do many shifts
you're better off calling insert() one time. So, insert() has an interval form:

vector<int> v;
vector<int> v2;

// ..

// Shift all elements from second to last to the appropriate number of


elements.
// Then copy the contents of v2 into v.
v.insert(1, all(v2));

Vector also has a member function erase, which has two forms. Guess what they are:

erase(iterator);
erase(begin iterator, end iterator);
367

At first case, single element of vector is deleted. At second case, the interval, specified by two
iterators, is erased from vector.

The insert/erase technique is common, but not identical for all STL containers.

String

There is a special container to manipulate with strings. The string container has a few differences
from vector<char>. Most of the differences come down to string manipulation functions and
memory management policy.

String has a substring function without iterators, just indices:

string s = "hello";
string
s1 = s.substr(0, 3), // "hel"
s2 = s.substr(1, 3), // "ell"
s3 = s.substr(0, s.length()-1), "hell"
s4 = s.substr(1); // "ello"

Beware of (s.length()-1) on empty string because s.length() is unsigned and unsigned(0) 1 is


definitely not what you are expecting!

Set

Its always hard to decide which kind of container to describe first set or map. My opinion is
that, if the reader has a basic knowledge of algorithms, beginning from 'set' should be easier to
understand.

Consider we need a container with the following features:

add an element, but do not allow duples [duplicates?]


remove elements
get count of elements (distinct elements)
check whether elements are present in set

This is quite a frequently used task. STL provides the special container for it set. Set can add,
remove and check the presence of particular element in O(log N), where N is the count of objects
in the set. While adding elements to set, the duples [duplicates?] are discarded. A count of the
elements in the set, N, is returned in O(1). We will speak of the algorithmic implementation of
set and map later -- for now, lets investigate its interface:

set<int> s;
368

for(int i = 1; i <= 100; i++) {


s.insert(i); // Insert 100 elements, [1..100]
}

s.insert(42); // does nothing, 42 already exists in set

for(int i = 2; i <= 100; i += 2) {


s.erase(i); // Erase even values
}

int n = int(s.size()); // n will be 50

The push_back() member may not be used with set. It make sense: since the order of elements in
set does not matter, push_back() is not applicable here.

Since set is not a linear container, its impossible to take the element in set by index. Therefore,
the only way to traverse the elements of set is to use iterators.

// Calculate the sum of elements in set


set<int> S;
// ...
int r = 0;
for(set<int>::const_iterator it = S.begin(); it != S.end(); it++) {
r += *it;
}

It's more elegant to use traversing macros here. Why? Imagine you have a set< pair<string, pair<
int, vector<int> > >. How to traverse it? Write down the iterator type name? Oh, no. Use our
traverse macros instead.

set< pair<string, pair< int, vector<int> > > SS;


int total = 0;
tr(SS, it) {
total += it->second.first;
}

Notice the 'it->second.first' syntax. Since 'it' is an iterator, we need to take an object from 'it'
before operating. So, the correct syntax would be '(*it).second.first'. However, its easier to write
'something->' than '(*something)'. The full explanation will be quite long just remember that,
for iterators, both syntaxes are allowed.

To determine whether some element is present in set use 'find()' member function. Dont be
confused, though: there are several 'find()' s in STL. There is a global algorithm 'find()', which
takes two iterators, element, and works for O(N). It is possible to use it for searching for element
in set, but why use an O(N) algorithm while there exists an O(log N) one? While searching in set
and map (and also in multiset/multimap, hash_map/hash_set, etc.) do not use global find
instead, use member function 'set::find()'. As 'ordinal' find, set::find will return an iterator, either
to the element found, or to 'end()'. So, the element presence check looks like this:

set<int> s;
// ...
369

if(s.find(42) != s.end()) {
// 42 presents in set
}
else {
// 42 not presents in set
}

Another algorithm that works for O(log N) while called as member function is count. Some
people think that

if(s.count(42) != 0) {
// ...
}

or even

if(s.count(42)) {
// ...
}

is easier to write. Personally, I dont think so. Using count() in set/map is nonsense: the element
either presents or not. As for me, I prefer to use the following two macros:

#define present(container, element) (container.find(element) !=


container.end())
#define cpresent(container, element) (find(all(container),element) !=
container.end())

(Remember that all(c) stands for c.begin(), c.end())

Here, 'present()' returns whether the element presents in the container with member function
'find()' (i.e. set/map, etc.) while 'cpresent' is for vector.

To erase an element from set use the erase() function.

set<int> s;
// ...
s.insert(54);
s.erase(29);

The erase() function also has the interval form:

set<int> s;
// ..

set<int>::iterator it1, it2;


it1 = s.find(10);
it2 = s.find(100);
370

// Will work if it1 and it2 are valid iterators, i.e. values 10 and 100
present in set.
s.erase(it1, it2); // Note that 10 will be deleted, but 100 will remain
in the container

Set has an interval constructor:

int data[5] = { 5, 1, 4, 2, 3 };
set<int> S(data, data+5);

It gives us a simple way to get rid of duplicates in vector, and sort it:

vector<int> v;
// ...
set<int> s(all(v));
vector<int> v2(all(s));

Here 'v2' will contain the same elements as 'v' but sorted in ascending order and with duplicates
removed.

Any comparable elements can be stored in set. This will be described later.

Map

There are two explanation of map. The simple explanation is the following:

map<string, int> M;
M["Top"] = 1;
M["Coder"] = 2;
M["SRM"] = 10;

int x = M["Top"] + M["Coder"];

if(M.find("SRM") != M.end()) {
M.erase(M.find("SRM")); // or even M.erase("SRM")
}

Very simple, isnt it?

Actually map is very much like set, except it contains not just values but pairs <key, value>.
Map ensures that at most one pair with specific key exists. Another quite pleasant thing is that
map has operator [] defined.

Traversing map is easy with our 'tr()' macros. Notice that iterator will be an std::pair of key and
value. So, to get the value use it->second. The example follows:
371

map<string, int> M;
// ...
int r = 0;
tr(M, it) {
r += it->second;
}

Dont change the key of map element by iterator, because it may break the integrity of map
internal data structure (see below).

There is one important difference between map::find() and map::operator []. While map::find()
will never change the contents of map, operator [] will create an element if it does not exist. In
some cases this could be very convenient, but it's definitly a bad idea to use operator [] many
times in a loop, when you do not want to add new elements. Thats why operator [] may not be
used if map is passed as a const reference parameter to some function:

void f(const map<string, int>& M) {


if(M["the meaning"] == 42) { // Error! Cannot use [] on const map
objects!
}
if(M.find("the meaning") != M.end() && M.find("the meaning")->second
== 42) { // Correct
cout << "Don't Panic!" << endl;
}
}

Notice on Map and Set

Internally map and set are almost always stored as red-black trees. We do not need to worry
about the internal structure, the thing to remember is that the elements of map and set are always
sorted in ascending order while traversing these containers. And thats why its strongly not
recommended to change the key value while traversing map or set: If you make the modification
that breaks the order, it will lead to improper functionality of container's algorithms, at least.

But the fact that the elements of map and set are always ordered can be practically used while
solving topcoder problems.

Another important thing is that operators ++ and -- are defined on iterators in map and set. Thus,
if the value 42 presents in set, and it's not the first and the last one, than the following code will
work:

set<int> S;
// ...
set<int>::iterator it = S.find(42);
set<int>::iterator it1 = it, it2 = it;
it1--;
372

it2++;
int a = *it1, b = *it2;

Here 'a' will contain the first neighbor of 42 to the left and 'b' the first one to the right.

More on algorithms

Its time to speak about algorithms a bit more deeply. Most algorithms are declared in the
#include <algorithm> standard header. At first, STL provides three very simple algorithms:
min(a,b), max(a,b), swap(a,b). Here min(a,b) and max(a,b) returns the minimum and maximum
of two elements, while swap(a,b) swaps two elements.

Algorithm sort() is also widely used. The call to sort(begin, end) sorts an interval in ascending
order. Notice that sort() requires random access iterators, so it will not work on all containers.
However, you probably won't ever call sort() on set, which is already ordered.

Youve already heard of algorithm find(). The call to find(begin, end, element) returns the
iterator where element first occurs, or end if the element is not found. Instead of find(...),
count(begin, end, element) returns the number of occurrences of an element in a container or a
part of a container. Remember that set and map have the member functions find() and count(),
which works in O(log N), while std::find() and std::count() take O(N).

Other useful algorithms are next_permutation() and prev_permutation(). Lets speak about
next_permutation. The call to next_permutation(begin, end) makes the interval [begin, end) hold
the next permutation of the same elements, or returns false if the current permutation is the last
one. Accordingly, next_permutation makes many tasks quite easy. If you want to check all
permutations, just write:

vector<int> v;

for(int i = 0; i < 10; i++) {


v.push_back(i);
}

do {
Solve(..., v);
} while(next_permutation(all(v));

Dont forget to ensure that the elements in a container are sorted before your first call to
next_permutation(...). Their initial state should form the very first permutation; otherwise, some
permutations will not be checked.
373

String Streams

You often need to do some string processing/input/output. C++ provides two interesting objects
for it: 'istringstream' and 'ostringstream'. They are both declared in #include <sstream>.

Object istringstream allows you to read from a string like you do from a standard input. It's better
to view source:

void f(const string& s) {

// Construct an object to parse strings


istringstream is(s);

// Vector to store data


vector<int> v;

// Read integer while possible and add it to the vector


int tmp;
while(is >> tmp) {
v.push_back(tmp);
}
}

The ostringstream object is used to do formatting output. Here is the code:

string f(const vector<int>& v) {

// Constucvt an object to do formatted output


ostringstream os;

// Copy all elements from vector<int> to string stream as text


tr(v, it) {
os << ' ' << *it;
}

// Get string from string stream


string s = os.str();

// Remove first space character


if(!s.empty()) { // Beware of empty string here
s = s.substr(1);
}

return s;
}
374

Summary

To go on with STL, I would like to summarize the list of templates to be used. This will simplify
the reading of code samples and, I hope, improve your topcoder skills. The short list of templates
and macros follows:

typedef vector<int> vi;


typedef vector<vi> vvi;
typedef pair<int,int> ii;
#define sz(a) int((a).size())
#define pb push_back
#defile all(c) (c).begin(),(c).end()
#define tr(c,i) for(typeof((c).begin() i = (c).begin(); i != (c).end();
i++)
#define present(c,x) ((c).find(x) != (c).end())
#define cpresent(c,x) (find(all(c),x) != (c).end())

The container vector<int> is here because it's really very popular. Actually, I found it convenient
to have short aliases to many containers (especially for vector<string>, vector<ii>, vector<
pair<double, ii> >). But this list only includes the macros that are required to understand the
following text.

Another note to keep in mind: When a token from the left-hand side of #define appears in the
right-hand side, it should be placed in braces to avoid many nontrivial problems.

Power up C++ with the Standard Template


Library: Part II: Advanced Uses
By DmitryKorolev

topcoder member

In this tutorial we will use some macros and typedefs from Part I of the tutorial.

Creating Vector from Map

As you already know, map actually contains pairs of element. So you can write it in like this:

map<string, int> M;
// ...
vector< pair<string, int> > V(all(M)); // remember all(c) stands for
(c).begin(),(c).end()
375

Now vector will contain the same elements as map. Of course, vector will be sorted, as is map.
This feature may be useful if you are not planning to change elements in map any more but want
to use indices of elements in a way that is impossible in map.

Copying data between containers

Let's take a look at the copy(...) algorithm. The prototype is the following:

copy(from_begin, from_end, to_begin);

This algorithm copies elements from the first interval to the second one. The second interval
should have enough space available. See the following code:

vector<int> v1;
vector<int> v2;

// ...

// Now copy v2 to the end of v1 v1.resize(v1.size() + v2.size());


// Ensure v1 have enough space
copy(all(v2), v1.end() - v2.size());
// Copy v2 elements right after v1 ones

Another good feature to use in conjunction with copy is inserters. I will not describe it here due
to limited space but look at the code:

vector<int> v;
// ...
set<int> s;
// add some elements to set
copy(all(v), inserter(s));

The last line means:

tr(v, it) {
// remember traversing macros from Part I
s.insert(*it);
}

But why use our own macros (which work only in gcc) when there is a standard function? Its a
good STL practice to use standard algorithms like copy, because it will be easy to others to
understand your code.

To insert elemements to vector with push_back use back_inserter, or front_inserter is available


for deque container. And in some cases it is useful to remember that the first two arguments for
copy may be not only begin/end, but also rbegin/rend, which copy data in reverse order.
376

Merging lists
Another common task is to operate with sorted lists of elements. Imagine you have two lists of
elements -- A and B, both ordered. You want to get a new list from these two. There are four
common operations here:

'union' the lists, R = A+B


intersect the lists, R = A*B
set difference, R = A*(~B) or R = A-B
set symmetric difference, R = A XOR B

STL provides four algorithms for these tasks: set_union(...), set_intersection(...),


set_difference(...) and set_symmetric_difference(...). They all have the same calling conventions,
so let's look at set_intersection. A free-styled prototype would look like this:

end_result = set_intersection(begin1, end1, begin2, end2, begin_result);

Here [begin1,end1) and [begin2,end2) are the input lists. The 'begin_result' is the iterator from
where the result will be written. But the size of the result is unknown, so this function returns the
end iterator of output (which determines how many elements are in the result). See the example
for usage details:

int data1[] = { 1, 2, 5, 6, 8, 9, 10 };
int data2[] = { 0, 2, 3, 4, 7, 8, 10 };

vector<int> v1(data1, data1+sizeof(data1)/sizeof(data1[0])); vector<int>


v2(data2, data2+sizeof(data2)/sizeof(data2[0]));

vector<int> tmp(max(v1.size(), v2.size());

vector<int> res = vector<int> (tmp.begin(), set_intersection(all(v1),


all(v2), tmp.begin());

Look at the last line. We construct a new vector named 'res'. It is constructed via interval
constructor, and the beginning of the interval will be the beginning of tmp. The end of the
interval is the result of the set_intersection algorithm. This algorithm will intersect v1 and v2 and
write the result to the output iterator, starting from 'tmp.begin()'. Its return value will actually be
the end of the interval that forms the resulting dataset.

One comment that might help you understand it better: If you would like to just get the number
of elements in set intersection, use int cnt = set_intersection(all(v1), all(v2), tmp.begin())
tmp.begin();

Actually, I would never use a construction like ' vector<int> tmp'. I don't think it's a good idea to
allocate memory for each set_*** algorithm invoking. Instead, I define the global or static
variable of appropriate type and enough size. See below:

set<int> s1, s2;


for(int i = 0; i < 500; i++) {
s1.insert(i*(i+1) % 1000);
377

s2.insert(i*i*i % 1000);
}

static int temp[5000]; // greater than we need

vector<int> res = vi(temp, set_symmetric_difference(all(s1), all(s2),


temp));
int cnt = set_symmetric_difference(all(s1), all(s2), temp) temp;

Here 'res' will contain the symmetric difference of the input datasets.

Remember, input datasets need to be sorted to use these algorithms. So, another important thing
to remember is that, because sets are always ordered, we can use set-s (and even map-s, if you
are not scared by pairs) as parameters for these algorithms.

These algorithms work in single pass, in O(N1+N2), when N1 and N2 are sizes of input
datasets.

Calculating Algorithms

Yet another interesting algorithm is accumulate(...). If called for a vector of int-s and third
parameter zero, accumulate(...) will return the sum of elements in vector:

vector<int> v;
// ...
int sum = accumulate(all(v), 0);

The result of accumulate() call always has the type of its third argument. So, if you are not sure
that the sum fits in integer, specify the third parameter's type directly:

vector<int> v;
// ...
long long sum = accumulate(all(v), (long long)0);

Accumulate can even calculate the product of values. The fourth parameter holds the predicate to
use in calculations. So, if you want the product:

vector<int> v;
// ...
double product = accumulate(all(v), double(1), multiplies<double>());
// dont forget to start with 1 !

Another interesting algorithm is inner_product(...). It calculates the scalar product of two


intervals. For example:

vector<int> v1;
vector<int> v2;
for(int i = 0; i < 3; i++) {
378

v1.push_back(10-i);
v2.push_back(i+1);
}
int r = inner_product(all(v1), v2.begin(), 0);

'r' will hold (v1[0]*v2[0] + v1[1]*v2[1] + v1[2]*v2[2]), or (10*1+9*2+8*3), which is 52.

As for accumulate the type of return value for inner_product is defined by the last parameter.
The last parameter is the initial value for the result. So, you may use inner_product for the
hyperplane object in multidimensional space: just write inner_product(all(normal), point.begin(),
-shift).

It should be clear to you now that inner_product requires only increment operation from
iterators, so queues and sets can also be used as parameters. Convolution filter, for calculating
the nontrivial median value, could look like this:

set<int> values_ordered_data(all(data));
int n = sz(data); // int n = int(data.size());
vector<int> convolution_kernel(n);
for(int i = 0; i < n; i++) {
convolution_kernel[i] = (i+1)*(n-i);
}
double result = double(inner_product(all(ordered_data),
convolution_kernel.begin(), 0)) / accumulate(all(convolution_kernel), 0);

Of course, this code is just an example -- practically speaking, it would be faster to copy values
to another vector and sort it.

It's also possible to write a construction like this:

vector<int> v;
// ...
int r = inner_product(all(v), v.rbegin(), 0);

This will evaluate V[0]*V[N-1] + V[1]+V[N-2] + ... + V[N-1]*V[0] where N is the number of
elements in 'v'.

Nontrivial Sorting

Actually, sort(...) uses the same technique as all STL:

all comparison is based on 'operator <'

This means that you only need to override 'operator <'. Sample code follows:

struct fraction {
int n, d; // (n/d)
379

// ...
bool operator < (const fraction& f) const {
if(false) {
return (double(n)/d) < (double(f.n)/f.d);
// Try to avoid this, you're the topcoder!
}
else {
return n*f.d < f.n*d;
}
}
};

// ...

vector<fraction> v;

// ...

sort(all(v));

In cases of nontrivial fields, your object should have default and copy constructor (and, maybe,
assignment operator -- but this comment is not for topcoders).

Remember the prototype of 'operator <' : return type bool, const modifier, parameter const
reference.

Another possibility is to create the comparison functor. Special comparison predicate may be
passed to the sort(...) algorithm as a third parameter. Example: sort points (that are
pair<double,double>) by polar angle.

typedef pair<double, double> dd;

const double epsilon = 1e-6;

struct sort_by_polar_angle {
dd center;
// Constuctor of any type
// Just find and store the center
template<typename T> sort_by_polar_angle(T b, T e) {
int count = 0;
center = dd(0,0);
while(b != e) {
center.first += b->first;
center.second += b->second;
b++;
count++;
}
double k = count ? (1.0/count) : 0;
center.first *= k;
center.second *= k;
}
// Compare two points, return true if the first one is earlier
// than the second one looking by polar angle
// Remember, that when writing comparator, you should
380

// override not operator < but operator ()


bool operator () (const dd& a, const dd& b) const {
double p1 = atan2(a.second-center.second, a.first-center.first);
double p2 = atan2(b.second-center.second, b.first-center.first);
return p1 + epsilon < p2;
}
};

// ...

vector<dd> points;

// ...

sort(all(points), sort_by_polar_angle(all(points)));

This code example is complex enough, but it does demonstrate the abilities of STL. I should
point out that, in this sample, all code will be inlined during compilation, so it's actually really
fast.

Also remember that 'operator <' should always return false for equal objects. It's very important
for the reason why, see the next section.

Using your own objects in Maps and Sets

Elements in set and map are ordered. It's the general rule. So, if you want to enable using of your
objects in set or map you should make them comparable. You already know the rule of
comparisons in STL:

| * all comparison is based on 'operator <'

Again, you should understand it in this way: "I only need to implement operator < for objects to
be stored in set/map."

Imagine you are going to make the 'struct point' (or 'class point'). We want to intersect some line
segments and make a set of intersection points (sound familiar?). Due to finite computer
precision, some points will be the same while their coordinates differ a bit. That's what you
should write:

const double epsilon = 1e-7;

struct point {
double x, y;

// ...

// Declare operator < taking precision into account


381

bool operator < (const point& p) const {


if(x < p.x - epsilon) return true;
if(x > p.x + epsilon) return false;
if(y < p.y - epsilon) return true;
if(y > p.y + epsilon) return false;
return false;
}
};

Now you can use set<point> or map<point, string>, for example, to look up whether some point
is already present in the list of intersections. An even more advanced approach: use map<point,
vector<int> > and list the list of indices of segments that intersect at this point.

It's an interesting concept that for STL 'equal' does not mean 'the same', but we will not delve
into it here.

Memory management in Vectors

As has been said, vector does not reallocate memory on each push_back(). Indeed, when
push_back() is invoked, vector really allocates more memory than is needed for one additional
element. Most STL implementations of vector double in size when push_back() is invoked and
memory is not allocated. This may not be good in practical purposes, because your program may
eat up twice as much memory as you need. There are two easy ways to deal with it, and one
complex way to solve it.

The first approach is to use the reserve() member function of vector. This function orders vector
to allocate additional memory. Vector will not enlarge on push_back() operations until the size
specified by reserve() will be reached.

Consider the following example. You have a vector of 1,000 elements and its allocated size is
1024. You are going to add 50 elements to it. If you call push_back() 50 times, the allocated size
of vector will be 2048 after this operation. But if you write

v.reserve(1050);

before the series of push_back(), vector will have an allocated size of exactly 1050 elements.

If you are a rapid user of push_back(), then reserve() is your friend.

By the way, its a good pattern to use v.reserve() followed by copy(, back_inserter(v)) for
vectors.

Another situation: after some manipulations with vector you have decided that no more adding
will occur to it. How do you get rid of the potential allocation of additional memory? The
solution follows:
382

vector<int> v;
// ...
vector<int>(all(v)).swap(v);

This construction means the following: create a temporary vector with the same content as v, and
then swap this temporary vector with 'v'. After the swap the original oversized v will be
disposed. But, most likely, you wont need this during SRMs.

The proper and complex solution is to develop your own allocator for the vector, but that's
definitely not a topic for a topcoder STL tutorial.

Implementing real algorithms with STL

Armed with STL, let's go on to the most interesting part of this tutorial: how to implement real
algorithms efficiently.

Depth-first search (DFS)

I will not explain the theory of DFS here instead, read this section of gladius's Introduction to
Graphs and Data Structures tutorial but I will show you how STL can help.

At first, imagine we have an undirected graph. The simplest way to store a graph in STL is to use
the lists of vertices adjacent to each vertex. This leads to the vector< vector<int> > W structure,
where W[i] is a list of vertices adjacent to i. Lets verify our graph is connected via DFS:

/*
Reminder from Part 1:
typedef vector<int> vi;
typedef vector<vi> vvi;
*/

int N; // number of vertices


vvi W; // graph
vi V; // V is a visited flag

void dfs(int i) {
if(!V[i]) {
V[i] = true;
for_each(all(W[i]), dfs);
}
}

bool check_graph_connected_dfs() {
int start_vertex = 0;
V = vi(N, false);
dfs(start_vertex);
383

return (find(all(V), 0) == V.end());


}

Thats all. STL algorithm 'for_each' calls the specified function, 'dfs', for each element in range.
In check_graph_connected() function we first make the Visited array (of correct size and filled
with zeroes). After DFS we have either visited all vertices, or not this is easy to determine by
searching for at least one zero in V, by means of a single call to find().

Notice on for_each: the last argument of this algorithm can be almost anything that can be
called like a function. It may be not only global function, but also adapters, standard algorithms,
and even member functions. In the last case, you will need mem_fun or mem_fun_ref adapters,
but we will not touch on those now.

One note on this code: I don't recommend the use of vector<bool>. Although in this particular
case its quite safe, you're better off not to use it. Use the predefined vi (vector<int>). Its quite
OK to assign true and false to ints in vi. Of course, it requires 8*sizeof(int)=8*4=32 times more
memory, but it works well in most cases and is quite fast on topcoder.

A word on other container types and their usage

Vector is so popular because it's the simplest array container. In most cases you only require the
functionality of an array from vector but, sometimes, you may need a more advanced
container.

It is not good practice to begin investigating the full functionality of some STL container during
the heat of a Single Round Match. If you are not familiar with the container you are about to use,
you'd be better off using vector or map/set. For example, stack can always be implemented via
vector, and its much faster to act this way if you dont remember the syntax of stack container.

STL provides the following containers: list, stack, queue, deque, priority_queue. Ive found list
and deque quite useless in SRMs (except, probably, for very special tasks based on these
containers). But queue and priority_queue are worth saying a few words about.

Queue

Queue is a data type that has three operations, all in O(1) amortized: add an element to front (to
head) remove an element from back (from tail) get the first unfetched element (tail) In
other words, queue is the FIFO buffer.
384

Breadth-first search (BFS)

Again, if you are not familiar with the BFS algorithm, please refer back to this topcoder
tutorial first. Queue is very convenient to use in BFS, as shown below:

/*
Graph is considered to be stored as adjacent vertices list.
Also we considered graph undirected.

vvi is vector< vector<int> >


W[v] is the list of vertices adjacent to v
*/

int N; // number of vertices


vvi W; // lists of adjacent vertices

bool check_graph_connected_bfs() {
int start_vertex = 0;
vi V(N, false);
queue<int> Q;
Q.push(start_vertex);
V[start_vertex] = true;
while(!Q.empty()) {
int i = Q.front();
// get the tail element from queue
Q.pop();
tr(W[i], it) {
if(!V[*it]) {
V[*it] = true;
Q.push(*it);
}
}
}
return (find(all(V), 0) == V.end());
}

More precisely, queue supports front(), back(), push() (== push_back()), pop (== pop_front()). If
you also need push_front() and pop_back(), use deque. Deque provides the listed operations in
O(1) amortized.

There is an interesting application of queue and map when implementing a shortest path search
via BFS in a complex graph. Imagine that we have the graph, vertices of which are referenced by
some complex object, like:

pair< pair<int,int>, pair< string, vector< pair<int, int> > > >

(this case is quite usual: complex data structure may define the position
in
some game, Rubiks cube situation, etc)
385

Consider we know that the path we are looking for is quite short, and the total number of
positions is also small. If all edges of this graph have the same length of 1, we could use BFS to
find a way in this graph. A section of pseudo-code follows:

// Some very hard data structure

typedef pair< pair<int,int>, pair< string, vector< pair<int, int> > > > POS;

// ...

int find_shortest_path_length(POS start, POS finish) {

map<POS, int> D;
// shortest path length to this position
queue<POS> Q;

D[start] = 0; // start from here


Q.push(start);

while(!Q.empty()) {
POS current = Q.front();
// Peek the front element
Q.pop(); // remove it from queue

int current_length = D[current];

if(current == finish) {
return D[current];
// shortest path is found, return its length
}

tr(all possible paths from 'current', it) {


if(!D.count(*it)) {
// same as if(D.find(*it) == D.end), see Part I
// This location was not visited yet
D[*it] = current_length + 1;
}
}
}

// Path was not found


return -1;
}

// ...

If the edges have different lengths, however, BFS will not work. We should use Dijkstra instead.
It's possible to implement such a Dijkstra via priority_queue -- see below.

Priority_Queue

Priority queue is the binary heap. It's the data structure, that can perform three operations:
386

push any element (push)


view top element (top)
pop top element (pop)

For the application of STL's priority_queue see the TrainRobber problem from SRM 307.

Dijkstra

In the last part of this tutorial Ill describe how to efficiently implement Dijktras algorithm in
sparse graph using STL containers. Please look through this tutorial for information on Dijkstras
algoritm.

Consider we have a weighted directed graph that is stored as vector< vector< pair<int,int> > > G,
where

G.size() is the number of vertices in our graph


G[i].size() is the number of vertices directly reachable from vertex with index i
G[i][j].first is the index of j-th vertex reachable from vertex i
G[i][j].second is the length of the edge heading from vertex i to vertex G[i][j].first

We assume this, as defined in the following two code snippets:

typedef pair<int,int> ii;


typedef vector<ii> vii;
typedef vector<vii> vvii;

Dijstra via priority_queue

Many thanks to misof for spending the time to explain to me why the complexity of this
algorithm is good despite not removing deprecated entries from the queue.

vi D(N, 987654321);
// distance from start vertex to each vertex

priority_queue<ii,vector<ii>, greater<ii> > Q;


// priority_queue with reverse comparison operator,
// so top() will return the least distance
// initialize the start vertex, suppose its zero
D[0] = 0;
Q.push(ii(0,0));

// iterate while queue is not empty


while(!Q.empty()) {

// fetch the nearest element


387

ii top = Q.top();
Q.pop();

// v is vertex index, d is the distance


int v = top.second, d = top.first;

// this check is very important


// we analyze each vertex only once
// the other occurrences of it on queue (added earlier)
// will have greater distance
if(d <= D[v]) {
// iterate through all outcoming edges from v
tr(G[v], it) {
int v2 = it->first, cost = it->second;
if(D[v2] > D[v] + cost) {
// update distance if possible
D[v2] = D[v] + cost;
// add the vertex to queue
Q.push(ii(D[v2], v2));

}
}
}
}

I will not comment on the algorithm itself in this tutorial, but you should notice the
priority_queue object definition. Normally, priority_queue<ii> will work, but the top() member
function will return the largest element, not the smallest. Yes, one of the easy solutions I often
use is just to store not distance but (-distance) in the first element of a pair. But if you want to
implement it in the proper way, you need to reverse the comparison operation of
priority_queue to reverse one. Comparison function is the third template parameter of
priority_queue while the second paramerer is the storage type for container. So, you should write
priority_queue<ii, vector<ii>, greater<ii> >.

Dijkstra via set

Petr gave me this idea when I asked him about efficient Dijkstra implementation in C#. While
implementing Dijkstra we use the priority_queue to add elements to the vertices being
analyzed queue in O(logN) and fetch in O(log N). But there is a container besides
priority_queue that can provide us with this functionality -- its set! Ive experimented a lot and
found that the performance of Dijkstra based on priority_queue and set is the same.

So, heres the code:

vi D(N, 987654321);

// start vertex
set<ii> Q;
D[0] = 0;
Q.insert(ii(0,0));
388

while(!Q.empty()) {

// again, fetch the closest to start element


// from queue organized via set
ii top = *Q.begin();
Q.erase(Q.begin());
int v = top.second, d = top.first;

// here we do not need to check whether the distance


// is perfect, because new vertices will always
// add up in proper way in this implementation

tr(G[v], it) {
int v2 = it->first, cost = it->second;
if(D[v2] > D[v] + cost) {
// this operation can not be done with priority_queue,
// because it does not support DECREASE_KEY
if(D[v2] != 987654321) {
Q.erase(Q.find(ii(D[v2],v2)));
}
D[v2] = D[v] + cost;
Q.insert(ii(D[v2], v2));
}
}
}

One more important thing: STLs priority_queue does not support the DECREASE_KEY
operation. If you will need this operation, set may be your best bet.

Ive spent a lot of time to understand why the code that removes elements from queue (with set)
works as fast as the first one.

These two implementations have the same complexity and work in the same time. Also, Ive set
up practical experiments and the performance is exactly the same (the difference is about ~%0.1
of time).

As for me, I prefer to implement Dijkstra via set because with set the logic is simpler to
understand, and we dont need to remember about greater<int> predicate overriding.

What is not included in STL

If you have made it this far in the tutorial, I hope you have seen that STL is a very powerful tool,
especially for topcoder SRMs. But before you embrace STL wholeheartedly, keep in mind what
is NOT included in it.

First, STL does not have BigInteger-s. If a task in an SRM calls for huge calculations, especially
multiplication and division, you have three options:
389

use a pre-written template


use Java, if you know it well
say Well, it was definitely not my SRM!

I would recommend option number one.

Nearly the same issue arises with the geometry library. STL does not have geometry support, so
you have those same three options again.

The last thing and sometimes a very annoying thing is that STL does not have a built-in
string splitting function. This is especially annoying, given that this function is included in the
default template for C++ in the ExampleBuilder plugin! But actually Ive found that the use of
istringstream(s) in trivial cases and sscanf(s.c_str(), ) in complex cases is sufficient.

Those caveats aside, though, I hope you have found this tutorial useful, and I hope you find the
STL a useful addition to your use of C++. Best of luck to you in the Arena!

Note from the author: In both parts of this tutorial I recommend the use of some templates to
minimize the time required to implement something. I must say that this suggestion should
always be up to the coder. Aside from whether templates are a good or bad tactic for SRMs, in
everyday life they can become annoying for other people who are trying to understand your
code. While I did rely on them for some time, ultimately I reached the decision to stop. I
encourage you to weigh the pros and cons of templates and to consider this decision for
yourself.

Maximum Bipartite Matching


Geeksforgeeks.org

A matching in a Bipartite Graph is a set of the edges chosen in such a way that no two edges
share an endpoint. A maximum matching is a matching of maximum size (maximum number of
edges). In other words, a matching is maximum if any edge is added to it, it is no longer a
matching. There can be more than one maximum matchings for a given Bipartite Graph.

Why do we care?
There are many real world problems that can be formed as Bipartite Matching. For example,
consider the following problem:
There are M job applicants and N jobs. Each applicant has a subset of jobs that he/she is
interested in. Each job opening can only accept one applicant and a job applicant can be
appointed for only one job. Find an assignment of jobs to applicants in such that as many
applicants as possible get jobs.
390

We strongly recommend to read the following post first.

Ford-Fulkerson Algorithm for Maximum Flow Problem

Maximum Bipartite Matching and Max Flow Problem


Maximum Bipartite Matching (MBP) problem can be solved by converting it into a flow
network (See this video to know how did we arrive this conclusion). Following are the steps.

1) Build a Flow Network


There must be a source and sink in a flow network. So we add a source and add edges from
source to all applicants. Similarly, add edges from all jobs to sink. The capacity of every edge is
marked as 1 unit.
391

2) Find the maximum flow.


We use Ford-Fulkerson algorithm to find the maximum flow in the flow network built in step 1.
The maximum flow is actually the MBP we are looking for.

How to implement the above approach?


Let us first define input and output forms. Input is in the form of Edmonds matrix which is a 2D
array bpGraph[M][N] with M rows (for M job applicants) and N columns (for N jobs). The
value bpGraph[i][j] is 1 if ith applicant is interested in jth job, otherwise 0.
Output is number maximum number of people that can get jobs.

A simple way to implement this is to create a matrix that represents adjacency matrix
representation of a directed graph with M+N+2 vertices. Call the fordFulkerson() for the matrix.
This implementation requires O((M+N)*(M+N)) extra space.

Extra space can be be reduced and code can be simplified using the fact that the graph is bipartite
and capacity of every edge is either 0 or 1. The idea is to use DFS traversal to find a job for an
applicant (similar to augmenting path in Ford-Fulkerson). We call bpm() for every applicant,
bpm() is the DFS based function that tries all possibilities to assign a job to the applicant.

In bpm(), we one by one try all jobs that an applicant u is interested in until we find a job, or all
jobs are tried without luck. For every job we try, we do following.
If a job is not assigned to anybody, we simply assign it to the applicant and return true. If a job is
assigned to somebody else say x, then we recursively check whether x can be assigned some
other job. To make sure that x doesnt get the same job again, we mark the job v as seen before
we make recursive call for x. If x can get other job, we change the applicant for job v and
return true. We use an array maxR[0..N-1] that stores the applicants assigned to different jobs.
392

If bmp() returns true, then it means that there is an augmenting path in flow network and 1 unit
of flow is added to the result in maxBPM().

// A C++ program to find maximal Bipartite matching.

#include <iostream>

#include <string.h>

using namespace std;

// M is number of applicants and N is number of jobs

#define M 6

#define N 6

// A DFS based recursive function that returns true if a

// matching for vertex u is possible

bool bpm(bool bpGraph[M][N], int u, bool seen[], int matchR[])

// Try every job one by one

for (int v = 0; v < N; v++)

// If applicant u is interested in job v and v is

// not visited

if (bpGraph[u][v] && !seen[v])

seen[v] = true; // Mark v as visited

// If job 'v' is not assigned to an applicant OR

// previously assigned applicant for job v (which is matchR[v])

// has an alternate job available.

// Since v is marked as visited in the above line, matchR[v]

// in the following recursive call will not get job 'v' again

if (matchR[v] < 0 || bpm(bpGraph, matchR[v], seen, matchR))

{
393

matchR[v] = u;

return true;

return false;

// Returns maximum number of matching from M to N

int maxBPM(bool bpGraph[M][N])

// An array to keep track of the applicants assigned to

// jobs. The value of matchR[i] is the applicant number

// assigned to job i, the value -1 indicates nobody is

// assigned.

int matchR[N];

// Initially all jobs are available

memset(matchR, -1, sizeof(matchR));

int result = 0; // Count of jobs assigned to applicants

for (int u = 0; u < M; u++)

// Mark all jobs as not seen for next applicant.

bool seen[N];

memset(seen, 0, sizeof(seen));

// Find if the applicant 'u' can get a job

if (bpm(bpGraph, u, seen, matchR))

result++;

}
394

return result;

// Driver program to test above functions

int main()

// Let us create a bpGraph shown in the above example

bool bpGraph[M][N] = { {0, 1, 1, 0, 0, 0},

{1, 0, 0, 1, 0, 0},

{0, 0, 1, 0, 0, 0},

{0, 0, 1, 1, 0, 0},

{0, 0, 0, 0, 0, 0},

{0, 0, 0, 0, 0, 1}

};

cout << "Maximum number of applicants that can get job is "

<< maxBPM(bpGraph);

return 0;

Output:

Maximum number of applicants that can get job is 5

Manacher's Algorithm
The Longest Palindrome Substring (Manacher's algorithm)

Given a string, find a longest palindrome substring.


We can use general suffix tree that stores the original string and its reverse, which is an O(N)
algorithm. However, here we give a better one with less space overhead while still O(N)
complexity. This algorithm is called Manacher's algorithm. If we check a string from left to
right, we can leverage the palindrome check we did previously. This is from the symmetry of
palindrome. The main idea is as follow:
395

Create an array called P[], P[i] stands for the longest palindrome centered at location i.
Here i is not the index in the original string. For the original string, the locations we need
to check for palindromes contains those characters in string along with the spaces
between characters. So if we have a string of length l, we need to have a P[] with length
2*l+1.
Our goal is to fill in P[]. For a particular position, we check its left and right. If equals,
we extend our check further. Otherwise, the longest palindrome centered at location is
found.
However, we need to be smarter. Actually we can leverage previous computed P[i] when
we calculate a P[x] where x>i.
So here we add two pointers, p1 and p2, which point to the left and right of the current
location i such that |i-p1| = |i-p2| and p2>i>p1. We know p1 refers to a palindrome t
and i refers to a palindrome s. If the first character of t is strictly on the right of the first
character of s, we know P[p2] = P[p1].
Otherwise, say if the first character of t is not strictly on the right of the first character
of s, we have P[p2] >= r - p2. where r is the right bound of the palindrome that centered
at i. We then need to check if the palindrome at p2 can be longer than r - p2. The good
thing is that we only need to start the characters beyond the length of r - p2.
When the first character of t is strictly on the right of the first character of s, we don't
need to move the current center (i). Only when the first character of t is not strictly on the
right of the first character of s, we need to move the current center to p2.
The total cost is O(N).

The code is as follow:


void manacher(const string &s)
{
int len = s.size();
if(len == 0) return;

int m[2*len+1];
m[0] = 0;
m[1] = 1;
// "cur" is the current center
// "r" is the right bound of the palindrome
// that centered at current center
int cur, r;
r = 2;
cur = 1;

for(int p2=2; p2<2*len+1; p2++)


{
int p1 = cur- (p2-cur);
//if p1 is negative, we need to
//move "cur" forward
while(p1 < 0)
{
cur++;
r = m[cur] + cur;
p1 = cur- (p2-cur);

}
396

//If the first character of t is


//strictly on the right of the
// first character of s
if(m[p1] < r - p2)
m[p2] = m[p1];
//otherwise
else
{
//reset "cur"
cur = p2;
int k = r-p2;
if(k<0) k = 0;
while(1)
{
if((p2+k+1)&1)
{
if(p2+k+1 < 2*len+1 && p2-k-1 >=0 && s[(p2+k)/2] == s[(p2-k-
2)/2])
k++;
else break;
}
else
{
if(p2+k+1 < 2*len+1 && p2-k-1 >=0)
k++;
else break;
}

r = p2+k;
m[p2] = k;
}

Stable Marriage Problem


geeksforgeeks.org

Given N men and N women, where each person has ranked all members of the opposite sex in
order of preference, marry the men and women together such that there are no two people of
opposite sex who would both rather have each other than their current partners. If there are no
such people, all the marriages are stable (Source Wiki).

Consider the following example.


397

Let there be two men m1 and m2 and two women w1 and w2.
Let m1s list of preferences be {w1, w2}
Let m2s list of preferences be {w1, w2}
Let w1s list of preferences be {m1, m2}
Let w2s list of preferences be {m1, m2}

The matching { {m1, w2}, {w1, m2} } is not stable because m1 and w1 would prefer each other
over their assigned partners. The matching {m1, w1} and {m2, w2} is stable because there are
no two people of opposite sex that would prefer each other over their assigned partners.

It is always possible to form stable marriages from lists of preferences (See references for proof).
Following is GaleShapley algorithm to find a stable matching:
The idea is to iterate through all free men while there is any free man available. Every free man
goes to all women in his preference list according to the order. For every woman he goes to, he
checks if the woman is free, if yes, they both become engaged. If the woman is not free, then the
woman chooses either says no to him or dumps her current engagement according to her
preference list. So an engagement done once can be broken if a woman gets better option.
Following is complete algorithm from Wiki

Initialize all men and women to free


while there exist a free man m who still has a woman w to propose to
{
w = m's highest ranked such woman to whom he has not yet proposed
if w is free
(m, w) become engaged
else some pair (m', w) already exists
if w prefers m to m'
(m, w) become engaged
m' becomes free
else
(m', w) remain engaged
}

Input & Output: Input is a 2D matrix of size (2*N)*N where N is number of women or men.
Rows from 0 to N-1 represent preference lists of men and rows from N to 2*N 1 represent
preference lists of women. So men are numbered from 0 to N-1 and women are numbered from
N to 2*N 1. The output is list of married pairs.

Following is C++ implementation of the above algorithm.

// C++ program for stable marriage problem


#include <iostream>
#include <string.h>
#include <stdio.h>
using namespace std;

// Number of Men or Women


#define N 4

// This function returns true if woman 'w' prefers man 'm1' over man 'm'
bool wPrefersM1OverM(int prefer[2*N][N], int w, int m, int m1)
398

{
// Check if w prefers m over her current engagment m1
for (int i = 0; i < N; i++)
{
// If m1 comes before m in lisr of w, then w prefers her
// cirrent engagement, don't do anything
if (prefer[w][i] == m1)
return true;

// If m cmes before m1 in w's list, then free her current


// engagement and engage her with m
if (prefer[w][i] == m)
return false;
}
}

// Prints stable matching for N boys and N girls. Boys are numbered as 0 to
// N-1. Girls are numbereed as N to 2N-1.
void stableMarriage(int prefer[2*N][N])
{
// Stores partner of women. This is our output array that
// stores paing information. The value of wPartner[i]
// indicates the partner assigned to woman N+i. Note that
// the woman numbers between N and 2*N-1. The value -1
// indicates that (N+i)'th woman is free
int wPartner[N];

// An array to store availability of men. If mFree[i] is


// false, then man 'i' is free, otherwise engaged.
bool mFree[N];

// Initialize all men and women as free


memset(wPartner, -1, sizeof(wPartner));
memset(mFree, false, sizeof(mFree));
int freeCount = N;

// While there are free men


while (freeCount > 0)
{
// Pick the first free man (we could pick any)
int m;
for (m = 0; m < N; m++)
if (mFree[m] == false)
break;

// One by one go to all women according to m's preferences.


// Here m is the picked free man
for (int i = 0; i < N && mFree[m] == false; i++)
{
int w = prefer[m][i];

// The woman of preference is free, w and m become


// partners (Note that the partnership maybe changed
// later). So we can say they are engaged not married
if (wPartner[w-N] == -1)
{
wPartner[w-N] = m;
399

mFree[m] = true;
freeCount--;
}

else // If w is not free


{
// Find current engagement of w
int m1 = wPartner[w-N];

// If w prefers m over her current engagement m1,


// then break the engagement between w and m1 and
// engage m with w.
if (wPrefersM1OverM(prefer, w, m, m1) == false)
{
wPartner[w-N] = m;
mFree[m] = true;
mFree[m1] = false;
}
} // End of Else
} // End of the for loop that goes to all women in m's list
} // End of main while loop

// Print the solution


cout << "Woman Man" << endl;
for (int i = 0; i < N; i++)
cout << " " << i+N << "\t" << wPartner[i] << endl;
}

// Driver program to test above functions


int main()
{
int prefer[2*N][N] = { {7, 5, 6, 4},
{5, 4, 6, 7},
{4, 5, 6, 7},
{4, 5, 6, 7},
{0, 1, 2, 3},
{0, 1, 2, 3},
{0, 1, 2, 3},
{0, 1, 2, 3},
};
stableMarriage(prefer);

return 0;
}

Output:

Girl Boy
4 2
5 1
6 3
7 0
400

Hungarian Algorithm
Topcoder.com

By x-ray
TopCoder Member

Introduction

Are you familiar with the following situation? You open the Div I Medium and don't know how
to approach it, while a lot of people in your room submitted it in less than 10 minutes. Then, after
the contest, you find out in the editorial that this problem can be simply reduced to a classical
one. If yes, then this tutorial will surely be useful for you.

Problem statement

In this article we'll deal with one optimization problem, which can be informally defined as:

Assume that we have N workers and N jobs that should be done. For each pair (worker, job) we
know salary that should be paid to worker for him to perform the job. Our goal is to complete all
jobs minimizing total inputs, while assigning each worker to exactly one job and vice versa.

Converting this problem to a formal mathematical definition we can form the following
equations:

- cost matrix, where cij - cost of worker i to perform job j.


- resulting binary matrix, where xij = 1 if and only if ith worker is assigned to jth job.

- one worker to one job assignment.

- one job to one worker assignment.

- total cost function.

We can also rephrase this problem in terms of graph theory. Let's look at the job and workers as
if they were a bipartite graph, where each edge between the ith worker and jth job has weight of
cij. Then our task is to find minimum-weight matching in the graph (the matching will consists of
N edges, because our bipartite graph is complete).
401

Small example just to make things clearer:

General description of the algorithm

This problem is known as the assignment problem. The assignment problem is a special case of
the transportation problem, which in turn is a special case of the min-cost flow problem, so it can
be solved using algorithms that solve the more general cases. Also, our problem is a special case
of binary integer linear programming problem (which is NP-hard). But, due to the specifics of
the problem, there are more efficient algorithms to solve it. We'll handle the assignment problem
with the Hungarian algorithm (or Kuhn-Munkres algorithm). I'll illustrate two different
implementations of this algorithm, both graph theoretic, one easy and fast to implement with
O(n4) complexity, and the other one with O(n3) complexity, but harder to implement.

There are also implementations of Hungarian algorithm that do not use graph theory. Rather,
they just operate with cost matrix, making different transformation of it (see [1] for clear
explanation). We'll not touch these approaches, because it's less practical for TopCoder needs.

O(n4) algorithm explanation

As mentioned above, we are dealing with a bipartite graph. The main idea of the method is the
following: consider we've found the perfect matching using only edges of weight 0 (hereinafter
called "0-weight edges"). Obviously, these edges will be the solution of the assignment problem.
If we can't find perfect matching on the current step, then the Hungarian algorithm changes
weights of the available edges in such a way that the new 0-weight edges appear and these
changes do not influence the optimal solution.

To clarify, let's look at the step-by-step overview:

Step 0)

A. For each vertex from left part (workers) find the minimal outgoing edge and subtract its
weight from all weights connected with this vertex. This will introduce 0-weight edges (at least
one).

B. Apply the same procedure for the vertices in the right part (jobs).
402

Actually, this step is not necessary, but it decreases the number of main cycle iterations.

Step 1)

A. Find the maximum matching using only 0-weight edges (for this purpose you can use max-
flow algorithm, augmenting path algorithm, etc.).

B. If it is perfect, then the problem is solved. Otherwise find the minimum vertex cover V (for
the subgraph with 0-weight edges only), the best way to do this is to use Kning's graph theorem.

Step 2) Let and adjust the weights using the following rule:
403

Step 3) Repeat Step 1 until solved.

But there is a nuance here; finding the maximum matching in step 1 on each iteration will cause
the algorithm to become O(n5). In order to avoid this, on each step we can just modify the
matching from the previous step, which only takes O(n2) operations.

It's easy to see that no more than n2 iterations will occur, because every time at least one edge
becomes 0-weight. Therefore, the overall complexity is O(n4).

O(n3) algorithm explanation

Warning! In this section we will deal with the maximum-weighted matching problem. It's
obviously easy to transform minimum problem to the maximum one, just by setting:

or

Before discussing the algorithm, let's take a look at some of the theoretical ideas. Let's start off
by considering we have a complete bipartite graph G=(V,E) where
and , w(x,y) - weight of edge (x,y).

Vertex and set neighborhood

Let . Then is v's neighborhood, or all vertices that share an edge


with v.
404

Let . Then is S's neighborhood, or all vertices that share an edge


with a vertex in S.

Vertex labeling

This is simply a function (for each vertex we assign some number called a label).
Let's call this labeling feasible if it satisfies the following condition:
. In other words, the sum of the labels of the vertices
on both sides of a given edge are greater than or equal to the weight of that edge.

Equality subgraph

Let Gl=(V,El) be a spanning subgraph of G (in other words, it includes all vertices from G). If G
only those edges (x,y) which satisfy the following condition:
, then it is an equality subgraph. In other
words, it only includes those edges from the bipartite matching which allow the vertices to be
perfectly feasible.

Now we're ready for the theorem which provides the connection between equality subgraphs and
maximum-weighted matching:

If M* is a perfect matching in the equality subgraph Gl, then M* is a maximum-weighted


matching in G.

The proof is rather straightforward, but if you want you can do it for practice. Let's continue with
a few final definitions:

Alternating path and alternating tree

Consider we have a matching M ( ).

Vertex is called matched if , otherwise it is


called exposed (free, unmatched).

(In the diagram below, W1, W2, W3, J1, J3, J4 are matched, W4, J2 are exposed)
405

Path P is called alternating if its edges alternate between M and E\M. (For example, (W4, J4,
W3, J3, W2, J2) and (W4, J1, W1) are alternating paths)

If the first and last vertices in alternating path are exposed, it is called augmenting (because we
can increment the size of the matching by inverting edges along this path, therefore matching
unmatched edges and vice versa). ((W4, J4, W3, J3, W2, J2) - augmenting alternating path)

A tree which has a root in some exposed vertex, and a property that every path starting in the
root is alternating, is called an alternating tree. (Example on the picture above, with root in W4)

That's all for the theory, now let's look at the algorithm:

First let's have a look on the scheme of the Hungarian algorithm:

Step 0. Find some initial feasible vertex labeling and some initial matching.

Step 1. If M is perfect, then it's optimal, so problem is solved. Otherwise, some exposed
exists; set . (x - is a root of the alternating tree we're going to build). Go to step
2.

Step 2. If go to step 3, else . Find

(1)

and replace existing labeling with the next one:


406

(2)

Now replace with

Step 3. Find some vertex . If y is exposed then an alternating path from x (root
of the tree) to y exists, augment matching along this path and go to step 1. If y is matched in M
with some vertex z add (z,y) to the alternating tree and set , go to
step 2.

And now let's illustrate these steps by considering an example and writing some code.

As an example we'll use the previous one, but first let's transform it to the maximum-weighted
matching problem, using the second method from the two described above. (See Picture 1)

Picture 1

Here are the global variables that will be used in the code:

#define N 55 //max number of vertices in one part


#define INF 100000000 //just infinity

int cost[N][N]; //cost matrix


int n, max_match; //n workers and n jobs
int lx[N], ly[N]; //labels of X and Y parts
int xy[N]; //xy[x] - vertex that is matched with x,
int yx[N]; //yx[y] - vertex that is matched with y
bool S[N], T[N]; //sets S and T in algorithm
int slack[N]; //as in the algorithm description
int slackx[N]; //slackx[y] such a vertex, that
// l(slackx[y]) + l(y) - w(slackx[y],y) = slack[y]
int prev[N]; //array for memorizing alternating paths

Step 0:
407

It's easy to see that next initial labeling will be feasible:

And as an initial matching we'll use an empty one. So we'll get equality subgraph as on Picture 2.
The code for initializing is quite easy, but I'll paste it for completeness:

void init_labels()
{
memset(lx, 0, sizeof(lx));
memset(ly, 0, sizeof(ly));
for (int x = 0; x < n; x++)
for (int y = 0; y < n; y++)
lx[x] = max(lx[x], cost[x][y]);
}

The next three steps will be implemented in one function, which will correspond to a single
iteration of the algorithm. When the algorithm halts, we will have a perfect matching, that's why
we'll have n iterations of the algorithm and therefore (n+1) calls of the function.

Step 1

According to this step we need to check whether the matching is already perfect, if the answer is
positive we just stop algorithm, otherwise we need to clear S, T and alternating tree and then find
some exposed vertex from the X part. Also, in this step we are initializing a slack array, I'll
describe it on the next step.

void augment() //main function of the algorithm


{
if (max_match == n) return; //check wether matching is already
perfect
int x, y, root; //just counters and root vertex
int q[N], wr = 0, rd = 0; //q - queue for bfs, wr,rd - write and
read
//pos in queue
memset(S, false, sizeof(S)); //init set S
memset(T, false, sizeof(T)); //init set T
408

memset(prev, -1, sizeof(prev)); //init set prev - for the alternating


tree
for (x = 0; x < n; x++) //finding root of the tree
if (xy[x] == -1)
{
q[wr++] = root = x;
prev[x] = -2;
S[x] = true;
break;
}

for (y = 0; y < n; y++) //initializing slack array


{
slack[y] = lx[root] + ly[y] - cost[root][y];
slackx[y] = root;
}

Step 2

On this step, the alternating tree is completely built for the current labeling, but the augmenting
path hasn't been found yet, so we need to improve the labeling. It will add new edges to the
equality subgraph, giving an opportunity to expand the alternating tree. This is the main idea of
the method; we are improving the labeling until we find an augmenting path in the equality
graph corresponding to the current labeling. Let's turn back to step 2. There we just change
labels using formulas (1) and (2), but using them in an obvious manner will cause the algorithm
to have O(n4) time. So, in order to avoid this we use a slack array initialized in O(n) time
because we only augment the array created in step 1:

Then we just need O(n) to calculate a delta (see (1)):

Updating slack:
1) On step 3, when vertex x moves from X\S to S, this takes O(n).
2) On step 2, when updating labeling, it's also takes O(n), because:

So we get O(n) instead of O(n2) as in the straightforward approach.


Here's code for the label updating function:

void update_labels()
{
int x, y, delta = INF; //init delta as infinity
for (y = 0; y < n; y++) //calculate delta using slack
if (!T[y])
delta = min(delta, slack[y]);
for (x = 0; x < n; x++) //update X labels
if (S[x]) lx[x] -= delta;
for (y = 0; y < n; y++) //update Y labels
if (T[y]) ly[y] += delta;
409

for (y = 0; y < n; y++) //update slack array


if (!T[y])
slack[y] -= delta;
}

Step 3

In step 3, first we build an alternating tree starting from some exposed vertex, chosen at the
beginning of each iteration. We will do this using breadth-first search algorithm. If on some step
we meet an exposed vertex from the Y part, then finally we can augment our path, finishing up
with a call to the main function of the algorithm. So the code will be the following:

1) Here's the function that adds new edges to the alternating tree:

void add_to_tree(int x, int prevx)


//x - current vertex,prevx - vertex from X before x in the alternating path,
//so we add edges (prevx, xy[x]), (xy[x], x)
{
S[x] = true; //add x to S
prev[x] = prevx; //we need this when augmenting
for (int y = 0; y < n; y++) //update slacks, because we add new vertex
to S
if (lx[x] + ly[y] - cost[x][y] < slack[y])
{
slack[y] = lx[x] + ly[y] - cost[x][y];
slackx[y] = x;
}
}

3) And now, the end of the augment() function:

//second part of augment() function


while (true)
//main cycle
{
while (rd < wr)
//building tree with bfs cycle
{
x = q[rd++];
//current vertex from X part
for (y = 0; y < n; y++)
//iterate through all edges in equality graph
if (cost[x][y] == lx[x] + ly[y] && !T[y])
{
if (yx[y] == -1) break; //an
exposed vertex in Y found, so

//augmenting path exists!


T[y] = true;
//else just add y to T,
q[wr++] = yx[y]; //add
vertex yx[y], which is matched

//with y, to the queue


410

add_to_tree(yx[y], x); //add


edges (x,y) and (y,yx[y]) to the tree
}
if (y < n) break;
//augmenting path found!
}
if (y < n) break;
//augmenting path found!

update_labels();
//augmenting path not found, so improve labeling
wr = rd = 0;
for (y = 0; y < n; y++)
//in this cycle we add edges that were added to the equality graph as
a
//result of improving the labeling, we add edge (slackx[y], y) to the
tree if
//and only if !T[y] && slack[y] == 0, also with this edge we add
another one
//(y, yx[y]) or augment the matching, if y was exposed
if (!T[y] && slack[y] == 0)
{
if (yx[y] == -1)
//exposed vertex in Y found - augmenting path exists!
{
x = slackx[y];
break;
}
else
{
T[y] = true;
//else just add y to T,
if (!S[yx[y]])
{
q[wr++] = yx[y]; //add
vertex yx[y], which is matched with
//y,
to the queue
add_to_tree(yx[y], slackx[y]); //and
add edges (x,y) and (y,

//yx[y]) to the tree


}
}
}
if (y < n) break;
//augmenting path found!
}

if (y < n) //we
found augmenting path!
{
max_match++;
//increment matching
//in this cycle we inverse edges along augmenting path
for (int cx = x, cy = y, ty; cx != -2; cx = prev[cx], cy = ty)
{
411

ty = xy[cx];
yx[cy] = cx;
xy[cx] = cy;
}
augment();
//recall function, go to step 1 of the algorithm
}
}//end of augment() function

The only thing in code that hasn't been explained yet is the procedure that goes after labels are
updated. Say we've updated labels and now we need to complete our alternating tree; to do this
and to keep algorithm in O(n3) time (it's only possible if we use each edge no more than one time
per iteration) we need to know what edges should be added without iterating through all of them,
and the answer for this question is to use BFS to add edges only from those vertices in Y, that are
not in T and for which slack[y] = 0 (it's easy to prove that in such way we'll add all edges and
keep algorithm to be O(n3)). See picture below for explanation:

At last, here's the function that implements Hungarian algorithm:

int hungarian()
{
int ret = 0; //weight of the optimal matching
max_match = 0; //number of vertices in current
matching
memset(xy, -1, sizeof(xy));
memset(yx, -1, sizeof(yx));
init_labels(); //step 0
augment(); //steps 1-3
for (int x = 0; x < n; x++) //forming answer there
ret += cost[x][xy[x]];
return ret;
}

To see all this in practice let's complete the example started on step 0.
412

Augmenti
Build Build
ng
alternati alternati
path
ng tree ng tree
found

Update Build Update


labels alternatin labels
(=1) g tree (=1)

Augmenti
Build Build
ng
alternati alternati
path
ng tree ng tree
found

Update Build Update


labels alternatin labels
(=2) g tree (=1)

Augmenti
Build
ng
alternati
path
ng tree Optimal
found
matching
found

413

Finally, let's talk about the complexity of this algorithm. On each iteration we increment
matching so we have n iterations. On each iterations each edge of the graph is used no more than
one time when finding augmenting path, so we've got O(n2) operations. Concerning labeling we
update slack array each time when we insert vertex from X into S, so this happens no more than
n times per iteration, updating slack takes O(n) operations, so again we've got O(n2). Updating
labels happens no more than n time per iterations (because we add at least one vertex from Y to T
per iteration), it takes O(n) operations - again O(n2). So total complexity of this implementation
is O(n3).

Line Sweep Algorithms


topcoders.com

By bmerry
TopCoder Member

A previous series of articles covered the basic tools of computational geometry. In this article I'll
explore some more advanced algorithms that can be built from these basic tools. They are all
based on the simple but powerful idea of a sweep line: a vertical line that is conceptually swept
across the plane. In practice, of course, we cannot simulate all points in time and so we consider
only some discrete points.

In several places I'll refer to the Euclidean and Manhattan distances. The Euclidean distance is
the normal, everyday distance given by Pythagoras' Theorem. The Manhattan distance between
points (x1, y1) and (x2, y2) is the distance that must be travelled while moving only horizontally
or vertically, namely |x1 x2| + |y1 y2|. It is called the Manhattan distance because the roads in
Manhattan are laid out in a grid and so the Manhattan distance is the distance that must be
travelled by road (it is also called the "taxicab distance," or more formally the L1 metric).

In addition, a balanced binary tree is used in some of the algorithms. Generally you can just use a
set in C++ or a TreeSet in Java, but in some cases this is insufficient because it is necessary to
store extra information in the internal nodes.

Closest pair
Given a set of points, find the pair that is closest (with either metric). Of course, this can be
solved in O(N2) time by considering all the pairs, but a line sweep can reduce this to O(N log N).

Suppose that we have processed points 1 to N 1 (ordered by X) and the shortest distance we
have found so far is h. We now process point N and try to find a point closer to it than h. We
maintain a set of all already-processed points whose X coordinates are within h of point N, as
shown in the light grey rectangle. As each point is processed, it is added to the set, and when we
move on to the next point or when h is decreased, points are removed from the set. The set is
414

ordered by y coordinate. A balanced binary tree is suitable for this, and accounts for the log N
factor.

To search for points closer than h to point N, we need only consider points in the active set, and
furthermore we need only consider points whose y coordinates are in the range yN h to yN + h
(those in the dark grey rectangle). This range can be extracted from the sorted set in O(log N)
time, but more importantly the number of elements is O(1) (the exact maximum will depend on
the metric used), because the separation between any two points in the set is at least h. It follows
that the search for each point requires O(log N) time, giving a total of O(N log N).

Line segment intersections


We'll start by considering the problem of returning all intersections in a set of horizontal and
vertical line segments. Since horizontal lines don't have a single X coordinate, we have to
abandon the idea of sorting objects by X. Instead, we have the idea of an event: an X coordinate
at which something interesting happens. In this case, the three types of events are: start of a
horizontal line, end of a horizontal line, and a vertical line. As the sweep line moves, we'll keep
an active set of horizontal lines cut by the sweep line, sorted by Y value (the red lines in the
figure).

To handle either of the horizontal line events, we simply need to add or remove an element from
the set. Again, we can use a balanced binary tree to guarantee O(log N) time for these operations.
When we hit a vertical line, a range search immediately gives all the horizontal lines that it cuts.
If horizontal or vertical segments can overlap there is some extra work required, and we must
also consider whether lines with coincident endpoints are considered to intersect, but none of this
affects the computational complexity.

If the intersections themselves are required, this takes O(N log N + I) time for I intersections. By
augmenting the binary tree structure (specifically, by storing the size of each sub-tree in the root
of that sub-tree), it is possible to count the intersections in O(N log N) time.
415

In the more general case, lines need not be horizontal or vertical, so lines in the active set can
exchange places when they intersect. Instead of having all the events pre-sorted, we have to use a
priority queue and dynamically add and remove intersection events. At any point in time, the
priority queue contains events for the end-points of line-segments, but also for the intersection
points of adjacent elements of the active set (providing they are in the future). Since there are
O(N + I) events that will be reached, and each requires O(log N) time to update the active set and
the priority queue, this algorithm takes O(N log N + I log N) time. The figure below shows the
future events in the priority queue (blue dots); note that not all future intersections are in the
queue, either because one of the lines isn't yet active, or because the two lines are not yet
adjacent in the active list.

Area of the union of rectangles


Given a set of axis-aligned rectangles, what is the area of their union? Like the line-intersection
problem, we can handle this by dealing with events and active sets. Each rectangle has two
events: left edge and right edge. When we cross the left edge, the rectangle is added to the active
set. When we cross the right edge, it is removed from the active set.

We now know which rectangles are cut by the sweep line (red in the diagram), but we actually
want to know the length of sweep line that is cut (the total length of the solid blue segments).
Multiplying this length by the horizontal distance between events gives the area swept out
between those two events.
416

We can determine the cut length by running the same algorithm in an inner loop, but rotated 90
degrees. Ignore the inactive rectangles, and consider a horizontal sweep line that moves top-
down. The events are now the horizontal edges of the active rectangles, and every time we cross
one, we can simply increment or decrement a counter that says how many rectangles overlap at
the current point. The cut length increases as long as the counter is non-zero. Of course, we do
not increase it continuously, but rather while moving from one event to the next.

With the right data structures, this can be implemented in O(N2) time (hint: use a boolean array
to store the active set rather than a balanced binary tree, and pre-sort the entire set of horizontal
edges). In fact the inner line sweep can be replaced by some clever binary tree manipulation to
reduce the overall time to O(N log N), but that is more a problem in data structures than in
geometry, and is left as an exercise for the reader. The algorithm can also be adapted to answer
similar questions, such as the total perimeter length of the union or the maximum number of
rectangles that overlap at any point.

Convex hull
The convex hull of a set of points is the smallest convex polygon that surrounds the entire set,
and has a number of practical applications. An efficient method that is often used in challenges is
the Graham scan [2], which requires a sort by angle. This isn't as easy as it looks at first, since
computing the actual angles is expensive and introduces problems with numeric error. A simpler
yet equally efficient algorithm is due to Andrew [1], and requires only a sort by X for a line
sweep (although Andrew's original paper sorts by Y and has a few optimizations I won't discuss
here).

Andrew's algorithm splits the convex hull into two parts, the upper and lower hull. Usually these
meet at the ends, but if more than one points has minimal (or maximal) X coordinate, then they
are joined by a vertical line segment. We'll describe just how to construct the upper hull; the
lower hull can be constructed in similar fashion, and in fact can be built in the same loop.

To build the upper hull, we start with the point with minimal X coordinate, breaking ties by
taking the largest Y coordinate. After this, points are added in order of X coordinate (always
taking the largest Y value when multiple points have the same X value). Of course, sometimes
this will cause the hull to become concave instead of convex:

The black path shows the current hull. After adding point 7, we check whether the last triangle
(5, 6, 7) is convex. In this case it isn't, so we delete the second-last point, namely 6. The process
is repeated until a convex triangle is found. In this case we also examine (4, 5, 7) and delete 5
before examining (1, 4, 7) and finding that it is convex, before proceeding to the next point. This
is essentially the same procedure that is used in the Graham scan, but proceeding in order of X
417

coordinate rather than in order of the angle made with the starting point. It may at first appear
that this process is O(N2) because of the inner backtracking loop, but since no point can be
deleted more than once it is in fact O(N). The algorithm over-all is O(N log N), because the
points must initially be sorted by X coordinate.

Manhattan minimum spanning tree


We can create even more powerful algorithms by combining a line sweep with a divide-and-
conquer algorithm. One example is computing the minimum spanning tree of a set of points,
where the distance between any pair of points is the Manhattan distance. This is essentially the
algorithm presented by Guibas and Stolfi [3].

We first break this down into a simpler problem. Standard MST algorithms for general graphs
(e.g., Prim's algorithm) can compute the MST in O((E + N) log N) time for E edges. If we can
exploit geometric properties to reduce the number of edges to O(N), then this is merely O(N log
N). In fact we can consider, for each point P, only its nearest neighbors in each of the 8 octants
of the plane (see the figure below). The figure shows the situation in just one of the octants, the
West-Northwest one. Q is the closest neighbour (with the dashed line indicating points at the
same Manhattan distance as Q), and R is some other point in the octant. If PR is an edge in a
spanning tree, then it can be removed and replaced by either PQ or QR to produce a better
spanning tree, because the shape of the octant guarantees that |QR| |PR|. Thus, we do not need
to consider PR when building the spanning tree.

This reduces the problem to that of finding the nearest neighbour in each octant. We'll just
consider the octant shown; the others are no different and can be handled by symmetry. It should
be clear that within this octant, finding the nearest neighbour is equivalent to just finding the
point with the largest value of x y, subject to an upper bound on x + y and a lower bound on y,
and this is the form in which we'll consider the problem.

Now imagine for the moment that the lower bound on y did not exist. In this case we could solve
the problem for every P quite easily: sweep through the points in increasing order of x + y, and Q
will be the point with the largest x y value of those seen so far. This is where the divide-and-
418

conquer principle comes into play: we partition the point set into two halves with a horizontal
line, and recursively solve the problem for each half. For points P in the upper half, nothing
further needs to be done, because points in the bottom half cannot play Q to their P. For the
bottom half, we have to consider that by ignoring the upper half so far we may have missed some
closer points. However, we can take these points into account in a similar manner as before: walk
through all the points in x + y order, keeping track of the best point in the top half (largest x y
value), and for each point in the bottom half, checking whether this best top-half point is better
than the current neighbour.

So far I have blithely assumed that any set of points can be efficiently partitioned on Y and also
walked in x + y order without saying how this should be done. In fact, one of the most beautiful
aspects of this class of divide-and-conquer plus line-sweep algorithms is that it has essentially
the same structure as a merge sort, to the point that a merge-sort by x + y can be folded into the
algorithm in such a way that each subset is sorted on x + y just when this is needed (the points
initially all being sorted on Y). This gives the algorithm a running time of O(N log N).

The idea of finding the closest point within an angle range can also be used to solve the
Euclidean MST problem, but the O(N log N) running time is no longer guaranteed in the worst
cases, because the distance is no longer a linear equation. It is actually possible to compute the
Euclidean MST in O(N log N) time, because it is a subset of the Delaunay triangulation.

LCP
codeforces.com

Kasai's algorithm is pretty easy and works in O(n).

Let's look at the two continuous suffixes in the suffix array. Let their indexes in suffix array be i1
and i1+1. If their lcp>0, then if we delete first letter from both of them. We can easily see that
new strings will have the same relative order. Also we can see that lcp of new strings will be
exactly lcp-1.

Let's now look at the string wich we have got from the i suffix by deleting its first character.
Obviously it is some suffix of the string too. Let its index be i2. Let's look at the lcp of suffixes i2
and i2+1. We can see that it's lcp will be at least already mentioned lcp-1. This is associated
with certain properties of lcp array, in particular, that lcp(i,j)=min(lcpi,lcpi+1,...,lcpj-1).

And finally let's make the algorithm based on the mentioned above. We will need an additional
array rank[n], wich will contain the index in the suffix array of the suffix starting in index i.
Firstly we should calculate the lcp of the suffix with index rank[0]. Then let's iterate through all
suffixes in order in which we meet them in the string and calculate lcp[rank[i]] in naive way,
BUT starting it from lcp[rank[i-1]]-1. Easy to see that now we have O(n) algorithm because on
the each step our lcp decreasing not more than by 1 (except the case when rank[i]=n-1).

Implementation:
419

vector<int> kasai(string s, vector<int> sa)


{
int n=s.size(),k=0;
vector<int> lcp(n,0);
vector<int> rank(n,0);

for(int i=0; i<n; i++) rank[sa[i]]=i;

for(int i=0; i<n; i++, k?k--:0)


{
if(rank[i]==n-1) {k=0; continue;}
int j=sa[rank[i]+1];
while(i+k<n && j+k<n && s[i+k]==s[j+k]) k++;
lcp[rank[i]]=k;
}
return lcp;
}

Gaussian Elimination
compprog.wordpress.com

Gauss algorithm for solving n linear equations with n unknowns. I also give a sample
implementation in C.

The Problem

Lets say you want to solve the following system of 3 equations with 3 unknowns:

Humans learn that there a two ways to solve this system. Reduction and substitution.
Unfortunately, neither of these methods is suitable for a computer.

A simple algorithm (and the one used everywhere even today), was discovered by Gauss more
than two hundred years ago. Since then, some refinements have been found, but the basic
procedure remains unchanged.

Gaussian Elimination

Start by writing the system in matrix form:


420

If you recall how matrix multiplication works, youll see thats true. If not, its enough to notice
how the matrix is written: the coefficients of x, y and z are written, side by side, as the rows of a
33 matrix; x, y and z are then written as rows of a 31 matrix; finally, whats left of the equality
sign is written one under the other as a 31 matrix.

So far, this doesnt actually help, but it does make the following process easier to write. The goal
is, through simple transformations, to reach the system, where a, b and c are known.

How do you transform the initial system into the above one? Heres Gauss idea.

Start with the initial system, then perform some operations to get 0s on the first column, on every
row but the first.

The operations mentioned are multiplying the first rows by -3/2 and substracting it from the
second. Then multiplying the first rows by -1 and substracting it from the third. What is -3/2? Its
first element of the second row divided by the first element of the first row. And -1? Its the first
element of the third row divided by the first element of the first row. NOTE The changes to row
1 are never actually written back into the matrix.

For now were done with row 1, so we move on to row 2. The goal here is to get every row on
the second column under row 2 to 0. We do this by multiplying the second rows by 4 (i.e. 2 / (1 /
2)) and substracting it from the third rows.

Now its easy to find the value of z. Just multiply the third column by -1 (i.e. -1/1).

ERRATA: The 7 in the above matrix should be an 8.


421

Knowing the value of z, we can now eliminate it from the other two equations.

Now, we can find the value of y and eliminate y from the first equation.

And, finally, the value of x is:

And with that, were done.

The Programme

Unfortunately, this is easier said than done. The actual computer programme has to take into
account divisions by zero and numerical instabilities. This adds to the complexity of
forwardSubstitution().

Heres the code in C (gauss.c):

1 #include <stdio.h>

3 int n;

4 float a[10][11];

6 void forwardSubstitution() {

7 int i, j, k, max;

float t;
8
422

9 for (i = 0; i < n; ++i) {

10 max = i;

for (j = i + 1; j < n; ++j)


11
if (a[j][i] > a[max][i])
12
max = j;
13

14
for (j = 0; j < n + 1; ++j) {
15
t = a[max][j];
16 a[max][j] = a[i][j];
17 a[i][j] = t;

18 }

19

20 for (j = n; j >= i; --j)

21 for (k = i + 1; k < n; ++k)

a[k][j] -= a[k][i]/a[i][i] * a[i][j];


22

23
/* for (k = 0; k < n; ++k) {
24
for (j = 0; j < n + 1; ++j)
25
printf("%.2f\t", a[k][j]);
26
printf("\n");
27 }*/
28 }
29}

30

31void reverseElimination() {

32 int i, j;

for (i = n - 1; i >= 0; --i) {


33
a[i][n] = a[i][n] / a[i][i];
34
a[i][i] = 1;
35
for (j = i - 1; j >= 0; --j) {
36
a[j][n] -= a[j][i] * a[i][n];
423

37 a[j][i] = 0;

38 }

}
39
}
40

41
void gauss() {
42
int i, j;
43

44
forwardSubstitution();
45 reverseElimination();
46

47 for (i = 0; i < n; ++i) {

48 for (j = 0; j < n + 1; ++j)

49 printf("%.2f\t", a[i][j]);

50 printf("\n");

}
51
}
52

53
int main(int argc, char *argv[]) {
54
int i, j;
55

56
FILE *fin = fopen("gauss.in", "r");
57 fscanf(fin, "%d", &n);
58 for (i = 0; i < n; ++i)

59 for (j = 0; j < n + 1; ++j)

60 fscanf(fin, "%f", &a[i][j]);

61 fclose(fin);

62
gauss();
63

64
return 0;
424

65}

66

67

68

69

70

71

In the above code, the first two for-s of forwardSubstitution(), just swap two rows so as to
diminish the possibilities of some bad rounding errors. Also, if it exists with a division by zero
error, that probably means the system cannot be solved.

And heres the input file for the example (gauss.in) (save it as gauss.in):
3
2 1 -1 8
-3 -1 2 -11
-2 1 2 -3

Pollard Rho Integer Factorization


cs.colorado.edu

Pollard's Rho Algorithm is a very interesting and quite accessible algorithm for factoring
numbers. It is not the fastest algorithm by far but in practice it outperforms trial division
by many orders of magnitude. It is based on very simple ideas that can be used in other
contexts as well.

History

The original reference for this algorithm is a paper by John M. Pollard (who has many
contributions to computational number theory).

Pollard, J. M. (1975), A Monte Carlo method for factorization, BIT


Numerical Mathematics 15 (3): 331334

Improvements were suggested by Richard Brent in a follow up paper that appeared in 1980

Brent, Richard P. (1980), An Improved Monte Carlo Factorization Algorithm,


BIT 20: 176184, doi:10.1007/BF01933190
425

A good reference to this algorithm is by Cormen, Leiserson and Rivest in their book. They
discuss integer factorization and Pollard's rho algorithm.

Problem Setup

Let us assume that N=pq is a number to be factorized and p/=q. Our goal is to find one of the
factors p or q (the other can be found by dividing from N).

We saw the trial division algorithm

Naive Trial Division Algorithm

int main (int argc, char * const argv []) {


int N,i;
// Read in the number N
scanf("%d", &N);
printf ("You entered N = %d \n", N);
if (N %2 == 0) {
puts ("2 is a factor");
exit(1);
}

for (i = 3; i < N; i+= 2){


if (N % i == 0) {
printf ("%d is a factor \n", i);
exit(1);
}
}

printf("No factor found. \n");


return 1;
}

Let us try an even more atrocious version that uses random numbers. Note that the code below is
not perfect but it will do.

I am feeling very lucky today Algorithm

int main (int argc, char * const argv []) {


int N,i;
// Read in the number N
scanf("%d", &N);
printf ("You entered N = %d \n", N);

i = 2 + rand(); // Gets a number from 0 to RAND_MAX

if (N % i == 0) {
printf(" I found %d \n", i);
exit(1);
}

printf ("go fishing!\n");


426

The I am feeling lucky algorithm (no offense to google) generates a random number between
2 and N1 and checks if we found a factor. What are the odds of finding a factor?

Very simple: we have precisely two factors to find p,q in the entire space and N1 numbers.
Therefore, the probability is 2N1. If N1010 (a 10 digit number), the chances are very much
not in our favor about 15000000000. Now the odds for winning Lotto are much better than this.

Put another way, we have to repeatedly run the I am feeling lucky algorithm
approximately N times with different random numbers to find an answer. This is absolutely no
better than trial division.

Improving the Odds with Birthday Trick

There is a simple trick to improve the odds and it is a very useful one too. It is called the
Birthday Trick. Let us illustrate this trick.

Suppose we randomly pick a number uniformly at random from 1 to 1000. What are the
chances that we land on the number 42. The answer is very simple: 11000. Each number is
equally probable and getting 42 is just as likely as picking any other number, really.

But suppose we modify the problem a little: we pick two random numbers i,j from 1 to 1000.
What are the odds that ij=42? Note that i,j need not be different. There are roughly 9582
possible values of i,j that ensure that ij=42 and the probability reduces to 295810001000 and
it works out to roughly 1500.

Rather than insist that we pick one number and it be exactly 42, if we are allowed to pick two
and just ask that their difference be 42, the odds improve.

What if we pick k numbers x1,,xk in the range [1,1000] and ask whether any two numbers
xi,xj satisfy xixj=42? How do the odds relate to k? The calculation is not hard but let us write
a simple program to empirically estimate the odds.

Computing Birthday Paradox Probability

#include <stdio.h>
#include <stdlib.h>

int main(int argc, int * argv){


int i,j,k,success;
int nTrials = 100000, nSucc = 0,l;
int * p;

/* Read in the number k */


427

printf ("Enter k:");


scanf ("%d", &k);
printf ("\n You entered k = %d \n", k);
if ( k < 2){
printf (" select a k >= 2 please. \n");
return 1;
}
/* Allocate memory for our purposes */
p = (int *) malloc(k * sizeof(int));

//nTrials = number of times to repeat the experiment.


for (j = 0; j < nTrials; ++j){

success = 0;
// Each experiment will generate k random variables
// and check if the difference between
// any two of the generated variables is exactly 42.
// The loop below folds both in.
for (i = 0; i < k; ++i){
// Generate the random numbers between 1 and 1000
p[i] = 1+ (int) ( 1000.0 * (double) rand() / (double) RAND_MAX );

// Check if a difference of 42 has been achieved already


for (l = 0; l < i; ++l)
if (p[l] - p[i] == 42 || p[i] - p[l] == 42 ){
success = 1; // Success: we found a diff of 42
break;
}
}

if (success == 1){ // We track the number of successes so far.


nSucc ++;
}

}
// Probability is simply estimated by number of success/ number of trials.
printf ("Est. probability is %f \n", (double) nSucc/ (double) nTrials);
// Do not forget to cleanup the memory.
free(p);

return 1;

You can run the code for various values of k starting from k>=2.

k Prob. Est.

2 0.0018

3 0.0054

4 0.01123
428

5 0.018

6 0.0284

10 0.08196

15 0.18204

20 0.30648

25 0.4376

30 0.566

100 0.9999

This shows a curious property. Around k=30 there is more than a half probability of success.
We have a large set of numbers 1000 but generating about 30 random numbers gets us a half
probability of success. Around k=100, we are virtually assured of success. This is an important
observation and it is called the birthday problem or the birthday paradox.

Suppose we pick a person at random and ask what is the probability that their birthday is the
April 1st. Well, the answer is 1365 assuming that no leap years exist.

We number the days in the year from 1 to 365 and April 1st is roughly day number 91. Since
each day is equally probable as a birthday, we get to 1365. We can play the same game as before.

Let us take k2 people at random and ask the probability that they have the same birthday. A
quick modification of our difference of 42 code yields the answers we want

Exploring the birthday paradox

#include <stdio.h>
#include <stdlib.h>

int main(int argc, int * argv){


int i,j,k,success;
int nTrials = 100000, nSucc = 0,l;
int * p;
printf ("Enter k:");
scanf ("%d", &k);
printf ("\n You entered k = %d \n", k);

p = (int *) malloc(k * sizeof(int));


// We will do 1000 reps
for (j = 0; j < nTrials; ++j){
success = 0;
429

for (i = 0; i < k; ++i){


// Generate the random numbers between 1 and 365
p[i] = 1+ (int) ( 365 * (double) rand() / (double) RAND_MAX );
// Check if a difference of 42 has been achieved
for (l = 0; l < i; ++l)
if (p[l] - p[i] == 0 ){
success = 1;
break;
}
}
if (success == 1){
nSucc ++;
}
}

printf ("Est. probability is %f \n", (double) nSucc/ (double) nTrials);


return 1;
}

We can see that with k=10 people there is a roughly 11% chance of having two people with the
same birthday. With k=23 people, we have a roughly 50% chance. Our class size is k=58 and
we have roughly 99% chance of having two people having the same birthday. We get
mathematical certainty 100% chance with k366.

If we have N days in a year (N=365 on this planet) then with k=N people we have a 50%
chance of having a birthday collision.

Imagine a version of star trek where the enterprise docks on a strange new planet and they are
unable to find out how long a year is. Captain Kirk and Officer Spock land on the planet and
walk over to the birth records. They toss coins to pick people at random and look at how many
people give them even odds of birthday collision. They can back out the revolution period of the
planet divided by its rotational period (i.e, number of days in the year).

This is all well and good, you say. How does this help us at all?

Applying Birthday Paradox to Factoring

Let us go back to the I am feeling lucky algorithm. We are given N=pq and we randomly
picked a number between 1 and N. Chances of landing on either p or q are quite small. So we
have to repeat the algorithm many many times to get us to even odds.

We can ask a different question. Instead of picking just one number, we can pick k numbers for
some k. Let these numbers be x1,...,xk. We can now ask if xixj divides N.

The difference between the former scheme and latter is exactly the difference between picking
one person and asking if their birthday falls on April the 1st (a specific day) or picking k people
430

and asking if any two among k share a birthday. We can already see that for k roughly around
N, we get even odds. So very roughly speaking, we can improve the chances from roughly
1N to 1N. Therefore, for a 10 digit number, instead of doing 1010 reps, we can pick k=105
random numbers and do the test above.

But unfortunately, this does not save us any effort. With k=105 people, we still do k2=1010

pairwise comparisons and divisions. Bah.. there's got to be a better way

We can do something even better.

We can pick numbers x1,,xk and instead of asking if xixj divides N, we can ask if
GCD(xixj,N)>1. In other words, we ask if xixj and N have a non-trivial factor in common.
This at once increases the number of chances for successes.

If we ask how many numbers divide N, we have just 2 : p,q.

If we ask how many numbers have GCD(x,N)>1, we have quite a few now:

p,2p,3p,4p,5p,....,(q1)p,q,2q,3q,...(p1)q
Precisely, we have p+q2 of these numbers.

So a simple scheme is as follows:

Pick k numbers x1,,xk uniformly at random between 2 and N1.


Ask if GCD(xixj,N)>1. If yes, then GCD(xixj,N) is a factor of N (either p or q).

But there is already a problem, we need to pick a number k that is in the order of N14 and do
pairwise comparisons. This is already too much to store in the memory. If N1030, we are
storing about 108 numbers in memory.

To get to Pollard's rho algorithm, we want to do things so that we just have two numbers in
memory.

Pollard's Rho Algorithm

Therefore, Pollard's rho algorithm works like this. We would like to generate k numbers
x1,,xk and check pairwise but we cannot. The next best thing is to generate random numbers
one by one and check two consecutive numbers. We keep doing this for a while and hopefully
we are going to get lucky.
431

We use a function f that will generate pseudo random numbers. In other words, we will keep
applying f and it will generate numbers that look and feel random. Not every function does it
but one such function that has mysterious pseudo random property is Undefined control
sequence \mathchoice, where we generate a using a random number generator.

We start with x1=2 or some other number. We do x2=f(x1),x3=f(x2),.... The general rule is
xn+1=f(xn).
We can start off with a naive algorithm and start to fix the problems as we go.

Pollard's Rho Algorithm Scheme

x := 2;

while (.. exit criterion .. )

y := f(x);
p := GCD( | y - x | , N);
if ( p > 1)
return "Found factor: p";
x := y;
end

return "Failed. :-("

Let us take N=55 and Undefined control sequence \mathchoice.

xn xn+1 GCD(|xnxn+1|, N)
2 6 1

6 38 1

38 16 1

16 36
5

You can see that in many cases this works. But in some cases, it goes into an infinite loop
because, the function f cycles. When that happens, we keep repeating the same set of value pairs
xn and xn+1 and never stop.

For example, we can make up a pseudo random function f which gives us a sequence like

2,10,16,23,29,13,16,23,29,13,....
432

In this case, we will keep cycling and never find a factor. How do we detect that the cycling has
happened?

Solution #1 is to keep all the numbers seen so far x1,,xn and see if xn=xl for some previous
l<n. This gets back to our memory crunch as n is going to be large in practice.
Solution #2 is a clever algorithm by Floyd. To illustrate Floyd's algorithm, suppose we are
running on a long circular race track, how do we know we have completed one cycle? We could
use solution #1 and remember everything we have seen so far. But a cleverer solution is to have
two runners A and B with B running twice as fast as A. They start off at the same position and
when B overtakes A, we know that B has cycled around at least once.

Pollard's Rho Algorithm Scheme

a := 2;
b := 2;
while ( b != a )

a = f(a); // a runs once


b = f(f(b)); // b runs twice as fast.
p = GCD( | b - a | , N);
if ( p > 1)
return "Found factor: p";
end

return "Failed. :-("

If the algorithm fails, we simply find a new function f or start from a new random seed for a,b.

Now we have derived the full Pollard's rho algorithm with Floyd's cycle detection scheme.

Topological Sorting
Geeksforgeeks.org

Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for
every directed edge uv, vertex u comes before v in the ordering. Topological Sorting for a graph
is not possible if the graph is not a DAG.

For example, a topological sorting of the following graph is 5 4 2 3 1 0. There can be more
than one topological sorting for a graph. For example, another topological sorting of the
433

following graph is 4 5 2 3 1 0. The first vertex in topological sorting is always a vertex with in-
degree as 0 (a vertex with no in-coming edges).

graph

Topological Sorting vs Depth First Traversal (DFS):


In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In topological
sorting, we need to print a vertex before its adjacent vertices. For example, in the given graph,
the vertex 5 should be printed before vertex 0, but unlike DFS, the vertex 4 should also be
printed before vertex 0. So Topological sorting is different from DFS. For example, a DFS of
the above graph is 5 2 3 1 0 4, but it is not a topological sorting

Algorithm to find Topological Sorting:


We recommend to first see implementation of DFS here. We can modify DFS to find Topological
Sorting of a graph. In DFS, we start from a vertex, we first print it and then recursively call DFS
for its adjacent vertices. In topological sorting, we use a temporary stack. We dont print the
vertex immediately, we first recursively call topological sorting for all its adjacent vertices, then
push it to a stack. Finally, print contents of stack. Note that a vertex is pushed to stack only
when all of its adjacent vertices (and their adjacent vertices and so on) are already in stack.

Following is C++ implementation of topological sorting. Please see the code for Depth First
Traversal for a disconnected Graph and note the differences between the second code given
there and the below code.

// A C++ program to print topological sorting of a DAG


#include<iostream>
#include <list>
#include <stack>
using namespace std;

// Class to represent a graph


class Graph
{
int V; // No. of vertices'
434

// Pointer to an array containing adjacency listsList


list<int> *adj;

// A function used by topologicalSort


void topologicalSortUtil(int v, bool visited[], stack<int> &Stack);
public:
Graph(int V); // Constructor

// function to add an edge to graph


void addEdge(int v, int w);

// prints a Topological Sort of the complete graph


void topologicalSort();
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to vs list.
}

// A recursive function used by topologicalSort


void Graph::topologicalSortUtil(int v, bool visited[], stack<int> &Stack)
{
435

// Mark the current node as visited.


visited[v] = true;

// Recur for all the vertices adjacent to this vertex


list<int>::iterator i;
for (i = adj[v].begin(); i != adj[v].end(); ++i)
if (!visited[*i])
topologicalSortUtil(*i, visited, Stack);

// Push current vertex to stack which stores result


Stack.push(v);
}

// The function to do Topological Sort. It uses recursive topologicalSortUtil()


void Graph::topologicalSort()
{
stack<int> Stack;

// Mark all the vertices as not visited


bool *visited = new bool[V];
for (int i = 0; i < V; i++)
visited[i] = false;

// Call the recursive helper function to store Topological Sort


// starting from all vertices one by one
for (int i = 0; i < V; i++)
if (visited[i] == false)
topologicalSortUtil(i, visited, Stack);

// Print contents of stack


436

while (Stack.empty() == false)


{
cout << Stack.top() << " ";
Stack.pop();
}
}

// Driver program to test above functions


int main()
{
// Create a graph given in the above diagram
Graph g(6);
g.addEdge(5, 2);
g.addEdge(5, 0);
g.addEdge