Beruflich Dokumente
Kultur Dokumente
org/articles/the-maximum-subsequence-problem
think about it
Entries Log
0 1 2 -3 3 -1 0 -4 0 -1 -4 2 4 1 1 3 1 0 -2 -3 -3 -2 3 1 1 4 5 -3 -2 -1 ...
Given a sequence of numbers, find the maximum sum of a contiguous subsequence of those
numbers.
As an example, the maximum sum contiguous subsequence of 0, -1, 2, -1, 3, -1, 0 would be 4 (= 2 + -1 + 3).
This problem is generally known as the maximum sum contiguous subsequence problem and if you
haven’t encountered it before, I’d recommend trying to solve it before reading on. Even if you have
encountered it before, I’ll invite you to read on anyway — it’s well worth another look.
Programming Pearl
The maximum sum contiguous subsequence problem appears in Jon Bentley’s “Programming Pearls”. He first
presents a brute force solution which examines all possible contiguous subsequences of the initial sequence and
returns the maximum sum of these subsequences.
def generate_pairs(n):
"Generate all pairs (i, j) such that 0 <= i <= j < n"
for i in range(n):
for j in range(i, n):
yield i, j
def max_sum_subsequence(seq):
"Return the max-sum contiguous subsequence of the input sequence."
return max(sum(seq[i:j])
for i, j in generate_pairs(len(seq) + 1))
It’s a straightforward piece of code, though note the + 1 which ensures that we slice to the end of seq, and also
that we include empty slices, which sum to 0, handling the case when every item in the sequence is negative.
The trouble is, the algorithm is of cubic complexity: to process just 6 hours of logged activity takes over 2
minutes on a 2GHz Intel Core Duo MacBook, and the cubic nature of the algorithm means we’d quickly fail to
process more substantial log files in real time.
1 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem
A simple optimisation eliminates the repeated calls to sum by accumulating the input sequence — the red line in
the graph above. Subtracting element i-1 from element j of this cumulative sequence gives us the sum of
elements in the range i, j of the original sequence. We won’t study the code for this quadratic solution — it
doesn’t add much to our analysis. Again, some care is needed to avoid fencepost problems.
We won’t look at the divide-and-conquer NlogN solution either. It’s hard to understand, and we can do far
better.
Linear Solution
There is a linear solution. The idea is to scan the sequence from start to finish keeping track of maxsofar, the
maximum sum of a contiguous subsequence seen so far, and maxendinghere, the maximum sum of a
contiguous subsequence which ends at the current position. Bentley’s pseudo-code reads:
maxsofar = 0
maxendinghere = 0
for i = [0, n)
/* invariant: maxendinghere and maxsofar are accurate
are accurate for x[0..i-1] */
maxendinghere = max(maxendinghere + x[i], 0)
maxsofar = max(maxsofar, maxendinghere)
def max_sum_subsequence(seq):
maxsofar = 0
maxendinghere = 0
for s in seq:
# invariant: maxendinghere and maxsofar are accurate
# are accurate up to s
maxendinghere = max(maxendinghere + s, 0)
maxsofar = max(maxsofar, maxendinghere)
return maxsofar
Now, this is a fabulous solution. Bentley describes it as subtle. Such a succinct code snippet hardly looks
subtle, but I agree, the loop body does take a bit of understanding:
maxendinghere = max(maxendinghere + s, 0)
maxsofar = max(maxsofar, maxendinghere)
Well, essentially maxendinghere is what’s accumulating the subsequences — it keeps rolling the next element
into itself. Should this accumulated sum ever become negative we know that the subsequence-which-ends-here
we’re currently tracking is worse than the empty subsequence-which-restarts-here; so we can reset our
subsequence accumulator, and the first clause of the loop invariant still holds. Combine this with the
observation that maxsofar tracks peaks in maxendinghere and we’re done.
The loop-invariant comment provides a good example of how comments can help us understand an algorithm,
even though the code is minimal and the variable names are well-chosen.
Streaming Solution
I prefer to think of this problem in terms of streams — lazily evaluated sequences. Think of our log file as
generating a stream of numbers:
... 0 1 2 -3 3 -1 0 -4 0 -1 -4 2 4 1 1 3 1 0 -2 -3 -3 -2 3 1 1 4 5 -3 -2 -1 ...
The first thing we do is transform this stream to generate another stream, the cumulative sum of numbers
2 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem
seen so far. It’s an integration of sorts. You’ll remember we already used this stream, or an in-memory version
of it, in our quadratic solution to the problem: the difference between points on it yields subsequence-sums.
Stream Accumulate
We generate the accumulated stream from our original stream like this:
def stream_accumulate(stream):
total = 0
for s in stream:
total += s
yield total
The graph below samples the first five minutes of this stream. The red line accumulates values from the pale
grey line.
These accumulated numbers represent the number of members who have entered the club since we started
tracking them. On our graph, the maximum sum contiguous subsequence is simply the greatest Y-increase
between any two points on this graph. X’s mark these points on the graph above. (Note: it’s not the Y-range of
the graph we want since our X-values are time-ordered, and we require X1 <= X2).
Stream Floor
A second transformation yields the floor of the accumulated stream.
import sys
def stream_floor(stream):
m = 0
for s in stream:
m = min(m, s)
yield m
(Note that, for our purposes, the floor of the stream isn’t exactly the stream of minimum values taken by the
stream — we enforce a baseline at zero. It would be better to allow clients of this function to supply an optional
baseline value, but I wanted the simplest possible code that shows the idea.)
Here’s a graph plotting the accumulated entries alongside the floor of these entries.
We’re very close to what we want now. We can track Y-increases on the graph just by generating the difference
between the accumulated stream and its floor — the shading on the graph.
Stream Diff
3 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem
Here’s an implementation of stream_diff. We can’t just plug a minus sign “-” into the mapping function, so we
have to use the less wieldy operator.sub.
import itertools
import operator
import itertools
The final graph shows us the difference between the accumulated entry count and its floor. I’ve also added the
ceiling of this stream as a thick red line (I’m sure you can figure out how to implement stream_ceiling), and
this ceiling represents the stream of maximum sum contiguous subsequences.
We’ve re-labelled the lines Max-so-far and Max-ending-here because they’re the stream of values taken by
the variables maxsofar and maxendinghere during Bentley’s clever solution to the maximum sum contiguous
subsequence problem. I think we’re in a better position to understand how this solution works now.
A final solution to the maximum sum contiguous subsequence problem reads like this. We’ve pushed the
general purpose stream transformation functions into a separate module, stream.py.
import itertools
import stream
def max_sum_subsequence_stream(ss):
"Return the stream of max sum contiguous subsequences of the input iterable."
accu1, accu2 = itertools.tee(stream.accumulate(ss))
return stream.ceil(stream.diff(accu1,
stream.floor(accu2, baseline=0)))
def max_sum_subsequence(ss):
"Return the max sum of a contiguous subsequence of the input iterable."
return stream.last(max_sum_subsequence_stream(ss))
The iterable supplied to max_sum_subsequence has its last value read, and should therefore be bounded if we
want the function to return. We haven’t supplied arguments to extract a portion of this iterable (to generate
maximum subsequences for the club on a particular day, for example) because that’s what itertools.islice
is for.
4 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem
Perhaps we’d like to know if the maximum sum subsequence reaches a plateau; that is, it stays on a level for a
while.
stream.py
"General purpose stream generation functions."
import itertools
def ceil(stream):
"""Generate the stream of maximum values from the input stream.
def accumulate(stream):
"""Generate partial sums from the stream.
>>> last('abc')
'c'
>>> last([], default=-1)
-1
"""
s = default
for s in stream:
pass
return s
if __name__ == "__main__":
import doctest
doctest.testmod()
5 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem
Stream on…
1. The maximum sum contiguous subsequence problem is described in “Programming Pearls” by Jon
Bentley.
3. Streams are a natural fit with functional programming, and well supported by languages like Scheme and
Haskell. Python also handles them nicely: look into generators, generator expressions, the itertools
module, and study test_generators.py carefully.
4. If you liked this article, try more Word Aligned articles tagged “streams”. And if you like puzzles, there
are more articles tagged “puzzles” too.
5. The graphs in this article are generated using the Google chart API, which is both useful and a fine
example of how to design and document a programming interface.
Thomas Guest
6 of 6 27/3/2011 10:52 AM