Sie sind auf Seite 1von 6

The Maximum Sum contiguous subsequence problem http://wordaligned.

org/articles/the-maximum-subsequence-problem

think about it

The Maximum Sum contiguous subsequence problem


2007-12-17 • Algorithms, Puzzles, Python, Functional Programming, Scheme, Streams, Charts • 8 Comments and 1
Reaction

Welcome to the Pearly Gates Contents


Welcome to the Pearly Gates
The Pearly Gates club never closes. Its public entrance, a revolving door, just keeps Programming Pearl
Linear Solution
on spinning. With each rotation some punters enter and others leave. The club’s
Streaming Solution
owners would like to track this traffic. Specifically, they’d like to know the maximum
Stream Accumulate
increase in people entering the club over a given period. Stream Floor
Stream Diff
The starting point is to track the people who enter/leave with each spin of the door. Streams and Collections
Here’s a 5 minute sample of that information. Negative numbers mean more people Stream on…
left than entered during a particular cycle.

Entries Log
0 1 2 -3 3 -1 0 -4 0 -1 -4 2 4 1 1 3 1 0 -2 -3 -3 -2 3 1 1 4 5 -3 -2 -1 ...

Here’s the same information plotted on a graph.

The archetypal problem we’d like to solve can be stated:

Given a sequence of numbers, find the maximum sum of a contiguous subsequence of those
numbers.

As an example, the maximum sum contiguous subsequence of 0, -1, 2, -1, 3, -1, 0 would be 4 (= 2 + -1 + 3).

This problem is generally known as the maximum sum contiguous subsequence problem and if you
haven’t encountered it before, I’d recommend trying to solve it before reading on. Even if you have
encountered it before, I’ll invite you to read on anyway — it’s well worth another look.

Programming Pearl
The maximum sum contiguous subsequence problem appears in Jon Bentley’s “Programming Pearls”. He first
presents a brute force solution which examines all possible contiguous subsequences of the initial sequence and
returns the maximum sum of these subsequences.

A Python implementation might read:

def generate_pairs(n):
"Generate all pairs (i, j) such that 0 <= i <= j < n"
for i in range(n):
for j in range(i, n):
yield i, j

def max_sum_subsequence(seq):
"Return the max-sum contiguous subsequence of the input sequence."
return max(sum(seq[i:j])
for i, j in generate_pairs(len(seq) + 1))

It’s a straightforward piece of code, though note the + 1 which ensures that we slice to the end of seq, and also
that we include empty slices, which sum to 0, handling the case when every item in the sequence is negative.
The trouble is, the algorithm is of cubic complexity: to process just 6 hours of logged activity takes over 2
minutes on a 2GHz Intel Core Duo MacBook, and the cubic nature of the algorithm means we’d quickly fail to
process more substantial log files in real time.

1 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem

A simple optimisation eliminates the repeated calls to sum by accumulating the input sequence — the red line in
the graph above. Subtracting element i-1 from element j of this cumulative sequence gives us the sum of
elements in the range i, j of the original sequence. We won’t study the code for this quadratic solution — it
doesn’t add much to our analysis. Again, some care is needed to avoid fencepost problems.

We won’t look at the divide-and-conquer NlogN solution either. It’s hard to understand, and we can do far
better.

Linear Solution
There is a linear solution. The idea is to scan the sequence from start to finish keeping track of maxsofar, the
maximum sum of a contiguous subsequence seen so far, and maxendinghere, the maximum sum of a
contiguous subsequence which ends at the current position. Bentley’s pseudo-code reads:

maxsofar = 0
maxendinghere = 0
for i = [0, n)
/* invariant: maxendinghere and maxsofar are accurate
are accurate for x[0..i-1] */
maxendinghere = max(maxendinghere + x[i], 0)
maxsofar = max(maxsofar, maxendinghere)

This translates directly into Python.

def max_sum_subsequence(seq):
maxsofar = 0
maxendinghere = 0
for s in seq:
# invariant: maxendinghere and maxsofar are accurate
# are accurate up to s
maxendinghere = max(maxendinghere + s, 0)
maxsofar = max(maxsofar, maxendinghere)
return maxsofar

Now, this is a fabulous solution. Bentley describes it as subtle. Such a succinct code snippet hardly looks
subtle, but I agree, the loop body does take a bit of understanding:

maxendinghere = max(maxendinghere + s, 0)
maxsofar = max(maxsofar, maxendinghere)

Why does this work?

Well, essentially maxendinghere is what’s accumulating the subsequences — it keeps rolling the next element
into itself. Should this accumulated sum ever become negative we know that the subsequence-which-ends-here
we’re currently tracking is worse than the empty subsequence-which-restarts-here; so we can reset our
subsequence accumulator, and the first clause of the loop invariant still holds. Combine this with the
observation that maxsofar tracks peaks in maxendinghere and we’re done.

The loop-invariant comment provides a good example of how comments can help us understand an algorithm,
even though the code is minimal and the variable names are well-chosen.

Streaming Solution
I prefer to think of this problem in terms of streams — lazily evaluated sequences. Think of our log file as
generating a stream of numbers:

... 0 1 2 -3 3 -1 0 -4 0 -1 -4 2 4 1 1 3 1 0 -2 -3 -3 -2 3 1 1 4 5 -3 -2 -1 ...

The first thing we do is transform this stream to generate another stream, the cumulative sum of numbers

2 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem

seen so far. It’s an integration of sorts. You’ll remember we already used this stream, or an in-memory version
of it, in our quadratic solution to the problem: the difference between points on it yields subsequence-sums.

Stream Accumulate
We generate the accumulated stream from our original stream like this:

def stream_accumulate(stream):
total = 0
for s in stream:
total += s
yield total

The graph below samples the first five minutes of this stream. The red line accumulates values from the pale
grey line.

These accumulated numbers represent the number of members who have entered the club since we started
tracking them. On our graph, the maximum sum contiguous subsequence is simply the greatest Y-increase
between any two points on this graph. X’s mark these points on the graph above. (Note: it’s not the Y-range of
the graph we want since our X-values are time-ordered, and we require X1 <= X2).

Stream Floor
A second transformation yields the floor of the accumulated stream.

import sys

def stream_floor(stream):
m = 0
for s in stream:
m = min(m, s)
yield m

(Note that, for our purposes, the floor of the stream isn’t exactly the stream of minimum values taken by the
stream — we enforce a baseline at zero. It would be better to allow clients of this function to supply an optional
baseline value, but I wanted the simplest possible code that shows the idea.)

Here’s a graph plotting the accumulated entries alongside the floor of these entries.

We’re very close to what we want now. We can track Y-increases on the graph just by generating the difference
between the accumulated stream and its floor — the shading on the graph.

Stream Diff

3 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem

Here’s an implementation of stream_diff. We can’t just plug a minus sign “-” into the mapping function, so we
have to use the less wieldy operator.sub.

import itertools
import operator

def stream_diff(s, t):


return itertools.imap(operator.sub, s, t)

Alternatively, we could generate the new stream with an explicit loop:

import itertools

def stream_diff(s, t):


for ss, tt in itertools.izip(s, t):
yield ss - tt

The final graph shows us the difference between the accumulated entry count and its floor. I’ve also added the
ceiling of this stream as a thick red line (I’m sure you can figure out how to implement stream_ceiling), and
this ceiling represents the stream of maximum sum contiguous subsequences.

We’ve re-labelled the lines Max-so-far and Max-ending-here because they’re the stream of values taken by
the variables maxsofar and maxendinghere during Bentley’s clever solution to the maximum sum contiguous
subsequence problem. I think we’re in a better position to understand how this solution works now.

Streams and Collections


Please don’t imagine these streams are bloated. They may be infinite (remember the Pearly Gates club never
closes!) but that doesn’t mean they take up much space. The graphs shown represent snapshots of their
activity, and at no point do our presented algorithms actually store a five minute buffer of entries.

A final solution to the maximum sum contiguous subsequence problem reads like this. We’ve pushed the
general purpose stream transformation functions into a separate module, stream.py.

import itertools
import stream

def max_sum_subsequence_stream(ss):
"Return the stream of max sum contiguous subsequences of the input iterable."
accu1, accu2 = itertools.tee(stream.accumulate(ss))
return stream.ceil(stream.diff(accu1,
stream.floor(accu2, baseline=0)))

def max_sum_subsequence(ss):
"Return the max sum of a contiguous subsequence of the input iterable."
return stream.last(max_sum_subsequence_stream(ss))

The iterable supplied to max_sum_subsequence has its last value read, and should therefore be bounded if we
want the function to return. We haven’t supplied arguments to extract a portion of this iterable (to generate
maximum subsequences for the club on a particular day, for example) because that’s what itertools.islice
is for.

Note that max_sum_subsequence_stream() may be more useful to clients than max_sum_subsequence().


Suppose, for example, we’re only interested when the maximum sum subsequence exceeds 100. We can do
this directly by connecting itertools.dropwhile() to our function.

def max_subseq_exceeds(seq, limit=100):


max_sub_s = max_sum_subsequence_stream(seq)
return itertools.dropwhile(lambda s: s <= limit, max_sub_s)

4 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem

Perhaps we’d like to know if the maximum sum subsequence reaches a plateau; that is, it stays on a level for a
while.

Here’s the stream module.

stream.py
"General purpose stream generation functions."
import itertools

def floor(stream, baseline=None):


"""Generate the stream of minimum values from the input stream.

The baseline, if supplied, is an upper limit for the floor.


>>> ff = floor((1, 2, -2, 3))
>>> assert list(ff) == [1, 1, -2, -2]
>>> ff = floor((1, 2, -2, 3), 0)
>>> assert list(ff) == [0, 0, -2, -2]
"""
stream = iter(stream)
m = baseline
if m is None:
try:
m = stream.next()
yield m
except StopIteration:
pass
for s in stream:
m = min(m, s)
yield m

def ceil(stream):
"""Generate the stream of maximum values from the input stream.

>>> top = ceil([0, -1, 2, -2, 3])


>>> assert list(top) == [0, 0, 2, 2, 3]
"""
stream = iter(stream)
try:
M = stream.next()
yield M
except StopIteration:
pass
for s in stream:
M = max(M, s)
yield M

def accumulate(stream):
"""Generate partial sums from the stream.

>>> accu = accumulate([1, 2, 3, 4])


>>> assert list(accu) == [1, 3, 6, 10]
"""
total = 0
for s in stream:
total += s
yield total

def diff(s, t):


"""Generate the differences between two streams

If the streams are of unequal length, the shorter is truncated.


>>> dd = diff([2, 4, 6, 8], [1, 2, 3])
>>> assert list(dd) == [1, 2, 3]
"""
import operator
return itertools.imap(operator.sub, s, t)

def last(stream, default=None):


"""Return the last item in the stream or the default if the stream is empty.

>>> last('abc')
'c'
>>> last([], default=-1)
-1
"""
s = default
for s in stream:
pass
return s

if __name__ == "__main__":
import doctest
doctest.testmod()

5 of 6 27/3/2011 10:52 AM
The Maximum Sum contiguous subsequence problem http://wordaligned.org/articles/the-maximum-subsequence-problem

Stream on…
1. The maximum sum contiguous subsequence problem is described in “Programming Pearls” by Jon
Bentley.

2. My favourite introduction to computer programming, “Structure and Interpretation of Computer


Programs”, has lots to say about streams, and suggests they have a role in concurrent programming and
modelling time.

3. Streams are a natural fit with functional programming, and well supported by languages like Scheme and
Haskell. Python also handles them nicely: look into generators, generator expressions, the itertools
module, and study test_generators.py carefully.

4. If you liked this article, try more Word Aligned articles tagged “streams”. And if you like puzzles, there
are more articles tagged “puzzles” too.

5. The graphs in this article are generated using the Google chart API, which is both useful and a fine
example of how to design and document a programming interface.

Thomas Guest

6 of 6 27/3/2011 10:52 AM

Das könnte Ihnen auch gefallen