Sie sind auf Seite 1von 44

5/14/2014

Python speed optimization in the real world

Armed and Dangerous


Sex, software, politics, and firearms. Life's simple pleasures

Python speed optimization in the real world


Posted on 2013-03-24 by esr
63

I shipped reposurgeon 2.29 a few minutes ago. The main improvement in this version is
speed it now reads in and analyzes Subversion repositories at a clip of more than

11,000 commits per minute. This, is, in case you are in any doubt, ridiculously fast faster than the native
Subversion tools do it, and for certain far faster than any of the rival conversion utilities can manage. Its well
over an order of magnitude faster than when I began seriously tuning for speed three weeks ago. Ive learned
some interesting lessons along the way.
The impetus for this tune-up was the Battle for Wesnoth repository. The projects senior devs finally decided
to move from Subversion to git recently. I want actively involved in the decision myself, since Ive been semiretired from Wesnoth for a while, but I supported it and was naturally the person they turned to to do the
conversion. Doing surgical runs on that repository rubbed my nose in the fact that code with good enough
performance on a repository 500 or 5000 commits long wont necessarily cut it on a repository with over
56000 commits. Two-hour waits for the topological-analysis phase of each load to finish were kicking my ass
I decided that some serious optimization effort seemed like a far better idea than twiddling my thumbs.
First Ill talk about some things that didnt work.
pypy , which is alleged to use fancy JIT compilation techniques to speed up a lot of Python programs, failed
miserably on this one. My pypy runs were 20%-30% slower than plain Python. The pypy site warns that
pypys optimization methods can be defeated by tricky, complex code, and perhaps that accounts for it;
reposurgeon is nothing if not algorithmically dense.
cython didnt emulate pypys comic pratfall, but didnt deliver any speed gains distinguishable from noise
either. I wasnt very surprised by this; what it can compile is mainly control structure. which I didnt expect to
be a substantial component of the runtime compared to (for example) string-bashing during stream-file
parsing.
My grandest (and perhaps nuttiest) plan was to translate the program into a Lisp dialect with a decent
compiler. Why Lisp? WellI needed (a) a language with unlimited-extent types that (b) could be compiled to
machine-code for speed, and (c) minimized the semantic distance from Python to ease translation (that last
point is why you Haskell and ML fans should refrain from even drawing breath to ask your obvious question;
instead, go read this ). After some research I found Steel Bank Common Lisp (SBCL) and began reading up
on what Id need to do to translate Python to it.
http://esr.ibiblio.org/?p=4861

1/44

5/14/2014

Python speed optimization in the real world

The learning process was interesting. Lisp was my second language; I loved it and was already expert in it by
1980 well before I learned C. But since 1982 the only Lisp programs Ive written have been Emacs modes. Ive
done a whole hell of a lot of those, including some of the most widely used ones like GDB and VC, but
semantically Emacs Lisp is a sort of living fossil coelacanth from the 1970s, dynamic scoping and all.
Common Lisp, and more generally the evolution of Lisp implementations with decent alien type bindings,
passed me by. And by the time Lisp got good enough for standalone production use in modern environments I
already had Python in hand.
So, for me, reading the SBCL and Common Lisp documentation was a strange mixture of learning a new
language and returning to very old roots. Yay for lexical scoping! I recoded about 6% of reposurgeon in SBCL,
then hit a couple of walls. Once of the lesser walls was a missing feature in Common Lisp corresponding to
the __str__ special method in Python. Lisp types dont know how to print themselves, and as it turns out
reposurgeon relies on this capability in various and subtle ways. Another problem was that I couldnt easily
see how to duplicate Pythons subprocess-control interface at all, let alone portably across common Lisp
implementations.
But the big problem was CLOS, the Common Lisp Object System. I like most of the rest of Common Lisp
now that Ive studied it. OK, its a bit baroque and heavyweight and I can see where its had a couple of
kitchen sinks pitched in if I were choosing a language on purely esthetic grounds Id prefer Scheme. But I
could get comfortable with it, except for CLOS.
But me no buts about multimethods and the power of generics I get that, OK? I see why it was done the
way it was done, but the brute fact remains that CLOS is an ugly pile of ugly. More to the point in this
particular context, CLOS objects are quite unlike Python objects (which are in many ways more like CL
defstructs). It was the impedance mismatch between Python and CLOS objects that really sank my
translation attempt, which I had originally hoped could be done without seriously messing with the
architecture of the Python code. Alas, that was not to be. Which refocused me on algorithmic methods of
improving the Python code.
Now Ill talk about what did work.
What worked, ultimately, was finding operations that have instruction costs O(n**2) in the number of commits
and squashing them. At this point a shout-out goes to Julien FrnchFrgg Rivaud, a very capable hacker trying
to use reposurgeon for some work on the Blender repository. He got interested in the speed problem (the
Blender repo is also quite large) and was substantially helpful with both patches and advice. Working
together, we memoized some expensive operations and eliminated others, often by incrementally computing
reverse-lookup pointers when linking objects together in order to avoid having to traverse the entire repository
later on.
Even just finding all the O(n**2) operations isnt necessarily easy in a language as terse and high-level as
Python; they can hide in very innocuous-looking code and method calls. The biggest bad boy in this case
turned out to be child-node computation. Fast import streams express is a child of directly; for obvious
reasons, a repository analysis often has to look at all the children of a given parent. This operation blows up
quite badly on very large repositories even if you memoize it; the only way to make it fast is to precompute all
the reverse lookups and update them when you update the forward ones.
http://esr.ibiblio.org/?p=4861

2/44

5/14/2014

Python speed optimization in the real world

Another time sink (the last one to get solved) was identifying all tags and resets attached to a particular
commit. The brute-force method (look through all tags for any with a from member matching the commits
mark) is expensive mainly because to look through all tags you have to look through all the events in the
stream and thats expensive when there are 56K of them. Again, the solution was to give each commit a list
of back-pointers to the tags that reference it and make sure all the mutation operations update it properly.
It all came good in the end. In the last benchmarking run before I shipped 2.29 it processed 56424 commits in
303 seconds. Thats 186 commits per second, 11160 per minute. Thats good enough that I plan to lay off
serious speed-tuning efforts; the gain probably wouldnt be worth the increased code complexity.
UPDATE: A week later, after more speed-tuning mainly by Julien (because it was still slow on the very large
repo hes working with) analysis speed is up to 282 commits/sec (16920 per minute) and a curious thing has
occurred. pypy now actually produces an actual speedup, up to around 338 commits/sec (20280 per minute).
We dont know why, but apparently the algorithmic optimizations somehow gave pypys JIT better traction.
This is particularly odd because the density of the code actually increased.
This entry was posted in Software and tagged reposurgeon by esr. Bookmark the permalink
[http://esr.ibiblio.org/?p=4861] .

120 THOUGHTS ON PYTHON SPEED OPTIMIZATION IN THE REAL WORLD

Foo
on 2013-03-24 at 19:55:07 said:

Has been a pleasure watching hackers at work on irc, and the early warning for the blog post :-)
Foo Quuxman

Pingback: Python speed optimization in the real world | dropsafe

Phil
on 2013-03-24 at 20:23:45 said:

Hmmm. I learned a new word today: Memoization. Ive had few formal programming classes, and
none recently. I keep up with programming trends by lurking on various listsbut that often shows me
techniques without naming them. Anyways, I regularly memoize functions but never knew there was
a formal name for it.
http://esr.ibiblio.org/?p=4861

3/44

5/14/2014

Python speed optimization in the real world

Said Achmiz
on 2013-03-24 at 20:51:33 said:

Is it 11k commits per second or per minute? First paragraph says second, last paragraph says
minute.

esr
on 2013-03-24 at 21:01:30 said:

>Is it 11k commits per second or per minute? First paragraph says second, last paragraph says
minute.
Typo. I got it right the second time; Ive fixed the incorrect first instance.

esr
on 2013-03-24 at 21:07:03 said:

>Anyways, I regularly memoize functions but never knew there was a formal name for it.
Oddly enough, my situation was opposite I knew the word, but how to memoize systematically was
something Id never learned until this last three weeks. I dont write code that is both performancecritical and compute-bound very often, so I havent before had enough use for this technique to nail it
down.

Patrick Maupin
on 2013-03-24 at 21:53:43 said:

Python is faster than a lot of people think it is.


But not if you code it like you would C.

http://esr.ibiblio.org/?p=4861

4/44

5/14/2014

Python speed optimization in the real world

You have to figure out how to let most of the looping happen inside C builtins.
Usually, if it needs to go fast, someone has already made a library.
Occasionally, I will write C or Pyrex/Cython to speed it up.
But the last time that happened was in 2003

Joshua Kronengold
on 2013-03-25 at 00:55:29 said:

Nice writing, although Im somewhat surprised that you havent discovered what I (as someone who
frequently works with big data setups) have long since determined that when the going gets slow,
its time to pull out a profiler and see if some part of your codebase is running -far- more often than
youve anticipated; a sure sign that something upstream of it is suffering big O problems.

esr
on 2013-03-25 at 01:22:17 said:

>when the going gets slow, its time to pull out a profiler and see if some part of your codebase is
running -far- more often than youve anticipated; a sure sign that something upstream of it is suffering
big O problems.
Im well aware of the principle. Unfortunately, my experience is that Python profilers suck rather badly
you generally end up having to write your own instrumentation to gather timings, which is what I did
in this case. It helped me find the obscured O(n**2) operations.

John Wiseman
on 2013-03-25 at 03:11:12 said:

Once of the lesser walls was a missing feature in Common Lisp corresponding to the __str__
special method in Python.
You want print-object: http://www.lispworks.com/documentation/HyperSpec/Body/f_pr_obj.htm
http://esr.ibiblio.org/?p=4861

5/44

5/14/2014

Python speed optimization in the real world

Yaroslav Fedevych
on 2013-03-25 at 05:06:07 said:

> you generally end up having to write your own instrumentation to gather timings, which is what I did
in this case.
Do you deem it good enough to show the rest of the world?

Beat Bolli
on 2013-03-25 at 08:08:31 said:

Looks like a classical runtime/memory trade-off. Have you compared the working set size before and
after the speedup?

esr
on 2013-03-25 at 08:12:37 said:

>Looks like a classical runtime/memory trade-off. Have you compared the working set size before
and after the speedup?
It is most certainly that. I didnt bother measuring the working set because the only metric of that that
mattered to me was doesnt trigger noticeable swapping.

esr
on 2013-03-25 at 08:13:26 said:

>Do you deem it good enough to show the rest of the world?
Look at the implementation of the timings command.

http://esr.ibiblio.org/?p=4861

6/44

5/14/2014

Python speed optimization in the real world

esr
on 2013-03-25 at 08:20:17 said:

>You want print-object: http://www.lispworks.com/documentation/HyperSpec/Body/f_pr_obj.htm


The function print-object is called by the Lisp printer; it should not be called by the user.
Anyway, this looks like an analogue of Python repr(), not print its supposed to print a
representation thats invertible (can be fed back to read-eval). I use str() for dumping the fast-import
stream representations of objects, which is not invertible by Python itself.

JustSaying
on 2013-03-25 at 08:21:47 said:

Big O optimization trumps (or at worst equals in lucky cases) any compiler-aware information,
because the degrees-of-freedom in the semantics not modeled by the language (and the declared
types) is always a superset. Yet another reason why computers will never program themselves
creatively and why I think the Singularity is nonsense.
I dont know enough about the details of CLOS nor defstructs to grasp the detailed reasons for the
claimed impedance mismatch between the CLOS and Python.
Programmers are rightfully proud when they achieve an order-of-magnitude gain in performance. I
dont see programmers run away from their babies and disappear into thin air without ever bragging to
any one of their accomplishment. How lonely that would be otherwise.

Nancy Lebovitz
on 2013-03-25 at 08:49:17 said:

Checking to make sure I understand: Memoization is looking up and recording the data youre likely
to keep needing instead of looking it up every time you need it?

esr
on 2013-03-25 at 09:15:07 said:
http://esr.ibiblio.org/?p=4861

7/44

5/14/2014

Python speed optimization in the real world

>Checking to make sure I understand: Memoization is looking up and recording the data youre likely
to keep needing instead of looking it up every time you need it?
Correct. It works when the results of an expensive function (a) change slowly, and (b) are small and
cheap to store. Also there has to be a way to know when the cached results have become invalid so
you can clear the cache.
Since youre not a programmer, Ill add that big-O notation is a way of talking about how your
computation costs scale up with the size of your input data. O(1) is constant time, O(n) is linear in
the size of the input set, O(n**2) is as the square of the size, O(2**n) as the number of subsets of the
data set. Also youll see O(log n), typically associated with the cost of finding a specified item in a
tree or hash table. And O(n log n) which is the expected cost function of various good sorting
algorithms. In general, O(1) < O(log n) < O(n) < O(n log n) < O(n**2) < O(2**n). Normally anything
O(n log n) or below is tolerable, O(n**2) is pretty bad, and O(2**n) is unusably slow.

Rick C
on 2013-03-25 at 09:15:11 said:

Nancy, it would be more accurate to say you record the results of complex calculations and then
reuse the stored result later, rather than recalculate it every time.

Shenpen
on 2013-03-25 at 10:08:19 said:

> Once of the lesser walls was a missing feature in Common Lisp corresponding to the __str__
special method in Python. Lisp types dont know how to print themselves, and as it turns out
reposurgeon relies on this capability in various and subtle ways.
Does it also rely on everybody using this and doing it in a sensible, readable way in their classes.
Also, have you checked Jython?

iajrz
on 2013-03-25 at 10:13:57 said:

http://esr.ibiblio.org/?p=4861

8/44

5/14/2014

Python speed optimization in the real world

Rick: all calculations are functions, arent they? But if you had to do a look-up which requires
expensive/extensive/recurrent parsing, can that be called a calculation?
It is still good for memoization

The Monster
on 2013-03-25 at 10:29:15 said:

Im a big believer that a data structure with one-way pointers is vastly inferior to one that includes
back-pointers. With back-pointers, you can always traverse the structure in any direction. Without
them, you have to do searches, which are always expensive, and progressively more expensive as
the structure grows.
I, too, was unfamiliar with the verb memoize, but have made use of the idea behind it many times.
At my last job, I wrote some utility programs that had to know where to find some files that werent
stored in well-known locations (but were very unlikely to move once theyd been put in a given place,
because that was a PITA). Since a findis a very expensive operation, I made the utility installer
dispatch an at nowjob to do the findonce and cache the result in a specific location that the other
utilities knew about.

esr
on 2013-03-25 at 10:44:42 said:

>Does it also rely on everybody using this and doing it in a sensible, readable way in their classes.
My code doesnt assume that every class in the universe has a sensible __str__, but it does assume
that almost every class defined in reposurgeon has its own __str__ that is useful for
progress/debugging messages, and (this is the key point) the system str() will recurse down through
all such methods when told to print an arbitrary structure.
>Also, have you checked Jython?
No. Is there any reason I should expect it to be faster than c-python? I thought it was mainly aimed
at allowing programmers to use the Java library classes, rather than at performance per se.

Mike E
http://esr.ibiblio.org/?p=4861

9/44

5/14/2014

Python speed optimization in the real world

on 2013-03-25 at 11:17:53 said:

Memoization; good to know the name for that. I found myself doing that extensively while trying to
work through the problems at Project Euler (projecteuler.net), which is a marvelous resource with a
series of incrementally more difficult mathematical programming puzzles for those interested in such
a thing.

Random832
on 2013-03-25 at 11:41:19 said:

> the system str() will recurse down through all such methods when told to print an arbitrary
structure.
Im not sure what this means. The closest thing I can think of is the fact that system structure types
(such as list/tuple/dict) will call str (or, looks like its actually repr at least half the time) on their
children.
Thats a kind of narrow definition of arbitrary structure for my taste.

esr
on 2013-03-25 at 12:01:56 said:

>Thats a kind of narrow definition of arbitrary structure for my taste.


Perhaps I was unclear. I created __str__ methods for all the classes that are parts of Repository
the effect is that when requesting structure dumps for debugging instrumentation I can just say str()
on whatever implicit object pointere I have and the intuitively useful thing will happen. I dont know
how to duplicate this effect in CL. What it would probably require is for the system print function to
magically call a str generic whenever it reaches a CLOS object.

Patrick Maupin
on 2013-03-25 at 12:07:18 said:

@The Monster:

http://esr.ibiblio.org/?p=4861

10/44

5/14/2014

Python speed optimization in the real world

I, too, w as unfamiliar w ith the verb memoize, but have made use of the idea behind it many
times. At my last job, I w rote some utility programs that had to know w here to find some files
that w erent stored in w ell-know n locations (but w ere very unlikely to move once theyd
been put in a given place, because that w as a PITA). Since a find is a very expensive
operation, I made the utility installer dispatch an at now job to do the find once and cache
the result in a specific location that the other utilities knew about.
Congratulations! You reinvented bashs command hash. :-)
But seriously, this is a great idea, and like most great ideas, will have multiple independent
inventions by multiple clever people.

Garrett
on 2013-03-25 at 12:08:56 said:

I would just follow up on esrs excellent overview of big-O notation above with one point which is often
missed by developers. The impact of the algorithm is usually seen as data sets grow larger. For
small data sets, the complexity of the operation frequently is overtaken by other concerns.
To provide a mundane example: a car goes much faster than you can walk, but if you are a citydweller its probably faster to walk to your neighbours house than to drive.

Adam
on 2013-03-25 at 12:23:26 said:

> O(1) < O(log n) < O(n) < O(n log n) < O(n**2) < O(2**n)
While that's theoretically true, It's interesting to note that in practice, O(1) = O(log n). For typical
problems, you should just mentally macro expand "log2 n" to 30. The only way you're going to get it
different enough from 30 to make any difference is to have n be so small that the operation in
question is effectively instant. For example, to shave a mere 1/3 from that "constant" requires n to
decrease by three orders of magnitude.
Maybe you want to work on an atypical problem. For the biggest problem most people could possibly
attempt, (log2 n) < 60. For Google, it might be 70. For the crackpot who wants to count every
elementary particle in the observable universe, 300 is an upper bound.

http://esr.ibiblio.org/?p=4861

11/44

5/14/2014

Python speed optimization in the real world

Jeff Read
on 2013-03-25 at 12:23:51 said:

Well, looks like the Boston Lisp folks called it. At their last meeting a couple of weeks ago, some of
them predicted that you would:
a) discover that your speed problem is better solved by algorithmic optimization than by switching to
a faster language or compiler;
b) write a post critiquing the shortcomings of Common Lisp.
They were pretty spot on except they thought you would critique CLs lack of libraries, not the
ugliness of CLOS. :)

esr
on 2013-03-25 at 12:41:11 said:

>esrs excellent overview of big-O notation


Entertainingly, one of the downsides of being an entirely self-taught programmer is that I didnt learn
big-O notation or the associated reflexes until relatively late in my career. It wasnt intuitive for me
until, oh, probably less than five years ago.

esr
on 2013-03-25 at 12:53:53 said:

>They were pretty spot on except they thought you would critique CLs lack of libraries, not the
ugliness of CLOS. :)
And I might have gotten to that if Id gotten around CLOS.

John Wiseman
on 2013-03-25 at 13:12:16 said:

http://esr.ibiblio.org/?p=4861

12/44

5/14/2014

Python speed optimization in the real world

>The function print-object is called by the Lisp printer; it should not be called by the user.
Correct. You define a custom print-object method on your data types, and it is called by the
implementation whenever you cause a value of that type to be printedby calling print, prin1, format,
or whatever. Just like you dont explicitly call __str__.
> Anyway, this looks like an analogue of Python repr(), not print its supposed to print a
representation thats invertible (can be fed back to read-eval).
It is used for both, actually. If *print-readably* is T, then it must either print an readable (invertible)
representation or throw an errorrepr mode. Otherwise, it can print whatever it wantsstr mode.

John Wiseman
on 2013-03-25 at 13:33:48 said:

Lispers usually use the print-unreadable-object helper macro. See


http://clhs.lisp.se/Body/m_pr_unr.htm for an example.

Federico
on 2013-03-25 at 14:46:32 said:

> Once of the lesser walls was a missing feature in Common Lisp corresponding to the >__str__
special method in Python. Lisp types dont know how to print themselves, and as >it turns out
reposurgeon relies on this capability in various and subtle ways. Another >problem was that I couldnt
easily see how to duplicate Pythons subprocess-control >interface
what about CLs much hyped ability to have new features added very easily (Im thinking of Paul
Grahams writings): adding a macro would not have solved your problems? not worth your time? too
tricky?

Far
on 2013-03-25 at 15:05:02 said:

For subprocess control interface did you mean synchronous execution (waiting for results), or
asynchronous? In the first case, UIOP:RUN-PROGRAM does it for you quite portably. In the second
http://esr.ibiblio.org/?p=4861

13/44

5/14/2014

Python speed optimization in the real world

case, there is no perfect answer, but EXECUTOR does a decent job on the major implementations
(SBCL, CCL and a few more).
CLOS is ugly but (1) its more expressive and powerful than any other object system Ive heard of
(e.g. multiple inheritance, multiple-dispatch, method combinations, accessors, meta-object protocol,
etc.), and (2) you can hide the ugly behind suitable macros, and many people have.
Regarding __str__ and print-method, see John Wisemans answer; though in this case you might
want to define your own serialize-object method and have a mixin defining a print-object method that
wraps a call to that in a print-unreadable-object.

esr
on 2013-03-25 at 15:23:12 said:

>what about CLs much hyped ability to have new features added very easily (Im thinking of Paul
Grahams writings): adding a macro would not have solved your problems? not worth your time? too
tricky?
Dunno. Would have looked into it more deeply, but CLOS blocked the translation. Now that I know
SBCL exists, though, Ill probably do a project in it from scratch sometime and learn these things.

esr
on 2013-03-25 at 15:25:08 said:

>though in this case you might want to define your own serialize-object method and have a mixin
defining a print-object method that wraps a call to that in a print-unreadable-object.
Yes, I thought the answer would be something much like that.
Good to know that UIOP:RUN-PROGRAM exists next time I try something like this Ill look it up.

dtsund
on 2013-03-25 at 15:52:00 said:

>Entertainingly, one of the downsides of being an entirely self-taught programmer is that I didnt learn
http://esr.ibiblio.org/?p=4861

14/44

5/14/2014

Python speed optimization in the real world

big-O notation or the associated reflexes until relatively late in my career. It wasnt intuitive for me
until, oh, probably less than five years ago.
Werent you also a mathematician, at least briefly? My first exposure to the notation was in Real
Analysis, after which grasping it in a CS context was almost trivial.

Jay Maynard
on 2013-03-25 at 15:52:23 said:

Lets go up a metalevel. I was mildly surprised you considered switching languages at all before
attacking the algorithms speed issues. This seems unlike you. How did you get there?

Jeff Read
on 2013-03-25 at 16:02:30 said:

CLOS is ugly but (1 ) its more expressive and pow erful than any other object system Ive
heard of (e.g. multiple inheritance, multiple-dispatch, method combinations, accessors,
meta-object protocol, etc.), and (2) you can hide the ugly behind suitable macros, and many
people have.
Historically there was Ts object system: as powerful as CLOS but actually beautiful.
The closest I can find in a modern running Scheme is RSchemes object system, but RScheme has
sadly been lacking in maintenance or interest and is still quite riddled with bugs.

esr
on 2013-03-25 at 16:06:53 said:

>Lets go up a metalevel. I was mildly surprised you considered switching languages at all before
attacking the algorithms speed issues. This seems unlike you. How did you get there?
Youre right it was unlike me (on the evidence anyone else has available). Ive actually been
wondering if anyone would notice this and bring it up.
At the time I began looking at Lisp, I believed mistakenly that I had already found and optimized
http://esr.ibiblio.org/?p=4861

15/44

5/14/2014

Python speed optimization in the real world

out the stuff that could be attacked that way. In my defense, I will note that the remaining O(n**2)
code was pretty well obscured; it took a couple of weeks of concentrated attention by two able
hackers to find it, and that was after Id built the machinery for gathering timings.

esr
on 2013-03-25 at 16:16:05 said:

>Werent you also a mathematician, at least briefly?


I was, but my concentration was in abstract algebra, logic, and finite mathematics. I didnt actually
learn a lot of real analysis (I had a fondness for topology that was unrelated to my main interests, but
I approached it through set and group theory rather than differential geometry). It may also be that
big-O notation wasnt as prominent then (in the 1970s) as it later became, so Id have been less likely
to encounter it even if I had been learning more on the continuous side.

BRM aka Brian R. Marshall


on 2013-03-25 at 16:37:46 said:

Another note for non-programmers


A profiler is a tool to determine how much time is spent running different parts of a program. As
ESR noted, sometimes it is better to add some code to the program to get the required results.
(Such code generally isnt used/run when not trying to speed up the program.)
Sometimes, at least as a first try, a programmer can tell from the code where it is worth trying to
speed things up.
In any case, this kind of analysis is very useful. A junior/lousy programmer may attempt to speed up
a program by reworking code that obviously can be made to run faster. But if a program takes 10
minutes to run and this code accounts for only 10 seconds of that time, it is a waste of time trying to
speed it up. Even if it can be made to run 10 times faster, the program run time goes from 600
(590+10) seconds to 591 (590+1) seconds.
Sometimes this kind of improvement is worse than a waste of time. The code may be written in a
way that makes it obvious what it is supposed to do and that it is, in fact, doing it. Reworking code
that makes unimportant improvements but also makes the code obscure and subtle is bad practice.

http://esr.ibiblio.org/?p=4861

16/44

5/14/2014

Python speed optimization in the real world

Peter Scott
on 2013-03-25 at 17:11:19 said:

> Also youll see O(log n), typically associated with the cost of finding a specified item in a tree or
hash table.
Small correction: hash table insertion and lookup are expected O(1), not O(lg n).
(At worst, they are O(n), but this degenerate case hardly ever happens unless you piss off a
cryptographer. )

JustSaying
on 2013-03-25 at 19:42:53 said:

@Adam:
Its interesting to note that in practice, O(1 ) = O(log n). For typical problems, you should just
mentally macro expand log2 n to 30. The only w ay youre going to get it different enough
from 30 to make any difference is to have n be so small that the operation in question is
effectively instant. For example, to shave a mere 1 /3 from that constant requires n to
decrease by three orders of magnitude.
Why are you claiming that 300% efficiency increase is irrelevant (equivalent to constant) in the
presence of iterations that range over 3 orders-of-magnitude?
log n is still log n, not constant.
Are you claiming that no such cases occur?

JustSaying
on 2013-03-25 at 19:53:25 said:

@esr:
Y oure right it w as unlike me (on the evidence anyone else has available). Ive actually
been w ondering if anyone w ould notice this and bring it up.

http://esr.ibiblio.org/?p=4861

17/44

5/14/2014

Python speed optimization in the real world

I had assumed that you wanted to test out whether there was a fundamental advantage of your longlost love over your new one. I have observed that you favor continuity of code bases over other
considerations, so I should have realized I was wrong. Perhaps I was distracted.

Jay Maynard
on 2013-03-26 at 03:37:13 said:

JustSaying: I think the point is that going from 30 to 1 is almost never enough improvement to be
worth doing, and its effectively linear (when you have a 2**30 scale factor on input producing scale
factor of 30 on output, there are much bigger fish to fry).

Jay Maynard
on 2013-03-26 at 03:47:33 said:

>I will note that the remaining O(n**2) code was pretty well obscured; it took a couple of weeks of
concentrated attention by two able hackers to find it
Are there any O(n**2) traps within Python itself we can avoid that you found, or was this all your
algorithms fault?

another user
on 2013-03-26 at 04:11:02 said:

Did you consider profiling reposurgeon for performance bottlenecks, rewriting the relevant pieces of
code in C/C++ and using bindings? I personally like boost-python.
Maybe if you keep 90% of code written in Python and rewrite 10% of performance-critical code in C,
you can approach the speed of a C program.

JustSaying
on 2013-03-26 at 06:10:04 said:
http://esr.ibiblio.org/?p=4861

18/44

5/14/2014

Python speed optimization in the real world

almost never
That is why I asked if there are no cases. I cant think of a case at the moment, but I am skeptical of
saying there are none. I think log n is still log n and I should remember it as that, while also factoring
in that it might nearly always be too low of a priority. All of us experienced programmers, I am sure
share BRMs experience that obfuscating code for insignificant efficiency gains is myopic.

Winter
on 2013-03-26 at 08:30:10 said:

@JustSaying
I think log n is still log n and I should remember it as that, while also factoring in that it might nearly
always be too low of a priority.
O(log n) vs O(n) corresponds to [c1 * log(n) + d1] N. In practice the constants may be so large that n
> N is out of your reach.
So, your implementation might indeed scale as O(log n), but it could still run much slower for
practical n.

Winter
on 2013-03-26 at 08:32:16 said:

Sorry, html filter messed up my comment:


@JustSaying
I think log n is still log n and I should remember it as that, while also factoring in that it might nearly
always be too low of a priority.
O(log n) vs O(n) corresponds to [c1 * log(n) + d1] LT [c2 * n + d2] for some n GT N. In practice the
constants may be so large that n GT N is out of your reach.
So, your implementation might indeed scale as O(log n), but it could still run much slower for
practical n.

http://esr.ibiblio.org/?p=4861

19/44

5/14/2014

Python speed optimization in the real world

Adam
on 2013-03-26 at 11:12:22 said:

Of course lg n isnt *really* a constant, but its often useful to think of it that way. Its also useful at
times to assume a spherical cow.
They say that premature optimization is the root of all evil. If you need to sort the items in a dropdown
box, youre probably fine to use an n^2 sort. Those are fast and easy to code up, which means fewer
bugs. Its a dropdown box, so your user experience will be crap if you have more that a few dozen
items anyway. At most, youll add a few milliseconds, which isnt noticeable. When n is small
enough, even the difference between O(n) and O(n^2) doesnt matter. An extra lg n is completely
irrelevant. Thats one of Knuths small efficiencies.
However, some optimizations arent premature. If you have lists of a billion items, n^2 sorts are out of
the question. Lets say youre typically sorting a billion items. Then lg n is 30. Assume that once in a
while, you need to sort 10 billion items. Then lg n is a hair over 33. That adds 11% to your runtime.
Instead of spending 10x longer processing 10x more items, youll have to spend 11x longer. The
difference is negligible: An order of magnitude change of the input size in either direction affects your
total runtime over by only 11% over an O(n) algorithm. Given a gigantic three orders of magnitude
change of input size, the lg n factor results in only 66%. Thats not nothing, but its also not the real
problem. Youll need more memory before youll need more CPU.
In short, when lg n really varies, n is small enough that the entire operation doesnt matter. When
n is large enough to matter, lg n varies so little that the variation doesnt matter. Not much
anyway.
You could improve your spherical cow model by making it an oblate spheroid, adding another smaller
one as a head, and adding four cylindrical legs but that wont change the air resistance enough to
stop the cow from making a big mess when it hits the ground.

esr
on 2013-03-26 at 11:14:30 said:

>Are there any O(n**2) traps within Python itself we can avoid that you found, or was this all your
algorithms fault?
I dont know that yet. It was probably all my code, but there could be O(n**2) traps within Python as
well.

http://esr.ibiblio.org/?p=4861

20/44

5/14/2014

Python speed optimization in the real world

Jay Maynard
on 2013-03-26 at 11:15:41 said:

>boost-python
*twitch*
Merciful $DEITY. Boost is bad enough. Dont inflict it on Python.

esr
on 2013-03-26 at 11:19:42 said:

>Did you consider profiling reposurgeon for performance bottlenecks, rewriting the relevant pieces of
code in C/C++ and using bindings?
Yes, for about a half-second. Then I realized how ridiculous the idea was and abandoned it.
That strategy only works when the stuff you need to do fast fits in Cs type ontology without incurring
so much code complexity that you end up with more problems than you started with. There was no
chance that would be true of reposurgeons internals none at all.

Garrett
on 2013-03-26 at 11:46:07 said:

@JustSaying:
I work with filesystems for a living. When you have a large on-disk data structure you need to search,
loading another block off of disk is a big cost. OTOH, searching that block in memory is
comparatively cheap. For some of our data structures we use binary or hash trees to locate the block
we need, but then pack the block as an array. This avoids extra pointers and allows us to cram a few
more entries per block. In these cases, cutting the number of block loads from 20 to 10 can be a big
savings if the operation must occur in real-time for a client (as opposed to a background processing
operation). Spinning rust is slow

http://esr.ibiblio.org/?p=4861

21/44

5/14/2014

Python speed optimization in the real world

JustSaying
on 2013-03-26 at 12:28:40 said:

@esr:
O(1 ) is constant time [...] O(log n), typically associated w ith the cost of finding a specified
item in a tree or hash table.
O(1) is typically associated with the cost of accessing a specified item in an array by index.
@Winter: O(log n) vs O(n) corresponds to [c1 * log(n) + d1] LT [c2 * n + d2]
@Adam:
In short, w hen lg n really varies, n is small enough that the entire operation doesnt
matter.
That is only if c1 is small relative to d1 and the universe.
The curve for log(n) flattens faster than even sqrt.
The sacrosanct rule to not do premature optimization appears to be deprecated under openextension, because profiling isnt available.
If your caller must to call you a billion times (perhaps deep in some nested function hierarchy), and
you are employing a log(n) tree or hash instead of an array, then the difference in application
performance can 300% n = 1000, 400% n = 10,000, 500% n = 100,000, etc.
So log(n) is never the same as constant. The cow is never spherical except when we touch him only
one way Steve Jobs.

Falstaff
on 2013-03-26 at 12:39:34 said:

@esr: I didnt bother measuring the working set because the only metric of that that mattered to
me was doesnt trigger noticeable swapping.
Sure, for your current number commits. Now, using caching, the design trades off runtime for an
upper limit based on the memory of the box.

http://esr.ibiblio.org/?p=4861

22/44

5/14/2014

Python speed optimization in the real world

esr
on 2013-03-26 at 13:41:41 said:

>Now, using caching, the design trades off runtime for an upper limit based on the memory of the
box.
Indeed so. Its easier to buy memory than more processor speed these days.

Winter
on 2013-03-26 at 14:52:44 said:

@JustSaying
The constants for O(log n) tend to be larger than for O(n), else you would have tried the log n
algorithm first. And indeed, log n matters if n is in the billions. But at that point, you are tweaking all
algorithms.

Julien FrnchFrgg Rivaud


on 2013-03-26 at 18:26:07 said:

> Looks like a classical runtime/memory trade-off. Have you compared the working set size before
and after the speedup?
TL,DR: see below
In fact, I should tell that before I worked on refactoring for speed, I began searching for ways to cut a
lot the memory used by reposurgeon. Most of the gain was obtained by using __slots__ on most
instanciated structures, but I did some dict eviction and copy on write optimization on a really
memory hungry part: the filemaps.
Reposurgeon already was optimized in that regard (Eric had already implemented a rather good
COW scheme for PathMaps), but the fact that PathMaps snapshotting required a new dictionnary
each time to be able to replace later an entry by its copy was taking its toll So I devised a
tweek to take snapshots even less often, then a further optimization which is a real memory
usage/code complexity tradeoff.
http://esr.ibiblio.org/?p=4861

23/44

5/14/2014

Python speed optimization in the real world

Returning to simpler structures would probably gain some speed too, but the fact is that on my
machine, reposurgeon still tops at 75% of my 4GB of RAM when converting the blender repository
and I suspect Battle for Westnoth to be such a contender too. Sure, one can trade computationnal
cost and even code readability for memory, but the bargain is not the same when you can trade
200MB temporary memory for a O(n**2) to expected O(n) reduction e.g. store previous hits in a
set/dict instead of searching them backwards in the already seen list than when you trade 2GB
of memory used through the whole import for only a constant factor one of the costs of the smart
COW PathMap over a list of dicts is that built-in types dont have interpreter overhead, and in fact run
at C-speed, but thats only a constant factor rather that a whole new complexity class.
As for the optimization itself, it is amusing to note that Eric and I actually started optimizing for
speed each in his corner without concerting At first we were doing orthogonal changes, then as the
set of molasses reduced we began stepping on each others toes^W^W^W^W^W collaborating more
;-) Also note that while Eric says to have driven his optimizations by profiling, I was less smart and
just wandered in the code searching for unpythonic or unpretty code to my eyes to the risk of
premature or over- optimization. I was more seeking refactors for clarity and code compactess and
iterators galore because I love them too much for my own good than real speed optimizations; it
just happens that I seem to find ugly O(n**2) code.
The last thing that can explain why Eric didnt find the places to optimize at first sight is that
reposurgeon is big and its internal structures have been made to mirror the fast-import format at first.
This legacy still shows a lot. While that decision was sane at the time when reposurgeon was less
complex and able than now, and while there still are several tangible benefits to this similarity like
the ability of reposurgeon to round-trip fast-import streams to the exact character, over the course of
time and especially in the few last weeks where Eric and I started to optimize internal objects
like Commits track more and more their relationships with their surrondings, to the point that now
they collectively maintain in memory the whole DAG, in both directions.
At first, Commits only stored the marks to their parents. To find parents a sweep over the complete
set of events was needed, because a mark is only a string containing a colon and a number, and
marks arent even necessarily consecutive Eric made that computation to remember its results,
then swapped altogether to storing the commits objects instead, diverging from fast-import towards a
graph representation. For children, I first memoized the function searching for all commits whose
parents contained self, then replaced that altogether by code that stores the children list on commits
but keeps them synchronized at all times with parent lists. And for tags/resets, Eric and I both tried
to make commits know which tags/resets pointed to them, always kept in sync with the information
on tags telling where they point to.
TL,DR: Some of the innefficiencies were hidden, but most of them were due to the lack of
informations stored. Some loops that were only O(n) were actually called O(n) times by another
function which in a codebase that dense is not easy to spot and it was not possible to make
tho inner loop more efficient short of doing large refactors All these problems combined tend to
make a poor humans brain automatically sweep over and search some other more palatable
optimization. The needed refactors were difficult to do, not because the end result isnt known but
because the transition must be painless, and thats difficult when there are tricky corner cases
http://esr.ibiblio.org/?p=4861

24/44

5/14/2014

Python speed optimization in the real world

everywhere.
Keeping commits very small and ensuring each state was correct was an imperative goal for me.
Kudos for Eric and his approach to writing code, documentation, and test suites at the same time, or
else none of these refactorings could have happened for fear of breaking everything And I broke a
lot of things but noticed right away. Some parts of the code were actually relying on some
invariants that came from the fact that parent and children lists were generated at first ! Finding those
was hard and a blocker for the refactorings.
I already said far too much for a small comment, sorry for that :-(

Sigivald
on 2013-03-26 at 18:38:48 said:

BRM said: Sometimes this k ind of improvement is worse than a waste of time. The code may be
written in a way that mak es it obvious what it is supposed to do and that it is, in fact, doing it.
Rework ing code that mak es unimportant improvements but also mak es the code obscure and subtle
is bad practice.
True fact.
Premature optimization is a related problem.
First, see if its slow.
Then, see what part of its actually making it slow,
Then fix that part.
(And if, as in the quote above, the speed improvement is minor compared to the added complexity,
dont fix it.)

esr
on 2013-03-26 at 18:53:08 said:

>(Eric had already implemented a rather good COW scheme for PathMaps)
Actually that code wasnt mine. Somebody named Greg Hudson wrote it in an attempt to reduce
http://esr.ibiblio.org/?p=4861

25/44

5/14/2014

Python speed optimization in the real world

memory footprint, and in so doing enabled me to solved a fiendishly subtle bug in branch processing
that had stalled the completion of the Subversion reader for six months. To invoke it, the repository
had to contain a Subversion branch creation, followed by a deletion, followed by a move of another
branch to the deleted name.
I still dont know what exactly was wrong with my original implementation, but a small generalization
of Hudsons code (from CoW filepath sets to CoW filepath maps) enabled me to use it to remove a
particular O(n**2) ancestry computation in which I suspected the bug was lurking. Happily that
suspicion proved correct.

Jakub Narebski
on 2013-03-26 at 20:48:55 said:

By the way, Eric, what profiler did you try to use, and what you are missing in it? What features
would you like to see in profiler?

esr
on 2013-03-26 at 21:47:41 said:

>By the way, Eric, what profiler did you try to use, and what you are missing in it? What features
would you like to see in profiler?
The stock Python profiler. Unfortunately, its pretty bad about assigning time to method calls! Ive
always thought this was odd given that the standard style is so OO.

Far
on 2013-03-26 at 23:43:09 said:

BTW, SBCL has SB-SPROF for profiling, which is quite informative, though it is not obvious at first
how to read the results.

Far
http://esr.ibiblio.org/?p=4861

26/44

5/14/2014

Python speed optimization in the real world

on 2013-03-26 at 23:55:19 said:

(Also, if you do complex shell pipes or string substitutions, INFERIOR-SHELL:RUN is a richer frontend on top of UIOP:RUN-PROGRAM. A implementation of it on top of EXECUTOR:RUN-PROGRAM
or IOLIB:SPAWN would be nice, but hasnt been done yet.)

Shenpen
on 2013-03-27 at 05:22:59 said:

>Entertainingly, one of the downsides of being an entirely self-taught programmer


Actually I think the standard schoolish way of learning theory, then hands-on experience, then more
work experience, is not useful at all in the two fields I was taught, programming/database design and
business administration. We memorize and barf back theoretical definitions which we dont care
about because we have no idea what they are good for, and are often too formal to seem really
useful, take an exam, forget them, and later on it is hard to apply it to practical problems, or even
realize that the problems we face have anything to do with them.
It would be better to do hands-on practice first, try to figure out solutions, usually fail, then being told
to do it X way without an explanation, and then learn the theory why we were told so.
Example: I remember memorizing, not really understanding, taking an exam of, and then promptly
forgetting database normalization: 3NF, BCNF, 4NF. Then years later actually designing databases,
figuring out a common sense way of doing it, then realizing this actually sounds something like
BCNF. Then I went back to the textbook, looked up 4NF and actually my design got better. And then
realizing it is all too slow and we have to denormalize for speed :-)
Same with business administration, only after many years of work I got the philosophy of accounting
and going back to the textbook they started to make sense.
What would be a world like in which every construction engineer would first work as a mason and
carpenter?

Ivan Shvedunov
on 2013-03-27 at 05:59:09 said:

It would be nice if you told us more about your dissatisfaction with CLOS. I can think of lack of dot
notation and therefore somewhat inconvenient slot access, although it doesnt bother me very much,
http://esr.ibiblio.org/?p=4861

27/44

5/14/2014

Python speed optimization in the real world

and for situations when it does there are with-slots and with-accessors. Maybe decorators for
classes/generic functions/methods would not harm, too (macrology and MOP helps, but in some
situations I would indeed prefer decorators as theyre easier to combine and using MOP may switch
off compiler optimizations for CLOS). Other than that, is the problem reduced to the fact that CLOS is
unlike Pythons object model? If so, Im not sure whether its a problem of CLOS or one of Python. I
for one often miss CLOS when I write Python or JavaScript code. Besides multimethods/MOP/etc.
there are other good sides to it, for instance using (func object) instead of object.method notation
makes completion and call tips work much better; also, CLOS is very well suited for live editing,
when you make modifications without restarting your program thats usually hard to achieve in JS
and very hard in Python.

esr
on 2013-03-27 at 08:39:11 said:

>It would be nice if you told us more about your dissatisfaction with CLOS.
Dot notation would have been nice, but thats just syntax and un-Lispy (though you should look into
e7 if any documentation for it is still on the web). I think the feature that stuck most in my craw was
having to declare stub generics on penalty of a style warning from the compiler. Bletch! I dislike the
requirement that all methods be globally exposed. too.
For this particular translation, I wanted a class system that simulated Python behavior more closely.
Im sure this could be done with sufficiently complex macro wrappers but that seemed like a
forbidding amount of work and possibly dangerous to maintainability.

The Monster
on 2013-03-27 at 09:04:36 said:

> Its easier to buy memory than more processor speed these days.
The original driver for 64-bit architectures was people who wanted to cache their entire database in
RAM, and the 32-bit machines couldnt address enough memory to do that.

Jeff Read
on 2013-03-27 at 11:56:06 said:
http://esr.ibiblio.org/?p=4861

28/44

5/14/2014

Python speed optimization in the real world

What w ould be a w orld like in w hich every construction engineer w ould first w ork as a
mason and carpenter?
My dad served as a mentor to a couple of UConn mech eng students a few years back for their
senior project. His big complaint was that while they were smart and knew their physics, they didnt
know how to machine at all. He thought it terribly important that an engineer gain experience as a
machinist, since a technical drawing is basically a set of instructions to the machinist who will
actually make the part.

tpmoney
on 2013-03-27 at 20:26:31 said:

> Actually I think the standard schoolish way of learning theory, then hands-on
>experience, then more work experience, is not useful at all in the two fields I
>was taught, programming/database design and business administration.
This was my constant complaint about my CS classes. They taught plenty of theory of the various
modern programming techniques, but there was so little practical application, and what little there
was was so contrived (a square is a rectangle is a shape for class inheritance for example) that while
I could give you the reasons why you would want to do these things on an intellectual level, I had no
gut understanding of why you would go through the extra work.

BRM aka Brian R. Marshall


on 2013-03-27 at 22:54:01 said:

Tangential to the matter at hand, but


Probably anyone who is into database design has heard this one, but
1NF, 2NF and 3NF can be described as:
The key, the whole key and nothing but the key

Jakub Narebski
on 2013-03-28 at 05:37:26 said:

http://esr.ibiblio.org/?p=4861

29/44

5/14/2014

Python speed optimization in the real world

> [...] what little there was was so contrived (a square is a rectangle is a shape for class inheritance
for example) [...]
Particullary because square / rectangle relationship is just a bad fit and bad example of OOP
inheritance (where more specialized class is usually extended, not limited).

Patrick Maupin
on 2013-03-28 at 10:02:40 said:

@Jakub:
w here more specialized class is usually extended, not limited
Thats a really good observation.

LS
on 2013-03-28 at 11:17:57 said:

where more specialized class is usually extended, not limited


Thats a really good observation.
Yes, but it just goes to show that while OOP is a good fit for many problems, it doesnt make things
much easier. Coming up with a really good set of classes, with the right responsibilities is difficult.
Finding hidden gotchas in the inheritance hierarchy is difficult. Its only after youve struggled quite a
while with these issues that you end up with a good set of classes that make the actual program
construction easy.
This is not really an OOP thing. If youre doing plain old procedural programming, the hard part is
figuring out how to partition the problem. Once you do that, everything seems to fall into place.
Either way, what you are doing is actually trying to understand the problem you are trying to solve.
Thats the hard part.

William Newman
on 2013-03-28 at 11:28:09 said:
http://esr.ibiblio.org/?p=4861

30/44

5/14/2014

Python speed optimization in the real world

ESR wrote Dot notation [for CLOS] would have been nice, but thats just syntax and un-Lispy
Its not what youre looking for, but you might at least be amused that I often use a macro
DEF.STRUCT which is a fairly thin layer over stock CL:DEFSTRUCT which, among other things,
makes accessor names use #\. instead of #\- as the separator between structure class name and
slot name. (E.g., after (DEF.STRUCT PLANAR-POINT X Y), PLANAR-POINT.X is the name of an
accessor function.)
More seriously, when you talked earlier about the apparent limitations of CL for printing objects, my
impression is that the CL printer is more powerful and flexible than in most languages. It has some
ugly misfeatures in its design (e.g., making stream printing behavior depend too much on global
special variables instead of per-stream settings). It tends to be slow. But it is fundamentally
functional and flexible enough that on net Id list it as an advantage of CL vs. most other languages in
something like Peter Norvigs chart. The feature Ive pushed hardest on is *PRINT-READABLY*
coupled with complementary readmacros to allow the object to be read back in. In CL, these hooks
are expressive enough to let me do tricks like writing out a complex cyclic data structure at the
REPL, and later scrolling back in the REPL or even in the transcript of a previous session, cutting out
the printed form, pasting it into my new REPL prompt, and getting the same thing. (Of course the
implementor of the readmacros needs to decide how the same copes with technicalities like shared
structure, e.g. by memoization or not.) I am not an expert in Python 2.x or Ocaml or Haskell, but Ive
read about them and written thousands of lines of each, and its not clear to me that their
printer/reader configurability is powerful enough to support this.

esr
on 2013-03-28 at 13:14:35 said:

>More seriously, when you talked earlier about the apparent limitations of CL for printing objects, my
impression is that the CL printer is more powerful and flexible than in most languages.
You may well be right. It wouldnt surprise me if you were.
But youre falling into a trap here that I find often besets Lisp advocates (and as I criticize, remember
that I love the language myself). Youre confusing theoretical availability with practicality to hand. As
you note, and as I had previously noticed, print behavior has ugly dependencies on global variables.
Separately, supposing your more powerful is really there, it is difficult to use, requiring arcane and
poorly documented invocations. Contrast this with Python str(), which ten minutes after youve first
seen it looks natural enough that any fool can use it.
Programming languages should not be tavern puzzles. The Lispy habit of saying yes, you can do X
provided youre willing to sacrifice a goat at midnight and then dance widdershins around a flowerpot
is one of the reasons Lisp advocates are often dismissed as semi-crackpots. Yes, LISP has
tremendous power, but it also has terrible affordances. Also see why I didnt program in it outside of
http://esr.ibiblio.org/?p=4861

31/44

5/14/2014

Python speed optimization in the real world

Emacs for 30 years.

Jeff Read
on 2013-03-28 at 13:22:53 said:

Those of you who are concerned about software patents have reason to celebrate: Uniloc just got
handed its ass in their patent suit against Rackspace. In the very same East Texas district court
that patent trolls venue-shop for to get patent-troll-friendly rulings. Uniloc is a notorious, and
heretofore rather successful, patent troll; basically if you do any sort of license verification for a piece
of proprietary software, expect to be sued by Uniloc.
The defense cited not only In re Bilsk i but two other, more recent cases: Cybersource v. Retail
Decisions and Dealertrack v. Huber, which establish that for purposes of the machine or
transformation test of patentability, a general-purpose computer is not a specific enough machine,
and transformation of data is not sufficient transformation.
Given the way the sausage that is law gets made in Murka, Im not going to say its game over for
software patentholders yet. But their job just got a whole lot harder.

John Wiseman
on 2013-03-28 at 14:17:20 said:

> Youre confusing theoretical availability with practicality to hand


Writing the equivalent of simple __str__ and __repr__ methods in Lisp is very easy, its not some
theoretically-powerful-but-practically-difficult beast. You just have to know that print-object and *printreadably* exist, like you have to know that __str__ and __repr__ exist.
If you want to support pretty-printing or printing cyclic data structures in a way that they can be read
back in, then you need to learn some more Lisp, but thats actually not hard either (well, except
pretty-printingthat can be a beast). As far as I know neither is even possible in Python using the
standard for printing & reading.

Jeff Read
on 2013-03-28 at 15:26:58 said:
http://esr.ibiblio.org/?p=4861

32/44

5/14/2014

Python speed optimization in the real world

Programming languages should not be tavern puzzles. The Lispy habit of saying yes, you
can do X provided youre w illing to sacrifice a goat at midnight and then dance w iddershins
around a flow erpot is one of the reasons Lisp advocates are often dismissed as semicrackpots.
s/Programming languages/Operating systems/g
s/Lisp/Linux/g

Patrick Maupin
on 2013-03-28 at 15:41:18 said:

@John Wiseman:
If you w ant to support pretty-printing or printing cyclic data structures in a w ay that they
can be read back in, then you need to learn some more Lisp, but thats actually not hard
either (w ell, except pretty-printingthat can be a beast). As far as I know neither is even
possible in Python using the standard for printing & reading.
In the general case, there is usually no real reason to worry about printing for round-tripping in
Python, because pickle handles things like circular references quite nicely.
As far as the other goes, there are several ways to pretty print things, including leveraging the
standard str() functions by providing your own __str__.

Jessica Boxer
on 2013-03-28 at 15:47:36 said:

@esr
> Unfortunately, my experience is that Python profilers suck rather badly
Whatever happened to batteries included?
Improving the performance of anything beyond a trivial program without a profiler is like painting a
portrait wearing a blindfold. It is a plain observable fact that programs dont spend their time where
programmers think they do. It is much more fun to write a cool optimization than an effective one.
Nonetheless, it sounds like you recognize this and implemented a custom, rube goldberg profiler.
http://esr.ibiblio.org/?p=4861

33/44

5/14/2014

Python speed optimization in the real world

Which reminds me of the aphorism that those who dont use UNIX are deemed to reinvent it, badly.[*]
One more reason not to do Python, as if there werent enough already.
[*] BTW, I know that isnt actually what Henry Spencer said, but I didnt want to use the real one
since plainly ESR is not lacking understanding here, just tools.

esr
on 2013-03-28 at 16:48:09 said:

>Whatever happened to batteries included?


Its a question Ive wondered about myself in this case. There arent many places where Python fails
to live up to its billing; this is one. Actually, the most serious one I can think of offhand.
>Nonetheless, it sounds like you recognize this and implemented a custom, rube goldberg profiler.
Thats too negative. What I did is often useful in conjunction with profilers even when they dont suck
I sampled a timer after each phase in my repo analysis and reported both elapsed time and
percentages. When several different phases call (for example) the same lookup-commit-by-mark
code, custom instrumentation of the phases can tell you things that function timings alone will not.

esr
on 2013-03-28 at 16:51:42 said:

>s/Lisp/Linux/g
Linux is not even within orders of magnitude as bad as Lisp is this way theyre really not
comparable. The real-world evidence for that is penetration levels.

Jay Maynard
on 2013-03-28 at 17:17:39 said:

Jessica, what is *your* weapon of choice for the problem space Python occupies?
http://esr.ibiblio.org/?p=4861

34/44

5/14/2014

Python speed optimization in the real world

esr
on 2013-03-28 at 17:33:40 said:

>Jessica, what is *your* weapon of choice for the problem space Python occupies?
Im curious about that myself. I would rate Ruby approximately as good (though with weaker library
support) Perl somewhat inferior due to long-term maintainability issues, and nothing else anywhere
near as good.

Jeff Read
on 2013-03-28 at 18:10:40 said:

Im curious about that myself. I w ould rate Ruby approximately as good (though w ith
w eaker library support) Perl somew hat inferior due to long-term maintainability issues, and
nothing else anyw here near as good.
Given Jessicas putative requirements (must be statically typed and work with the .NET framework),
Boo would be the closest thing to Python; but really, C# is a good enough language that few people
working within those constraints have a reason to switch away from it.

Jessica Boxer
on 2013-03-28 at 18:21:03 said:

Im not really sure what problem space Python occupies. It seems to me that every programming
problem is its domain, according to its advocates.
Nonetheless, as I have said here a number of times as a general programming language I think C# is
the best system I have used (system including all the peripheral items that make a language usable.)
The problem space Eric is referring too, what I want to call batch tools, I find C# excellent for that
kind of work.
I doubt you love that answer, but there it is.
Most of my programming is done in C# and JavaScript. FWIW, you all should totally check out
http://esr.ibiblio.org/?p=4861

35/44

5/14/2014

Python speed optimization in the real world

angularjs as an helper for javascript. It is super cool, very useful, and has probably tripled my speed
in writing browser side code.
I have never used Ruby, but I have read a little about it and know someone who has a lot of expertise.
Anything that describes itself as the good parts of perl is unlikely to be appealing to me because I
dont think perl has any good parts.

Jessica Boxer
on 2013-03-28 at 18:27:37 said:

@esr
> Thats too negative. What I did is often useful in conjunction with profilers even when they dont
suck I sampled a timer after each phase in my repo analysis and reported both elapsed time and
percentages.
I didnt read your code, but FWIW, a sampling profiler is more than adequate for 95% of profiling
needs. Seems to me that you just created the batteries required, assuming you made it general
enough.
Certainly for optimizing what you need is show me the top five places my code spends most of its
time, which is what that gives you. So Rube Goldberg be damned, sounds like a great tool you built.

Patrick Maupin
on 2013-03-28 at 22:30:02 said:

@esr:
Its a question Ive w ondered about myself in this case. There arent many places w here
Python fails to live up to its billing; this is one. Actually, the most serious one I can think of
offhand.
In my experience, one of the stock profilers (cProfile) works quite well. But you do have to take into
consideration the number of calls that are made to a given method (this data is reported as well as
total time spent in each call). An attribute lookup for a call is quite properly assigned to the calling
function.
@Jessica Boxer:

http://esr.ibiblio.org/?p=4861

36/44

5/14/2014

Python speed optimization in the real world

The problem space Eric is referring too, w hat I w ant to call batch tools, I find C# excellent
for that kind of w ork.
I agree that C# is a good language. Like Python, its domain space is huge, so its worthwhile honing
your abilities on a general purpose tool like either one of these rather than remembering arcane batch
syntax.
But Im not going to use a Microsoft OS and Im not going to use a non-Microsoft implementation of
C#, so Im not using C#.

uma
on 2013-03-28 at 23:12:30 said:

esr:
A combination of clojure and jython is one possibility

rrenaud
on 2013-03-28 at 23:32:45 said:

You have a performance problem, and your first instinct is to rewrite the code in a different language,
rather than find algorithmic bottlenecks? Maybe you should stop hating on computer science
education, and start taking some CS classes?

esr
on 2013-03-28 at 23:56:23 said:

>You have a performance problem, and your first instinct is to rewrite the code in a different
language, rather than find algorithmic bottlenecks? Maybe you should stop hating on computer
science education, and start taking some CS classes?
How do you pack that many misconceptions into two sentences? It must take both native talent and
a lot of practice.

http://esr.ibiblio.org/?p=4861

37/44

5/14/2014

Python speed optimization in the real world

esr
on 2013-03-28 at 23:57:32 said:

>A combination of clojure and jython is one possibility


Intriguing thought. I may try it on a future project.

Jay Maynard
on 2013-03-29 at 01:59:31 said:

I have a ready-made C# project that I could hack on if I felt the needthough something tells me that
diving into a 500 KLOC package as an introduction to a language may not be that good an idea
after all, I learned to hate C++ from diving into a then-800 KLOC package

Jay Maynard
on 2013-03-29 at 02:03:23 said:

Jessica, Id say Pythons problem space is that group of programs for which an interpreted language
is good enough, little to no bit-bashing is needed, and its I/O capabilities are good enough. yeah,
thats a pretty wide domain, but by no means every programming problem.

Jeff Read
on 2013-03-29 at 12:20:44 said:

A combination of clojure and jython is one possibility


Holy crap, if you thought CL had warts wait till you get a load of Clojure. I tried wrapping my head
around it for a joke project. Id been joking around on Reddit about L33tStart a fictional init(1)
replacement written in ClojureScript and running on Node and decided that such a blasphemous
thing should really, actually exist.
It didnt take much exposure to Clojure(Script) for me to discover that I was allergic. That combined
with Clojures community of twenty-something nafs (holy shit, guys, Rich Hickey is such a genius!
Check out this talk of his (only available in embedded video with no transcript) hes discovered that
http://esr.ibiblio.org/?p=4861

38/44

5/14/2014

Python speed optimization in the real world

if you minimize side effects and code in strictly functional style, your programs become simpler and
more tractable!) is enough to turn me right off the language and actively discourage other smart folks
from adopting it.
Anyway, Clojure is strictly only as powerful as JScheme or Kawa so if you like Scheme you can
use one of those and gain all of Clojures java-interop advantages, plus the awesomeness of working
directly in (a somewhat reduced form of) Scheme.

rrenaud
on 2013-03-29 at 13:02:46 said:

So you find the algorithmic bottlenecks and fix them in Python. Then you begin a failed translation to
Lisp for no reason?

Random832
on 2013-03-29 at 13:58:10 said:

rrenaud youve messed up your reading comprehension somewhere, or just didnt read through the
comments from before your initial post he _thought_ he found them, then attempted a rewrite, then
found more.
As he said March 25 at 4:06 pm: At the time I began looking at Lisp, I believed mistakenly that
I had already found and optimized out the stuff that could be attacked that way. In my defense, I will
note that the remaining O(n**2) code was pretty well obscured; it took a couple of weeks of
concentrated attention by two able hackers to find it, and that was after Id built the machinery for
gathering timings.

Jay Maynard
on 2013-03-29 at 15:26:26 said:

rrenaud: Why do you think I said that was unlike Eric? Unlike you, apparently, I do know him
personally and have what I think is a decent grasp on his hacking style, and the idea that hed
commence a port for performance reasons before making sure every last drop of speed was wrung
out of it algorithmically is something that hed normally ridicule with vigor.

http://esr.ibiblio.org/?p=4861

39/44

5/14/2014

Python speed optimization in the real world

esr
on 2013-03-29 at 16:12:35 said:

>the idea that hed commence a port for performance reasons before making sure every last drop of
speed was wrung out of it algorithmically is something that hed normally ridicule with vigor.
Indeed. But to be fair, I didnt actually give enough information in the OP to exclude the following two
theories: (1) Eric had a momentary attack of brain-damage and behaved in a way he would normally
ridicule, (2) Eric had a momentary attack of oooh, look at the shiny Lisp and put more effort into
thinking about a port to that specific language than the evidence justified.
Neither theory is true, mind you. But I cant entirely blame anyone for entertaining them, because I
didnt convey the whole sequence of events exactly.
rrenauds biggest mistake was to suppose that I hate CS education; in fact, while I have little use for
the version taught everywhere but a handful of excellent schools like MIT/CMU/Stanford, mild
contempt would be a far better description than hate. If these places were doing their job properly,
hackers at my skill level and above wouldnt be rara aves and I wish that were the case, because
theres lots of work to go around.
His funniest mistake was that he thought CS education would fix the mistake he believed me to be
making. See above

Jay Maynard
on 2013-03-29 at 19:58:24 said:

2) was the theory Id come up withfiguring you had a sudden need to connect with your roots or
something. Like I occasionally fire up a CP/M system.

Jay Maynard
on 2013-03-29 at 20:00:33 said:

And CS education, to me, seems to be a good way to train people to be computing theorists, which
is almost entirely orthogonal to hacking ability. Ive never had a single CS course, have no plans to do
so, and my hacking abilities are at a level I consider adequate to get the job done.
http://esr.ibiblio.org/?p=4861

40/44

5/14/2014

Python speed optimization in the real world

uma
on 2013-03-29 at 23:35:55 said:

esr:
Another possibility is chicken scheme and cython, with possibly a thin layer of C glue.
http://www.call-cc.org/

Bruce H.
on 2013-03-30 at 00:01:33 said:

My experience with performance tuning is that you get the greatest gains by starting with a really bad
algorithm. Fortunately, there are a lot of those lying around.

janzert
on 2013-03-30 at 03:05:40 said:

It would be interesting to see the performance of pypy on the post optimization version. The question
being, did the algorithmic optimization that was done help or hurt the relative performance of pypy?

esr
on 2013-03-30 at 06:30:12 said:

>It would be interesting to see the performance of pypy on the post optimization version. The
question being, did the algorithmic optimization that was done help or hurt the relative performance of
pypy?
Its easy enough to run that test that Im doing it now. Timing stock Python on the 56K-commit
benchmark repo, 270sec (208 commits/sec). Same with pypy 178sec (315 commits/sec) Thats
interesting actually a significant speedup this time. I wasnt seeing that when I wrote the OP, quite
the reverse in fact. Something Julien or I did in the last six days has made a significant difference
http://esr.ibiblio.org/?p=4861

41/44

5/14/2014

Python speed optimization in the real world

might be worth running a bisection to find out what.

Russ Nelson
on 2013-03-31 at 02:08:21 said:

Warning: Jay and Jessica, if you fail to appreciate Python as the transcendent language of the gods,
you will be replaced by a small Python script after the Singularity!

esr
on 2013-03-31 at 02:35:32 said:

>Warning: Jay and Jessica, if you fail to appreciate Python as the transcendent language of the
gods, you will be replaced by a small Python script after the Singularity!
There is another theory which states this has already occurred.

Jay Maynard
on 2013-03-31 at 08:16:48 said:

Heh. Python is *my* weapon of choice for the problems it can handle.

Jacob Halln
on 2013-03-31 at 19:11:20 said:

Go see the people in the PyPy channel on Freenode about why your code is slow. Slowness is
considered to be a bug, unless you your code is too short to overcome warmup.

Jeff Read
http://esr.ibiblio.org/?p=4861

42/44

5/14/2014

Python speed optimization in the real world

on 2013-04-01 at 17:54:59 said:

Warning: Jay and Jessica, if you fail to appreciate Python as the transcendent language of
the gods, you w ill be replaced by a small Python script after the Singularity!
Any singularity based on Python will itself meet a day of reckoning with the Gods of the Copybook
Headings, who insist that bugs caught at runtime are orders of magnitude more expensive to fix than
bugs caught at compile time.
Traceback (most recent call last):
File /usr/bin/singularity.py, line 8643, in run_ai
File /usr/lib/python2.7/dist-packages/ai/ai.py, line 137406, in get_neuron_state:
File /usr/lib/python2.7/dist-packages/ai/neuralnet.py, line 99205, in query_neuron
File /usr/lib/python2.7/dist-packages/ai/neuron.py, line 20431, in query_synapse
TypeError: expected object of type SynapseConfiguration, got NoneType
Strong static typing systems are not put into languages just to make your lives miserable, folks.

Jakub Narebski
on 2013-04-01 at 21:34:34 said:

@Jeff Read: Strong typing does not necessarily mean static typing, ask ML (or Haskell, Im not sure
which), with its implied types (and correctness checking that can discover errors in an algorithm by
type mismatch).

Patrick Maupin
on 2013-04-01 at 23:42:35 said:

@Jakub Narebski:
There are actually three different things that get conflated on typing:
strength
static/dynamic
explicit/implicit
Python has reasonably strong typing that is dynamic and implicit.
Some newer languages have strong typing that is static and implicit.
http://esr.ibiblio.org/?p=4861

43/44

5/14/2014

Python speed optimization in the real world

Typing on older languages is usually explicit. Even C#, which has implicit local variables, still
requires variable declarations for those. You tell the compiler heres a variable, figure it out based
on its use.
Strong is usually good. Static is usually good. Implicit is usually good. It wasnt until recently that
you could have all three.

Jeff Read
on 2013-04-04 at 18:49:36 said:

Strong typing does not necessarily mean static typing, ask ML (or Haskell, Im not sure
w hich), w ith its implied types (and correctness checking that can discover errors in an
algorithm by type mismatch).
Never said it did. I chose the phrase strong static typing specifically to contrast with weak static
typing (e.g., C) and strong dynamic typing (e.g., Python, Lisp).
Also, both Haskell and ML support type inference.

Alexander Todorov
on 2013-04-07 at 17:53:02 said:

Im well aware of the principle. Unfortunately, my experience is that Python profilers suck rather badly
you generally end up having to write your own instrumentation to gather timings, which is what I did
in this case. It helped me find the obscured O(n**2) operations.
Did you use any profiling tools at all ? Im interested to hear if there are any ready made tools that
can run against your Python code and suggest meaningful improvements.

http://esr.ibiblio.org/?p=4861

44/44

Das könnte Ihnen auch gefallen