[Haskell-cafe] the case of the 100-fold program speedup
sethg at ropine.com
Tue Nov 14 16:06:31 EST 2006
One of Alan Perlis's "Epigrams in Programming" is "A language that
doesn't affect the way you think about programming, is not worth
knowing". I recently had an experience that demonstrated this principle.
I had to write some code that took a polygon (encoded in WKT, a standard
format for geographic information exchange), computed its intersection
with several rectangles, and then compute the centroid of each one of
these intersections. No problem, I thought. People who know more than
me about this geometry stuff have written an open-source library, OGR,
which has functions for parsing WKT, computing intersections, computing
centroids, and computing all sorts of other things. And there are
Python bindings for OGR so I don't have to wrestle with C++ to solve my
problem. So I wrote a Python program that used OGR to crunch the
numbers, ran it through the data set we needed to process, and...it took
over sixteen hours to run. Since we want to have this program process
this data set with every build of our software, adding sixteen hours to
our build time was not a pleasant thought.
Then I thought about some of the things I'd read about optimization of
functional programming languages, and said, "Hey! The algorithm for
clipping the polygon to the rectangle is producing a sequence of
vertices, and the algorithm for computing the centroid is consuming a
sequence of vertices. I can do some deforestation here!"
It took me a week to figure out the right algorithm for combining these
two procedures and write some almost-working code that implemented it.
It took a co-worker of mine another few days to find the bugs that had
eluded me. But after the program worked for my test cases, I ran it
again on the full data set...and it took ten minutes.
In other words, I sped up the code by two orders of magnitude. And I
didn't even have to drop into C++ to do the number-crunching; I used OGR
to parse the WKT strings and to look up the coordinates of each vertex
of the polygon, but the algorithm itself was implemented in Python.
More information about the Haskell-Cafe