What I learned from my first serious attempt low-level Haskell programming

Stefan O'Rear stefanor at cox.net
Wed Apr 4 19:11:31 EDT 2007


As a learning excersize, I re-wrote and re-optimized
Data.Binary.Builder yesterday.

1. Intuition is NOT your friend.  Most obvious pessimizations I made
   were actually wins, and vice versa.

2. Parameters are very expensive.  Our type of functions that build
   (ignoring CPS for the time being) was MBA# -> Int# -> [ByteString],
   where the Int# is the current write pointer.  Adding an extra Int#
   to cache the size of the array (rather than calling sMBA# each
   time) slowed the code down ~2x.  Conversely, moving the write
   pointer into the byte array (storing it in bytes 0#, 1#, 2#, and
   3#) sped the code by 4x.

3. MBA# is just as fast as Addr#, and garbage collected to boot.

4. You can't keep track of which version of the code is which, what is
   a regression, and what is an enhancement.  Don't even try.  Next
   time I try something like this I will make as much use of darcs as
   possible.

5. State# threads clog the optimizer quite effectively.  Replacing
   st(n-1)# with realWorld# everywhere I could count on data
   dependencies to do the same job doubled performance.

6. The inliner is a bit too greedy.  Removing the slow-path code from
   singleton doesn't help because popSingleton is only used once; but
   if I explicitly {-# NOINLINE popSingleton #-}, the code for
   singleton itself becomes much smaller, and inlinable (15% perf
   gain).  Plus the new singleton doesn't allocate memory, so I can
   use even MORE realWorld#s.

And probably a few more I forgot about because of #4.

The code is online at http://members.cox.net/stefanor/hackedbuilder if anyone cares (but see #4).

Some parting numbers: (Builder7 is my current version, Builder1 is the
 unmodified rossp/kolmodin builder)

stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder7 Bench.hs ; time ./Bench 2 10000000
330000000

real    0m5.580s
user    0m5.540s
sys     0m0.032s
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder7 -DUNROLL Bench.hs ; time ./Bench 2 10000000
330000000

real    0m2.948s
user    0m2.908s
sys     0m0.036s
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder1 Bench.hs ; time ./Bench 2 10000000
330000000

real    0m55.708s
user    0m54.695s
sys     0m0.208s
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder1 -DUNROLL Bench.hs ; time ./Bench 2 10000000
330000000

real    0m25.888s
user    0m25.546s
sys     0m0.156s
stefan at stefans:~/hackedbuilder$ gcc -O2 -march=pentium4 CBuilder.c -o CBuilder ; time ./CBuilder 10000000

real    0m0.861s
user    0m0.860s
sys     0m0.000s

Stefam


More information about the Libraries mailing list