[Haskell-cafe] What I learned from my first serious attempt
low-level Haskell programming
Stefan O'Rear
stefanor at cox.net
Wed Apr 4 19:11:31 EDT 2007
As a learning excersize, I re-wrote and re-optimized
Data.Binary.Builder yesterday.
1. Intuition is NOT your friend. Most obvious pessimizations I made
were actually wins, and vice versa.
2. Parameters are very expensive. Our type of functions that build
(ignoring CPS for the time being) was MBA# -> Int# -> [ByteString],
where the Int# is the current write pointer. Adding an extra Int#
to cache the size of the array (rather than calling sMBA# each
time) slowed the code down ~2x. Conversely, moving the write
pointer into the byte array (storing it in bytes 0#, 1#, 2#, and
3#) sped the code by 4x.
3. MBA# is just as fast as Addr#, and garbage collected to boot.
4. You can't keep track of which version of the code is which, what is
a regression, and what is an enhancement. Don't even try. Next
time I try something like this I will make as much use of darcs as
possible.
5. State# threads clog the optimizer quite effectively. Replacing
st(n-1)# with realWorld# everywhere I could count on data
dependencies to do the same job doubled performance.
6. The inliner is a bit too greedy. Removing the slow-path code from
singleton doesn't help because popSingleton is only used once; but
if I explicitly {-# NOINLINE popSingleton #-}, the code for
singleton itself becomes much smaller, and inlinable (15% perf
gain). Plus the new singleton doesn't allocate memory, so I can
use even MORE realWorld#s.
And probably a few more I forgot about because of #4.
The code is online at http://members.cox.net/stefanor/hackedbuilder if anyone cares (but see #4).
Some parting numbers: (Builder7 is my current version, Builder1 is the
unmodified rossp/kolmodin builder)
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder7 Bench.hs ; time ./Bench 2 10000000
330000000
real 0m5.580s
user 0m5.540s
sys 0m0.032s
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder7 -DUNROLL Bench.hs ; time ./Bench 2 10000000
330000000
real 0m2.948s
user 0m2.908s
sys 0m0.036s
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder1 Bench.hs ; time ./Bench 2 10000000
330000000
real 0m55.708s
user 0m54.695s
sys 0m0.208s
stefan at stefans:~/hackedbuilder$ ghc -v0 --make -O2 -fforce-recomp -DBUILDER=Builder1 -DUNROLL Bench.hs ; time ./Bench 2 10000000
330000000
real 0m25.888s
user 0m25.546s
sys 0m0.156s
stefan at stefans:~/hackedbuilder$ gcc -O2 -march=pentium4 CBuilder.c -o CBuilder ; time ./CBuilder 10000000
real 0m0.861s
user 0m0.860s
sys 0m0.000s
Stefam
More information about the Haskell-Cafe
mailing list