[Haskell-cafe] [RFC] benchmarks of bytestrings, teaser

Sat Dec 15 17:44:27 EST 2007

On Sat, 2007-12-15 at 11:59 -0800, Don Stewart wrote:
> firefly:
> > What do you think the relative speeds are of the six small haskell
> > programs at the end of this email?
> > 
> > All they do is read from stdin and count the number of spaces they see.
> > There are two that use strict bytestrings, two that use lazy
> > bytestrings, and two that use the standard Haskell strings.  Three use a
> > recursive function with an accumulator parameter and three use a foldl
> > with a lambda function.
> > 
> > Say the fastest one takes the time 1.  How much time will the others
> > take?
> > 
> > And how about memory?  How much memory do you think they require?  Let's
> > say we feed a 150MB(*) file into each of them, how many megabytes do you
> > think they end up using (as seen from the OS, not in terms of how big
> > the live heap is)?
> > 
> > I'm going to post full benchmarks + analysis on Wednesday.
> 
> How are you compiling these programs, by the way?  ghc-6.8.2 -O2  ?
> (-O2 is required for bytestrings :)

With -O2.  I have measured with 6.8.1, 6.9.20071119, 6.9.20071208
(approx), and 6.9.200712xx (as of yesterday or today).  The picture
changes very little with the compiler, if it changes at all.

I have run them on three very different microarchitectures (2GHz
Athlon64 3000+, 1667MHz Core Duo, 600MHz Pentium III).

All the measurements are scripted.  'make phase1' compiles the
benchmarks, creates input files of various sizes, and runs each
benchmark once to gather information about memory use (peak RSS + ghc
RTS' own information about allocations and gcs), page faults, and an
strace.  This phase is not timing sensitive so I can browse the web and
listen to the music etc. while running it.

'make phase2' runs each benchmark a number of times, calculates the
average time for each + the relative size of the standard deviation +
how much user+sys is different from real (as reported by bash' built-in
time command).  A report with barcharts indicating relative time and
relative peak RSS is generated in either pure ASCII or in UTF-8 (with
fractional-width block chars so the charts look nice and have high
resolution).  If the measurements are deemed to be bad (too high
standard deviation or user+sys doesn't add up to real) then the barchart
is done with '%' characters.  The "quality indicators" for each timing
test are always printed out next to each bar, so we know how close we
are to being perfect or, conversely, how bad the measurements are.

There is a script that analyzes the I/O pattern and sums it up (as "4375
x read(3, 4096, ...) = 4096  followed by 1 x read(3, 4096, ...) = 1242
followed by 1 x read(3, 4096, ...) = 0" an similar).

There are a set of simple I/O programs in C so we can compare ghc's
performance with "speed of light", and so different I/O strategies can
be compared in a cleaner, purer form.

There are also 'make cache' (runs cachegrind), 'make
hs/xxxx.doc' (creates a file with source code + core + stg + c-- +
assembly + times for a given benchmark), etc.

'make sysinfo' creates a file with information about the Linux
distribution used (/etc/lsb-release), kernel version (uname -a), CPU
(/proc/cpuinfo), and compilers used (ghc --version, gcc --version).

'make zipdata' creates a zip file of about 20K with all the raw time
measurements + the sysinfo.

I also have a set of scripts that installs each ghc version so 1) it is
easy for me to repeat the tests on various machines and 2) you can see
exactly which ghc versions I use and repeat the experiments yourself.

-Peter