[Haskell-cafe] Re: Haskell performance

Thu Dec 20 06:07:36 EST 2007

On Thu, 2007-12-20 at 10:37 +0000, Simon Peyton-Jones wrote:
> Don, and others,
> 
> This thread triggered something I've had at the back of my mind for some time.
> 
> The traffic on Haskell Cafe suggests that there is a lot of interest
> in the performance of Haskell programs.  However, at the moment we
> don't have any good *performance* regression tests for GHC. We have
> zillions of behavioural regression tests (this program should compile,
> this one should fail), but nothing much on performance. We have the
> nofib suite, but it's pretty static these days.  Peter's set of
> benchmarks are great (if very specific to strings etc, but that's
> fine), and it'd be a pity of they now sink beneath the waves.

They won't!  I have set up a mercurial repository on
http://vax64.dyndns.org/repo/hg/ together with the ghc install scripts
I've used.

Once the basic string performance is under control, I intend to expand
it with more advanced parsing, with I/O, and with backend stuff.

I like Parsec.  But it seems to hang on to a bit more memory than it
should and I think it should be faster than it is.

Fast I/O is not simple, and to do it really well, one probably needs to
use threading and mmap() in combination.  mmap() alone is usually not
very performant unless the file has already been cached by the operating
system.

And the backend.  Ouch.  The frontend is absolutely fantastic and does
heroic stuff -- but the backend... apart from having many phases, it
doesn't do much ;)

> What would be v helpful would be a regression suite aimed at
> performance, that benchmarked GHC (and perhaps other Haskell
> compilers) against a set of programs, regularly, and published the
> results on a web page, highlighting regressions.  Kind of like the
> Shootout, only just for Haskell, and with many more programs.

I don't see why a lot of that couldn't be added to the framework I have.
It's GPLv2 :)

> Like Hackage, it should be easy to add a new program.  It'd be good to
> measure run-time, but allocation count, peak memory use, code size,

My framework captures the allocation count but it doesn't use it for
anything.  It gets its peak memory info from /proc/self/status (which it
captures, together with /proc/self/maps, through a LD_PRELOAD trick).
'-sstderr' seemed a bit unreliable in my experience, so I fell back to
asking the operating system.

Making sure one gets stable times + a good estimate of the quality of
the measurements is also important (which my code already does).

>  compilation time are also good (and rather more stable) numbers to
> capture.
> 
> Does anyone feel like doing this?  It'd be a great service.  No need
> to know anything much about GHC.

I think I've made a start but this is clearly not something I'm willing
to take on by myself.

-Peter