[Haskell-cafe] Re: proposal: HaBench, a Haskell Benchmark Suite

Fri Jun 25 10:45:58 EDT 2010

On 25/06/2010 14:01, Andy Georges wrote:

>
> Right. I have the distinct feeling this is a major lack in the
> Haskell world. SPEC evolved over time to include larger benchmarks
> that still excercise the various parts of the hardware, such that the
> benchmarks does not achieve suddenly a large improvement on a new
> architecture/implementation due to e.g. a larger cache and the
> working sets remain in the cache for the entire execution. The
> Haskell community has nothing that remotely resembles a decent suite.
> You could do experiments and show that over 10K iterations, the
> average execution time per iteration goes from 500ms to 450ms, but
> what does this really mean?
>
>> We have a need not just for plain Haskell benchmarks, but
>> benchmarks that test
>>
>> - GHC extensions, so we can catch regressions - parallelism (see
>> nofib/parallel) - concurrency (see nofib/smp) - the garbage
>> collector (see nofib/gc)
>>
>> I tend to like quantity over quality: it's very common to get just
>> one benchmark in the whole suite that shows a regression or
>> exercises a particular corner of the compiler or runtime.  We
>> should only keep benchmarks that have a tunable input size,
>> however.
>
> I would suggest that the first category might be made up of
> microbenchmarks, as I do not think it really is needed for
> performance per se. However, the other categories really need
> long-running benchmarks, that use (preferable) heaps of RAM, even
> when they're well tuned.

The categories you mention aren't necessarily distinct: we have several 
microbenchmarks that run for a long time and use a lot of heap.  For 
testing the GC, as with other parts of the system, we need both 
microbenchmarks and larger programs.  Different people want different 
things from a benchmark suite: if you're demonstrating the efficacy of 
an optimisation or a language implementation, then you want just the 
"real" benchmarks, whereas if you're a compiler developer you probably 
want the microbenchmarks too, because investigating their performance 
tends to be more tractable, and the hope is that if you optimise all the 
microbenchmarks then the real programs will take care of themselves (it 
doesn't always work like that, but it's a good way to start).

So I still very much like the approach taken by the venerable nofib 
suite where it includes not only the "real" programs, but also the 
microbenchmarks and the small programs; you don't have to use these in 
published results, but they're invaluable to us compiler developers, and 
having a shared framework for all the benchmarks makes things a lot easier.

If we made it *really easy* for people to submit their own programs 
(e.g. using 'darcs send') then we might get a lot of contributions, from 
which we could cherry-pick for the "real" benchmark suite, while 
keeping most/all of the submissions for the "full" suite.  Similarly, we 
should make it really easy for people to run the benchmark suite on 
their own machines and compilers - make the tools cabal-installable, 
with easy ways to generate results.

> I'm definitely interested. If I want to make a strong case for my
> current research, I really need benchmarks that can be used.
> Additionally, coming up with a good suite, characterising it, can
> easily result is a decent paper, that is certain to be cited numerous
> times. I think it would have to be a group/community effort though.
> I've looked through the apps on the Haskell wiki pages, but there's
> not much usable there, imho. I'd like to illustrate this by the
> dacapo benchmark suite [2,3] example. It took a while, but now
> everybody in the Java camp is (or should be) using these benchmarks.
> Saying that we just do not want to do this, is simply not plausible
> to maintain.

Oh, don't get me wrong - we absolutely do want to do this, it's just 
difficult to get motivated to actually do it.  It's great that you're 
interested, I'll help any way that I can, and I'll start by digging up 
some suggestions for benchmarks.

Cheers,
	Simon

>
>
> -- Andy
>
>
> [1]  Computer systems are dynamical systems, Todd Mytkowicz, Amer
> Diwan, and Elizabeth Bradley, Chaos 19, 033124 (2009);
> doi:10.1063/1.3187791 (14 pages). [2] The DaCapo benchmarks: java
> benchmarking development and analysis, Stephen Blackburn et al,
> OOPSLA 2006 [3] Wake up and smell the coffee: evaluation methodology
> for the 21st century, Stephen Blackburn et al, CACM 2008
>