[Haskell-cafe] Re: proposal: HaBench, a Haskell Benchmark Suite

Fri Jun 25 08:39:41 EDT 2010

On 25/06/2010 00:24, Andy Georges wrote:

> I've picked up the HaBench/nofib/nobench issue again, needing a
> decent set of real applications to do some exploring of what people
> these days call split-compilation. We have a framework that was able
> to explore GCC optimisations [1] -- the downside there was the
> dependency of these optimisations on each other, requiring them to be
> done in certain order -- for a multi-objective search space, and
> extended this to exploring a JIT compiler [2] for Java in our case --
> which posed its own problems. Going one step further, we'd like to
> explore the tradeoffs that can be made when compiling on different
> levels: source to bytecode (in some sense) and bytecode to native.
> Given that LLVM is quicly becoming a state-of-the-art framework and
> with the recent GHC support, we figured that Haskell would be an
> excellent vehicle to conduct our exploration and research (and the
> fact that some people at our lab have a soft spot for Haskell helps
> too). Which brings me back to benchmarks.
>
> Are there any inputs available that allow the real part of the suite
> to run for a sufficiently long time? We're going to use criterion in
> any case given our own expertise with rigorous benchmarking [3,4],
> but since we've made a case in the past against short running apps on
> managed runtime systems [5], we'd love to have stuff that runs at
> least in the order of seconds, while doing useful things. All
> pointers are much appreciated.

The short answer is no, although some of the benchmarks have tunable 
input sizes (mainly the spectral ones) and you can 'make mode=slow' to 
run those with larger inputs.

More generally, the nofib suite really needs an overhaul or replacement. 
  Unfortunately it's a tiresome job and nobody really wants to do it. 
There have been various abortive efforts, including nobench and HaBench. 
  Meanwhile we in the GHC camp continue to use nofib, mainly because we 
have some tool infrastructure set up to digest the results 
(nofib-analyse).  Unfortunately nofib has steadily degraded in 
usefulness over time due to both faster processors and improvements in 
GHC, such that most of the programs now run for less than 0.1s and are 
ignored by the tools when calculating averages over the suite.

We have a need not just for plain Haskell benchmarks, but benchmarks 
that test

  - GHC extensions, so we can catch regressions
  - parallelism (see nofib/parallel)
  - concurrency (see nofib/smp)
  - the garbage collector (see nofib/gc)

I tend to like quantity over quality: it's very common to get just one 
benchmark in the whole suite that shows a regression or exercises a 
particular corner of the compiler or runtime.  We should only keep 
benchmarks that have a tunable input size, however.

Criterion works best on programs that run for short periods of time, 
because it runs the benchmark at least 100 times, whereas for exercising 
the GC we really need programs that run for several seconds.  I'm not 
sure how best to resolve this conflict.

Meanwhile, I've been collecting pointers to interesting programs that 
cross my radar, in anticipation of waking up with an unexpectedly free 
week in which to pull together a benchmark suite... clearly 
overoptimistic!  But I'll happily pass these pointers on to anyone with 
the inclination to do it.

Cheers,
	Simon

> Or if any of you out there have (recent) apps with inputs that are
> open source ... let us know.
>
> -- Andy
>
>
> [1] COLE: Compiler Optimization Level Exploration, Kenneth Hoste and
> Lieven Eeckhout, CGO 2008 [2] Automated Just-In-Time Compiler Tuning,
> Kenneth Hoste, Andy Georges and Lieven Eeckhout, CGO 2010 [3]
> Statistically Rigorous Java Performance Evaluation, Andy Georges,
> Dries Buytaert and Lieven Eeckhout, OOPSLA 2007 [4] Java Performance
> Evaluation through Rigorous Replay Compilation, Andy Georges, Lieven
> Eeckhout and Dries Buytaert, OOPSLA 2008 [5] How Java Programs
> Interact with Virtual Machines at the Microarchitectural Level,
> Lieven Eeckhout, Andy Georges, Koen De Bosschere, OOPSLA 2003