[Haskell-cafe] Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

Fri Nov 12 11:10:34 EST 2010

Hi Jason,

Sorry for the delayed response. Thanks for pointing out the darcs-benchmark
package. I had not seen that before and there may be some room for sharing
infrastructure. Parsing the runtime stats is pretty easy, but comparing
different runs, computing statistics, and generating tables should be a
common task.

On a related note, when I uploaded the fibon package, I put it in a new
"Benchmarking" category as opposed to the existing "Testing" category. In my
mind testing is more for correctness and benchmarking is for performance. I
think it would be useful to include other benchmarking packages
(darcs-benchmark, criterion) in that category.

--------------------------------------------------
From: "Jason Dagit" <dagit at codersbase.com>
Sent: Tuesday, November 09, 2010 7:58 PM
To: "David Peixotto" <dmp at rice.edu>
Cc: <haskell at haskell.org>; <haskell-cafe at haskell.org>
Subject: Re: [Haskell] ANNOUNCE: The Fibon benchmark suite (v0.2.0)

> On Tue, Nov 9, 2010 at 5:47 PM, David Peixotto <dmp at rice.edu> wrote:
>
>>
>> On Nov 9, 2010, at 3:45 PM, Jason Dagit wrote:
>>
>> I have a few questions:
>>   * What differentiates fibon from criterion?  I see both use the
>> statistics package.
>>
>>
>> I think the two packages have different benchmarking targets.
>>
>> Criterion allows you to easily test individual functions and gives some
>> help with benchmarking in the presence of lazy evaluation. If some code
>> does
>> not execute for a long time it will run it multiple times to get sensible
>> timings. Criterion does a much more sophisticated statistical analysis of
>> the results, but I hope to incorporate that into the Fibon analysis in
>> the
>> future.
>>
>> Fibon is a more traditional benchmarking suite like SPEC or nofib. My
>> interest is using it to test compiler optimizations. It can only
>> benchmark
>> at the whole program level by running an executable. It checks that the
>> program produces the correct output, can collect extra metrics generated
>> by
>> the program, separates collecting results from analyzing results, and
>> generates tables directly comparing the results from different benchmark
>> runs.
>>
>>   * Does it track memory statistics?  I glanced at the FAQ but didn't see
>> anything about it.
>>
>>
>> Yes, it can read memory statistics dumped by the GHC runtime. It has
>> built
>> in support for reading the stats dumped by `+RTS -t --machine-readable`
>> which includes things like bytes allocated and time spent in GC.
>>
>
> Oh, I see.  In that case, it's more similar to darcs-benchmark.  Except
> that
> darcs-benchmark is tailored specifically at benchmarking darcs.  Where
> they
> overlap is parsing the RTS statistics, running the whole program, and
> tabular reports.  Darcs-benchmark adds to that an embedded DSL for
> specifying operations to do on the repository between benchmarks (and
> translating those operations to runnable shell snippets).
>
> I wonder if Fibon and darcs-benchmark could share common infrastructure
> beyond the statistics package.  It sure sounds like it to me.  Perhaps
> some
> collaboration is in order.
>
>
>>   * Are the numbers in the sample output seconds or milliseconds?  What
>> is
>> the stddev (eg., what does the distribution of run-times look like)?
>>
>>
>> I'm not sure which results you are referring to exactly (the numbers in
>> the
>> announcement were lines of code). I picked benchmarks that all ran for at
>> least a second (and hopefully longer) with compiler optimizations
>> enabled.
>> On an 8-core Xeon, the median time over all benchmarks is 8.43 seconds,
>> mean
>> time is 12.57 seconds and standard deviation is 14.56 seconds.
>>
>
> I probably read your email too fast, sorry.  Thanks for the clarification.
>
> Thanks,
> Jason
>