nofib comparisons between 7.0.4, 7.4.2, 7.6.1, and 7.6.2

Wed Feb 6 17:04:22 CET 2013

On Wed, Feb 6, 2013 at 2:09 AM, Simon Marlow <marlowsd at gmail.com> wrote:

> This is slightly off topic, but I wanted to plant this thought in people's
> brains: we shouldn't place much significance in the average of a bunch of
> benchmarks (even the geometric mean), because it assumes that the
> benchmarks have a sensible distribution, and we have no reason to expect
> that to be the case.  For example, in the results above, we wouldn't expect
> a 14.7% reduction in runtime to be seen in a typical program.
>
> Using the median might be slightly more useful, which here would be
> something around 0% for runtime, though still technically dodgy.  When I
> get around to it I'll modify nofib-analyse to report medians instead of GMs.
>

Using the geometric mean as a way to summarize the results isn't that bad.
See "How not to lie with statistics: the correct way to summarize benchmark
results" (http://ece.uprm.edu/~nayda/Courses/Icom6115F06/Papers/paper4.pdf).

That being said, I think the most useful thing to do is to look at the big
losers, as they're often regressions. Making some class of programs much
worse is but improving the geometric mean overall is often worse
than changing nothing at all.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130206/08139d96/attachment.htm>