[Haskell-cafe] long-range performance comparisons (GHC over the years)

Wed Jan 6 14:03:26 UTC 2016

Dear Cafe,

I recently noticed a performance problem
due to a fusion rule not firing
for a straightforward piece of code.

In fact it turned out this was already fixed in HEAD,
see https://ghc.haskell.org/trac/ghc/ticket/11344
https://ghc.haskell.org/trac/ghc/ticket/9848

What worries me is that such a regression
had been sitting there for over a year
(and did not make it to 7.10.3) and I got to thinking:

     What long-range performance metering do we have?

So I tried to made nofib runs for some ghc-{6,7} versions
http://www.imn.htwk-leipzig.de/~waldmann/etc/nofib/comparison-k.text

This is slightly broken (not all tests can be built
for all compilers, and I don't known how to fix this)
but there are some interesting numbers already.

It seems nofib programs are self-contained (not using
any libraries) so they are mainly using numbers, lists, tuples,
and user-defined data. This is the heart of (traditional) Haskell,
so this is supposed to work really well.

The table shows that there are a lot of benchmarks
where performance has been increasing. That's good.

But not for all! We should certainly ignore all runtimes
that are absolutely small. I think it is most interesting
to look allocation numbers. A few examples from this list:

* exp3_8  (allocation goes up 50 % from 6.* to 7.*)
  this is addition of Peano numbers.

* gcd (allocation goes up 20 % from 7.8 to 7.10)
  using Integers, tuples (for extended Euclid),
  lists (for control)

* tak (runtime goes up 20 % from 7.6 to 7.8)
  the plain Takeuchi function, just Int and recursion
  (it should not allocate at all?)

(and I confirmed these by manually running them for more inputs,
all measurements done on debian on x86_64  X5365,
ghc-6.* installed from binary packages, ghc-7.* built from source)

So, can this be explained? Improved?

I think we should resist the temptation to change
these benchmarks (using seq and ! and Int# and whatnot)
Assuming nofib contains typical code,
it is the task of the compiler to handle it well.

In case you're wondering about my motivation - this was
prompted by teaching, I wanted to show that ghc creates
efficient code (by fusion) - but it's not just for the show,
I generally try to believe in what I teach and I do rely
on this for my real code. (Well, by definition, "real" for me
might still be "academic" for others...)

- Johannes.