Benchmarking harnesses for a more modern nofib?

Mon Apr 4 07:45:23 UTC 2016

Is anyone currently working in, or interested in helping with, a new benchmark suite for Haskell?  Perhaps, packaging up existing apps and app benchmarks into a new benchmark suite that gives a broad picture of Haskell application performance today?
I would love to see this done.  nofib is showing its age.

An incentive is this: we benchmark GHC against nofib pretty regularly, and pay attention to regressions.   If your program is in the benchmark suite, it’s more likely that its performance will be good and stay good.

The tension is that, to be usable, it must be possible to actually run the benchmark suite, on a variety of platforms, without consuming too much time.

·         nofib has zero package dependencies.  Adding some dependencies is fine, but it adding zillions is not.  Often they can be cut down because some of the dependencies are related to incidental features of the benchmark that can be stubbed off.

·         More seriously, for figures to be comparable we have to compare the same code.  So any package dependencies must be hard dependencies on particular versions. And as GHC moves on, those packages may require (hopefully minor) updates to stay working.

·         Test data and test environment can be a challenge, especially for things like web servers.  Again we don’t to force the developer to install too much other stuff.

All that said, it must be possible to do MUCH better than we are right now, with a 20-year old suite!  Please do join Ryan in working on this.

Simon

From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ryan Newton
Sent: 04 April 2016 06:06
To: ghc-devs at haskell.org; Haskell Cafe <haskell-cafe at haskell.org>
Subject: Benchmarking harnesses for a more modern nofib?

Hi all,

Is anyone currently working in, or interested in helping with, a new benchmark suite for Haskell?  Perhaps, packaging up existing apps and app benchmarks into a new benchmark suite that gives a broad picture of Haskell application performance today?

Background: We run nofib, and we run the shootout benchmarks.  But when we want to evaluate basic changes to GHC optimizations or data representation, these really don't give us a full picture of whether a change is beneficial.

A few years ago, fibon<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fhackage.haskell.org%2fpackage%2ffibon&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7c97ff82e27fe943e829a408d35c46e8e5%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=9TmwKAqL9dOSyqzwcFvVoI7K7aVlxL80Uqws2uZK3Dk%3d> tried to gather some Hackage benchmarks.  This may work even better with Stackage, where there are 180 benchmark suites among the 1770 packages currently.

Also, these days companies are building substantial apps in Hackage.  Which substantial apps could or should go into a benchmark suite?  I see Warp and other web server benchmarks<https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.infoq.com%2fnews%2f2015%2f04%2fweb-frameworks-benchmark-2015%3futm_source%3dinfoqEmail%26utm_medium%3dWeeklyNL_EditorialContentOperationsInfrastructure%26utm_campaign%3d04282015news&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7c97ff82e27fe943e829a408d35c46e8e5%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=EPCgYJwmVzo0JVCdhPjnp1FwfcRcJid%2bAZFyhUp7e0U%3d> all over the web.  But is there a harness that can time some of this code while running inside a single-machine, easy-setup benchmark suite?

Best,
  -Ryan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160404/ca84a808/attachment-0001.html>