Attempt at a real world benchmark

Fri Dec 9 01:50:34 UTC 2016

Hi,

let me thank you perusing this!

>> I am not sure how useful this is going to be:
>>  + Tests lots of common and important real-world libraries.
>>  − Takes a lot of time to compile, includes CPP macros and C code.
>> (More details in the README linked above).
> 
> another problem with the approach of taking modern real-world code:
> It uses a lot of non-boot libraries that are quite compiler-close and
> do low-level stuff (e.g. using Template Haskell, or stuff like the). If
> we add that not nofib, we’d have to maintain its compatibility with GHC
> as we continue developing GHC, probably using lots of CPP. This was
> less an issue with the Haskell98 code in nofib.
> 
> But is there a way to test realistic modern code without running into
> this problem?

what are the reasons besides fragmentation for a modern real-world test
suite outside of ghc (maybe even maintained by a different set of people)?

At some point you would also end up having a matrix of performance
measurements due to the evolution of the library and the evolution of ghc.
Fixing the library to profile against ghc will likely end at some point in
incompatibility with ghc. Fixing ghc will similarly at some point end with
the inability to compile the library.

However if both are always updated, how could one discriminate performance
regressions of the library against regressions due to changes in ghc?

—

What measurements did you collect? Are these broken down per module?
Something I’ve recently had some success with was dumping measurements
into influxdb[1] (or a similar data point collections service) and hook
that up to grafana[2] for visualization.

cheers,
 moritz

—
[1]: https://www.influxdata.com/
[2]: http://grafana.org/