Attempt at a real world benchmark

Fri Dec 9 05:54:05 UTC 2016

> On Dec 9, 2016, at 1:00 PM, Joachim Breitner <mail at joachim-breitner.de> wrote:
> 
> Hi,
> 
> Am Freitag, den 09.12.2016, 09:50 +0800 schrieb Moritz Angermann:
>> Hi,
>> 
>> let me thank you perusing this!
>> 
>>>> I am not sure how useful this is going to be:
>>>>  + Tests lots of common and important real-world libraries.
>>>>  − Takes a lot of time to compile, includes CPP macros and C code.
>>>> (More details in the README linked above).
>>> 
>>> another problem with the approach of taking modern real-world code:
>>> It uses a lot of non-boot libraries that are quite compiler-close and
>>> do low-level stuff (e.g. using Template Haskell, or stuff like the). If
>>> we add that not nofib, we’d have to maintain its compatibility with GHC
>>> as we continue developing GHC, probably using lots of CPP. This was
>>> less an issue with the Haskell98 code in nofib.
>>> 
>>> But is there a way to test realistic modern code without running into
>>> this problem?
>> 
>> 
>> what are the reasons besides fragmentation for a modern real-world test
>> suite outside of ghc (maybe even maintained by a different set of people)?
> 
> I am not sure what you are saying. Are you proposing the maintain a
> benchmark set outside GHC, or did you get the impression that I am
> proposing it?

Yes, that’s what *I* am proposing for the reasons I mentioned; one I
did not yet mention is time. Running nofib takes time, adding more time
consuming performance tests would reduce their likelihood of being run
in my experience.  As I see this as being almost completely scriptable,
this could live outside of ghc i think. 

> 
>> At some point you would also end up having a matrix of performance
>> measurements due to the evolution of the library and the evolution of ghc.
>> Fixing the library to profile against ghc will likely end at some point in
>> incompatibility with ghc. Fixing ghc will similarly at some point end with
>> the inability to compile the library.
> 
> My motivation right now is to provide something to measure GHC, so this
> would involve fixing the library. And that is what I am worried about:
> Too much maintenance effort in keeping this large piece of code
> compatible with GHC.

Well, we won’t know until we try :-)

> But maybe it is ok if it part of nofib, and hence of GHC, so that every
> breaking change in GHC can immediately be accounted for in the
> benchmark code.
> 
> A nice side effect of this might be that GHC developers can get a
> better idea of how much code their change breaks.

I’m not much a fan of this, but that’s just my opinion :-)

>> 
>> What measurements did you collect? Are these broken down per module?
> 
> Nothing yet, this is on the TODO list.
> 
>> Something I’ve recently had some success with was dumping measurements
>> into influxdb[1] (or a similar data point collections service) and hook
>> that up to grafana[2] for visualization.
> 
> Nice! Although these seem to be tailored for data-over-time, not
> data-over-commit. This mismatch in the data model was part of the
> motivation for me to create gipeda, which powers
> https://perf.haskell.org/ghc/

Assuming we confine this to a particular branch, or discriminate by branch,
commits would be measured in sequence anyway, and the timestamp could be the
time of the reporting of the measurement, and the respective ghc commit hash
end up being an annotation. While this is not very pretty (and I would hope
that grafana has some other ability to enrich the hover-tooltips) it could
present a flexible solution without requiring additional engineering effort.

However, if gipeda is sufficient, please ignore my comment :)

Cheers,
 moritz