Attempt at a real world benchmark

Fri Dec 9 09:50:09 UTC 2016

I have wanted telemetry for years.  ("Telemetry" is the term Microsoft, and I think others, use for the phone-home feature.)

It would tell us how many people are using GHC; currently I have literally no idea.

It could tell us which language features are most used.

Perhaps it could tell us about performance, but I'm not sure how we could make use of that info without access to the actual source.

The big issue is (a) design and implementation effort, and (b) dealing with the privacy issues.  I think (b) used to be a big deal, but nowadays people mostly assume that their software is doing telemetry, so it feels more plausible.  But someone would need to work out whether it had to be opt-in or opt-out, and how to actually make it work in practice.  

Simon

|  -----Original Message-----
|  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
|  Bardur Arantsson
|  Sent: 09 December 2016 07:32
|  To: ghc-devs at haskell.org
|  Subject: Re: Attempt at a real world benchmark
|  
|  On 2016-12-08 17:04, Joachim Breitner wrote:
|  > Hi,
|  >
|  > Am Donnerstag, den 08.12.2016, 01:03 -0500 schrieb Joachim Breitner:
|  >> I am not sure how useful this is going to be:
|  >>  + Tests lots of common and important real-world libraries.
|  >>  − Takes a lot of time to compile, includes CPP macros and C code.
|  >> (More details in the README linked above).
|  >
|  > another problem with the approach of taking modern real-world code:
|  > It uses a lot of non-boot libraries that are quite compiler-close
|  and
|  > do low-level stuff (e.g. using Template Haskell, or stuff like the).
|  > If we add that not nofib, we’d have to maintain its compatibility
|  with
|  > GHC as we continue developing GHC, probably using lots of CPP. This
|  > was less an issue with the Haskell98 code in nofib.
|  >
|  > But is there a way to test realistic modern code without running
|  into
|  > this problem?
|  >
|  
|  This may be a totally crazy idea, but has any thought been given a
|  "Phone Home"-type model?
|  
|  Very simplistic approach:
|  
|    a) Before it compiles, GHC computes a hash of the file.
|    b) GHC has internal profiling "markers" in its compilation pipeline.
|    c) GHC sends those "markers" + hash to some semi-centralized highly-
|  available service somewhere under *.haskell.org.
|  
|  The idea is that the fact that "hashes are equal" => "performance
|  should be comparable". Ideally, it'd probably be best to be able to
|  have the full source, but that may be a tougher sell, obviously.
|  
|  (Obviously would have to be opt-in, either way.)
|  
|  There are a few obvious problems with this, but an obvious win would
|  be that it could be done on a massively *decentralized* scale. Most
|  problematic part might be that it wouldn't be able to track things
|  like "I changed $this_line and now it compiles twice as slow".
|  
|  Actually, now that I think about it: What about if this were
|  integrated into the Cabal infrastructure? If I specify "upload-perf-
|  numbers: True"
|  in my .cabal file, any project on (e.g.) GitHub that wanted to opt-in
|  could do so, they could build using Travis, and voila!
|  
|  What do you think? Totally crazy, or could it be workable?
|  
|  Regards,
|  
|  _______________________________________________
|  ghc-devs mailing list
|  ghc-devs at haskell.org
|  https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.h
|  askell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
|  devs&data=02%7C01%7Csimonpj%40microsoft.com%7Cf467f18af1cf48b14f7708d4
|  20056e03%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6361686550849222
|  57&sdata=vhdrztwo4%2FG8yTrSI%2B5aWSZblqoTTBWlIc5LpOMKquQ%3D&reserved=0