Re: RTS changes affect runtime when they shouldn’t

Sun Sep 24 18:11:03 UTC 2017

I think changes to the RTS, code generator, and general heap layout are exactly where we *do* want to worry about these very low-level details. Changes in type checking, desugaring, core-to-core, etc., probably are not, because it's just too hard to tease out the relationship between what they do and what instructions are emitted in the end.

David FeuerWell-Typed, LLP
-------- Original message --------From: Sven Panne <svenpanne at gmail.com> Date: 9/24/17  2:00 PM  (GMT-05:00) To: Joachim Breitner <mail at joachim-breitner.de> Cc: ghc-devs at haskell.org Subject: Re: RTS changes affect runtime when they shouldn’t 
2017-09-23 21:06 GMT+02:00 Joachim Breitner <mail at joachim-breitner.de>:

> what I want to do is to reliably catch regressions.

The main question is: Which kind of regressions do you want to catch? Do
you care about runtime as experienced by the user? Measure the runtime. Do
you care abou code size? Measure the code size. etc. etc. Measuring things
like the number of fetched instructions as an indicator for the experienced
runtime is basically a useless exercise, unless you do this on ancient RISC
processors, where each instruction takes a fixed number of cycles.

> What are the odds that a change to the Haskell compiler (in particular to
> Core2Core
> transformations) will cause a significant increase in runtime without a
>  significant increase in instruction count?
> (Honest question, not rhetoric).
>

The odds are actually quite high, especially when you define "significant"
as "changing a few percent" (which we do!). Just a few examples from
current CPUs:

   * If branch prediction has not enough information to do this better, it
assumes that backward branches are taken (think: loops) and forward
branches are not taken (so you should put "exceptional" code out of the
common, straight-line code). If by some innocent looking change the code
layout changes, you can easily get a very measurable difference in runtime
even if the number of executed instructions stays exactly the same.

   * Even if the number of instructions changes only a tiny bit, it could
be the case that it is just enough to make caching much worse and/or make
the loop stream detector fail to detect a loop.

There are lots of other scenarios, so in a nutshell: Measure what you
really care about, not something you think might be related to that.

As already mentioned in another reply, "perf" can give you very detailed
hints about how good your program uses the pipeline, caches, branch
prediction etc. Perhaps the performance dashboard should really collect
these, too, this would remove a lot of guesswork.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170924/14093cfd/attachment-0001.html>