Re: RTS changes affect runtime when they shouldn’t

Sat Sep 23 18:45:36 UTC 2017

2017-09-21 0:34 GMT+02:00 Sebastian Graf <sgraf1337 at gmail.com>:

> [...] The only real drawback I see is that instruction count might skew
> results, because AFAIK it doesn't properly take the architecture (pipeline,
> latencies, etc.) into account. It might be just OK for the average program,
> though.
>

It really depends on what you're trying to measure: The raw instruction
count is basically useless if you want to have a number which has any
connection to the real time taken by the program. The average number of
cycles per CPU instruction varies by 2 orders of magnitude on modern
architectures, see e.g. the Skylake section in
http://www.agner.org/optimize/instruction_tables.pdf (IMHO a must-read for
anyone doing serious optimizations/measurements on the assembly level). And
these numbers don't even include the effects of the caches, pipeline
stalls, branch prediction, execution units/ports, etc. etc. which can
easily add another 1 or 2 orders of magnitude.

So what can one do? It basically boils down to a choice:

   * Use a stable number like the instruction count (the "Instructions
Read" (Ir) events), which has no real connection to the speed of a program.

   * Use a relatively volatile number like real time and/or cycles used,
which is what your users will care about. If you put a non-trivial amount
of work into your compiler, you can make these numbers a bit more stable
(e.g. by making the code layout/alignment more stable), but you will still
get quite different numbers if you switch to another CPU
generation/manufacturer.

A bit tragic, but that's life in 2017... :-}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170923/64a5a5ed/attachment.html>