Re: RTS changes affect runtime when they shouldn’t

Sat Sep 23 19:08:23 UTC 2017

On 2017-09-23 20:45, Sven Panne wrote:
> 2017-09-21 0:34 GMT+02:00 Sebastian Graf <sgraf1337 at gmail.com
> <mailto:sgraf1337 at gmail.com>>:
> 
>     [...] The only real drawback I see is that instruction count might
>     skew results, because AFAIK it doesn't properly take the
>     architecture (pipeline, latencies, etc.) into account. It might be
>     just OK for the average program, though.
> 
> 
> It really depends on what you're trying to measure: The raw instruction
> count is basically useless if you want to have a number which has any
> connection to the real time taken by the program. The average number of
> cycles per CPU instruction varies by 2 orders of magnitude on modern
> architectures, see e.g. the Skylake section
> in http://www.agner.org/optimize/instruction_tables.pdf (IMHO a
> must-read for anyone doing serious optimizations/measurements on the
> assembly level). And these numbers don't even include the effects of the
> caches, pipeline stalls, branch prediction, execution units/ports, etc.
> etc. which can easily add another 1 or 2 orders of magnitude.
> 
> So what can one do? It basically boils down to a choice:
> 
>    * Use a stable number like the instruction count (the "Instructions
> Read" (Ir) events), which has no real connection to the speed of a program.
> 
>    * Use a relatively volatile number like real time and/or cycles used,
> which is what your users will care about. If you put a non-trivial
> amount of work into your compiler, you can make these numbers a bit more
> stable (e.g. by making the code layout/alignment more stable), but you
> will still get quite different numbers if you switch to another CPU
> generation/manufacturer.
> 
> A bit tragic, but that's life in 2017... :-}
> 
> 

I may be missing something since I have only quickly skimmed the thread,
but...: Why not track all of these things and correlate them with
individual runs? The Linux 'perf' tool can retrieve a *lot* of
interesting numbers, esp. around cache hit rates, branch predicition hit
rates, etc.

Regards,