[Haskell-cafe] Cache miss performance costs for Haskell programs?

Tue Aug 30 15:28:11 UTC 2016

Hi Rob,

Am Dienstag, den 30.08.2016, 16:23 +0100 schrieb Rob Stewart:
> I'd really like to see how badly Haskell runtimes are affected as
> cache hit rates decrease. Is anyone aware of any empirical studies, or
> papers, or blog posts, that show examples where:
> 
> For the same Haskell program, increasing inlining causes lower cache
> hit rates, which slows down runtime due to costly cycles to fetch from
> main memory more often.

note quite inlinine, but I have had an experience like this when
evaluating the benefit of the Call Arity program analysis. Here is the
relevant excerpt from my thesis (§3.5.3, you can find the thesis at
http://www.joachim-breitner.de/thesis/):

    Initially, I attempted to use the actual run time measurements, but it
    turned out to be a mostly pointless endeavour. For example the knights
    benchmark would become 9% slower when enabling Call Arity (i.e. when
    comparing (A) to (B)), a completely unexpected result, given that the
    changes to the GHC Core code were reasonable. Further investigation
    using performance data obtained from the CPU indicated that with the
    changed code, the CPU’s instruction decoder was idling for more cycles,
    hinting at cache effects and/or bad program layout.

    Indeed: When I compiled the code with the compiler flag -g, which
    includes debugging information in the resulting binary, but should oth-
    erwise not affect the relative performance characteristics much, the un-
    expected difference vanished. I conclude that non-local changes to the
    Haskell or Core code will change the layout of the generated program
    code in unpredictable ways and render such run time measurements
    mostly meaningless.

    This conclusion has been drawn before [MDHS09], and recently, tools
    to mitigate this effect, e.g. by randomising the code layout [CB13], were
    created. Unfortunately, these currently target specific C compilers, so I
    could not use them here.

    In the following measurements, I avoid this problem by not measuring
    program execution time, but simply by counting the number of instruc-
    tions performed. This way, the variability in execution time due to code
    layout does not affect the results. To obtain the instruction counts I em-
    ploy valgrind [NS07], which runs the benchmarks on a virtual CPU and
    thus produces more reliable and reproducible measurements.

Greetings,
Joachim
-- 
Joachim “nomeata” Breitner
  mail at joachim-breitner.de • https://www.joachim-breitner.de/
  XMPP: nomeata at joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: This is a digitally signed message part
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160830/5642a6e7/attachment.sig>