[Haskell-cafe] Cache miss performance costs for Haskell programs?

Tue Aug 30 18:01:45 UTC 2016

2016-08-30 17:57 GMT+02:00 Rob Stewart <robstewart57 at gmail.com>:

> [...] I'm interested in hearing about Haskell developers who've used
> Valgrind, or perf, to start caring about ensuring minimising
> executable size, by injecting NOINLINE pragmas or removing INLINE
> pragmas.
>

Note that the size of an executable and memory cache hit rates are only
some of the many things heavily influencing performance. You can easily get
a slowdown of an order of magnitude if your program e.g. interacts badly
with branch prediction or if it has pseudo-dependencies between
instructions due to partial register writes. One can work around these
issues on a low level:

   * If the branch predictor has no clue, backward jumps are normally
assumed to be taken (loops!) and forward jumps are assumed to be not taken.
So your code generator should better layout the common case in a straight
line.

   * Use the right dependency-breaking instructions when needed, see e.g.
section 3.5.1.8 "Clearing Registers and Dependency Breaking Idioms" in
http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf
.

   * If you ever wondered why e.g. the Intel processors have various
seemingly identical instructions, the answer is: Some registers are
internally "typed", and you pay a relatively high cost if you access the in
an "untyped" manner.

If LLVM is used as the backend, this should be handled automatically, at
least if we give the right hints to it.

In a nutshell: Size matters only sometimes. ;-) If you really care about
performance in detail, you have to look at lots of perf values, e.g.
pipeline stalls, branch mispredictions, etc. etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160830/e0cd2b1f/attachment.html>