<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">2016-08-30 17:57 GMT+02:00 Rob Stewart <span dir="ltr"><<a href="mailto:robstewart57@gmail.com" target="_blank">robstewart57@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">[...] I'm interested in hearing about Haskell developers who've used<br>

Valgrind, or perf, to start caring about ensuring minimising<br>

executable size, by injecting NOINLINE pragmas or removing INLINE<br>

pragmas.<br></blockquote><div><br></div><div>Note that the size of an executable and memory cache hit rates are only some of the many things heavily influencing performance. You can easily get a slowdown of an order of magnitude if your program e.g. interacts badly with branch prediction or if it has pseudo-dependencies between instructions due to partial register writes. One can work around these issues on a low level:</div><div><br></div><div>   * If the branch predictor has no clue, backward jumps are normally assumed to be taken (loops!) and forward jumps are assumed to be not taken. So your code generator should better layout the common case in a straight line.</div><div><br></div><div>   * Use the right dependency-breaking instructions when needed, see e.g. section 3.5.1.8 "Clearing Registers and Dependency Breaking Idioms" in <a href="http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf">http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf</a>.</div><div><br></div><div>   * If you ever wondered why e.g. the Intel processors have various seemingly identical instructions, the answer is: Some registers are internally "typed", and you pay a relatively high cost if you access the in an "untyped" manner.</div><div><br></div><div>If LLVM is used as the backend, this should be handled automatically, at least if we give the right hints to it.</div><div><br></div><div>In a nutshell: Size matters only sometimes. ;-) If you really care about performance in detail, you have to look at lots of perf values, e.g. pipeline stalls, branch mispredictions, etc. etc.</div></div></div></div>