<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body><div>I think changes to the RTS, code generator, and general heap layout are exactly where we *do* want to worry about these very low-level details. Changes in type checking, desugaring, core-to-core, etc., probably are not, because it's just too hard to tease out the relationship between what they do and what instructions are emitted in the end.</div><div><br></div><div><br></div><div><br></div><div id="composer_signature"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><div style="font-size:85%;color:#575757">David Feuer</div><div style="font-size:85%;color:#575757">Well-Typed, LLP</div></div><div><br></div><div style="font-size:100%;color:#000000"><!-- originalMessage --><div>-------- Original message --------</div><div>From: Sven Panne <svenpanne@gmail.com> </div><div>Date: 9/24/17  2:00 PM  (GMT-05:00) </div><div>To: Joachim Breitner <mail@joachim-breitner.de> </div><div>Cc: ghc-devs@haskell.org </div><div>Subject: Re: RTS changes affect runtime when they shouldn’t </div><div><br></div></div>2017-09-23 21:06 GMT+02:00 Joachim Breitner <mail@joachim-breitner.de>:<br><br>> what I want to do is to reliably catch regressions.<br><br><br>The main question is: Which kind of regressions do you want to catch? Do<br>you care about runtime as experienced by the user? Measure the runtime. Do<br>you care abou code size? Measure the code size. etc. etc. Measuring things<br>like the number of fetched instructions as an indicator for the experienced<br>runtime is basically a useless exercise, unless you do this on ancient RISC<br>processors, where each instruction takes a fixed number of cycles.<br><br><br>> What are the odds that a change to the Haskell compiler (in particular to<br>> Core2Core<br>> transformations) will cause a significant increase in runtime without a<br>>  significant increase in instruction count?<br>> (Honest question, not rhetoric).<br>><br><br>The odds are actually quite high, especially when you define "significant"<br>as "changing a few percent" (which we do!). Just a few examples from<br>current CPUs:<br><br>   * If branch prediction has not enough information to do this better, it<br>assumes that backward branches are taken (think: loops) and forward<br>branches are not taken (so you should put "exceptional" code out of the<br>common, straight-line code). If by some innocent looking change the code<br>layout changes, you can easily get a very measurable difference in runtime<br>even if the number of executed instructions stays exactly the same.<br><br>   * Even if the number of instructions changes only a tiny bit, it could<br>be the case that it is just enough to make caching much worse and/or make<br>the loop stream detector fail to detect a loop.<br><br>There are lots of other scenarios, so in a nutshell: Measure what you<br>really care about, not something you think might be related to that.<br><br>As already mentioned in another reply, "perf" can give you very detailed<br>hints about how good your program uses the pipeline, caches, branch<br>prediction etc. Perhaps the performance dashboard should really collect<br>these, too, this would remove a lot of guesswork.<br></body></html>