[Haskell-cafe] Data.Set optimisations [was: library for set of heterogeneous values?]

Mon Aug 15 08:15:20 UTC 2016

Thanks David,
> Yes, constructor order makes a difference, weird as that seems. When GHC desugars a pattern  match, it always reorders the case branches into constructor order. Sometimes this is obviously good; certain large case expressions are turned into jump tables. Sometimes this is not so good.  One of my predecessors (probably Milan Straka, but I'm not sure) ran a lot of benchmarks and picked the constructor order that was usually best.
Yes it was Milan [1], [2] refers to [3], which has the explanation. (And yes it seems weird.)
"indirect jumps are particularly expensive on a modern processor architecture, because they fox the branch prediction hardware, leading to a stall of 10 or more cycles depending on the length of the pipeline." [Introduction. The paper goes on to estimate a penalty around 7 to 9 % in general GHC code.]
Milan benchmarked a penalty around 10% for Data.Set. The prediction hardware doesn't seem very clever: it "predicts" any test-and-jump will not jump but will barrel on through the straight-line code. GHC's test is for the constructor not being the first in the decl. So the optimal straight-line code comes from declaring first your most likely constructor.
That seems to me worrying for case/constructor branching code in general. Do we pay that cost for every pattern-match on a list or a Maybe, for example? Their definitions in the Prelude put the constructors in the 'wrong' order. (Although presumably those two get optimised to smithereens.)
> Milan Straka wrote a paper [that's 2] that, among other things, recommended adding an additional singleton constructor. I don't actually know why he didn't do that; I can try to ask him. It's possible that it made some things better and others worse.
Performance seemed somewhat better for certain functions. (And none were worse.) OTOH you would pay a penalty of another indirect jump. So perhaps the extra code complexity just wasn't worth it. Perhaps the combination of pointer tagging and inlining the recursive call wrung the best performance out of it. (Inlining works, despite recursion, because every 'visible' function has a helper `go` that isn't inlined.)

[1] The performance of Haskell CONTAINERS package, Haskell Symposium 2010, Milan Straka
[2] Adams' Trees Revisited, 2011, Milan Straka
[3] Faster Laziness Using Dynamic Pointer Tagging, ICFP 2007, Simon M, Alexey Y, Simon PJ


AntC
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160815/ca956f02/attachment.html>