<div dir="ltr">I've gotten substantial speedups (>30%) in the past (e.g. in the hashtables package) by replacing search loops with branchless code, but the technique works best in situations where you have a long run of straight-line integer code for the CPU to chew on; for branchless to be a win you need to feed enough instructions into the pipeline such that the savings of getting 3 instructions per cycle and avoiding paying for branch mispredictions outweighs the savings you would get from not doing the work. I don't think the use case you posted falls into that category.</div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Apr 19, 2015 at 12:44 AM, Joachim Breitner <span dir="ltr"><<a href="mailto:mail@joachim-breitner.de" target="_blank">mail@joachim-breitner.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear devs,<br>
<br>
in <a href="https://ghc.haskell.org/trac/ghc/ticket/10124" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/10124</a> bgamari suggested that<br>
code like<br>
<br>
f :: Int -> Bool<br>
f a = case a of<br>
1 -> True<br>
5 -> True<br>
8 -> True<br>
9 -> True<br>
11 -> True<br>
19 -> True<br>
_ -> False<br>
{-# NOINLINE f #-}<br>
<br>
should not compile to a series of conditional jumps, but rather a<br>
branchless code akin to<br>
<br>
let !p = a ==# 1 `orI#` a ==# 5 `orI#` a ==# 8 `orI#` a ==# 9<br>
`orI#` a ==# 11 `orI#` a ==# 19<br>
case p of<br>
1 -> do_something<br>
0 -> do_something_else<br>
<br>
Subsequently, I refactored the way we produce Cmm code from STG, opening<br>
the possibility to implement this optimization at that stage¹.<br>
<br>
But when I then added this implementation, I could not measure any<br>
runtime improvement, see<br>
<a href="https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15</a><br>
<br>
So my question to the general audience: Is such branchless code really<br>
better than the current, branching code? Can someone provide us with an<br>
example that shows that it is better? Do I need to produce different<br>
branchless assembly?<br>
<br>
If it is not better, I can again refactor the switch generation and<br>
simplify it a bit again.<br>
<br>
Hmm, only now I see that rwbarton has replied there. Not sure why I<br>
missed that. Anyways, more voices are welcome :-)<br>
<br>
Greetings,<br>
Joachim<br>
<br>
¹ This should not preclude an implementation on the Core level, which<br>
SPJ preferred. But I improved other aspects of the code generation as<br>
well, so this is worthwhile on its own.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Joachim “nomeata” Breitner<br>
<a href="mailto:mail@joachim-breitner.de">mail@joachim-breitner.de</a> • <a href="http://www.joachim-breitner.de/" target="_blank">http://www.joachim-breitner.de/</a><br>
Jabber: <a href="mailto:nomeata@joachim-breitner.de">nomeata@joachim-breitner.de</a> • GPG-Key: 0xF0FBF51F<br>
Debian Developer: <a href="mailto:nomeata@debian.org">nomeata@debian.org</a><br>
</font></span><br>_______________________________________________<br>
ghc-devs mailing list<br>
<a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Gregory Collins <<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.net</a>></div>
</div>