<div dir="ltr">the optimization manual ben mentions is <a href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html">http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html</a><div><br></div><div>it does a very good job laying out the scope of both <b>generality</b> and <b>impact</b> for a huge range of systemsy tricks.</div><div><br></div><div>there are certain conditions when that transformation can be a win, but they generally correspond with *inner loop* scenarios where bad branch prediction heavily impacts performance. One class of algorithms that might provide some grist for the optimization mill would be those in the hacker's delight book <a href="http://www.hackersdelight.org/">http://www.hackersdelight.org/</a></div><div><br></div><div>I have written some code that has that bit fiddling vs bad branch prediction encoding tradeoff before, i'll see if i can dig any of it up if you'd like!</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Apr 19, 2015 at 10:41 AM, Ben Gamari <span dir="ltr"><<a href="mailto:ben@smart-cactus.org" target="_blank">ben@smart-cactus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Joachim Breitner <<a href="mailto:mail@joachim-breitner.de">mail@joachim-breitner.de</a>> writes:<br>
<br>
> Dear devs,<br>
><br>
> in <a href="https://ghc.haskell.org/trac/ghc/ticket/10124" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/10124</a> bgamari suggested that<br>
> code like<br>
><br>
> f :: Int -> Bool<br>
> f a = case a of<br>
> 1 -> True<br>
> 5 -> True<br>
> 8 -> True<br>
> 9 -> True<br>
> 11 -> True<br>
> 19 -> True<br>
> _ -> False<br>
> {-# NOINLINE f #-}<br>
><br>
> should not compile to a series of conditional jumps, but rather a<br>
> branchless code akin to<br>
><br>
> let !p = a ==# 1 `orI#` a ==# 5 `orI#` a ==# 8 `orI#` a ==# 9<br>
> `orI#` a ==# 11 `orI#` a ==# 19<br>
> case p of<br>
> 1 -> do_something<br>
> 0 -> do_something_else<br>
><br>
</span>Hi Joachim,<br>
<br>
Thanks for trying this and I'm sorry to hear that the optimization<br>
hasn't yielded the improvement I would have expected. I've been a bit<br>
silent on the issue as I've been working on finishing up my<br>
PhD. Hopefully in a couple of months after I'm settled in Germany I'll<br>
have time to delve into this more deeply.<br>
<span class=""><br>
> Subsequently, I refactored the way we produce Cmm code from STG, opening<br>
> the possibility to implement this optimization at that stage¹.<br>
><br>
> But when I then added this implementation, I could not measure any<br>
> runtime improvement, see<br>
> <a href="https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15</a><br>
><br>
> So my question to the general audience: Is such branchless code really<br>
> better than the current, branching code? Can someone provide us with an<br>
> example that shows that it is better? Do I need to produce different<br>
> branchless assembly?<br>
><br>
</span>Have you had a look at the Intel Optimization manual? I'll admit that I<br>
haven't looked myself but I'd imagine they'd probably have a few things<br>
to say about this.<br>
<br>
Cheers,<br>
<br>
- Ben<br>
<br>_______________________________________________<br>
ghc-devs mailing list<br>
<a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>
<br></blockquote></div><br></div>