<div dir="ltr">the optimization manual ben mentions is <a href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html">http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html</a><div><br></div><div>it does a very good job laying out the scope of both <b>generality</b> and <b>impact</b> for a huge range of systemsy tricks.</div><div><br></div><div>there are certain conditions when that transformation can be a win, but they generally correspond with *inner loop* scenarios where bad branch prediction heavily impacts performance. One class of algorithms that might provide some grist for the optimization mill would be those in the hacker's delight book <a href="http://www.hackersdelight.org/">http://www.hackersdelight.org/</a></div><div><br></div><div>I have written some code that has that bit fiddling vs bad branch prediction encoding tradeoff before, i'll see if i can dig any of it up if you'd like!</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Apr 19, 2015 at 10:41 AM, Ben Gamari <span dir="ltr"><<a href="mailto:ben@smart-cactus.org" target="_blank">ben@smart-cactus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Joachim Breitner <<a href="mailto:mail@joachim-breitner.de">mail@joachim-breitner.de</a>> writes:<br>

<br>

> Dear devs,<br>

><br>

> in  <a href="https://ghc.haskell.org/trac/ghc/ticket/10124" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/10124</a> bgamari suggested that<br>

> code like<br>

><br>

>         f :: Int -> Bool<br>

>         f a = case a of<br>

>                 1  -> True<br>

>                 5  -> True<br>

>                 8  -> True<br>

>                 9  -> True<br>

>                 11 -> True<br>

>                 19 -> True<br>

>                 _  -> False<br>

>         {-# NOINLINE f #-}<br>

><br>

> should not compile to a series of conditional jumps, but rather a<br>

> branchless code akin to<br>

><br>

>         let !p = a ==# 1 `orI#` a ==# 5 `orI#` a ==# 8 `orI#` a ==# 9<br>

>                          `orI#` a ==# 11 `orI#` a ==# 19<br>

>         case p of<br>

>           1 -> do_something<br>

>           0 -> do_something_else<br>

><br>

</span>Hi Joachim,<br>

<br>

Thanks for trying this and I'm sorry to hear that the optimization<br>

hasn't yielded the improvement I would have expected. I've been a bit<br>

silent on the issue as I've been working on finishing up my<br>

PhD. Hopefully in a couple of months after I'm settled in Germany I'll<br>

have time to delve into this more deeply.<br>

<span class=""><br>

> Subsequently, I refactored the way we produce Cmm code from STG, opening<br>

> the possibility to implement this optimization at that stage¹.<br>

><br>

> But when I then added this implementation, I could not measure any<br>

> runtime improvement, see<br>

> <a href="https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15" target="_blank">https://ghc.haskell.org/trac/ghc/ticket/10124#comment:15</a><br>

><br>

> So my question to the general audience: Is such branchless code really<br>

> better than the current, branching code? Can someone provide us with an<br>

> example that shows that it is better? Do I need to produce different<br>

> branchless assembly?<br>

><br>

</span>Have you had a look at the Intel Optimization manual? I'll admit that I<br>

haven't looked myself but I'd imagine they'd probably have a few things<br>

to say about this.<br>

<br>

Cheers,<br>

<br>

- Ben<br>

<br>_______________________________________________<br>

ghc-devs mailing list<br>

<a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a><br>

<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a><br>

<br></blockquote></div><br></div>