[GHC] #9661: Branchless ==# is compiled to branchy code
GHC
ghc-devs at haskell.org
Fri Oct 3 22:20:25 UTC 2014
#9661: Branchless ==# is compiled to branchy code
-------------------------------------+-------------------------------------
Reporter: dfeuer | Owner:
Type: feature | Status: new
request | Milestone:
Priority: normal | Version: 7.9
Component: Compiler | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Unknown
Unknown/Multiple | Blocked By:
Type of failure: Runtime | Related Tickets:
performance bug |
Test Case: |
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Comment (by dfeuer):
Replying to [comment:7 rwbarton]:
> I think GHC is doing the right thing here, or at least not the wrong
thing.
>
> It's wrong to think of `(==#)` as "branchless equality". It's just a
function whose value is `1#` if its arguments are equal, and `0#` if they
aren't. This fact needs to be visible to GHC in order to optimize code
like
...
> and this is the purpose of the `litEq` rule.
I wonder if there's another way to accomplish that, but see below.
> As a result, GHC may not output assembly containing the number of
branches that you intended. This is the normal function of an optimizing
compiler; if GHC was not free to replace your program with an equivalent
one, then users would be forced to perform all kinds of optimizations by
hand to get efficient code.
With the exception of some bottom-of-the-stack `base` code with module
dependency nightmares, people who use such primops are usually trying to
do something specific for performance reasons, and we should be careful
not to interfere excessively if we can avoid doing so. Your preferred
`==#` can be implemented in terms of mine, I believe, like this:
{{{#!hs
bartonEq x y = case feuerEq x y of
1# -> 1#
_ -> 0#
}}}
Going the other way around, however, is impossible. If you prefer, we can
give `feuerEq` a different name so current code will continue to work the
same.
> I understand that depending on the characteristics of the input, the
"branchless" code may be faster (e.g. for the C function above, `x`
randomly drawn from the set 0 (49%), 5 (49%), 10 (2%)). However, with
certain other distributions of the input value, the "branchy" version is
likely to be faster. So there is no right or wrong answer here.
Of course.
> In any case, I would like to see some benchmarks showing that there are
performance gains to be had in real programs from control over
branchlessness before expending any effort on implementation.
That ship has long since sailed—the massive breaking primBool change was
undertaken specifically to support this sort of thing.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9661#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list