[GHC] #9661: Branchless ==# is compiled to branchy code
GHC
ghc-devs at haskell.org
Fri Oct 3 21:14:51 UTC 2014
#9661: Branchless ==# is compiled to branchy code
-------------------------------------+-------------------------------------
Reporter: dfeuer | Owner:
Type: feature | Status: new
request | Milestone:
Priority: normal | Version: 7.9
Component: Compiler | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Unknown
Unknown/Multiple | Blocked By:
Type of failure: Runtime | Related Tickets:
performance bug |
Test Case: |
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Changes (by rwbarton):
* type: bug => feature request
Comment:
I think GHC is doing the right thing here, or at least not the wrong
thing.
It's wrong to think of `(==#)` as "branchless equality". It's just a
function whose value is `1#` if its arguments are equal, and `0#` if they
aren't. This fact needs to be visible to GHC in order to optimize code
like
{{{#!hs
f :: Int -> (Int,Int)
f x = if x == 0 -- defined in terms of ==#
then (x', 1) -- here GHC can inline and then constant-fold x' to
257
else (x-1, x')
where x' = 1000 * x + 257
}}}
and this is the purpose of the `litEq` rule. As a result, GHC may not
output assembly containing the number of branches that you intended. This
is the normal function of an optimizing compiler; if GHC was not free to
replace your program with an equivalent one, then users would be forced to
perform all kinds of optimizations by hand to get efficient code.
I would note that this behavior is not specific to GHC. If you compile
this C program with gcc (Debian 4.9.1-14) at optimization level `-O1` or
higher
{{{#!c
void g();
void h();
void f(int x)
{
if ((x == 0) OR (x == 5))
g();
else
h();
}
}}}
you will get identical assembly output (with two branches, not an `or`
instruction) whether you set `-DOR=|` or `-DOR=||`. So C does not give you
control over branchlessness either.
I understand that depending on the characteristics of the input, the
"branchless" code may be faster (e.g. for the C function above, `x`
randomly drawn from the set 0 (49%), 5 (49%), 10 (2%)). However, with
certain other distributions of the input value, the "branchy" version is
likely to be faster. So there is no right or wrong answer here.
In general it takes a lot of engineering effort to get a compiler to
reliably produce two different optimized outputs for two different
programs that it can see are semantically identical. In this case maybe
there is an easy solution: add a new `secretlyEquality#` which is the same
as `(==#)`, but where that fact is hidden from GHC for long enough for
`secretlyEquality#` to survive the stages that transform `(==#)`
unscathed. This approach is intrinsically somewhat fragile (for example,
the LLVM backend may introduce additional branches anyways), but it has
the merit of ease of implementation.
In any case, I would like to see some benchmarks showing that there are
performance gains to be had in real programs from control over
branchlessness before expending any effort on implementation.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9661#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list