[GHC] #14644: Improve cmm/assembly for pattern matches with two constants.
GHC
ghc-devs at haskell.org
Tue Jan 9 13:03:24 UTC 2018
#14644: Improve cmm/assembly for pattern matches with two constants.
-------------------------------------+-------------------------------------
Reporter: AndreasK | Owner: AndreasK
Type: task | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.2
(CodeGen) | Keywords: Codegen, CMM,
Resolution: | Patterns, Pattern Matching
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D4294
Wiki Page: |
-------------------------------------+-------------------------------------
Changes (by AndreasK):
* owner: (none) => AndreasK
* differential: => Phab:D4294
Comment:
I ran a simple Benchmark and the results seem almost too good with >10%
gains in almost all cases.
If it really is that good then, well, I guess good for us.
Assuming I didn't introduce an error somewhere I assume the combination of
smaller code and less instructions on some paths is just better in
general.
> If we e.g. assume that all Ints happen equally often in the example
above, it would be best to check for <1 and >7 first, so we would get
roughly 1.5 comparisons on average. Depending on the number of registers
available, you can even get away with 1 subtraction and 1 unsigned
comparison for this range check, a classic C hack (ab)using wrap-around
for unsigned ints.
I thought about this a bit. While true it also means the first comparison
has a hard time with prediction if the numbers are a random distribution
over the range. So two 99.9% correctly predicted comparisons might still
be better even without considering code size.
`range check, jg, je, cmp, je, j <default>` could still be better. But not
sure if I will look into that and it might warrant it's own ticket as GHC
also uses this for range checks on jump tables I think.
Code used for the test:
{{{
f_test :: Int -> Int
f_test 111 = 1111
f_test 222 = 2222
f_test _ = -1
run = print . sum . map f_test
--main = print . sum . map f_test $ ([(-1500000000::Int) .. 0])
main = do
args <- getArgs :: IO [String]
if null args
then putStrLn "Usage: pos|neg|both"
else case (head args) of
"pos" -> run [0 .. (1500000000::Int)]
"neg" -> run [-1500000000::Int .. 0]
"both" -> run [-750000000::Int .. 750000000::Int]
"negrep" -> run . concat . replicate 100000000 $ [-311, -322,
-333, -444]
"posrep" -> run . concat . replicate 100000000 $ [311, 322,
333, 444]
"unpre" -> run . concat . replicate 100000000 $ [111, 322,
-333, 222]
}}}
I get these results for the current approach (range) and mine (if) on an
6700K
=== With inlining:
{{{
benchmarking execute: range pos
mean 1.196 s (1.196 s .. 1.198 s)
benchmarking execute: if pos
mean 806.0 ms (803.5 ms .. 807.3 ms)
benchmarking execute: range neg
mean 2.384 s (2.383 s .. 2.385 s)
benchmarking execute: if neg
mean 1.841 s (1.841 s .. 1.842 s)
benchmarking execute: range negrep
mean 840.1 ms (838.1 ms .. 841.3 ms)
benchmarking execute: if negrep
mean 728.6 ms (728.3 ms .. 728.8 ms)
benchmarking execute: range unpre
mean 852.2 ms (851.1 ms .. 853.1 ms)
benchmarking execute: if unpre
mean 789.7 ms (789.3 ms .. 789.9 ms)
}}}
=== With inlining disabled on f_test
{{{
benchmarking execute: range pos
mean 2.385 s (2.383 s .. 2.386 s)
benchmarking execute: if pos
mean 2.383 s (2.382 s .. 2.384 s)
benchmarking execute: range neg
mean 2.845 s (2.839 s .. 2.848 s)
benchmarking execute: if neg
mean 2.047 s (2.041 s .. 2.053 s)
benchmarking execute: range negrep
mean 1.204 s (1.201 s .. 1.205 s)
benchmarking execute: if negrep
mean 829.4 ms (828.4 ms .. 830.1 ms)
benchmarking execute: range unpre
mean 1.165 s (1.164 s .. 1.166 s)
benchmarking execute: if unpre
mean 1.118 s (1.117 s .. 1.119 s)
}}}
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14644#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list