[GHC] #14644: Improve cmm/assembly for pattern matches with two constants.

Tue Jan 9 13:03:24 UTC 2018

#14644: Improve cmm/assembly for pattern matches with two constants.
-------------------------------------+-------------------------------------
        Reporter:  AndreasK          |                Owner:  AndreasK
            Type:  task              |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
  (CodeGen)                          |             Keywords:  Codegen, CMM,
      Resolution:                    |  Patterns, Pattern Matching
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):  Phab:D4294
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by AndreasK):

 * owner:  (none) => AndreasK
 * differential:   => Phab:D4294

Comment:

 I ran a simple Benchmark and the results seem almost too good with >10%
 gains in almost all cases.

 If it really is that good then, well, I guess good for us.

 Assuming I didn't introduce an error somewhere I assume the combination of
 smaller code and less instructions on some paths is just better in
 general.

 > If we e.g. assume that all Ints happen equally often in the example
 above, it would be best to check for <1 and >7 first, so we would get
 roughly 1.5 comparisons on average. Depending on the number of registers
 available, you can even get away with 1 subtraction and 1 unsigned
 comparison for this range check, a classic C hack (ab)using wrap-around
 for unsigned ints.

 I thought about this a bit. While true it also means the first comparison
 has a hard time with prediction if the numbers are a random distribution
 over the range. So two 99.9% correctly predicted comparisons might still
 be better even without considering code size.

 `range check, jg, je, cmp, je, j <default>` could still be better. But not
 sure if I will look into that and it might warrant it's own ticket as GHC
 also uses this for range checks on jump tables I think.

 Code used for the test:
 {{{
 f_test :: Int -> Int
 f_test  111 = 1111
 f_test  222 = 2222
 f_test  _ = -1

 run = print . sum . map f_test

 --main = print . sum . map f_test $ ([(-1500000000::Int) .. 0])
 main = do
     args <- getArgs :: IO [String]
     if null args
         then putStrLn "Usage: pos|neg|both"
         else case (head args) of
             "pos" -> run [0 .. (1500000000::Int)]
             "neg" -> run [-1500000000::Int .. 0]
             "both" -> run [-750000000::Int .. 750000000::Int]
             "negrep" -> run . concat . replicate 100000000 $ [-311, -322,
 -333, -444]
             "posrep" -> run . concat . replicate 100000000 $ [311, 322,
 333, 444]
             "unpre" -> run . concat . replicate 100000000 $ [111, 322,
 -333, 222]
 }}}

 I get these results for the current approach (range) and mine (if) on an
 6700K

 === With inlining:
 {{{
 benchmarking execute: range pos
 mean                 1.196 s    (1.196 s .. 1.198 s)
 benchmarking execute: if pos
 mean                 806.0 ms   (803.5 ms .. 807.3 ms)

 benchmarking execute: range neg
 mean                 2.384 s    (2.383 s .. 2.385 s)
 benchmarking execute: if neg
 mean                 1.841 s    (1.841 s .. 1.842 s)

 benchmarking execute: range negrep
 mean                 840.1 ms   (838.1 ms .. 841.3 ms)
 benchmarking execute: if negrep
 mean                 728.6 ms   (728.3 ms .. 728.8 ms)

 benchmarking execute: range unpre
 mean                 852.2 ms   (851.1 ms .. 853.1 ms)
 benchmarking execute: if unpre
 mean                 789.7 ms   (789.3 ms .. 789.9 ms)
 }}}

 === With inlining disabled on f_test
 {{{
 benchmarking execute: range pos
 mean                 2.385 s    (2.383 s .. 2.386 s)
 benchmarking execute: if pos
 mean                 2.383 s    (2.382 s .. 2.384 s)

 benchmarking execute: range neg
 mean                 2.845 s    (2.839 s .. 2.848 s)
 benchmarking execute: if neg
 mean                 2.047 s    (2.041 s .. 2.053 s)

 benchmarking execute: range negrep
 mean                 1.204 s    (1.201 s .. 1.205 s)
 benchmarking execute: if negrep
 mean                 829.4 ms   (828.4 ms .. 830.1 ms)

 benchmarking execute: range unpre
 mean                 1.165 s    (1.164 s .. 1.166 s)
 benchmarking execute: if unpre
 mean                 1.118 s    (1.117 s .. 1.119 s)
 }}}

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14644#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler