[GHC] #9646: Simplifer non-determinism leading to 8 fold difference in run time performance

GHC ghc-devs at haskell.org
Wed Oct 1 11:02:17 UTC 2014


#9646: Simplifer non-determinism leading to 8 fold difference in run time
performance
-------------------------------------+-------------------------------------
              Reporter:  erikd       |            Owner:
                  Type:  bug         |           Status:  new
              Priority:  normal      |        Milestone:
             Component:  Compiler    |          Version:  7.8.3
            Resolution:              |         Keywords:
      Operating System:  Linux       |     Architecture:  x86_64 (amd64)
       Type of failure:  Runtime     |       Difficulty:  Unknown
  performance bug                    |       Blocked By:
             Test Case:              |  Related Tickets:
              Blocking:              |
Differential Revisions:              |
-------------------------------------+-------------------------------------

Comment (by erikd):

 Amos Robinson has helped make some progress on this.

 The problem is with the `StrictPrim` monad (defined in the file
 `Common/GHC/Ineteger/StrictPrim.hs` of the ghc-perfbug-test project) which
 basically an ST monad with a bunch of explicit bang patterns to force
 strict evaluation.

 The big difference between the slow and the fast version is that in the
 slow version of the .dump-simpl output of these inner loops have an extra
 lambda eg on line 39 of `slow-Natural.dump-simpl`:

 {{{
 (\ eta -> (# eta, vx #)) `cast` ...;
 }}}

 The `eta` variable in this case is the state token for the `StrictPrim`
 monad.

 Furthermore, Amos Robinson was able to re-write the inner loop from:

 {{{#!hs
     innerLoop2 !xi !yi !carryhi !carrylo !sum
         | yi < n2 = do
             x <- indexWordArrayM arr1 xi
             y <- indexWordArrayM arr2 yi
             let (# !cry0, !prod #) = timesWord2 x y
                 (# !cry1, !sum1 #) = plusWord2 prod sum
                 (# !tcryhi, !crylo #) = plusWord2C carrylo cry0 cry1
                 !cryhi = plusWord carryhi tcryhi
             innerLoop2 (xi - 1) (yi + 1) cryhi crylo sum1
         | otherwise = do return $! (carryhi, carrylo, sum)
 }}}

 to

 {{{#!hs
     innerLoop2 !xi !yi !carryhi !carrylo !sum =
         StrictPrim $ \s ->
             if yi < n2
                 then
                     let StrictPrim go = do
                         x <- indexWordArrayM arr1 xi
                         y <- indexWordArrayM arr2 yi
                         let (# !cry0,   !prod  #) = timesWord2 x y
                             (# !cry1,   !sum1  #) = plusWord2  prod sum
                             (# !tcryhi, !crylo #) = plusWord2C carrylo
 cry0 cry1
                             !cryhi                = plusWord   carryhi
 tcryhi
                         innerLoop2 (xi - 1) (yi + 1) cryhi crylo sum1
                     in go s
                 else (# s, (carryhi, carrylo, sum) #)
 }}}

 and this new formulation *always* compiles to run fast.

 After some discussion with Amos, we came to the following conclusions:

 1. The simplifier should be deterministic. The same input file, with the
 same compiler flags should result in the same output (preferably the
 output that performs better).

 2. It is not un-reasonable to expect the compiler to apply the
 transformation that Amos appied manually.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9646#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list