[GHC] #9646: Simplifer non-determinism leading to 8 fold difference in run time performance
GHC
ghc-devs at haskell.org
Wed Oct 1 11:02:17 UTC 2014
#9646: Simplifer non-determinism leading to 8 fold difference in run time
performance
-------------------------------------+-------------------------------------
Reporter: erikd | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.3
Resolution: | Keywords:
Operating System: Linux | Architecture: x86_64 (amd64)
Type of failure: Runtime | Difficulty: Unknown
performance bug | Blocked By:
Test Case: | Related Tickets:
Blocking: |
Differential Revisions: |
-------------------------------------+-------------------------------------
Comment (by erikd):
Amos Robinson has helped make some progress on this.
The problem is with the `StrictPrim` monad (defined in the file
`Common/GHC/Ineteger/StrictPrim.hs` of the ghc-perfbug-test project) which
basically an ST monad with a bunch of explicit bang patterns to force
strict evaluation.
The big difference between the slow and the fast version is that in the
slow version of the .dump-simpl output of these inner loops have an extra
lambda eg on line 39 of `slow-Natural.dump-simpl`:
{{{
(\ eta -> (# eta, vx #)) `cast` ...;
}}}
The `eta` variable in this case is the state token for the `StrictPrim`
monad.
Furthermore, Amos Robinson was able to re-write the inner loop from:
{{{#!hs
innerLoop2 !xi !yi !carryhi !carrylo !sum
| yi < n2 = do
x <- indexWordArrayM arr1 xi
y <- indexWordArrayM arr2 yi
let (# !cry0, !prod #) = timesWord2 x y
(# !cry1, !sum1 #) = plusWord2 prod sum
(# !tcryhi, !crylo #) = plusWord2C carrylo cry0 cry1
!cryhi = plusWord carryhi tcryhi
innerLoop2 (xi - 1) (yi + 1) cryhi crylo sum1
| otherwise = do return $! (carryhi, carrylo, sum)
}}}
to
{{{#!hs
innerLoop2 !xi !yi !carryhi !carrylo !sum =
StrictPrim $ \s ->
if yi < n2
then
let StrictPrim go = do
x <- indexWordArrayM arr1 xi
y <- indexWordArrayM arr2 yi
let (# !cry0, !prod #) = timesWord2 x y
(# !cry1, !sum1 #) = plusWord2 prod sum
(# !tcryhi, !crylo #) = plusWord2C carrylo
cry0 cry1
!cryhi = plusWord carryhi
tcryhi
innerLoop2 (xi - 1) (yi + 1) cryhi crylo sum1
in go s
else (# s, (carryhi, carrylo, sum) #)
}}}
and this new formulation *always* compiles to run fast.
After some discussion with Amos, we came to the following conclusions:
1. The simplifier should be deterministic. The same input file, with the
same compiler flags should result in the same output (preferably the
output that performs better).
2. It is not un-reasonable to expect the compiler to apply the
transformation that Amos appied manually.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/9646#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list