[GHC] #10678: integer-gmp's runS seems unnecessarily expensive

Fri Jul 24 09:16:58 UTC 2015

#10678: integer-gmp's runS seems unnecessarily expensive
-------------------------------------+-------------------------------------
        Reporter:  rwbarton          |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.10.1
  (CodeGen)                          |
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by simonpj):

 Reid, indeed it is known: #5916.  (If you find any other tickets about
 this, please add them.)

 I think the best approach would be as in comment:16 of
 ticket:5916#comment:16
  * '''Inline `runS` very late, in `CorePrep`'''

 Some thoughts about this.
  * We'd want a single, magical function which has this behaviour.  Things
 like `unsafeDupablePerformIO` should call `runS`.

  * Now I think about it, I wonder if the magical function should instead
 be this:
 {{{
 runRW# :: (State# s -> (# State s, a)) -> (# State s, a #)
 {-# NOINLINE runRW# #-}
 runRW# f = f realWorld#
 }}}
  That is, all `runRW#` does is to apply its argument to a fresh state
 token.

  Now we can define your `runS` thus:
 {{{
   runS f = case ruNRW# f of (# _, r #) -> r
 }}}
  I think we can inline `runS` freely, which is good because it means that
 more code is exposed to the optimiser (in particular that `case`).

  * Moreover, if we do this, I think we now don't need `lazy`.  Because
 `runRW#`'s strictness signature won't be strict in the `r` part; see `Note
 [unsafeDupablePerformIO has a lazy RHS]` in `GHC.IO`.

  * If `CorePrep` saw `runRW# (f a b)` it can generate `f a b realWorld#`.
 But if it sees `runRW# (\s.e)`, it should generate `e[realWorld#/s]`.
 That is, it should do the beta-reduction on the fly.  That's slightly
 annoying because `CorePrep` doesn't currently carry around a substitution.
 But I suppose that you could literally call `substExpr`; this doesn't
 happen much.

  * Alternatively maybe this could be done in the code generator,
 effectively treating `runRW` as a primop.  That would be a good plan,
 except that by the time we get to STG the program is in ANF, so instead of
 `runRW# (f a b)` we'd have
 {{{
 let g = f a b in ruNRW# g
 }}}
   and before we see the `runRW#` we'll have generated code to allocate a
 closure for `g`.  So we'd need to allow STG syntax (for specified primops)
 to have expressions, not just atoms, as the arguments to the primop.

  This might actually be a Jolly Good Thing.  At the moment `catch# (\s ->
 e1) (\x s -> e2)` allocates two closures before it gets to the `catch#`,
 which is pretty stupid since the first thing we do after pushing the
 exception-catch frame is to execute `e1`.

 I think changes like this would make modest but significant improvements
 to many programs.  If you'd like to dig into it, I'd gladly help.

 Simon

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10678#comment:2>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler