[GHC] #13966: Skip-less stream fusion

Tue Jul 18 12:33:16 UTC 2017

#13966: Skip-less stream fusion
-------------------------------------+-------------------------------------
        Reporter:  jmspiewak         |                Owner:  (none)
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.1-rc3
      Resolution:                    |             Keywords:  JoinPoints
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 This is a very interesting example, thank you.

 I note that in HEAD it fuses just fine.  I have not yet figured out
 exactly why, but I want to look at this code, from teh Description:
 {{{
 letrec {
   $wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
   [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
   $wloop
     = \ (ww1_s9e8 :: Int#) ->
         case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
           False ->
             case remInt# ww1_s9e8 2# of {
               __DEFAULT -> $wloop (+# ww1_s9e8 1#);
               0# ->
                 Main.Yield1
                   @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#))
 (GHC.Types.I# ww1_s9e8)
             };
           True -> Main.Done1 @ Int @ Int
         }; } in
 joinrec {
   $wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
   [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
   $wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
     = case $wloop ww2_s9ej of {
         Done1 -> ww1_s9ef;
         Yield1 s'_a497 x_a498 ->
           case x_a498 of { GHC.Types.I# y_a66i ->
           case s'_a497 of { GHC.Types.I# ww4_X9hA ->
           jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA
           }
           }
       }; } in
 jump $wloop1 0# 1#
 }}}
 Once GHC gets the program into this state, it's not going to be able to
 optimise it.   HEAD somehow avoids this dead end, but I hate things where
 GHC can get stuck in a dead end.  I think this code ''ought'' to optimise
 just fine.  Here's why.

 Look at thar functionn `$wloop`.   It's not a join point becuaes it's not
 tail-called in the body of the `letrec`.   But suppose we transform
 `$wloop` like this:
 {{{
 let {
   $wloop [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Step1 Int Int
   [LclId, Arity=1, Str=<S,U>, Unf=OtherCon []]
   $wloop
     = \x. joinrec $j (ww1_s9e8 :: Int#)
             = case tagToEnum# @ Bool (># ww1_s9e8 ww_s9ep) of {
                 False ->
                   case remInt# ww1_s9e8 2# of {
                     __DEFAULT -> jump $j (+# ww1_s9e8 1#);
                     0# ->
                       Main.Yield1
                         @ Int @ Int (GHC.Types.I# (+# ww1_s9e8 1#))
 (GHC.Types.I# ww1_s9e8)
                   };
                 True -> Main.Done1 @ Int @ Int
           in jump $j x
 }}}
 Now `$wloop` is non-recursive, so we can inline it at its only call site,
 in `$wloop1`:
 {{{
 joinrec {
   $wloop1 [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
   [LclId[JoinId(2)], Arity=2, Str=<S,U><S,U>, Unf=OtherCon []]
   $wloop1 (ww1_s9ef :: Int#) (ww2_s9ej :: Int#)
     = case (...the body of $wloop...) of {
         Done1 -> ww1_s9ef;
         Yield1 s'_a497 x_a498 ->
           case x_a498 of { GHC.Types.I# y_a66i ->
           case s'_a497 of { GHC.Types.I# ww4_X9hA ->
           jump $wloop1 (+# ww1_s9ef y_a66i) ww4_X9hA
           }
           }
       }; } in
 jump $wloop1 0# 1#
 }}}
 And now the right fusion will happen.

 The crucial bit was the transformation of `$wloop`: we took a tail-
 recursive function, introduced a (recursive) join point into it, which
 made it non-recursive.  Even if nothing further happens, the
 implementation of `$wloop` is a bit more efcient because the tail call is
 just a branch.  But the big thing here is that we can now inline `$wloop`
 in `$wloop1`.

 So the idea is this: a transformation to turn a tail-recursive function
 definition into one that is implemented with a recursive join point.  If
 we had such a transformation, it'd get us out of the dead end.

 Actually, there's a variant of the Static Argument Transformation
 ([wiki:StaticArgumentTransformation]) at work here.  Consider
 {{{
 f x y = case y of
           A -> f x y'
           B -> e2
           C -> e3
 }}}
 Here the first argument to `f` is "static"; that is, the same in every
 call.  So we can transform like this
 {{{
 f x y = joinrec $j y = case y of
                           A -> $j y'
                           B -> e2
                           C -> e3
         in $j y
 }}}
 Note that `x` isn't passed around in every iteration.

 There's a GHC module `SAT.hs` which does the static argument
 transformation, but it is not join-point aware.  So we should fix that.

 One reason we don't currently do SAT all the time is that the results are
 a bit ambiguous; see Andre Santos's thesis for more, cited on
 [wiki:StaticArgumentTransformation].  BUT I think that some (maybe most)
 of the problems with SAT may go away if we restrict SAT to tail-recursive
 functions that we can turn into `joinrecs`.

 Any volunteers?

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13966#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler