[GHC] #8763: forM_ [1..N] does not get fused (allocates 50% more)

GHC ghc-devs at haskell.org
Tue Sep 4 11:56:39 UTC 2018


#8763: forM_ [1..N] does not get fused (allocates 50% more)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.8.1
       Component:  Compiler          |              Version:  7.6.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #7206             |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 Sure will do. Let's begin with the original problem. There's two functions
 in [[attachment:reprof.hs]]. One is `f`, having a loop of the `forM_
 [2..n] body` form, the other is `g` which has the `forM_ [2,3..n] body`
 form. The former doesn't allocate (56kB in total), whereas the latter
 allocates quite a lot (960MB).

 Why is that? The arithmetic sequences desugar, rewrite and specialise to
 `build (\c t -> eftIntFB c z 2 n)` and `build (\c z -> efdtIntFB c z 2 3
 n)`, respectively. That cancels away with `forM_`'s implementation in
 terms of `foldr`:

 {{{
   forM_= flip mapM_
   mapM_ body
 = {- definition -}
   foldr ((>>) . body) (return ())
 = {- eta expand the section -}
   foldr (\x k -> body x >> k) (return ())
 = {- (>>) of ST, written as `s -> (a, s)` for lighter syntax -}
   foldr (\x k s -> case (body x s) of (_, s') -> k s') (return ())
 }}}

 Note how `k` occurs in tail position within the lambda. Now, this cancels
 with the definition of `ef(d)tIntFB`:

 {{{
   foldr (\x k s -> case (body x s) of (_, s') -> k s') (return ()) . build
 (\c z -> efdtIntFB c z 2 3 n)
 = {- foldr/build -}
   efdtIntFB (\x k s -> case (body x s) of (_, s') -> k s') (return ()) 2 3
 n
 }}}

 So, that lambda is what is bound to `c` when `ef(d)tIntFB` gets inlined.

 Now, the implementation of `eftIntFB` only has a single occurrence of `c`,
 so it will inline properly and bind its `k` parameter (the continuation in
 tail position) to the recursive `go` call here
 [[https://github.com/ghc/ghc/blob/fa3143c76ac77ee96fd89559cacc089205abaa20/libraries/base/GHC/Enum.hs#L522]].
 The call to `k` turns into a tail call within `go`s body:

 {{{
 go x = I# x `c` if isTrue# (x ==# y)
        then n
        else go (x +# 1#)
 ==> c = \x k s -> case (body x s) of (_, s') -> k s'
 go x s =
   case (body (I# x) s) of
     (_, s') ->
       if isTrue# (x ==# y)
       then n s'
       else go (x +# 1#) s'
 }}}

 And `go` can be made a join point.

 The same isn't possible in the current `efdtIntFB`, because it duplicates
 `c` by branching on whether to count up or down (and also within the loop
 itself, anyway):

 {{{
 efdtIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> Int# -> r
 efdtIntFB c n x1 x2 y
  | isTrue# (x2 >=# x1) = efdtIntUpFB c n x1 x2 y
  | otherwise = efdtIntDnFB c n x1 x2 y
 }}}

 Now, when `efdtInt{Up,Dn}FB` gets inlined, we end up with a let-bound `c`
 that still tail-calls its argument `k`, and is even tail-called itself
 within `go_up`/`go_dn` (here
 https://github.com/ghc/ghc/blob/fa3143c76ac77ee96fd89559cacc089205abaa20/libraries/base/GHC/Enum.hs#L588):

 {{{
 let c x k s = case (body x s) of (_, s') -> k s'
 in ...
    let go_up x
          | isTrue# (x ># y') = I# x `c` n
          | otherwise = I# x `c` go_up (x +# delta)
 }}}

 Note that `go_up` tail calls `c` and passes itself as the `k` parameter.
 If `c` was inlined, all would be fine and `go_up` would turn into a join
 point. That's not the case because `c` is duplicated in `efdtIntFB` and
 then one more time in `efdtInt{Up,Dn}FB`. My first implementation in
 comment:52 (for which you provided a simplification in comment:54) dealt
 with the latter, while the idea in comment:60 is supposed to deal with the
 former. Sadly, case-of-case seems to undo the painful de-duplication of
 the `c` parameter (that's what comment:71 is about).

 Why doesn't LLF help here? Well, lifting out `c` to top-level gets rid of
 allocations for `c`, but there's still at least the allocation for the
 thunk for `go_up (x+1)` (the `Int` box goes away because of strictness).
 Also, the call to `go_up` is still an unknown call, as opposed to the
 simple join call we would get by inlining `c`.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:74>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list