[GHC] #8763: forM_ [1..N] does not get fused (10 times slower than go function)

Thu Apr 12 09:28:27 UTC 2018

#8763: forM_ [1..N] does not get fused (10 times slower than go function)
-------------------------------------+-------------------------------------
        Reporter:  nh2               |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  7.6.3
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #7206             |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by sgraf):

 Here's an implementation of `efdtIntFB` that fits our requirements:

 {{{
 data Direction = Up | Dn deriving Eq

 direction :: Int# -> Int# -> Direction
 direction from to
   | isTrue# (to >=# from) = Up
   | otherwise = Dn

 efdtIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> Int# -> r
 efdtIntFB c n x1 x2 y = emit first x1
   where
     -- We can safely emit the first element if an iteration
     -- 'moves closer' to @y at . That's exactly the case when
     -- @dir_x2@ coincides with @dir_y at .
     !first  = dir_x2 == dir_y
     !dir_x2 = direction x1 x2
     !dir_y  = direction x1 y
     -- We need the overflow flag in 'emit'.
     (# delta, delta_ovf #) = x2 `subIntC#` x1

     -- | Think of @emit :: Maybe Int -> [Int]@, only unboxed.
     -- If the argument is 'Nothing', we reached the end of the list.
     -- If the argument is 'Just', we emit an element, compute
     -- the next candidate, validate it and recurse.
     emit False _ = n
     emit True  x = I# x `c` emit next_ok next
       where
         -- Check that @next@ didn't move past @y at .
         -- Also, overflow is only allowed iff the computation for
         -- @delta@ overflowed.
         (# next, next_ovf #) = addIntC# x delta
         !next_ok =  isTrue# (next_ovf ==# delta_ovf)
                  && direction next y == dir_y
                  -- TODO: evaluate strict && for branchless code
 {-# INLINE[0] efdtIntFB #-}
 }}}

 Some pros:

 - I find this much easier to understand. No complicated invariants, etc.
 - No Up/Dn variants to maintain. Still, if the direction is statically
 known, constant folding and inlining will simplify stuff to the equivalent
 code.
 - As a result, no more duplication of `c` occurrences
 - Also no more duplication of `n` occurrences

 Cons:

 - `emit`s closure is 4 words big (2 words bigger than the closure of the
 original `go_up` helper) in the most general form. It's unfortunate that
 we can't pack together `dir_y` and `delta_ovf` into a single word without
 confusing constant folding. This would need either some kind of constant
 propagation through bit fields (out of scope for GHC, I think) or a
 smarter closure allocation theme that packs together non-pointer payloads.
 - We pay for the generalisation of Up/Dn variants by having to compare
 with `dir_y` all the time.
 - `base` lacks `addWordC#` primitives, which I'll probably add now

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8763#comment:60>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler