[GHC] #14068: Loopification using join points

Fri Mar 23 21:40:24 UTC 2018

#14068: Loopification using join points
-------------------------------------+-------------------------------------
        Reporter:  nomeata           |                Owner:  nomeata
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:  JoinPoints
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #13966 #14067     |  Differential Rev(s):  Phab:D3811
  #14827                             |
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by kavon):

 Replying to [comment:52 nomeata]:
 > > Question: how beneficial is it to loopify a top-level function?
 >
 > Well, we still want to use jumps for the recursive calls, rather than
 normal function calls,  even if it is top-level, right?

 In the end, they'll still be emitted as jumps since they're tail calls.
 Considering the transformation in isolation (i.e., ignoring knock-on
 effects like inlining), using join-point throws instead of tail-recursive
 calls theoretically allows us to cheapen the iteration overhead in the
 following ways:

 1. For joinrecs whose RHS contains a non-tail call, we can avoid a stack
 check and stack pointer bumps on each iteration, since the join
 continuation can keep reusing the stack frame setup on the initial entry
 to the function. This depends on whether StackLayout in Cmm is optimized
 to do this.

 2. Optimizing argument-passing, such as by moving static arguments out of
 the recursive throws, spilling rarely used arguments in high-pressure
 loops, or allowing code generation to pick registers with a smaller
 instruction encoding (which LLVM loves to do for x86_64).

 3. Aligning a hot loop's header. Many x86_64 CPUs prefer 16-byte aligned
 jump targets, but because we add info tables just before a function label,
 the alignment of a function's body may only be 8-byte aligned. Code
 generators can more easily align the target of a join-point throw since it
 is less likely to have info table attached to it.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14068#comment:53>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler