Loop unrolling + fusion ?

Wed Mar 18 20:18:29 EDT 2009

Recursion unfolding spec, 2nd attempt. 

The main difference is to look at groups of mutually recursive
definitions as a whole, rather than trying to think about individual
definitions. That step actually seems sufficient to address most of
the shortcomings raised so far, such as avoiding runaway INLINE
or using PEEL/UNROLL also for mutually recursive definitions. 
I've also interpreted Max's comments as most of the existing 
INLINE restriction still making sense for recursive INLINE,
with small clarifications.

In the following, let REC({f g ..}) denote the set of all identifiers
belonging to the recursion involving f, g, .. (f, g, .. in REC({f g ..})
or in {-# INLINE f g .. #-} are required to belong to the same 
recursion).

{-# NOINLINE f #-}
   as now: no unfolding of f

{-# INLINE f #-} 
   as now: for non-recursive f only, unfold definition of f at call
   sites of f (might in future be taken as go-ahead for analysis-based
   recursion unfolding)

{-# INLINE f g .. PEEL n #-}
   new: unfold definitions of the named identifiers at their call
   sites *outside* their recursion group REC({f g ..}). In other
   words, *entries into* REC({f g ..}) via f, g, .. are unfolded.

   (for the special case of loops this corresponds to loop peeling)

{-# INLINE f g .. UNROLL m #-}
   new: unfold definitions of the named identifiers at their call
   sites *inside* their recursion group REC({f g ..}). In other
   words, *cross-references inside* REC({f g ..}) via f, g, .. are
   unfolded.

   (for the special case of loops this corresponds to loop unrolling)

{-# INLINE f g .. PEEL n UNROLL m #-}
   combine the previous two

   The numeric parameters are to be interpreted as if each call to
   f, g, .. was annotated with both PEEL and UNROLL limits for the 
   whole recursion group REC({f g ..}), starting with the limits from
   the pragmas (write f_n_m for a call to f with PEEL limit n and
   UNROLL limit m), to be decreased for every PEEL or UNROLL 
   action, as follows (REC({f g}) = {f g h}, in these examples):

1. let {-# INLINE f g PEEL n UNROLL m #-}
       f .. = .. f_?_? .. g_?_? .. h_0_0 ..
       g .. = .. f_?_? .. g_?_? .. h_0_0 .. 
       h .. = .. f_?_? .. g_?_? .. h_0_0 .. 
   in ..|f_n_m|..

   --PEEL-->

   let {-# INLINE f g PEEL n UNROLL m #-}
       f .. = .. f_?_? .. g_?_? .. h_0_0 ..
       g .. = .. f_?_? .. g_?_? .. h_0_0 .. 
       h .. = .. f_?_? .. g_?_? .. h_0_0 .. 
   in ..|.. f_(n-1)_0 .. g_(n-1)_0 .. h_0_0 ..|..

   Notes: - unfolding produces copies of definition bodies
          - the PEEL limit at the call site decides the PEEL
            limit for all calls to REC({f g}) in the inlined
            copy; this limit decreases with each PEEL step
         - since peeling unfolds code into call sites from outside
            the recursion, the UNROLL limits of calls to REC({f g})
            are effectively 0 in the inlined copy
          - only calls to identifiers named in the INLINE pragma
            can be peeled (f and g here), calls to other members of
            the same recursion remain unaffected (h here), having
            effective limits of 0

2. let {-# INLINE f g PEEL n UNROLL m #-}
       f .. = .. f_0_m .. g_?_? .. h_0_0 ..
       g .. = .. f_?_? .. g_?_? .. h_0_0 .. 
       h .. = .. f_?_? .. g_?_? .. h_0_0 .. 
   in ..

   --UNROLL-->

   let {-# INLINE f g PEEL n UNROLL m #-}
       f .. = .. .. f_0_(m-1) .. g_0_(m-1) .. h_0_0 .. .. g_?_? .. h_0_0 ..
       g .. = .. f_?_? .. g_?_? .. h_0_0 .. 
       h .. = .. f_?_? .. g_?_? .. h_0_0 .. 
   in ..

   Notes: - unfolding produces copies of definition bodies
          - the UNROLL limit at the call site decides the UNROLL
            limit for all calls to REC({f g}) in the inlined copy; this 
            limit decreases with each UNROLL step
          - peeling conceptually precedes unrolling (PEEL limit needs 
            to reach 0 before unrolling commences), to avoid peeling 
            unrolled definitions (this corresponds to an existing restriction
            of no inlining into definitions to be inlined; 
          - unrolling unfolds copies of the original definitions, not the
             already unrolled ones, again corresponding to the existing
             inlining restriction (TODO: how to specify this avoidance 
             of unrolling unrolled defs in this form of local rule spec?)
          - only calls to identifiers named in the INLINE pragma
            can be unrolled (f and g here), calls to other members of
            the same recursion remain unaffected (h here), having
            effective limits of 0

   Peeling and unrolling stop when the respective count annotation has
   reached 0. Peeling precedes unrolling, to avoid ambiguities in the
   size of the peeled definitions. Note that mutual recursion is the
   domain of PEEL, while UNROLL only applies to (mutual) recursion.

   {-# INLINE f PEEL n #-}, for n>0, corresponds to worker/
   wrapper transforms (previously done manually) + inline wrapper,
   and should therefore also be taken as a hint for the compiler to 
   try the static argument transformation for f (the "worker").

   Non-supporting implementations should treat these as INLINE
   pragmas (same warning/ignore or automatic unfold behaviour).
   This might be easier to accomplish if INLINE PEEL/UNROLL
   were implemented as separate pragmas, even though they are
   refinements of INLINE conceptually.

   About the current side-conditions for INLINE pragmas:

    - no functions inlined into f: 

      still makes sense for PEEL, needs to be adapted with an exception
      for UNROLL, in that we want to be able to unroll into the function 
      being unrolled, but we want to use the original body for the unrolling, 
      not an already unrolled one (else unrolling would be exponential rather
      than linear); this appears to be in line with existing work on INLINE

    - no float-in/float-out/cse: 

        similar to existing INLINE

    - no worker/wrapper transform in strictness analyser: 

        similar to existing INLINE

    - loop breakers: 

       PEEL/UNROLL have their own limits, applicable to the whole
       recursion group, creating intrinsic loop breakers when the counters 
       run out. Every PEEL or UNROLL action creates calls with smaller
       counters in the inlined copies, if the calls go into the same recursion.

If this is an improvement on the first version, and after correcting any
obvious issues, I should put it on the ghc trac wiki somewhere, and 
create a feature request ticket.

Claus