# Loop unrolling + fusion ?

Claus Reinke claus.reinke at talk21.com
Fri Mar 6 19:07:34 EST 2009

```> My preferred spec would be roughly
>
> {-# NOINLINE f #-}
>    as now
>
> {-# INLINE f #-}
>    works as now, which is for non-recursive f only (might in future
>    be taken as go-ahead for analysis-based recursion unfolding)
>
> {-# INLINE f PEEL n #-}
>    inline calls *into* recursive f (called loop peeling for loops)
>
> {-# INLINE f UNROLL m #-}
>    inline recursive calls to f *inside* f (called loop unrolling for loops)
>
> {-# INLINE f PEEL n UNROLL m #-}
>    combine the previous two
>
>    The numeric parameters are to be interpreted as if each call to
>    f was annotated with both PEEL and UNROLL limits, to be
>    decreased as appropriate for every PEEL or UNROLL action.

hmm, "appropriate" is one of those words that shouldn't occur in specs,
not even rough ones, so let's flesh this out a bit, by abstract example.

let f = ..f.. in f{n,m} -PEEL-> let f = ..f.. in ..f{n-1,m}..

let f = ..f{n,m}.. in .. -UNROLL-> let f = ..|..f{n,m-1}..|.. in ..

In words: the call being peeled/unrolled disappears, being replaced by
a copy of the definition, in which the decremented counts are applied to
the calls of the same function created by unfolding. Is that specific enough?

>    Peeling and unrolling stop when the respective count annotation
>    has reached 0. Note that mutual recursion is the domain of PEEL,
>    while UNROLL only applies to direct recursion.
>
>    {-# INLINE f PEEL n #-}, for n>0, corresponds to worker/
>    wrapper transforms (previously done manually) + inline wrapper,
>    and should therefore also be taken as a hint for the compiler to
>    try the static argument transformation for f (the "worker").
>
>    Non-supporting implementations should treat these as INLINE
>    pragmas (same warning/ignore or automatic unfold behaviour).
>
> Since we are talking about a refinement of the INLINE pragma, we
> also need to look at that pragma's existing subtleties:-(
>
> - no functions inlined into f: should be subject to override by
>    INLINE pragmas (even for the non-recursive case?)
> - no float-in/float-out/cse: ??
> - no worker/wrapper transform in strictness analyser: we do get the
>    same effect from INLINE PEEL, so this should be okay, right?
> - loop breakers: PEEL/UNROLL have their own limits, creating
>    an intrinsic loop breaker when the counters run out

Loop breakers are still needed, in spite of the explicit limits. Consider

let {odd x = ..even{1,0}..; even x = ..odd{1,0}..} in odd{1,0} n

Peeling odd gives a call to even, peeling of which gives a fresh, not
decremented, call to odd! Unless one makes a copy of the whole
mutual recursion, with the odd calls adjusted. This might be easier
to handle in your "unfolding as a separate core2core pass" scenario,