Loop unrolling + fusion ?

Mon Mar 9 18:44:27 EDT 2009

>>>> let f = ..f.. in f{n,m} -PEEL-> let f = ..f.. in ..f{n-1,m}..
>>
>>> Probably what you intend here is that you create one copy of the
>>> definition every round rather than one per call site, is that right?
>>
>> I don't think so - ultimately, the point of both peeling and unrolling is to
>> unfold a definition into a use site, to enable further optimizations, not
>> just to move from a recursive to a non-recursive definition. We could try to
>> do it in two steps, as you suggest, but that would expose us to the
>> heuristics of GHC inlining again (will or won't it inline the
>> new shared definition?), in the middle of a user-annotation-based unfolding.
> 
> Ah - I was thinking of something a bit different, where:
> 
> * PEEL / UNROLL pragmas duplicate the method body once per level of
> peeling / unrolling and fix up the recursive calls as appropriate
> * The user optionally adds an INLINE pragma to the function if he
> additionally wants to be SURE that those duplicates get inlined at the
> use sites

Ok, I suspected as much. You'd need to make the 'INLINE f' apply
to the generated 'fN', of course.

> This means that PEEL / UNROLL represent nice logically-orthogonal bits
> of functionality to INLINE-ing.

Usually, I'm all for orthogonality, and for more knobs to allow hand-tuning
of things that have no automatically reachable optimal solutions. In this case, 
however, I'm not sure anything would be gained. I recall that your hand-
unrolled code was written in a similar style, and assumed that it was a 
question of style, which GHC would inline into the same code.

But if you annotate all your unrolled and peeled new definitions as 
NOINLINE, do you still get the optimizations you want? There are 
probably a few GHC optimizations that can "look through" non-recursive
lets, but RULES are not among those.

For loop-style recursion, there'd be only one use per definition, so inlining
would be the default and there'd be no difference, but for non-loop-style
recursion, inlining might not happen, and so no further optimizations would
be enabled. Off the top of my head, I can't think of a case where that
would lead to improved code, but as I'm discovering, I'm not very familiar
with the details of what optimizations GHC is actually doing (though this
is quite helpful: 
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/HscMain )
so I might be missing something?

> Furthermore, I'm not too keen on duplicating method bodies at call
> sites willy-nilly because it may lead to increased allocations (of the
> function closures) in inner loops. At least if you bind the duplicated
> methods at the same level as the thing you are duplicating you only
> increase the dynamic number of closures created by a constant factor!

Yes, every form of INLINE has its limits. But if users say they want 
inlining (or peeling or unrolling or any other form of unfolding), that's 
what they should get, including those worrysome duplications. The 
idea is to create lots of added code (in order to remove abstractions 
that might hide optimization opportunities), which will then be simplified 
to something smaller (or at least better performing) than what we 
started out with. Providing the means to fine tune the amount of 
duplications might be useful, but preventing them entirely is not an option.

Claus