Loop unrolling + fusion ?

Sat Mar 7 04:28:51 EST 2009

2009/3/7 Claus Reinke <claus.reinke at talk21.com>:
> hmm, "appropriate" is one of those words that shouldn't occur in specs,
> not even rough ones, so let's flesh this out a bit, by abstract example.
>
> let f = ..f.. in f{n,m} -PEEL-> let f = ..f.. in ..f{n-1,m}..

Probably what you intend here is that you create one copy of the
definition every round rather than one per call site, is that right?
In the case of mutual recursion, I suppose something like this should
happen:

f = ... g ...
g = ... f ...

==>

f = ... g ...
g = ... f ...
f1 = ... g ...
g1 = ... f ...

i.e. after peeling f1 and g1 are free to be inlined into the use site
if GHC decides that is a good idea. Similarly for two rounds of
peeling you would get:

f = ... g ...
g = ... f ...
f1 = ... g ...
g1 = ... f ...
f2 = ... g1 ...
g2 = ... f1 ...

> let f = ..f{n,m}.. in .. -UNROLL-> let f = ..|..f{n,m-1}..|.. in ..

Similarly I suppose you intended that you get one copy of the body per
UNROLL, rather than per call-site? i.e:

f = ... f ...

==>

f = ... f1 ...
f1 = ... f2 ...
f2 = ... f ...

I'm not completely convinced that this doesn't make sense for mutual recursion:

f = ... g ...
g = ... f ...

==>

f = ... g ...
g = ... f1 ...
f1 = ... g1 ...
g1 = ... f ...

I'm not quite sure how to generalize that though :-)

>>   Non-supporting implementations should treat these as INLINE
>>   pragmas (same warning/ignore or automatic unfold behaviour).

Maybe they SHOULD do, but there are a lot of compilers out there in
the real world that won't :-). Making these entirely new pragmas feels
better to me.

I spoke to Simon PJ about these pragmas and he didn't sound terribly
enthusiatic - but he suggested they would be a nice use case for
compiler plugins :-). Plugins would only be capable of dealing with
UNROLL / PEEL as new pragmas. Of course, this kind of relies on us
getting plugins into the HEAD sometime...

>> - no functions inlined into f: should be subject to override by
>>   INLINE pragmas (even for the non-recursive case?)

If UNROLL / PEEL are seperate annotations we won't prevent inlining
into the UNROLLed/PEELed thing. But that might be bad! What if we
have:

x = BIG

{-# UNROLL f 3 #-}
f = ... x ... f ...

Now if we unconditionally inline x into f as the only use site we will
end up bloating up the code if we later run the unroller. However, if
we unroll first then the simplifier won't inline x and things will be
good. So perhaps you are right to say that this should be an extension
of INLINE.

As for the more general question about whether you should inline stuff
inside INLINEs at all - well, AFAIK the latest work on this by Simon
means that stuff /will/ be inlined inside them, but if that body is
subsequently inlined what gets inlined is the /original/ body as the
user wrote it in his source code. This improves performance when for
some reason a value doesn't get inlined.

>> - no float-in/float-out/cse: ??

The restriction on CSE is principally for NOINLINE things, to prevent
messing with RULEs by changing identifiers around. I'm not sure if
that is relevant here.

No float-in makes sure we don't increase the size of things we are
going to INLINE. This is important with UNROLL / PEEL for the same
reason as above - another argument for this being an extension of
INLINE so we inherit its semantics.

I don't actually know why the no-float-out restriction exists (after
all, it only makes the body smaller!) so I'm not sure what the right
thing to do would be there.

>> - no worker/wrapper transform in strictness analyser: we do get the   same
>> effect from INLINE PEEL, so this should be okay, right?

Maybe I don't understand what you mean, but I don't think this is
true. For example, w/w can unpack a strict argument of product type,
but I dont' think PEEL will let you achieve that.

This restriction exists to prevent losing INLINE pragmas:

"""
Note [Don't w/w inline things]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It's very important to refrain from w/w-ing an INLINE function
If we do so by mistake we transform
	f = __inline (\x -> E)
into
	f = __inline (\x -> case x of (a,b) -> fw E)
	fw = \ab -> (__inline (\x -> E)) (a,b)
and the original __inline now vanishes, so E is no longer
inside its __inline wrapper.  Death!  Disaster!
"""

So we might want to prevent w/wing UNROLL/PEEL for the same reasons..
but if we do the UNROLL/PEEL "pass" early enough (i.e. before
strictness - which is quite late in the pipeline) then this issue will
go away.

>> - loop breakers: PEEL/UNROLL have their own limits, creating
>>   an intrinsic loop breaker when the counters run out

> This might be easier to
> handle in your "unfolding as a separate core2core pass" scenario, where the
> pass might keep track of unfoldings already done (instead of trying to
> encode that information locally, in annotations).

I think that makes most sense. If we run it early enough we would be
reasonably sure our program was close to what the user intended and
hence could sidestep some of the restrictions on INLINE pragmas that
are discussed above by simply not applying them to UNROLL/PEEL, though
this doesn't feel like a terribly satisfactory solution.

Cheers,
Max