Loop unrolling + fusion ?

Fri Mar 27 17:32:13 EDT 2009

> I can't see any issues with this version of the spec.

Thanks. From the silence, we seemed to have lost the innocent 
bystanders? Anyway, for those who haven't noticed, there is now 
a feature request ticket (for that good feeling of closing it when this
is finally implemented;-) as well as a wiki page describing the issues, 
spec, and examples:

http://hackage.haskell.org/trac/ghc/ticket/3123
http://hackage.haskell.org/trac/ghc/wiki/Inlining

> I think in the implementation it makes most sense to do this as a
> core2core pass at an early stage in the pipeline, probably via plugins
> (so will have to wait until I get those into HEAD). 

What are the plans for plugin support? I do think plugins will be
useful, but inlining is pretty central to the existing optimizer transformations,
isn't it? Would the transformation code differ much between in-GHC
and via-plugins? Perhaps the transformation pass could be implemented
now, and later moved out into a plugin, possibly along with other passes.

I have also been wondering about the relation between rewrite RULES
and plugins. Assuming we can find a more convenient syntax, aren't
plugin+syb-based rewrites going to be more expressive, with more
control than RULES? Or is the syntactic/compiletime overhead going 
to remain so high that both RULES and plugins will be kept in GHC?

(cf the recent thread on "optimization and rewrite rules questions"
http://www.haskell.org/pipermail/glasgow-haskell-users/2009-February/016702.html )

>In the case of
> PEEL, we don't want to identify all call sites directly and do the
> substitution in the pass so we should just output some bindings which
> will certainly be inlined into call sites later on. So, the
> transformation should be bottom up on the Core syntax tree and when it
> meets a recursive group of bindings we should do something like this:
> 
> {-# INLINE f g PEEL 3 UNROLL 2 #-}
> f = ... g ... f ... h ...
> g = ... g ... f ... h ...
> h = ... g ... f ... h ...
> 
> =(my pass)=>
> 
> -- Temporary copies of f and g - dead code
> f_old = ... g_old ... f_old ... h ...
> g_old = ... g_old ... f_old ... h ...
> -- H unchanged for now, might get PEELed stuff inlined later
> h = ... g .. f ... h ...

You mean UNROLLed stuff (PEEL is only for entries into the group).

> -- Top level unrolled definiiton - if we weren't doing peeling, these
> would be the new f and g
> f_unrolled = ... g_unrolled_1 ... f_unrolled_1 ... h ...
> g_unrolled = ... g_unrolled_1 ... f_unrolled_1 ... h ...
> 
> -- Unrolled iteration. Will get inlined into f_unrolled / g_unrolled soon
> {-# INLINE f_unrolled_1 g_unrolled_1 #-}
> f_unrolled_1 = ... g_unrolled ... f_unrolled ... h ...
> g_unrolled_1 = ... g_unrolled ... f_unrolled ... h ...

Ah, yes, we need to be unambiguous about the interpretation of the
counters:-) I was thinking of n+1 (adding n copies to the original), you 
are thinking of n (adding copies until there are n).

> -- One level of peeling
> {-# INLINE f_1 g_1 #-}
> f_1 = ... g_unrolled ... f_unrolled ... h ...
> g_1 = ... g_unrolled ... f_unrolled ... h ...
> 
> -- Second level of peeling
> {-# INLINE f_2 g_2 #-}
> f_2 = ... g_1 ... f_1 ... h ...
> g_2 = ... g_1 ... f_1 ... h ...
> 
> -- Final level of peeling and new definitions for f and g. Inline pragmas
> -- make sure all of this gets inlined at the call site
> {-# INLINE f g #-}
> f = ... g_2 ... f_2 ... h ...
> g = ... g_2 ... f_2 ... h ...

Wait, now you are counting to n+1 for PEEL and to n for UNROLL?

> =(after the simplifier has run - effectively - there are a few
> harmless lies here)=>
> 
> -- NB: I haven't shown inlining of the new f and g here, but it /will/ happen
> h = ... g .. f ... h ...

Since we are interpreting recursive groups as single entities, and there
is usually no inlining into definitions that will get inlined, we will have to
specify this carefully.

> -- I've inlined the inner unrolled iteration at every /call site/
> within the top level unrolled iteration, as per
> -- the pragmas. Noone actualy calls this unrolled thing directly
> though, since we used PEEL as well
> f_unrolled = ... (... g_unrolled ... f_unrolled ... h ...) ... (...
> g_unrolled ... f_unrolled ... h ...) ... h ...
> g_unrolled = ... (... g_unrolled ... f_unrolled ... h ...) ... (...
> g_unrolled ... f_unrolled ... h ...) ... h ...

Again, we have to make sure of this interpretation.

> -- This huge chunk of code gets inlined at every call site, which in
> turn call through to the unrolled bodies
> {-# INLINE f g #-}
> f = ... (... (... g_unrolled ... f_unrolled ... h ...) ... (...
> g_unrolled ... f_unrolled ... h ...) ... h ...) ... (... (...
> g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ...
> f_unrolled ... h ...) ... h ...) ... h ...
> g = ... (... (... g_unrolled ... f_unrolled ... h ...) ... (...
> g_unrolled ... f_unrolled ... h ...) ... h ...) ... (... (...
> g_unrolled ... f_unrolled ... h ...) ... (... g_unrolled ...
> f_unrolled ... h ...) ... h ...) ... h ...

So this would be the result of inlining all the PEEL instances into 'f' and 'g'. 

> By ensuring that f and g are tagged INLINE we get the existing INLINE
> restrictions automatically in later Core passes.

So the INLINE gets added after your pass is through, so that it isn't
affected, but later passes are. But what if there are multiple such PEEL/
UNROLL definitions handled by the one pass? Since the pass doesn't
do general INLINE, that is out of the way, but wouldn't it still PEEL
stuff from one group into the definitions from another group, even if
those definitions themselves are about to be PEELed/INLINEd? And
do we want that or not?

> I think that this example transformation matches your spec - am I right?

Looks mostly right, apart from the ambiguities I mentioned. Could 
you please add your implementation sketch to the wiki page?

Claus