Loop unrolling + fusion ?

Wed Mar 11 18:29:56 EDT 2009

Claus, Max

| > My preferred spec would be roughly
| >
| > {-# NOINLINE f #-}
| >    as now
| >
| > {-# INLINE f #-}
| >    works as now, which is for non-recursive f only (might in future
| >    be taken as go-ahead for analysis-based recursion unfolding)
| >
| > {-# INLINE f PEEL n #-}
| >    inline calls *into* recursive f (called loop peeling for loops)
| >
| > {-# INLINE f UNROLL m #-}
| >    inline recursive calls to f *inside* f (called loop unrolling for loops)
| >
| > {-# INLINE f PEEL n UNROLL m #-}
| >    combine the previous two

Sounds as if you two are evolving a good design, thank you.  I am not following the details closely, but I have the advantage of being able to chat to Max directly.

Suggestion: if after discussion you think this is a valuable thing to do, write a GHC-Trac-Wiki page describing the design as precisely as possible (eg with examples; I find the above one-liners hard to grok). Along with any major design alternatives.  Ideally with a few indicative measurements gotten by by-hand transformations, that show there are real benefits to be had.

For implementation, there are two routes.  Either totally built-in, or using a Core-to-Core plug-in.  The thing I like about the latter is that it can be done without having GHC HQ in the critical path, because we (I) tend to slow things down, being a uniprocesor.  We don't have the plug-in capability yet, but I'm encouraging Max to polish it up so that we do.  I think it'd be a very valuable facility.

Simon