Recursive functions and constant parameter closures (inlining/strictness analyzer question)

Wed Jun 25 04:48:36 EDT 2008

2008/6/23 Dan Doel <dan.doel at gmail.com>:
> On Monday 23 June 2008, Isaac Dupree wrote:
>> there's no chance for the lower-level near code generation to
>> reverse-SAT to eliminate the heap usage? (which would obviously be a
>> different optimization that might be useful in other ways too, if it
>> could be made to work) (did someone say that -fvia-C with gcc -O3
>> managed to do that sometimes?)
>
> I said something similar, but I don't actually know the details. What I have
> is a SAT and non-SAT version of a benchmark:
>
> In the SAT version, the native code generator is noticeably slower
> than -fvia-c -optc-O3, so the latter is doing something right. In the non-SAT
> version, -fvia-c is somewhat faster than the SAT version, but the NCG is just
> as fast. I don't know what's going on on the C end in the first case that
> makes it better, though (heap allocation is the same (low) in all cases;
> via-c doesn't fix large heap allocation when it happens in my experience).

I've tested this on my own machine (i386-apple-darwin) and I can't
replicate it. For me (with 6.8.2), the SATed version is faster by
almost 40%, and there is no essentially no difference between the NCG
and GCC (GCC is maybe a fraction faster).

In any event GHC will not automatically SAT this function because it
has only one static argument. Indeed, I'm not quite sure why it should
be faster with SAT at all, and my assembly-fu is still too weak to be
able to work it out from the code generator output.

> As an aside, I finally got around to compiling 6.9 from a day or so ago, and
> noticed that one of the functions I'd mentioned earlier does get SATed to the
> bad version (which causes lots of heap allocation in the program) because it
> has two static arguments.

I devoted yesterday entirely to SAT and came up with some refinements
to my earlier heuristics which shave a couple of percent of
allocations and runtime off of nofib compared with the existing ones,
and also resolve this problem. I found that my tail call detection
from the day before was a bit broken. Now that it is fixed I use it to
avoid SATing functions which make tail calls, but with the important
wrinkle that if the function has static function parameters we should
SAT it anyway because this lets us do much more inlining and is a
bigger win than avoiding allocation. This criterion avoids the heap
allocation you are experiencing in your particular example.

I've also added rule generation to rewrite instances of functions that
are SATed purely because they have static function parameters back
into the non-SATed versions if they do not experience inlinig. This
prevents the worst cases that I got before when always SATing
functions with static function parameters.

What's more, I've found that most if not all SAT opportunities are
found early in the pipeline by these new criteria, so we can get away
with just one SAT pass instead of the two I reported earlier.

All in all, these developments are pretty promising and I suspect I'll
be able to integrate them into HEAD in time for 6.10, which will
hopefully alleviate some of your worries about the efficacy of SAT.

Cheers,
Max