Recursive functions and constant parameter closures (inlining/strictness analyzer question)

Mon Jun 23 19:11:04 EDT 2008

2008/6/23 Dan Doel <dan.doel at gmail.com>:
> On Monday 23 June 2008, Isaac Dupree wrote:
>> there's no chance for the lower-level near code generation to
>> reverse-SAT to eliminate the heap usage?

It might be possible to do this by utilising the rules system. It's
something I had briefly considered but not really looked into. As
there seems to be a lot of interest in this topic I'll probably devote
some more time to the issue tomorrow and see if we can't get SAT
working really well.

>> (did someone say that -fvia-C with gcc -O3
>> managed to do that sometimes?)
>
> I said something similar, but I don't actually know the details. What I have
> is a SAT and non-SAT version of a benchmark
>
> (snip)
>
> In the SAT version, the native code generator is noticeably slower
> than -fvia-c -optc-O3, so the latter is doing something right. In the non-SAT
> version, -fvia-c is somewhat faster than the SAT version, but the NCG is just
> as fast. I don't know what's going on on the C end in the first case that
> makes it better, though (heap allocation is the same (low) in all cases;
> via-c doesn't fix large heap allocation when it happens in my experience).

Having a compact benchmark is great, I'll take a look at what's going
on tomorrow, though it will involve my plumbing the as-yet unknown
depths of the code generation system :-)

> As an aside, I finally got around to compiling 6.9 from a day or so ago, and
> noticed that one of the functions I'd mentioned earlier does get SATed to the
> bad version (which causes lots of heap allocation in the program) because it
> has two static arguments. However, marking it NOINLINE seemed to keep it from
> having the negative effects. Does that pragma keep SAT from doing its thing,
> and if so, will that continue to be viable in cases where we decide we know
> better than GHC?

I assume you mean "shift" as all the others have only one static
argument. Personally I find the behaviour of not SATing if something
is NOINLINE surprising and I suspect something else is at work. Is is
possible for you to send me a self-contained example of the
problematic function that demonstrates this off-list that I can play
with myself?

It's worth noting that if you think SAT is a bad idea you can always
resort to -fno-static-argument-transformation (IIRC) in an OPTIONS_GHC
pragma.

Incidentally I think the introduction of some sort of tail call
detection in the SAT would resolve the problems with all the examples
you've presented so far since they are all straight loops. Annoyingly
this is not a clear win because I think doing this in general will
impede loop invariant lifting e.g consider this tail recursive
function:

foo x y z = let invariant = x * y in foo x y (z + 1)

This is static in x and y so we might like to SAT it:

foo x y z = let foo' z = (let invariant = x * y in foo' (z + 1)) in foo' z

This lets us take out the invariant in a later pass (CSE):

foo x y z = let invariant = x * y in let foo' z = foo' (z + 1) in foo' z

I don't >think< GHC is currently smart enough to detect this without
SAT. A smart optimization might try and evaluate the relative cost of
the SAT (heap allocation) and not doing the invariant lifting
(redundant computation) and make some kind of decision about which is
the lesser evil. There may even be a third option which may avoid
closure creation:

foo x y z = let foo' x y z invariant = (foo' x y (z + 1) invariant) in
foo' x y z (x * y)

(I think this is right). It corresponds operationally to computing the
invariant once at the start of a loop and then carrying it on the
stack rather than via the closure as with the option above, avoiding
touching the heap. I'm uncertain about how different this really is
though.

Anyway, let's see what further investigations bring tomorrow,

Cheers,
Max