penalty for proper tail recursion
Jan-Willem Maessen
jmaessen@alum.mit.edu
Mon, 5 Nov 2001 11:43:34 -0500
Richard Uhtenwoldt <greon@best.com> asks:
> What is the penalty that compilers pay these days for allowing
> "proper" tail-recursion (that does not grow the stack)?
These days it's often little or none. Here's what I know from working
very intensively on compiling (Eager) Haskell to GCC.
1) GCC has always done a fairly good job with self-tail-recursion.
Oddly, I don't exploit this as I use a few tricks to do a tiny bit
better.
2) GCC's latest version does a pretty good job with sibling call
optimization (==tail-recursive calls to other C procedures). I've
worked exclusively with gcc 3.0 prereleases for the past 9 months
and it has simplified implementation tremendously.
3) Sibling call optimization is not infallible (yet?), a fact I
discovered mainly by poring over the assembly code for portions of
my run-time system. It seems to be good enough in practice, but
there's a fallback mechanism in my implementation just in case. I
use the same mechanism to handle various other more common
conditions, though, so I didn't consider this a great burden in
practice. It doesn't appear to actually kick in due to missed
sibling call opportunities.
4) Writing your own GC means maintaining a "shadow stack": An explicit
data structure which allows the pointers on the stack to be found.
There are a number of ways of doing this, but all of them pretty
much require that the *Haskell* compiler do the work of saving and
restoring pointer variables around calls. This is a drawback of
implementing precise garbage collection on top of C. Haskell
compilers do enough allocation that conservative garbage collection
isn't very attractive.
I mention shadow stacks because they are the other (I would say
dominant) source of overhead when compiling to C. It subverts
register and frame slot allocation, two things the C compiler
should do well.
5) In GCC, shadow stacks can be efficient to pull off---compile without a
frame pointer, and declare a register global to hold the shadow
stack pointer. Net change in number of available registers: none.
But the loss in register/frame allocation is still annoying.
I've stuck with an implementation that could be turned into
bone-standard C99 code with a few days' work. The performance
advantages of gcc (register globals, sibling call optimization, and
per-procedure control over x86 calling conventions being the big ones)
are such that I have no plans to put in the effort. I would get a
working, but positively lousy, implementation.
My understanding is that ghc -fvia-C also depends on GCC for properly
tail-recursive output (otherwise they can use a trampoline as
described in your message). They post-process the assembly code
output in order to replace tail-recursive calls with jumps. This
approach is less portable but reliably eliminates tail calls.
The code is in GHC and wasn't hard to read when last I looked. Note
that they *also* post-process the assembly code output to move info
tables into the code segment. That code *is* a bit complicated, and
messages over the years suggest that people looking for one trick
often find themselves lost in the code for the other, as one program
does them both.
-Jan-Willem Maessen
Eager Haskell project
jmaessen@mit.edu