penalty for proper tail recursion
Mon, 5 Nov 2001 11:43:34 -0500
Richard Uhtenwoldt <email@example.com> asks:
> What is the penalty that compilers pay these days for allowing
> "proper" tail-recursion (that does not grow the stack)?
These days it's often little or none. Here's what I know from working
very intensively on compiling (Eager) Haskell to GCC.
1) GCC has always done a fairly good job with self-tail-recursion.
Oddly, I don't exploit this as I use a few tricks to do a tiny bit
2) GCC's latest version does a pretty good job with sibling call
optimization (==tail-recursive calls to other C procedures). I've
worked exclusively with gcc 3.0 prereleases for the past 9 months
and it has simplified implementation tremendously.
3) Sibling call optimization is not infallible (yet?), a fact I
discovered mainly by poring over the assembly code for portions of
my run-time system. It seems to be good enough in practice, but
there's a fallback mechanism in my implementation just in case. I
use the same mechanism to handle various other more common
conditions, though, so I didn't consider this a great burden in
practice. It doesn't appear to actually kick in due to missed
sibling call opportunities.
4) Writing your own GC means maintaining a "shadow stack": An explicit
data structure which allows the pointers on the stack to be found.
There are a number of ways of doing this, but all of them pretty
much require that the *Haskell* compiler do the work of saving and
restoring pointer variables around calls. This is a drawback of
implementing precise garbage collection on top of C. Haskell
compilers do enough allocation that conservative garbage collection
isn't very attractive.
I mention shadow stacks because they are the other (I would say
dominant) source of overhead when compiling to C. It subverts
register and frame slot allocation, two things the C compiler
should do well.
5) In GCC, shadow stacks can be efficient to pull off---compile without a
frame pointer, and declare a register global to hold the shadow
stack pointer. Net change in number of available registers: none.
But the loss in register/frame allocation is still annoying.
I've stuck with an implementation that could be turned into
bone-standard C99 code with a few days' work. The performance
advantages of gcc (register globals, sibling call optimization, and
per-procedure control over x86 calling conventions being the big ones)
are such that I have no plans to put in the effort. I would get a
working, but positively lousy, implementation.
My understanding is that ghc -fvia-C also depends on GCC for properly
tail-recursive output (otherwise they can use a trampoline as
described in your message). They post-process the assembly code
output in order to replace tail-recursive calls with jumps. This
approach is less portable but reliably eliminates tail calls.
The code is in GHC and wasn't hard to read when last I looked. Note
that they *also* post-process the assembly code output to move info
tables into the code segment. That code *is* a bit complicated, and
messages over the years suggest that people looking for one trick
often find themselves lost in the code for the other, as one program
does them both.
Eager Haskell project