Why do we have stack overflows?

Tue May 8 03:43:29 EDT 2007

Stefan O'Rear wrote:
> On Thu, May 03, 2007 at 05:36:45PM -0700, Brandon Michael Moore wrote:
>> On Thu, May 03, 2007 at 04:59:58PM -0700, John Meacham wrote:
>>> I believe it is because a stack cannot be garbage collected, and must be
>>> traversed as roots for every garbage collection. I don't think there are
>>> any issues with a huge stack per se, but it does not play nice with
>>> garbage collection so may hurt your performance and memory usage in
>>> unforeseen ways.
>> Isn't it just the top of the stack that has to be treadted as a root?
>> (maybe you need to walk the stack to find exception handlers and so on.)
>> Maybe it shouldn't be so much worse than a heap. The Chicken Scheme
>> system allocates everything on the C stack, and runs some sort of
>> compacting collector when it is about to fill.
> 
> GHC uses a simple exponential-backoff algorithm for handling stacks.
> When the stack pointer passes the stack limit, the thread yields to
> the scheduler, where the stack size is doubled, and the old stack is
> moved.  Perhaps instead we could modify the algorithm such that up to
> 16K stack size the behaivor is the same, but use linked lists for
> larger? 
> 
> 1. Allocate a new chunk, of size 16KB.
> 
> 2. Copy the topmost 1KB of stack to the new block.  This decreases
>    storage efficiency slightly (but not much in time - memcpy is
>    several times faster than anything the current ghc code generator
>    can generate), but it avoids a nasty corner case (stack size
>    fluctuating across 0 mod 16K) by acting as a form of hysteresis.
> 
> 3. Create a special frame at the bottom of the new stack chunk that
>    returns into a stack underflow handler, thus avoiding the need for
>    yet another conditional. 

GHC in the distant past (pre-4.0) had linked stack chunks, but we discarded the 
idea when the RTS was rewritten, mainly for simplicity.  It was before my time, 
so I don't know all the war stories, but I think it turned out to be hairy 
enough to avoid it in the redesign.  At least it would require separate TSO and 
STACK objects, the underflow frame, code to avoid performance degradation at the 
boundary (e.g. as you say, copy the top 1k of stack to the new chunk), stack 
walking gets more complicated (several places in the RTS do this).

My impression is that the performance of stack growth isn't usually a limiting 
factor, so I'm not sure it's worth adding this complexity.  Benchmarks can be 
convincing, though...

Cheers,
	Simon