[GHC] #8189: Default to infinite stack size?

Wed Aug 28 00:52:38 UTC 2013

#8189: Default to infinite stack size?
------------------------------------+-------------------------------------
       Reporter:  nh2               |             Owner:
           Type:  bug               |            Status:  new
       Priority:  normal            |         Milestone:
      Component:  Compiler          |           Version:  7.6.3
       Keywords:                    |  Operating System:  Unknown/Multiple
   Architecture:  Unknown/Multiple  |   Type of failure:  Runtime crash
     Difficulty:  Unknown           |         Test Case:
     Blocked By:                    |          Blocking:
Related Tickets:                    |
------------------------------------+-------------------------------------
 http://www.haskell.org/pipermail/haskell-cafe/2013-August/108570.html

 It looks like any code of the form

 {{{
 ... = do
   x1 <- m1
   x2 <- m2
   [do something with x1 and m2]
 }}}

 will stack overflow when used with a strict monad (like IO) and
 sufficiently large data to deal with (a 1m element list is enough to
 exceed the default 8MB stack size limit).

 Other examples include:

 {{{
      sequence (m:ms) = do x <- m
                           xs <- sequence ms
                           return (x:xs)
 }}}

 and

 {{{
 liftM2, liftM3, ...
 }}}

 By slightly modifying your program to do the same thing via heap
 allocation (e.g. using a difference list in sequence), a stack-overflow
 rejected program turns into a valid program.

 This means that valid and idiomatic code like

 {{{
 xs <- replicateM n someIOAction
 xs <- mapM someIOAction [1..n]
 }}}

 will eventually crash, which has an impact on practically all Haskell
 software and libraries out there.

 For examples, see
 https://github.com/ndmitchell/shake/blob/e0e0a43/Development/Shake/Database.hs#L394
 https://github.com/haskell-suite/haskell-src-exts/issues/27
 as well as the mapM usages in cabal, haddock, any database binding that
 allows querying a lists of results, etc.

 While some argue that ''"you shouldn't use lists of that size"'',
 ''"allocating a 150 MB list is wrong"'' or that they ''"carefully avoid
 any use of sequence/mapM due to the problem"'' pointed out above, I think
 that collecting a million results of a strict monadic action into a linked
 list using standard functions like mapM is a common pattern and valid use
 case. The stack based evaluation that GHC performs is an optimization and
 should be transparent to the user; it should not re-define some data to
 live in a fixed-size memory region, and it should work just as well as it
 does in GHCi.

 It seems that a default infinite stack size would mitigate this.

 A drawback might be that the stack space overflows we get so far to
 quickly spot a "wrong" program would not indicate certain errors any more.
 I currently doubt the usefulness of this debugging method, since it does
 not work reliably in the presence of optimization, doing the same thing
 via heap allocation makes the programs "correct", and it rejects valid
 programs if the input data is large enough.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8189>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler