My stack overflow was due to addListToFM

senganb@ia.nsc.com senganb@ia.nsc.com
Wed, 28 Feb 2001 17:07:43 -0700 (MST)


Hi,

Interestingly given the recent discussion about foldl versus foldl'
I'd like to report that the stack overflow I was seeing was due to
addListToFM which is defined using foldl although addToFM is strict
in FiniteMap according to ghc:

addToFM :: ... {PrelBase.Ord key} -> FiniteMap key elt -> key -> elt -> FiniteMap key elt {-## __A 4 __S LSLL __U ...

So, I'd like to request that FiniteMap be changed to use foldl'.

Compiler implementors, I'd also like to request that your compiler
emit a warning in such cases if at all possible given that this has
since even the best haskell programmers seem to miss this case
(the authors of FiniteMap).

---------------------------------------------------------------------------------

For those of you wondering how I found this, I used the following heuristic:

1 Build your program with -prof -auto-all -caf-all
2 Run your program with -h so that Main.hp is created.
3 Then look at the various functions that are reported to use a lot of heap.
4 Grep through Main.hp for the function that happens to have a recent rapid increase of heap.
5 Then look at its code.

In my case I got
   find_atoms/add_atom_equalities 33808
   find_atoms/add_atom_equalities 186128
   find_atoms/add_atom_equalities 338448
   find_atoms/add_atom_equalities 490768
   find_atoms/add_atom_equalities 643088
   find_atoms/add_atom_equalities 795408
   find_atoms/add_atom_equalities 947728
   find_atoms/add_atom_equalities 1100048
   find_atoms/add_atom_equalities 1233976
   find_atoms/add_atom_equalities 1302328
   find_atoms/add_atom_equalities 1370680
   find_atoms/add_atom_equalities 1439032
   find_atoms/add_atom_equalities 1507384
   find_atoms/add_atom_equalities 1575736
   find_atoms/add_atom_equalities 1644088
   find_atoms/add_atom_equalities 1697552
just before the crash.

Reasoning
---------

Since GHC optimizes tail recursion, the cause of a stack overflow must be something else, eg:
a you've got a non-tail recursive function
b your tail recursive function has built a long spine on the heap of functions that are strict in their 2cd argument
  so that you need to evaluate the bottom-most element of the spine (which causes you to push the address
  of all the functions along the spine onto the stack)
c or...

I assumed b since I know my code doesn't contain many cases of a.
If b is the case, since the function that is overflowing is being evaluated,
we know that it's going to build up its spine quickly and then try to evalute it,
find that each function on the spine is strict in its second argument and then die. Therefore heuristic step 4.

You also need it to use a fair bit of heap to actually be bad enough to overflow the stack (therefore heuristic step 3).

Sengan