Why is GHC so much worse than JHC when computing the Ackermann function?

> There's something strange going on in this example. For instance, setting
> (-M) heap limits as low as 40K seems to have no effect, even though the
> program easily uses more than 8G.

Apparently the only things it allocates is stack chunks.

I managed to produce a version of this program that runs approximately
as fast as the Ocaml one by manually unrolling the main loop [1], but
it still has to be run with +RTS -kc1M to avoid the memory leak.

[1] https://gist.github.com/23Skidoo/5425891

