Space explosion
Jyrinx
jyrinx_list@mindspring.com
Wed, 25 Dec 2002 17:47:10 -0800
I'm tinkering with a little program I wrote a while back to generate
probability distributions and averages for rolling various kinds and
quanitities of dice and applying various rules (for instance, roll four
six-sided dice, drop the lowest, and add the other three). I'm trying to
optimize the thing, and I've run into a space leak. I've found that, if
I tell it to roll four 20-sided dice and add, it finishes in under a
second and heap usage peaks at under 2MB (not sterling, but not
catastrophic); but if I make it five 20-sided dice, it runs out of stack
space after thrashing for a while (and getting partway through the final
output); for six dice, it thrashes into oblivion. (Running WinXP with
256MB memory.)
What puzzles me is that, according to GHC's heap-usage-by-call-center
graph, the hog isn't the code that generates every single outcome of
rolling the dice (and we're talking about four 20-sided dice here ...),
whose heap usage is constant (go GHC! :-) ), but the one that turns the
list of outcomes (for instance, [2,3,3,4,4,4 ... 7,7,7,7,7,7 ...
11,11,12] for adding two 6-sided dice) into a probability distribution
(like [(2,1),(3,2),(4,3) ... (7,6) ... (11,2),(12,1)]). It is this:
> -- Specs contains the options for the simulation; rollBounds simply
finds the lowest and highest possible rolls
> -- type RollValue = Int
> -- type RollValueDist = Array RollValue Int
> probDist :: Specs -> [RollValue] -> RollValueDist
> probDist specs rolls =
> accumArray (+) 0 (rollBounds specs) [(r, 1) | r <- rolls]
It should be doing an in-place array update as it evaluates the
comprehension, but apparently something's being kept in memory longer
than it has an excuse for. I tried another, more hand-coded version, to
get rid of the mid-calculation comprehension and some possible overhead:
> probDist specs =
> foldl update orig
> where orig = array bounds [(r, 0) | r <- range bounds]
> update dist r = dist // [(r, 1 + dist!r)]
> bounds = rollBounds specs
But this one was even worse. I've sprinkled $! around quite liberally,
to no avail; also, I'm compiling with -O, and I've tried various RTS
options with no radical change in behavior.
What's going on here? The array constructed has to be big, but not
*that* big (77 Ints for four 20-sided) ... Is there an inherent
inefficiency with GHC's arrays when used like this? I can't come up with
a better algorithm - making a zeroed array with an element for each
outcome, then adding 1 to the element for each outcome as it is read in
seems straightforward enough. (Would a FiniteMap be any better? I'm not
sure how ...) Thanks for any help ... I'm starting to get the hang of
Haskell, but some of these semantic subtleties are driving me nutty ...
Luke Maurer
jyrinx_list@mindspring.com