[Haskell-beginners] Space leak in a fold even after the usual fixes

Stephen Blackheath [to Haskell-Beginners] mutilating.cauliflowers.stephen at blacksapphire.com
Mon Apr 12 16:40:43 EDT 2010


Travis,

IMHO most Haskell features give a great benefit with a very small cost,
but laziness is the exception:  It gives a great benefit, but at a
fairly significant cost.  That cost comes in the form of a learning
curve for dealing with space behaviour.

I would add that even though Haskell has some faults, it is overall an
amazingly practical language - possibly the best around at this point in
time.  Out of all the gripes that people rightly make about certain
details of Haskell, the only one that causes any real grief is space
behaviour.  If we can ease the pain of space behaviour, then Haskell can
be all pleasure.  I hope I can help with that!

The first thing to understand is that "a `seq` b" means "when you
evaluate me, evaluate a and b, and return b".  It doesn't actually force
evaluation, it just ties the valuation of one thing to the evaluation of
another.  It's equivalent to just "b" except that it also implies the
evaluation of a.

The next thing to understand is that "a `seq` b" only causes evaluation
of the outermost constructor of a.  For some "strict" types the
outermost constructor is the whole value, so `seq` causes the whole
expression to be evaluated, e.g.

  Int
  Bool
  newtype Metres = Metres Float
  data Point2 = Point2 !Int !Int

Even if it's a simple type, any unevaluated expression can potentially
consume large amounts of memory, because the runtime system can't
garbage collect whatever data structures are needed to evaluate it.
Then you can get space leaks.  With these strict types, life is easy -
you can just use `seq` or foldl' or such like, and then you can
guarantee that these references are no longer hanging around.

But a lot of types are lazy.  Here are some examples.  Their contained
values aren't evaluated by a simple `seq`:

  Maybe a
  data Metres = Metres Float
  (a,b)
  [a]
  Data.Set.Set a

Another point is that the "a" in "a `seq` b" needs to be a let binding
so it's the same "a" as you use in other places, e.g.

  let total' = total + value
      n' = n + 1
  in total' `seq` n' `seq` (total, n)

Because of the two `seq`s, the value of this expression is now strict,
so that using `seq` on it will force the whole thing, even though it is
of type (a, b) which is normally a lazy type.

So this means it's possible to make a normally lazy value - or specific
parts of it - strict by making sure that each publicly exposed function
that constructs the value forces evaluation using appropriate `seq`s.
For example, the keys in a Data.Map structure are treated strictly, and
the values lazily.  This strictness is guaranteed by each function that
manipulates Data.Map being written so as to give this guarantee, and
then a promise is made in the API documentation.  It's important to
document space behaviour, since the caller does need to know it.

So that's how to control evaluation directly.  The
Control.Parallel.Strategies module from 'parallel' contains helpers for
managing evaluation (useful in non-parallel software too).  I think
ultimately if you want to put Haskell to practice use, you do need to be
aware of Haskell's evaluation behaviour.  Until you really understand
it, you will be occasionally plagued by space leaks.  In my experience,
space leaks have been tractable with a combination of space profiling
(which is unavoidable sometimes for big programs) and good
understanding, which has taken a bit of effort to acquire.


Steve

Travis Erdman wrote:
> The driver of my algorithm looks like this:
> 
> foldl' processfn nullarray (take arg infinitelist)
> 
> where processfn takes an array and the next set of inputs,
> processes, and delivers a new updated array (using the
> Data.Vector library).
>  
> Apparently, I have a space leak ... the "maximum residency"
> varies linearly with the size of "arg" supplied, garbage
> collection consumes ~75% of CPU time, and, if the arg is too
> big, the whole thing crashes with an out of memory
> error.  The algorithm should operate in constant
> space.
> 
> As you can see, I'm using a strict left-fold and also have
> made the accumulating array strict in the processfn
> definition with a bang pattern.  So, I'm sort of at a
> loss as to how to resolve this.
> 
> The help provided on this list has been outstanding, thanks
> to all of you; hope you have something left in the tank for
> this one!
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners



More information about the Beginners mailing list