[Haskell-cafe] Real-time garbage collection for Haskell

Thu Mar 4 02:43:28 EST 2010

On 2010-03-01 19:37 +0000 (Mon), Thomas Schilling wrote:

> A possible workaround would be to sprinkle lots of 'rnf's around your
> code....

As I learned rather to my chagrin on a large project, you generally
don't want to do that. I spent a couple of days writing instance
of NFData and loading up my application with rnfs and then watched
performance fall into a sinkhole.

I believe that the problem is that rnf traverses the entirety of a large
data structure even if it's already strict and doesn't need traversal.
My guess is that doing this frequently on data structures (such as Maps)
of less than tiny size was blowing out my cache.

I switched strategies to forcing a deep(ish) evaluation of only
newly constructed data instead. For example, after inserting a newly
constructed object into a Map, I would look it up and force evaluation
only of the result of that lookup. That solved my space leak problem and
made things chug along quite nicely.

Understanding the general techniques for this sort of thing and seeing
where you're likely to need to apply them isn't all that difficult, once
you understand the problem. (It's probably much easier if you don't have
to work it out all for yourself, as I did. Someone needs to write the
"how to manage lazyness in Haskell" guide.) The difficult part of it is
that you've really got to stay on top of it, because if you don't, the
space leaks come back and you have to go find them again. It feels a
little like dealing with buffers and their lengths in C.

On 2010-03-01 16:06 -0500 (Mon), Job Vranish wrote:

> All of our toplevel inputs will be strict, and if we keep our
> frame-to-frame state strick, our variances in runtimes, given the same
> inputs, should be quite low modulo the GC.

This is exactly the approach I need to take for the trading system. I
basically have various (concurrent) loops that process input, update
state, and possibly generate output. The system runs for about six
hours, processing five million or so input messages with other loops
running anywhere from hundreds of thousands to millions of times. The
trick is to make sure that I never, ever start a new loop with an
unevaluated thunk referring to data needed only by the previous loop,
because otherwise I just grow and grow and grow....

Some tool to help with this would be wonderful. There's something for
y'all to think about.

On 2010-03-01 22:01 +0000 (Mon), Thomas Schilling wrote:

> As Job and John have pointed out, though, laziness per se doesn't seem
> to be an issue, which is good to hear. Space leaks might, but there is
> no clear evidence that they are particularly harder to avoid than in
> strict languages.

As I mentioned above, overall I find them so. Any individual space
leak you're looking at is easy to fix, but the constant vigilance is
difficult.

cjs
-- 
Curt Sampson         <cjs at cynic.net>         +81 90 7737 2974
             http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.    --George Bernard Shaw