[Haskell-cafe] mapM is supralinear?

Tue Sep 27 13:42:30 CEST 2011

Here's my feeble understanding of GC:

1. GC runs when after some specified amount of allocations
2. GC runs in time proportional to the live heap, which it needs to
   traverse.

This means that for a long running mapM, like any other computation
generating a long list, (1) GC will run a number of times proportional to
the length of the list, and (2) each run will have a cost
proportional to the length of the list.  I.e. a linear algorithm is now
quadratic.

A lazy mapM (or mapM_), consuming the list as fast as it is generated,
will of course keep the list short/heap small, and thus the cost of each
GC is constant (for some value of "constant").

I suppose generational GC will help in practice, since the young
generation gets to start near the end of the list, but it will still be
linear in generated length, and you still need major GCs too,
occasionally.

Also, I guess mapM is more vulnerable to this, since the operations (IO,
say) involved in building the list likely do short-lived allocations,
triggering more GCs than simpler, pure computations would.

Do let me know if this is probably a terribly naive view.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants