Make lines stricter to fix a space leak

Daniel Fischer at
Fri Sep 24 21:45:15 EDT 2010

On Saturday 25 September 2010 02:55:24, Ian Lynagh wrote:
> On Fri, Sep 24, 2010 at 09:21:19PM +0200, Daniel Fischer wrote:
> > Proposal: A stricter implementation of lines.
> >
> > Reason: The current implementation causes a space leak (cf.
> >, at least in
> > GHC.
> >
> > The proposed implementation fixes the leak at the small cost of being
> > stricter if the first _|_ in the String is the first character of a
> > line.
> I think this changes the definition from one that currently has a space
> leak with GHC, to one which necessarily must have a space leak (as all
> of l must be held in memory while we look for a newline).

No, with the proposed implementation, e.g. counting line lengths runs in 
constant space:

./newTest 50000000 2 +RTS -s
  18,070,120,752 bytes allocated in the heap
   1,721,193,144 bytes copied during GC
          34,164 bytes maximum residency (1642 sample(s))
          45,204 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)             

  Generation 0: 32825 collections,     0 parallel, 12.62s, 12.98s elapsed
  Generation 1:  1642 collections,     0 parallel,  0.52s,  0.59s elapsed


./origTest 50000000 2 +RTS -s
  18,070,120,940 bytes allocated in the heap
   3,966,182,576 bytes copied during GC
     446,080,496 bytes maximum residency (23 sample(s))
      10,059,688 bytes maximum slop
             872 MB total memory in use (7 MB lost due to fragmentation)

  Generation 0: 34444 collections,     0 parallel, 13.81s, 24.92s elapsed
  Generation 1:    23 collections,     0 parallel,  6.55s, 27.90s elapsed

Since break (== '\n') immediately constructs a pair, the pattern match also 
succeeds immediately after the first character has been scanned and thus 
the first line is availabe as it is scanned.
If nothing else holds a reference to it, it can be garbage-collected 
incrementally as it is delivered and consumed.

> IIRC GHC's GC does have an optimisation which I believe would apply
> here, where it reduces essentially (snd (x, y)) in the heap to y, but
> that that optimisation isn't always good enough (it has a threshold for
> how deep it is willing to reduce, and a small loop like this will
> generates pairs and selectors faster than the GC is willing to reduce
> them).
> I thought we had a ticket about that, but I can't find it now.
> If that was fixed then we could have constant space usage.

If that was fixed, that'd be much better of course, since the tuple space 
leak is a common problem.
However, it's long known and not yet fixed for all cases, so it's probably 
hard to fix completely. In the meantime, patching things locally looks like 
the best option to me.

> Thanks
> Ian


More information about the Libraries mailing list