Make lines stricter to fix a space leak
daniel.is.fischer at web.de
Fri Sep 24 21:45:15 EDT 2010
On Saturday 25 September 2010 02:55:24, Ian Lynagh wrote:
> On Fri, Sep 24, 2010 at 09:21:19PM +0200, Daniel Fischer wrote:
> > Proposal: A stricter implementation of lines.
> > Reason: The current implementation causes a space leak (cf.
> > http://homepages.inf.ed.ac.uk/wadler/papers/leak/leak.ps), at least in
> > GHC.
> > The proposed implementation fixes the leak at the small cost of being
> > stricter if the first _|_ in the String is the first character of a
> > line.
> I think this changes the definition from one that currently has a space
> leak with GHC, to one which necessarily must have a space leak (as all
> of l must be held in memory while we look for a newline).
No, with the proposed implementation, e.g. counting line lengths runs in
./newTest 50000000 2 +RTS -s
18,070,120,752 bytes allocated in the heap
1,721,193,144 bytes copied during GC
34,164 bytes maximum residency (1642 sample(s))
45,204 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 32825 collections, 0 parallel, 12.62s, 12.98s elapsed
Generation 1: 1642 collections, 0 parallel, 0.52s, 0.59s elapsed
./origTest 50000000 2 +RTS -s
18,070,120,940 bytes allocated in the heap
3,966,182,576 bytes copied during GC
446,080,496 bytes maximum residency (23 sample(s))
10,059,688 bytes maximum slop
872 MB total memory in use (7 MB lost due to fragmentation)
Generation 0: 34444 collections, 0 parallel, 13.81s, 24.92s elapsed
Generation 1: 23 collections, 0 parallel, 6.55s, 27.90s elapsed
Since break (== '\n') immediately constructs a pair, the pattern match also
succeeds immediately after the first character has been scanned and thus
the first line is availabe as it is scanned.
If nothing else holds a reference to it, it can be garbage-collected
incrementally as it is delivered and consumed.
> IIRC GHC's GC does have an optimisation which I believe would apply
> here, where it reduces essentially (snd (x, y)) in the heap to y, but
> that that optimisation isn't always good enough (it has a threshold for
> how deep it is willing to reduce, and a small loop like this will
> generates pairs and selectors faster than the GC is willing to reduce
> I thought we had a ticket about that, but I can't find it now.
> If that was fixed then we could have constant space usage.
If that was fixed, that'd be much better of course, since the tuple space
leak is a common problem.
However, it's long known and not yet fixed for all cases, so it's probably
hard to fix completely. In the meantime, patching things locally looks like
the best option to me.
More information about the Libraries