[Haskell-cafe] iteratee-compress space leak?

John Lato jwlato at gmail.com
Tue Feb 22 14:03:22 CET 2011


2011/2/22 Michael A Baikov <pacak at bk.ru>

>
> -----Original Message-----
>
> > Hi Maciej,
> >
> > Thanks for looking in to this.
> >
> > > After looking into problem (or rather onto your code) - the problem
> have
> > > nothing to do with iteratee-compress I believe. I get similar behaviour
> > > and results when I replace "joinIM $ enumInflate GZip
> > > defaultDecompressParams chunkedRead" by chunkedRead. (The memory is
> > > smaller but it is due to decompression not iteratee fault).
> > >
> >
> > This is due to "printLines".  Whether it's a bug depends on what the
> correct
> > behavior of "printLines" should be.
> >
> > "printLines" currently only prints lines that are terminated by an EOL
> > (either "\n" or "\r\n").  This means that it needs to hold on to the
> entire
> > stream received until it finds EOL, and then prints the stream, or drops
> it
> > if it reaches EOF first.  In your case, the stream generated by
> "convStream
> > consChunk printLines" is just a stream of numbers without any EOL, where
> the
> > length is dependent on the specified block size.  This causes the space
> > leak.
> >
> > If I change the behavior of "printLines" to print lines that aren't
> > terminated by EOL, the leak could be fixed.  Whether that behavior is
> more
> > useful than the present, I don't know.  Alternatively, if you insert some
> > newlines into your stream this could be improved as well.
> >
> > As a result of investigating this, I realized that
> > Data.Iteratee.ListLike.break can be very inefficient in cases where the
> > predicate is not satisfied relatively early. I should actually provide an
> > enumeratee interface for it.  So thanks very much for (indirectly)
> > suggesting that.
>
> Actually i can give you full sorce code - it uses also attoparsec-iteratee.
> it leaks with iteratee-compress and works fine without it.
> Whole idea - get bytestring from access.log, convert it to stream of data
> object with usernames and bytes downliaded and then feed this stream into
> iteratee which will collect all data into one big Map ByteString Integer.
>

I'm not familiar with iteratee-compress, but you could be getting hit by
Map's laziness.  Instead of a map, could you use something like hashmap[1],
bytestring-trie[2], or Johan's new containers library[3]?

Also, I've recently posted a minor update to iteratee which includes an
enumeratee version of break and an alternative to printLines that doesn't
retain data, which you may find useful.

Cheers,
John

[1] http://hackage.haskell.org/package/hashmap
[2] http://hackage.haskell.org/package/bytestring-trie
[3] http://hackage.haskell.org/package/unordered-containers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110222/55987bf5/attachment.htm>


More information about the Haskell-Cafe mailing list