[Haskell-cafe] iteratee-compress space leak?
jwlato at gmail.com
Tue Feb 22 14:03:22 CET 2011
2011/2/22 Michael A Baikov <pacak at bk.ru>
> -----Original Message-----
> > Hi Maciej,
> > Thanks for looking in to this.
> > > After looking into problem (or rather onto your code) - the problem
> > > nothing to do with iteratee-compress I believe. I get similar behaviour
> > > and results when I replace "joinIM $ enumInflate GZip
> > > defaultDecompressParams chunkedRead" by chunkedRead. (The memory is
> > > smaller but it is due to decompression not iteratee fault).
> > >
> > This is due to "printLines". Whether it's a bug depends on what the
> > behavior of "printLines" should be.
> > "printLines" currently only prints lines that are terminated by an EOL
> > (either "\n" or "\r\n"). This means that it needs to hold on to the
> > stream received until it finds EOL, and then prints the stream, or drops
> > if it reaches EOF first. In your case, the stream generated by
> > consChunk printLines" is just a stream of numbers without any EOL, where
> > length is dependent on the specified block size. This causes the space
> > leak.
> > If I change the behavior of "printLines" to print lines that aren't
> > terminated by EOL, the leak could be fixed. Whether that behavior is
> > useful than the present, I don't know. Alternatively, if you insert some
> > newlines into your stream this could be improved as well.
> > As a result of investigating this, I realized that
> > Data.Iteratee.ListLike.break can be very inefficient in cases where the
> > predicate is not satisfied relatively early. I should actually provide an
> > enumeratee interface for it. So thanks very much for (indirectly)
> > suggesting that.
> Actually i can give you full sorce code - it uses also attoparsec-iteratee.
> it leaks with iteratee-compress and works fine without it.
> Whole idea - get bytestring from access.log, convert it to stream of data
> object with usernames and bytes downliaded and then feed this stream into
> iteratee which will collect all data into one big Map ByteString Integer.
I'm not familiar with iteratee-compress, but you could be getting hit by
Map's laziness. Instead of a map, could you use something like hashmap,
bytestring-trie, or Johan's new containers library?
Also, I've recently posted a minor update to iteratee which includes an
enumeratee version of break and an alternative to printLines that doesn't
retain data, which you may find useful.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe