[Haskell-cafe] Increasing memory use in stream computation
Arie Peterson
ariep
Thu Oct 10 18:23:56 UTC 2013
Hi Bertram,
> Unfortunately, I don't know. I'll intersperse some remarks and
> propose an alternative to stream fusion at the end, which allows
> your test program to run in constant space.
>
>
> A quicker way to spot the increased memory usage is to look at GC
> statistics. I used
>
> > ./Test +RTS -Sstderr 2>&1 | grep 'Gen: 1'
>
> [?]
Thanks for the suggestion.
> I had a glimpse at the core code generated by ghc, but the amount of
> code is overwhelming. From reading the source code, and as far as my
> intuition goes, the code *should* run in constant space.
Yes, my intuition says the same. For me though, this is only based on an
informal imperative interpretation of the code, not on any understanding of
the Stream internals.
> As an experiment, I rewrote the code using difference lists, and the
> result ran in constant memory. I then tried to abstract this idea into
> a nice data type. (I lost the low-level difference list code on the way,
> but the code was quite hard to read anyway.)
> [?]>
> I'll attach the full code below (it's a separate module, "Stream.hs"
> that can be imported instead of Data.Stream for your small example.)
> With that replacement, the code runs in constant space and becomes
> about 3x faster. Using the 'singleton' function for 'return' results
> in an additional, but very modest (about 10%) speedup.
>
> I wonder whether that approach scales up to your real code.
Awesome, thanks for all this work!
Unfortunately, the modified full program does not use constant memory. I
replaced Data.Stream by your Stream, and added one more function "append" to
it (which was very natural, given the difference list nature). This version
increases its resident size by about 1MB after roughly 10 minutes, and then
again after another 10 minutes.
This makes it a significant improvement over Data.Stream, memory-wise, but I'm
not sure if it will be enough to perform the entire computation without
running out of memory.
More information about the Haskell-Cafe
mailing list