[Haskell-cafe] Increasing memory use in stream computation
Bertram Felgenhauer
bertram.felgenhauer
Thu Oct 10 17:03:41 UTC 2013
Arie Peterson wrote:
> (Sorry for the long email.)
>
> Summary: why does the attached program have non-constant memory use?
Unfortunately, I don't know. I'll intersperse some remarks and
propose an alternative to stream fusion at the end, which allows
your test program to run in constant space.
> ==== A simple program
>
> When running the program, the resident memory quickly grows to about 3.5 MB
> (which I am fine with); this stays constant for a long time, but after about 7
> minutes, it starts to grow further. The growth is slow, but I really would
> hope this program to run in constant memory.
A quicker way to spot the increased memory usage is to look at GC
statistics. I used
> ./Test +RTS -Sstderr 2>&1 | grep 'Gen: 1'
569904 8192 65488 0.00 0.00 0.01 0.01 0 0 (Gen: 1)
516520 9768 67080 0.00 0.00 4.23 4.23 0 0 (Gen: 1)
513824 14136 71448 0.00 0.00 8.43 8.44 0 0 (Gen: 1)
515856 16728 74040 0.00 0.00 12.70 12.75 0 0 (Gen: 1)
515416 19080 76392 0.00 0.00 17.01 17.11 0 0 (Gen: 1)
515856 22248 79560 0.00 0.00 21.33 21.48 0 0 (Gen: 1)
514936 25080 82392 0.00 0.00 25.65 25.84 0 0 (Gen: 1)
514936 28632 85944 0.00 0.00 29.94 30.16 0 0 (Gen: 1)
513512 32328 89640 0.00 0.00 34.24 34.48 0 0 (Gen: 1)
515224 37032 127112 0.00 0.00 38.35 38.62 0 0 (Gen: 1)
Note the increasing values in the third column; that's the live bytes
after each major GC.
> ==== The code ====
>
> Note that I added an instance for Monad Stream, using concatMap. This is
> implicitly used in the definition of the big stream.
>
> The source of Data.Stream contains many alternative implementations of concat
> and concatMap, and alludes to the difficulty of making it fuse properly. Could
> it be that the fusion did not succeed in this case?
I had a glimpse at the core code generated by ghc, but the amount of
code is overwhelming. From reading the source code, and as far as my
intuition goes, the code *should* run in constant space.
As an experiment, I rewrote the code using difference lists, and the
result ran in constant memory. I then tried to abstract this idea into
a nice data type. (I lost the low-level difference list code on the way,
but the code was quite hard to read anyway.)
I ended up with an odd mixture of a difference lists and a continuation
that should be applied to each element:
data Stream a where
Stream :: (forall r. (a -> r) -> [r] -> [r]) -> Stream a
with Stream s representing the list s id []. The motivation for the
(a -> r) argument is that it makes fmap trivial:
fmap f (Stream s) = Stream (\g -> s (g . f))
I'll attach the full code below (it's a separate module, "Stream.hs"
that can be imported instead of Data.Stream for your small example.)
With that replacement, the code runs in constant space and becomes
about 3x faster. Using the 'singleton' function for 'return' results
in an additional, but very modest (about 10%) speedup.
I wonder whether that approach scales up to your real code.
Enjoy,
Bertram
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Stream.hs
Type: text/x-haskell
Size: 1064 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20131010/7e77825a/attachment.hs>
More information about the Haskell-Cafe
mailing list