[Haskell-cafe] Increasing memory use in stream computation
Thu Oct 10 14:39:38 UTC 2013
On 10/10/13 14:02, Arie Peterson wrote:
> (Sorry for the long email.)
> Summary: why does the attached program have non-constant memory use?
Looking at the heap profile graph (generated with +RTS -h, no need to
compile with profiling) I see the increasing memory use is split about
evenly between STACK and BLACKHOLE. I don't know what that means or why
it occurs, but replacing `small` solved that problem for me:
small = V.fromList <$> S.stream (replicateM 7 [-1,0,0,1])
I get the same output 3999744 from your version and my changed version.
> ==== Introduction ====
> I've written a program to do a big computation. Unfortunately, the computation
> takes a very long time (expectedly), and the memory use increases slowly
> (unexpectedly), until it fills up the entire memory and swap space of the
> computer (many gigabytes).
> The rough structure of the program is:
> ? create a long (up to 20 million) list of objects;
> ? compute a number for each of those objects;
> ? compute the sum of the resulting list.
> I switched the intermediate data structure from a list to a Stream (from the
> stream-fusion package), hoping to fix the memory issue. It decreased both the
> memory use and the rate of its increase, but after a long time, the program
> still uses up all available memory.
> ==== A simple program
> After many hours of cutting down my program, I now have a small program
> (attached) that shows the same behaviour. It uses only said stream-fusion
> package, and vector. (I haven't yet tried to cut out the use of vector. I hope
> it is irrelevant, because all vectors are of fixed small size.)
> I compile the program with ghc-7.6.1 using
>> ghc --make -threaded -rtsopts -with-rtsopts="-M1G -K128M" -O2 -main-is
>> Test.main Test
> The rts options may not be strictly necessary: I added them at some point to
> allow the use of multiple cores, and to prevent the program from crashing the
> machine by using all available memory.
> When running the program, the resident memory quickly grows to about 3.5 MB
> (which I am fine with); this stays constant for a long time, but after about 7
> minutes, it starts to grow further. The growth is slow, but I really would
> hope this program to run in constant memory.
> ==== The code ====
> Note that I added an instance for Monad Stream, using concatMap. This is
> implicitly used in the definition of the big stream.
> The source of Data.Stream contains many alternative implementations of concat
> and concatMap, and alludes to the difficulty of making it fuse properly. Could
> it be that the fusion did not succeed in this case?
> Thanks for any help!
More information about the Haskell-Cafe