[Haskell-cafe] Increasing memory use in stream computation

Claude Heiland-Allen claude
Thu Oct 10 14:39:38 UTC 2013


Hi Arie,

On 10/10/13 14:02, Arie Peterson wrote:
> (Sorry for the long email.)
> 
> Summary: why does the attached program have non-constant memory use?

Looking at the heap profile graph (generated with +RTS -h, no need to
compile with profiling) I see the increasing memory use is split about
evenly between STACK and BLACKHOLE.  I don't know what that means or why
it occurs, but replacing `small` solved that problem for me:

small = V.fromList <$> S.stream (replicateM 7 [-1,0,0,1])

I get the same output 3999744 from your version and my changed version.


Claude



> 
> 
> ==== Introduction ====
> 
> I've written a program to do a big computation. Unfortunately, the computation 
> takes a very long time (expectedly), and the memory use increases slowly 
> (unexpectedly), until it fills up the entire memory and swap space of the 
> computer (many gigabytes).
> 
> The rough structure of the program is:
> 
> ? create a long (up to 20 million) list of objects;
> ? compute a number for each of those objects;
> ? compute the sum of the resulting list.
> 
> I switched the intermediate data structure from a list to a Stream (from the 
> stream-fusion package), hoping to fix the memory issue. It decreased both the 
> memory use and the rate of its increase, but after a long time, the program 
> still uses up all available memory.
> 
> ==== A simple program
> 
> After many hours of cutting down my program, I now have a small program 
> (attached) that shows the same behaviour. It uses only said stream-fusion 
> package, and vector. (I haven't yet tried to cut out the use of vector. I hope 
> it is irrelevant, because all vectors are of fixed small size.)
> 
> I compile the program with ghc-7.6.1 using
> 
>> ghc --make -threaded -rtsopts -with-rtsopts="-M1G -K128M" -O2 -main-is
>>   Test.main Test
> 
> The rts options may not be strictly necessary: I added them at some point to 
> allow the use of multiple cores, and to prevent the program from crashing the 
> machine by using all available memory.
> 
> When running the program, the resident memory quickly grows to about 3.5 MB 
> (which I am fine with); this stays constant for a long time, but after about 7 
> minutes, it starts to grow further. The growth is slow, but I really would 
> hope this program to run in constant memory.
> 
> ==== The code ====
> 
> Note that I added an instance for Monad Stream, using concatMap. This is 
> implicitly used in the definition of the big stream.
> 
> The source of Data.Stream contains many alternative implementations of concat 
> and concatMap, and alludes to the difficulty of making it fuse properly. Could 
> it be that the fusion did not succeed in this case?
> 
> 
> Thanks for any help!
> 
> Regards,
> 
> Arie

-- 
http://mathr.co.uk





More information about the Haskell-Cafe mailing list