[Haskell-cafe] help optimizing memory usage for a program

Mon Mar 2 16:34:28 EST 2009

Kenneth Hoste ha scritto:
> [...]
> The 26m/700MB I mentioned on my blog was on my ancient PowerBook G4 
> (1.5GHz PowerPC G4, 1.25G).
> 
> I redid the same experiment on our iMac (Core2 Duo, 2.0 GHz, 3.0G), i.e.:
> - read in all the data
> - count the number of keys in the IntMap (which should be 17,770, i.e. 
> the number of movies)
> - compute the mean overall movie rating (which should be 3.6033)
> 
> That task was done in 3m42s, using just 632M of memory, using the 
> following command:
> 
> ./netflix ./training_set/ +RTS -A128M -s
> 
> The -A option makes sure GC isn't cleaning up stuff too frequently, 
> while -s just reports some statistics.
> 

Ok, thanks.

> The way in which I'm reading in the data is somewhat different from yours.
> I construct the IntMap from the ground up, i.e. starting with an empty 
> IntMap and using foldM,
> together with a function readMovie with the following type:
> 
> readMovie :: IntMap (UArray Int Int, UArray Int Word8) -> FilePath -> IO 
> (IntMap (UArray Int Int, UArray Int Word8))
> 
> In readMovie, I'm using the 'insert' function provided by the IntMap 
> module, which justs insert a new key-value pair
> in the existing IntMap.
> 

This is the same thing I was doing in a previous version.
Some tests showed me that the performances (and memory usage) were 
pratically the same, so I switched to the new implementation.

> Your approach is very different: you create 17,770 IntMaps with a single 
> key/value pair in them,
> and then use union to combine them all.
> 
> I profiled your approach on the same iMac (using the same +RTS options),
> and it needed 4m33s to run, using 948M of memory.
> 

This is very strange, because in my tests I got very similar execution 
time and memory usage.

Did you ran my program? Or did you just modified your code?

> I think my approach is turning out better because I'm:
> 
> - building up the IntMap using 'empty' and 'insert', instead of 
> combining 17,770 'singleton' IntMaps
>   (which probably results better GC behavior)
> - using UArray instead of Urr (although I don't know if that actually 
> makes a difference here)
> 
> I hope this helps you with figuring out what the bottleneck is on your 
> side.
> It took me several days to come up with this approach, with the help 
> from various Haskellers at IRC,
> so I'm surely no expert...
> 
> I've thrown my current code online at 
> http://boegel.kejo.be/files/Netflix_read-and-parse_24-02-2009.hs ,
> let me know if it's helpful in any way...
> 

Thanks for sharing!

I have executed your program:

   real	6m6.394s
   user	1m6.444s
   sys	0m7.180s

   640 MB usage

So memory usage don't changes with new versions of GHC
(did you tried the new parallel garbage collector?)

As for running time, it is my Hard Disk that is slower (the CPU is the 
same, but I have 2G of RAM).

After reading your code I have a suspect.
I use ByteString.Lazy to read the file content, and you instead use a 
strict ByteString.

So I have updated the code to use a strict ByteString (maybe an even 
better solution is to use bytestring-mmap? [1])

I have executed the program, using the same RTS flags as yours:

   real	6m13.523s
   user	0m53.931s
   sys	0m7.812s

   815 MB usage

This is an huge improvement!

Now I have to check if using insert will further improve memory usage.
It's unfortunate that GC is not friendly with singleton + union solution.

> Also, I was indeed using GHC 6.10.1, although I'm unsure to what extent 
> that matter.
> 
> greetings,
> 
> Kenneth
> 

[1] unfortunately with bytestring-mmap I'm not sure to have control over
     resources usage

Regards   Manlio