[Haskell-cafe] help optimizing memory usage for a program

Mon Mar 2 12:16:10 EST 2009

Bulat Ziganshin ha scritto:
> Hello Manlio,
> 
> Monday, March 2, 2009, 6:30:51 PM, you wrote:
> 
>> The process-data-1 program parse the entire dataset using about 1.4 GB
>> of memory (3x increment).
> 
>> This is strange.
>> The memory required is proportional to the number of ratings.
>> It may be IntMap the culprit, or the garbage collector than does not 
>> release the memory to the operating system (or, worse, does not 
>> deallocate all used temporary memory).
> 
> ghc has 2 garbage collectors.

 > [...]

I already tried to enable the compacting collection (-c RTS option).

Here are some numbers
(when parsing only 5000 movie files, with process-data-1):

1) With default collection algorithm, I have:

     real	1m4.599s
     user	0m49.131s
     sys	        0m1.492s

     409 MB

2) With -c option:

     real  1m45.197s
     user  0m59.248s
     sys	  0m1.640s

     418 MB

So, nothing changed.

 >
 > moreover, you may set up"growing factor". with a g.f. of
> 1.5, for example, memory will be collected once heap will become 1.5x
> larger than real memory usage after last GC. this effectively
> guarantees that memory overhead will never be over this factor
> 

Thanks.
This seems to be effective (but it also reduce performances).

3) With -F1 option

     real  9m59.789s
     user  9m41.844s
     sys	  0m5.608s

     264 MB

I have to parse the whole data set, to check if memory usage is good.

> look at GHC manual, RTS switches section. and last - GHC never returns
> memory to the OS, there should be ticket on this
> 

The same problem with Python.

By the way: I have written the first version of the program to parse 
Netflix training data set in D.
I also used ncpu * 1.5 threads, to parse files concurrently.

However execution was *really* slow, due to garbage collection.
I have also tried to disable garbage collection, and to manually run a 
garbage cycle from time to time (every 200 file parsed), but the 
performance were the same.

Running the Haskell version with -F1 *seems* (I have to check) to be as 
slow as the D version.

So, it seems that for this class of problems it is better to do manual 
memory management (and it is not even hard).
When I find some time I will try to write a C++ version (I also have a 
Python version, but it is very memory inefficient).

At least I hope that I can serialize the parsed data in a binary file, 
and then read it again, with optimized memory usage.

Thanks  Manlio Perillo