[Haskell-beginners] Memory tuning

Thorsten Hater th at tp1.rub.de
Mon Oct 11 17:29:30 EDT 2010


 On 11.10.2010 17:54, Daniel Fischer wrote:
> On Monday 11 October 2010 16:14:11, Thorsten Hater wrote:
>> Not only is the actual memory consumption quite high, but the
>> GC/computation ratio is almost 1:1.
> You construct a largish map (about 350000 entries), doing a lot (more than 
> a million) of insertWith's.
> That means frequent rebalancing (for new circumferences) and many updates 
> of already present values (which in all likelihood are heap allocated 
> Ints), both give the garbage collector a lot of work. Further, you 
> construct a lot of lists, which need to be GCed too.
> Also, the memory overhead of Maps is nontrivial, I think five or six words 
> per item, so a Map Int Int needs about size*8(or 7)*sizeof(Int) bytes.
>
> You can reduce the GC time by specifying a larger allocation area (e.g.
> +RTS -A16M -RTS, the default size is 512K) thus making GC less frequent.
>
>> I assume that there is something to be done regarding strictness,
> Nothing obvious.
> But for the computation pattern you use, since the ratio of possible values 
> to occurring values is not too high, using accumArray (on an Unboxed array) 
> is far better.
>
>> but a series of experiments with
>> BangPatterns return no yields at all (using foldr instead of foldl'
>> produces stack overflow)
> Yes, using foldr with insert(With)s into Maps is a Bad Idea™
>
Thank you for pointing me to Unboxed Arrays and accumarray, it works
like a charm.
The new statistics:

      35,180,328 bytes allocated in the heap
          13,608 bytes copied during GC
      12,027,912 bytes maximum residency (2 sample(s))
         594,520 bytes maximum slop
              13 MB total memory in use (0 MB lost due to fragmentation)

  Generation 0:    43 collections,     0 parallel,  0.00s,  0.00s elapsed
  Generation 1:     2 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    0.14s  (  0.17s elapsed)
  GC    time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    0.14s  (  0.17s elapsed)

  %GC time       0.0%  (0.6% elapsed)

  Alloc rate    251,288,057 bytes per MUT second

  Productivity 100.0% of total user, 82.9% of total elapsed

I don't think there is much left to do in terms of optimization.
So I'm left with the question of when to use Data.Map, if accumarray
is so much better even for heavy number crunching as this.
Especially when Data.Map advertises as efficient.
(Even without Unboxed, normal Arrays beat Maps by a factor of 2)
Best regards.


More information about the Beginners mailing list