[Haskell-cafe] Boxed Mutable Arrays

Wed Dec 16 13:00:12 EST 2009

On Tue, Dec 15, 2009 at 12:00 PM, Gregory Collins
<greg at gregorycollins.net>wrote:

> Bulat Ziganshin <bulat.ziganshin at gmail.com> writes:
>
> > now i see what you mean. no, i mean trivial transformation. #650 says
> > about slow GC. why it's slow? because once you made any update to the
> > array, the entire array is marked as updated and scanned on next minor
> > GC (which occurs after every 512 kbytes allocated, afaik). let's
> > replace big array (say, of 10,000 elements) with array of 100 arrays
> > of 100 elements each. now, between minor GCs only some of arrays will
> > be changed and entire amount of memory to be scanned will become less
>
> I actually tried this, and modified Data.HashTable to use a two-tiered
> chunked dynamic array as you suggest. In some limited testing it was
> only about 5% faster, so I gave up on it -- you get some GC time back
> but you also pay a significant indirection penalty by adding an extra
> cache line miss for every operation.
>

This will depend very much on usage patterns and array size. As that array
size approaches the size of available memory the difference becomes
enormous. Then any single entry mutation forces a linear scan of all of
memory. With a two-tier design you only scan a square root of that. I ran
benchmarks on a similar hack a year or two ago. Since the difference is
asymptotic, you can make it as large of a percentage as you want just by
using a bigger hash table/array. I would hazard that your test case wasn't
large enough to see the problem at its worst, or your costs were dominated
by other factors than GC.

-Edward Kmett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20091216/17e7db26/attachment.html