Faster Array#/MutableArray# copies

Fri Feb 18 02:18:11 CET 2011

Hi,

Daniel Peebles has done a great job adding new primops for copying
arrays (both Array# and MutableArray#). This gave me a nice speedup
for my hash-array mapped trie (HAMT) data structure, which is now
about as fast as Data.Map for inserts but about 5.5x faster for
lookups. (It should also have a lower per key/value overhead.)

However, for the HAMT is more than twice as slow for inserts compared
to IntMap and copying arrays is still the bottleneck.

C compilers, like gcc, go to great lengths making memcpy fast and I
was thinking that we might be able to steal a trick or two from them.
I'd like some feedback on these ideas:

 * Could we try to make the array primops inline? If so, would that
mean that if e.g. the number of elements to copy is known at the
(Haskell) call site it would also be known in C-- and we could get the
benefits of knowing that?

 * Could we use built-in compiler rules to catch array copies of known
length and replace them with e.g. unrolled loops? My particular use
case involves copying small arrays (size: 1-32). Ideally this should
be as fast as copying a tuple of the corresponding size but I'm pretty
sure we're far off that goal.

 * Can we use SSE instructions?

 * Can we get the C memcpy code inlined into the C-- source (crazy, I
know). If so we could perhaps benefit directly from optimizations in
libc.

Johan