[Haskell-cafe] Optimizing hash array mapped tries
Edward Z. Yang
ezyang at MIT.EDU
Mon Feb 22 02:15:28 EST 2010
Hello all,
I spent this weekend attempting to impement Phil Bagwell's Hash
Array Mapped Tries in Haskell. You can view the fruits of this work at
this repository:
http://github.com/ezyang/hamt
Unfortunately, I'm getting pretty abysmal performance out of the
implementation I wrote, so I'm hoping to get some performance tuning
insights here.
My cost centers look about like this:
time% alloc%
i-Full-update HAMT 29.1 31.4
lookup' HAMT 13.7 4.3
main Main 12.8 19.7
subkey HAMT 11.5 10.0
i-Bitmap HAMT 9.4 12.0
i-Full HAMT 8.1 12.9
insert' HAMT 5.1 0.6
popCount PopCount 4.7 1.0
And my GC stats look like:
./HAMTTest 256000 +RTS -t
275576630962922
<<ghc: 410718260 bytes, 788 GCs, 19646424/46880276 avg/max bytes residency (9 samples),
138M in use, 0.00 INIT (0.00 elapsed), 1.14 MUT (1.13 elapsed), 1.88 GC (2.00 elapsed) :ghc>>
+ 0:03.16
./IntMapTest 256000 +RTS -t
-710684043813
<<ghc: 172126372 bytes, 329 GCs, 5552955/10483896 avg/max bytes residency (9 samples),
31M in use, 0.00 INIT (0.00 elapsed), 0.51 MUT (0.51 elapsed), 0.62 GC (0.67 elapsed) :ghc>>
+ 0:01.18
IntMap is included for comparison. The HAMT does about x3-5 worse than
IntMap, and uses about x4 more memory (and gets correspondingly
penalized when garbage collection occurs).
My observations/suspicions are as such:
* i-Full-update essentially copies a 32-size vector, with a change to
one element. I think I am getting brutally punished for this, in
terms of both memory usage as well as runtime. What I'm curious is
whether or not this is intrinsic to the algorithm, or if it's
something special that GHC is doing.
* Bagwell suggests that each node in the Array-Mapped Trie should take
only two words. I think I'm losing another two words to storing
bounds information in vector, as well as the costs of boxing (though I
can't tell if GHC is boxing or not). I've done some playing around
with the data declaration, but it's current form seems to result in
the fastest implementation. I wonder if this is what is causing the
x4 ballooning of memory, or if it's something else. I haven't tried
rewriting the code to use GHC's arrays.
* Increasing the heap size seems to generally improve the performance
the HAMT; a 1G heap on a 512K sample set reduces the difference to
about x1.5-2 slower. It's hard to tell if the memory allocation
system is working for or against me.
What bothers me is even if I could magically wave away all GC time in
HAMT, it would only barely win against IntMap. Maybe the algorithm is
simply not as good as a good ole' Patricia Trie; but then again, maybe
not.
Comments and suggestions greatly appreciated.
Thanks,
Edward
More information about the Haskell-Cafe
mailing list