[Haskell-cafe] ANN: bytestring-trie 0.1.4
wren ng thornton
wren at freegeek.org
Tue Jan 13 00:00:45 EST 2009
Don Stewart wrote:
> Do you have any benchmarks comparing dictionaries against Map ByteString
> Int, or Map String Int?
I haven't personally run them, but Mark Wotton has compared
[(ByteString,Int)] vs (Map ByteString Int) vs (Trie Int) version 0.1.1
or 0.1.2 and using data from /usr/share/dict/words:
9:22 ~/projects/current/MapBench % ghc --make Test.hs
../../Microbench.hs && ./Test
[1 of 2] Compiling Microbench ( ../../Microbench.hs,
../../Microbench.o )
[2 of 2] Compiling Main ( Test.hs, Test.o )
Linking Test ...
* alist lookup:
160.641ns per iteration / 6225.07 per second.
* map lookup:
0.881ns per iteration / 1135623.22 per second.
* Trie lookup:
0.243ns per iteration / 4116930.41 per second.
The benchmarking code requires hacking microbench to use Integers
instead of Ints, to avoid counter overflow. I haven't yet worked the
benchmarking code into the Darcs repo, but it's available to those
interested. I'll add it to the repo soon.
The cost of inserting elements is higher for Trie than Map (I don't have
the numbers). One possible reason for this is that the generic alterBy
doesn't know whether it's inserting or deleting, so it uses smart
constructors along the spine which adds the cost of consistency checks.
I've planned to add a dedicated implementation for insertBy (or
modifyBy, or some such name) in order to test this hypothesis.
The cost of merging is still unknown, but big-endian patricia trees have
better asymptotic complexity than the balanced trees used by Map, so
Trie is probably better there too.
Another improvement in the works is a smarter algorithm for splitting
strings. Once it's in place, this should speed up all three of the main
algorithms.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list