[Haskell-cafe] ANN: bytestring-trie 0.1.4

Tue Jan 13 00:00:45 EST 2009

Don Stewart wrote:
> Do you have any benchmarks comparing dictionaries against Map ByteString
> Int, or Map String Int?

I haven't personally run them, but Mark Wotton has compared 
[(ByteString,Int)] vs (Map ByteString Int) vs (Trie Int) version 0.1.1 
or 0.1.2 and using data from /usr/share/dict/words:

9:22 ~/projects/current/MapBench % ghc --make  Test.hs
../../Microbench.hs && ./Test
[1 of 2] Compiling Microbench       ( ../../Microbench.hs, 
../../Microbench.o )
[2 of 2] Compiling Main             ( Test.hs, Test.o )
Linking Test ...
* alist lookup:
   160.641ns per iteration / 6225.07 per second.
* map lookup:
   0.881ns per iteration / 1135623.22 per second.
* Trie lookup:
   0.243ns per iteration / 4116930.41 per second.

The benchmarking code requires hacking microbench to use Integers 
instead of Ints, to avoid counter overflow. I haven't yet worked the 
benchmarking code into the Darcs repo, but it's available to those 
interested. I'll add it to the repo soon.

The cost of inserting elements is higher for Trie than Map (I don't have 
the numbers). One possible reason for this is that the generic alterBy 
doesn't know whether it's inserting or deleting, so it uses smart 
constructors along the spine which adds the cost of consistency checks. 
I've planned to add a dedicated implementation for insertBy (or 
modifyBy, or some such name) in order to test this hypothesis.

The cost of merging is still unknown, but big-endian patricia trees have 
better asymptotic complexity than the balanced trees used by Map, so 
Trie is probably better there too.

Another improvement in the works is a smarter algorithm for splitting 
strings. Once it's in place, this should speed up all three of the main 
algorithms.

-- 
Live well,
~wren