[Haskell-cafe] ANN: bytestring-trie 0.1.0
wren ng thornton
wren at freegeek.org
Sat Dec 20 01:06:10 EST 2008
--------------------------------------------
-- Announcing: bytestring-trie 0.1.0 (beta)
--------------------------------------------
An efficient finite map from (byte)strings to values.
The implementation is based on big-endian patricia trees, like
Data.IntMap. We first trie on the Word8 elements of a Data.ByteString,
sharing string prefixes where possible, and then trie on the big-endian
bit representation of those elements. Patricia trees have efficient
algorithms for union and other merging operations, but they're also
quick for lookups and insertions.
--------------------------------------------
-- Future Extensions
--------------------------------------------
* I've done spot checking to make sure it works, but haven't constructed
an extensive test suite. Does anyone know of a good data set already out
there for testing corner cases? I'm sure other dictionary writers have
come up with one.
* A Word8 is much smaller than the architecture's natural Word size.
Therefore it'd be more efficient for the trie on bits to read off a
whole Word at a time. This work is in progress, but I need help from
people with 64-bit and big-endian machines in order to verify the code
works on those architectures.
* Using ByteStrings allows for trieing on any packed byte representation
of a value, but they're not Strings. Integration with
Data.ByteString.Char8, utf8-string, and utf8-light should happen.
* The current implementation also only accepts strict ByteStrings. When
chopping up strings to share prefixes we share the smaller string. For
very long strings when many deletions are expected, this can still hold
onto more memory than necessary. Accepting lazy ByteStrings or adding
heuristics for when to copy prefixes instead of sharing will fix this.
--------------------------------------------
-- Links
--------------------------------------------
Homepage:
http://code.haskell.org/~wren/
Hackage:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
Darcs:
http://code.haskell.org/~wren/bytestring-trie/
Haddock (Darcs version):
http://code.haskell.org/~wren/bytestring-trie/dist/doc/html/bytestring-trie/
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list