[Haskell-cafe] ANN: bytestring-trie 0.1.0
Sterling Clover
s.clover at gmail.com
Sun Dec 21 13:17:56 EST 2008
Thanks for working on this! A nice efficient bytestring-trie is the
sort of data structure we should have had in Haskell for some time
now, and I'm sure I'll be giving it a great deal of use.
Regards,
Sterl.
On Dec 20, 2008, at 1:06 AM, wren ng thornton wrote:
> --------------------------------------------
> -- Announcing: bytestring-trie 0.1.0 (beta)
> --------------------------------------------
>
> An efficient finite map from (byte)strings to values.
>
> The implementation is based on big-endian patricia trees, like
> Data.IntMap. We first trie on the Word8 elements of a
> Data.ByteString, sharing string prefixes where possible, and then
> trie on the big-endian bit representation of those elements.
> Patricia trees have efficient algorithms for union and other
> merging operations, but they're also quick for lookups and insertions.
>
>
> --------------------------------------------
> -- Future Extensions
> --------------------------------------------
>
> * I've done spot checking to make sure it works, but haven't
> constructed an extensive test suite. Does anyone know of a good
> data set already out there for testing corner cases? I'm sure other
> dictionary writers have come up with one.
>
> * A Word8 is much smaller than the architecture's natural Word
> size. Therefore it'd be more efficient for the trie on bits to read
> off a whole Word at a time. This work is in progress, but I need
> help from people with 64-bit and big-endian machines in order to
> verify the code works on those architectures.
>
> * Using ByteStrings allows for trieing on any packed byte
> representation of a value, but they're not Strings. Integration
> with Data.ByteString.Char8, utf8-string, and utf8-light should happen.
>
> * The current implementation also only accepts strict ByteStrings.
> When chopping up strings to share prefixes we share the smaller
> string. For very long strings when many deletions are expected,
> this can still hold onto more memory than necessary. Accepting lazy
> ByteStrings or adding heuristics for when to copy prefixes instead
> of sharing will fix this.
>
>
> --------------------------------------------
> -- Links
> --------------------------------------------
>
> Homepage:
> http://code.haskell.org/~wren/
>
> Hackage:
> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/
> bytestring-trie
>
> Darcs:
> http://code.haskell.org/~wren/bytestring-trie/
>
> Haddock (Darcs version):
> http://code.haskell.org/~wren/bytestring-trie/dist/doc/html/
> bytestring-trie/
>
> --
> Live well,
> ~wren
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list