[Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary
tree implementation of Sets and Maps
wren ng thornton
wren at freegeek.org
Thu Jul 2 20:59:01 EDT 2009
Don Stewart wrote:
> wren:
>> Alex Mason wrote:
>>> TernaryTrees is a package that extends Data.Set ad Data.Map with some
>>> ternary tree structures, based on the article
>>> [http://www.pcplus.co.uk/node/3074/] .
>> For the string (or rather ByteString) version:
>>
>> http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie
>>
>> Which has a number of other significant performance improvements (e.g.
>> node fusion, ByteString instead of String) and a highly expressive
>> interface. Because it uses ByteStrings it can trie any type which can be
>> serialized into a vector of bits[1], albeit indirectly.
>>
>> The real trick with tries is not in just having them[2], it's in having
>> the right interface to make use of what they're good at. For example, if
>> I have multiple tries, I'd like to merge them without doing it element
>> by element[3]. Or if I know I'm going to be making a number of similar
>> queries, it'd be nice if I could cache my position in the trie[4] to
>> avoid repeating the work for the prefixes of all my queries[5]. Using
>> tricks like these leads to significant improvements over using them like
>> hashtables; tries aren't hashtables just like lists aren't arrays.
>
> Do you have benchmarks?
Somewhere in my email archive (care of Mark Wotton). I'll see if I can
dig them up this weekend. The biggest issue here is finding nice
datasets (and tasks) to give reasonable benchmarks for. Reading in all
of /usr/dict (or the Brown corpus) and looking up all keys only gives
one perspective (or two), and not necessarily the most helpful one for
"real world" use. I haven't found any good dataset/task suites like
there are for the Language Benchmarks Game, though I'd love to hear
about one.
The tries /= hashtables comment stems from discussions on various
haskell blogs with people inventing their own (or wanting to benchmark
Data.Map vs hashtables vs tries vs bloomfilters). As a drop-in
replacement tries will perform adequately, but they're nothing
overwhelming; the overwhelming comes from changing the usage algorithms
to match the "stride" of the datastructure. I don't think I have links
to these discussions anymore to pull up code examples.
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list