[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library

ChrisK haskell at list.mightyreason.com
Mon Aug 27 13:57:19 EDT 2007


Gleb Alexeyev wrote:
> Bryan O'Sullivan wrote:
>> I just posted a library named suffixtree to Hackage.
>>
>> http://www.serpentine.com/software/suffixtree/
>>
>> It implements Giegerich and Kurtz's lazy construction algorithm, with
>> a few tweaks for better performance and resource usage.
>>
>> API docs:
>>
>> http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html
>>
>> I've tested it on multi-megabyte input strings.
>>
> I think I found a bug:
> import qualified Data.SuffixTree as T
> 
>> T.countRepeats "ab" (T.construct "abab")
> 1


That is almost certainly because the algorithm expects the source string to have
a unique character at its end.

The library should either make that clear or add such a character on its own.

Otherwise the "ab" suffix is a prefix of the "abab" suffix and the shorter one
gets lost.  If you end in "$" then "ab$" cannot merge with "abab$" and there are
no distinct suffixes a and b for which (isPrefixOf a b) is true.

Example:

> *Data.SuffixTree> countRepeats "ab" (construct "abab")
> 1
> *Data.SuffixTree> countRepeats "ab" (construct "abab$")
> 2

-- 
Chris



More information about the Haskell-Cafe mailing list