[Haskell-cafe] Finding longest common prefixes in a list
Twan van Laarhoven
twanvl at gmail.com
Sat Jan 21 14:18:28 CET 2012
On 2012-01-20 23:44, Gwern Branwen wrote:
> On Fri, Jan 20, 2012 at 1:57 PM, Twan van Laarhoven<twanvl at gmail.com> wrote:
>> Here is some example code (untested):
>
> Well, you're right that it doesn't work. I tried to fix the crucial
> function, 'atLeastThisManyDescendants', but it's missing something
> because varying parts doesn't much affect the results when I try it
> out on example input - it either returns everything or nothing, it
> seems:
> atLeastThisManyDescendants :: Int -> Trie a -> [CommonPrefix a]
> atLeastThisManyDescendants minD trie@(Trie l d t')
> | d < minD = []
> | null forChildren = [Prefix [] trie]
> | otherwise = forChildren
> where
> forChildren = [ Prefix (x:pfx) nms
> | (x,t) <- Map.toList t'
> , Prefix pfx nms <- atLeastThisManyDescendants l t ]
It should be "atLeastThisManyDescendants minD t", minD is a threshold for the
minimum numer of descendants, and it stays the same in the recursive call.
That's what you get for not testing your code :)
With the correct function I get a result like:
*Main> mapM_ (print . prefix) $ atLeastThisManyDescendants 4 test1
"gumi-"
"luka-"
"miku-a"
"miku-h"
"miku-m"
"miku-n"
"miku-p"
"miku-r"
"miku-s"
"miku-t"
"rin-"
Notice that there are lots of "miku-X" prefixes found. This is probably not what
you want. What exactly do you want the algorithm to do? For example, "" is
obviously a prefix of every string, but it is not very long. On the other hand,
each string is a prefix of itself, but that prefix is shared by only one string
(usually).
By the way, the sort and compare adjacent pairs approach corresponds to
"atLeastThisManyDescendants 2".
Twan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 2012-01-21 common-prefix.hs
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120121/b04c9466/attachment.asc>
More information about the Haskell-Cafe
mailing list