[Haskell-cafe] Finding longest common prefixes in a list

Sat Jan 21 14:18:28 CET 2012

On 2012-01-20 23:44, Gwern Branwen wrote:
> On Fri, Jan 20, 2012 at 1:57 PM, Twan van Laarhoven<twanvl at gmail.com>  wrote:
>> Here is some example code (untested):
>
> Well, you're right that it doesn't work. I tried to fix the crucial
> function, 'atLeastThisManyDescendants', but it's missing something
> because varying parts doesn't much affect the results when I try it
> out on example input - it either returns everything or nothing, it
> seems:
> atLeastThisManyDescendants :: Int -> Trie a -> [CommonPrefix a]
> atLeastThisManyDescendants minD trie@(Trie l d t')
>    | d < minD = []
>    | null forChildren = [Prefix [] trie]
>    | otherwise = forChildren
>  where
>    forChildren = [ Prefix (x:pfx) nms
>                  | (x,t) <- Map.toList t'
>                  , Prefix pfx nms <- atLeastThisManyDescendants l t ]

It should be "atLeastThisManyDescendants minD t", minD is a threshold for the 
minimum numer of descendants, and it stays the same in the recursive call.

That's what you get for not testing your code :)

With the correct function I get a result like:

     *Main> mapM_ (print . prefix) $ atLeastThisManyDescendants 4 test1
     "gumi-"
     "luka-"
     "miku-a"
     "miku-h"
     "miku-m"
     "miku-n"
     "miku-p"
     "miku-r"
     "miku-s"
     "miku-t"
     "rin-"

Notice that there are lots of "miku-X" prefixes found. This is probably not what 
you want. What exactly do you want the algorithm to do? For example, "" is 
obviously a prefix of every string, but it is not very long. On the other hand, 
each string is a prefix of itself, but that prefix is shared by only one string 
(usually).

By the way, the sort and compare adjacent pairs approach corresponds to 
"atLeastThisManyDescendants 2".

Twan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 2012-01-21 common-prefix.hs
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120121/b04c9466/attachment.asc>