[Haskell-beginners] Longest common prefix of a list of lists

Daniel Fischer daniel.is.fischer at googlemail.com
Fri Apr 29 21:15:13 CEST 2011


On Friday 29 April 2011 20:17:27, Aai wrote:
> I don't know for sure if the following is elegant or not, but:

In a way, yes. It's a composition of library functions, so you delegate the 
explicit recursion to those. There are a few points where you can increase 
elegance and performance here, see inline comments.

But it's very inefficient and the suggestions don't change that.

> 
> a = "aaabbbbbbcccccc"
> b = "aaabbbccccccd"
> c = "aaabbbbbbbbbccccffffdd"
> 
> takeCommonPrefix xs =
>   flip take xs. length . filter (`isPrefixOf`xs) . drop 1. inits

Don't use filter here, as soon as you hit an initial segment which isn't a 
prefix of xs, no later one will be, so it's a job for takeWhile.

Also, you already have the common prefixes, so why count and take from xs?
Just take the longest:

takeCommonPrefix xs = last . takeWhile (`isPrefixOf` xs) . inits

> 
> *Main> foldl1 takeCommomPrefix  [a,b,c]
> "aaabbb"
> 
> *Main> foldl1 takeCommonPrefix  [a,b,c,""]
> ""
> 
> May be the use of 'length' is a bit ugly. :-)

Yes, and unneeded. But perforance-wise, that is not what hurts.

The trouble is, if xs and ys have a longest common prefix of length p, you 
check whether (take k ys) is a prefix of xs for k <- [0 .. p+1], so that is 
an O(p^2) algorithm. Also, you can't start delivering until the longest 
common prefix has been determined, which may cause high memory requirements 
and bad performance.

Consider

a = replicate 1000000 'a'
b = replicate 1000001 'a'
c = "abacab"

foldl1 takeCommonPrefix [a,b,c] =
    takeCommonPrefix ab c
      where
        ab = takeCommonPrefix a b

If the common prefix is delivered incrementally, all we need is the first 
two characters to determine the result. That's fast and needs very little 
memory. If the common prefix has to be entirely determined before it can be 
delivered, you need to have two one-million characters long Strings in 
memory at the same time, which needs a couple dozen MB, and to determine 
the longest common prefix of a and b by the above method, you need about 
5*10^11 Char comparisons, so it'll need a bit of time too.

> 
> Hallo Daniel Fischer, je schreef op 29-04-11 16:32:
> > -- to get the common prefix of two lists, we use explicit recursion, I
> > don't see an elegant way to avoid that.



More information about the Beginners mailing list