[Haskell-beginners] Longest common prefix of a list of lists
Daniel Fischer
daniel.is.fischer at googlemail.com
Fri Apr 29 21:15:13 CEST 2011
On Friday 29 April 2011 20:17:27, Aai wrote:
> I don't know for sure if the following is elegant or not, but:
In a way, yes. It's a composition of library functions, so you delegate the
explicit recursion to those. There are a few points where you can increase
elegance and performance here, see inline comments.
But it's very inefficient and the suggestions don't change that.
>
> a = "aaabbbbbbcccccc"
> b = "aaabbbccccccd"
> c = "aaabbbbbbbbbccccffffdd"
>
> takeCommonPrefix xs =
> flip take xs. length . filter (`isPrefixOf`xs) . drop 1. inits
Don't use filter here, as soon as you hit an initial segment which isn't a
prefix of xs, no later one will be, so it's a job for takeWhile.
Also, you already have the common prefixes, so why count and take from xs?
Just take the longest:
takeCommonPrefix xs = last . takeWhile (`isPrefixOf` xs) . inits
>
> *Main> foldl1 takeCommomPrefix [a,b,c]
> "aaabbb"
>
> *Main> foldl1 takeCommonPrefix [a,b,c,""]
> ""
>
> May be the use of 'length' is a bit ugly. :-)
Yes, and unneeded. But perforance-wise, that is not what hurts.
The trouble is, if xs and ys have a longest common prefix of length p, you
check whether (take k ys) is a prefix of xs for k <- [0 .. p+1], so that is
an O(p^2) algorithm. Also, you can't start delivering until the longest
common prefix has been determined, which may cause high memory requirements
and bad performance.
Consider
a = replicate 1000000 'a'
b = replicate 1000001 'a'
c = "abacab"
foldl1 takeCommonPrefix [a,b,c] =
takeCommonPrefix ab c
where
ab = takeCommonPrefix a b
If the common prefix is delivered incrementally, all we need is the first
two characters to determine the result. That's fast and needs very little
memory. If the common prefix has to be entirely determined before it can be
delivered, you need to have two one-million characters long Strings in
memory at the same time, which needs a couple dozen MB, and to determine
the longest common prefix of a and b by the above method, you need about
5*10^11 Char comparisons, so it'll need a bit of time too.
>
> Hallo Daniel Fischer, je schreef op 29-04-11 16:32:
> > -- to get the common prefix of two lists, we use explicit recursion, I
> > don't see an elegant way to avoid that.
More information about the Beginners
mailing list