[Haskell-cafe] Memoizing longest-common-subsequence

Tue Aug 1 08:04:21 EDT 2006

I wanted a longest common subsequence function and a bit of Googling
failed to turn up a functional one, except for in a scary bit of darcs.
So, I tried making a memoized functional version of the LCS delta
algorithm on the problem's Wikipedia page. It's not the fastest, but
it's simple and should be polynomial, growing with the product of input
sequence lengths. I've not played with unboxing or strictness here yet,
but with the help of Data.Array I have:

longestCommonSubsequence xs ys =
    let lastIndex = (length xs, length ys)
        memo = array ((0, 0), lastIndex)
                     [ ((xi, yi), deltaLCS (xi, x) (yi, y)) |
                       (xi, x) <- zip [0..] (undefined : xs),
                       (yi, y) <- zip [0..] (undefined : ys) ] 
        deltaLCS (0, _) _ = (0, [])
        deltaLCS _ (0, _) = (0, [])
        deltaLCS (xl, x) (yl, y) =
            if (x == y)
            then let (xylShrunk, xysShrunk) = memo ! (xl-1, yl-1)
                 in (xylShrunk + 1, x : xysShrunk)
            else let xaShrunk@(xlShrunk, _) = memo ! (xl-1, yl)
                     yaShrunk@(ylShrunk, _) = memo ! (xl, yl-1)
                 in if xlShrunk > ylShrunk then xaShrunk else yaShrunk
        in reverse (snd (memo ! lastIndex))

I haven't looked much at optimizing this further so there are probably
yet some easy wins to be had - I'd be interested to see what they are.
Still, I thought it worth sharing as an example of laziness making 
memoization easy in a simple functional longest common subsequence 
example.

Take this as your cue to point out the much better LCS algorithm that
already exists in the standard libraries, that I couldn't find. (-:

-- Mark