diff in Haskell: clarification
Andrew J Bromage
ajb@spamcop.net
Fri, 22 Nov 2002 13:49:13 +1100
G'day all.
On Fri, Nov 22, 2002 at 05:13:07AM +1100, Fergus Henderson wrote:
> Would a Mercury version help? The Mercury distribution includes a
> Mercury version of Myer's algorithm: it's in the directory `samples/diff'.
Disclaimer: I wrote the Mercury version.
That particular algorithm heavily relies on destructively updated
arrays, which don't map neatly onto Haskell lists. In addition,
it's pretty complicated (all the caching between passes, mostly). It's
also optimised for very long sequences, which may not help you here.
Just for jollies, here's a Haskell version of Hirschberg's LCSS
algorithm. It's O(N^2) time but O(N) space at any given point in
time, assuming eager evaluation. You should be able to make diff out
of this. You should also be able to find many opportunities for
optimisation here.
@article{360861,
author = {D. S. Hirschberg},
title = {A linear space algorithm for computing maximal common subsequences},
journal = {Communications of the ACM},
volume = {18},
number = {6},
year = {1975},
issn = {0001-0782},
pages = {341--343},
doi = {http://doi.acm.org/10.1145/360825.360861},
publisher = {ACM Press},
}
Cheers,
Andrew Bromage
module Lcss ( lcss ) where
algb :: (Eq a) => [a] -> [a] -> [Int]
algb xs ys
= 0 : algb1 xs [ (y,0) | y <- ys ]
where
algb1 [] ys' = map snd ys'
algb1 (x:xs) ys'
= algb1 xs (algb2 0 0 ys')
where
algb2 _ _ [] = []
algb2 k0j1 k1j1 ((y,k0j):ys)
= let kjcurr = if x == y then k0j1+1 else max k1j1 k0j
in (y,kjcurr) : algb2 k0j kjcurr ys
algc :: (Eq a) => Int -> Int -> [a] -> [a] -> [a] -> [a]
algc m n xs [] = id
algc m n [x] ys = if x `elem` ys then (x:) else id
algc m n xs ys
= algc m2 k xs1 (take k ys) . algc (m-m2) (n-k) xs2 (drop k ys)
where
m2 = m `div` 2
xs1 = take m2 xs
xs2 = drop m2 xs
l1 = algb xs1 ys
l2 = reverse (algb (reverse xs2) (reverse ys))
k = findk 0 0 (-1) (zip l1 l2)
findk k km m [] = km
findk k km m ((x,y):xys)
| x+y >= m = findk (k+1) k (x+y) xys
| otherwise = findk (k+1) km m xys
lcss :: (Eq a) => [a] -> [a] -> [a]
lcss xs ys = algc (length xs) (length ys) xs ys []