[Haskell-cafe] [darcs #222] performance problem with massive
changes in a single file
David Roundy via RT
bugs at darcs.net
Sun Feb 20 09:31:16 EST 2005
I'm cc'ing haskell-cafe on this darcs bug, since I imagine there might
be a student (or professor) hanging around there who might be interested
in the implementation of an efficient LCS algorithm in haskell.
[markjugg - Sun Feb 20 09:15:57 2005]:
> Typically in software, 1000's of line aren't changed at the time, which
> is why this hasn't been an big issue before.
Indeed. The issue here is that darcs uses the exact Hunt-Szymanski LCS
algorithm when it computes the diff, while GNU diff uses an approximate
--but faster--algorithm, which works "almost" as well. The approximate
algorithm is reasonably well-documented, but more complex to implement
than the exact one, just because it's harder to understand.
This would be a reasonable project for an interested haskell CS student.
The LCS is implemented in pure haskell 98, and is of signature
lcs :: Ord a => [a] -> [a] -> [a]
It's an interesting problem that's easily testable, and has different
scaling behavior in different limits (finite alphabet, infinite
alphabet, permutations of unique objects).
David (ever optimistic for new contributors to darcs!)
http://darcs.net
More information about the Haskell-Cafe
mailing list