[Haskell-cafe] [darcs #222] performance problem with massive
changes in a single file
David Roundy via RT
bugs at darcs.net
Sun Feb 20 09:31:16 EST 2005
I'm cc'ing haskell-cafe on this darcs bug, since I imagine there might
be a student (or professor) hanging around there who might be interested
in the implementation of an efficient LCS algorithm in haskell.
[markjugg - Sun Feb 20 09:15:57 2005]:
> Typically in software, 1000's of line aren't changed at the time, which
> is why this hasn't been an big issue before.
Indeed. The issue here is that darcs uses the exact Hunt-Szymanski LCS
algorithm when it computes the diff, while GNU diff uses an approximate
--but faster--algorithm, which works "almost" as well. The approximate
algorithm is reasonably well-documented, but more complex to implement
than the exact one, just because it's harder to understand.
This would be a reasonable project for an interested haskell CS student.
The LCS is implemented in pure haskell 98, and is of signature
lcs :: Ord a => [a] -> [a] -> [a]
It's an interesting problem that's easily testable, and has different
scaling behavior in different limits (finite alphabet, infinite
alphabet, permutations of unique objects).
David (ever optimistic for new contributors to darcs!)
More information about the Haskell-Cafe