[Haskell-cafe] Re: Memoizing longest-common-subsequence
jupdike at gmail.com
Tue Aug 22 15:08:47 EDT 2006
> (Sorry for the late reply; have been on holiday.)
No problem. Your email system was kind enough to say when you'd be back :-)
> I've used it to diff fairly large files (hundreds of K's, if not Megs)
> where there were few differences. It seemed to perform OK, and in cases
> where GNU diff (or whatever comes with MSYS) failed.
Thanks for posting the code. It works on pretty large data sets (for
example, a thousand Strings each) and I have a hunch that if I use
Data.ByteString it would even work fast enough on my quarter meg text
files (split on words, ~40,000 and ~50,000 words each) to use in place
of GNU sdiff or diff. Did you use FastPackedString or ByteString to
get performance you alluded to?
I'll return with the results of my experiments with ByteString and
Diff, although I imagine it should be pretty fast since darcs is able
to get acceptable speed on large datasets using (I think) ByeString
and a Haskell implementation of Myers diff.
More information about the Haskell-Cafe