[Haskell-cafe] Re: Memoizing longest-common-subsequence

Bayley, Alistair Alistair_Bayley at invescoperpetual.co.uk
Wed Aug 23 04:06:50 EDT 2006

> From: Jared Updike [mailto:jupdike at gmail.com] 
> Thanks for posting the code. It works on pretty large data sets (for
> example, a thousand Strings each) and I have a hunch that if I use
> Data.ByteString it would even work fast enough on my quarter meg text
> files (split on words, ~40,000 and ~50,000 words each) to use in place
> of GNU sdiff or diff. Did you use FastPackedString or ByteString to
> get performance you alluded to?

You'll notice that the primary interface is:
  diff :: Eq a => [a] -> [a] -> [Modification a]

So it's not optimised for any kind of String input.

The performance I'm alluding to is not something I've tested with any rigor, or even metrics to support. I simply needed to compare a couple of large text files which GNU diff didn't handle (I think when the input is over a certain size it breaks it into chunks and diffs the chunks). I tried this code, and it did the job in a reasonable time (my PC has 1G of memory, and that helps when the input lists have to reside entirely in memory). That was using naïve String-IO too, so there might well be some mileage in a FPS-specific implementation.

I profiled the code by generating large inputs in memory, and then invoking the various interfaces, so as to isolate the testing from IO performance. And that was just to find the performance hotspots in the algorithm, not to compare against other diff implementations.

Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.

More information about the Haskell-Cafe mailing list