Request for code review - Knuth Morris Pratt for Data.Sequence

Justin Bailey jgbailey at gmail.com
Tue Sep 4 12:57:11 EDT 2007


Using the code developed for ByteStrings by myself, Christ Kuklewicz
and Daniel Fischer, I've implemented Knuth-Morris-Pratt substring
searching on Data.Sequence "Seq" values. Attached you'll find the
library in kmp.zip.safe. The algorithm is implemented in the module
Data.Sequence.KMP.

At the root, SpeedTest.hs can be compiled on Windows with the
"prof_compile.bat" file included (you'll need to install the
"regex-dfa" and "regex-base" packages from
Hackage to build). SpeedTest searches for a known value in the 7 MB
file "endo.dna" (which can be downloaded from
http://www.icfpcontest.org/endo.zip) using several different
algorithms and methods: strict and lazy bytestrings, regular
expressions, and the KMP algorithm for "Seq" values.

SpeedTest is pretty fast but I worry about its space usage. It may
just be the nature of Seq values, but I cannot get it to run in
constant space when I think it should be. I'm especially interested in
help here.

All comments and feedback are welcome. Since the zip file includes a
darcs context file, feel free to send patches.

Justin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kmp.zip.safe
Type: application/octet-stream
Size: 9785 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/libraries/attachments/20070904/d680edf8/kmp.zip.obj


More information about the Libraries mailing list