[Haskell-cafe] Re: Knuth Morris Pratt for Lazy Bytestrings
implementation
ChrisK
haskell at list.mightyreason.com
Wed Aug 1 15:23:58 EDT 2007
I am still working on improving your code.
And I have a "Is this a bug?" question:
The "lookup = computeLookup pat" defines lookup to take an Int which represents
the index into pat, where this index is 0 based and the 0th item is the head of pat.
Now look at "matchLength :: Int; matchLength = let ... in matchLength' (B.drop
(fromIntegral patShift) pat) str 0"
That starts the "pat" from a shorter version that starts at 'patShift' and sets
the last argument to matchLength' to 0.
matchLength' is in the ... as:
let matchLength' pat str cnt | B.null pat = cnt
matchLength' pat str cnt | B.null str = 0
matchLength' pat str cnt | (B.head pat == B.head str) =
matchLength' (B.drop 1 pat) (B.drop 1 str) (cnt + 1)
| otherwise = cnt
I see that cnt is incremented until
* all of pat is matched, return cnt
* all of str is used, return 0
* a mismatch is found, return cnt
So a mismatch means that the character at index (patShift+cnt) in the original
'pat' had the mismatch.
Now I see this that uses the 'cnt' returned from matchLength:
shiftAmt = {-# SCC "kmpMatch_shiftAmt" #-} lookup matchLength
And this is the bug. lookup should be given (shiftAmt + matchLength).
This was not clear or obvious since
* Nearly everything is Int or Int64
* The name "pat" in matchLength' shadows "pat" in kmpMatch
This was not caught by quickcheck since it is an internal value and only rarely
will trigger an error where a legitimate match is missed. If there is no match
then this bug does not cause an incorrect answer (I think).
I will fix this and continue hacking on it.
--
Chris
More information about the Haskell-Cafe
mailing list