[Haskell-cafe] Haskell KMP(Knuth-Morris-Pratt) algorithm

larry.liuxinyu liuxinyu95 at gmail.com
Thu Mar 3 10:25:07 CET 2011


I read about some KMP implementation in Haskell including:

 [1] Richard Bird. ``Pearls of Functional algorithm design''
 [2] http://twan.home.fmf.nl/blog/haskell/Knuth-Morris-Pratt-in-Haskell.details
 [3] http://www.haskell.org/haskellwiki/Runtime_compilation
 [4] LazyString version

[1] builds a infinite lazy state transfer trees, while [3] uses index
to build overlap table.

I created a version which isn't as efficient as in [1]. Just for fun:

failure :: (Eq a)=> ([a], [a]) -> ([a], [a])
failure ([], ys) = ([], ys)
failure (xs, ys) = fallback (init xs) (last xs:ys) where
    fallback as bs | as `isSuffixOf` xs = (as, bs)
                   | otherwise = fallback (init as) (last as:bs)

kmpSearch2 :: (Eq a) => [a] -> [a] ->[Int]
kmpSearch2 ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where
    f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y],
ys), ns++[n])
                                             else ((xs++[y], ys), ns)
                                  | xs == [] = (p, ns)
                                  | otherwise = f (failure p, ns) (x,
    f (p, ns) e = f (failure p, ns) e

The function failure just follows the idea that in case (xs, ys) fails
matching some letter c in text,
where xs++ys = pattern and c!= head ys, it means we must fallback to
(xs', ys') so that
  xs' = longest { s: s is prefix of xs AND s is suffix of xs }

The bad thing is that failure can't memorize what it has compute
before, for example, as pattern = "ababc"
and we fails at ("abab", "c"), then we call function failure to get
the new one as ("ab", "abc").
After several matches, we fails again at ("abab", "c"), failure can't
just return ("ab", "abc") what it has
been compute already. It has too do the same work again.

Function f inside kmpSearch2 is in fact a state-transfer function. If
we try to use some data structure (for example tree) to memorize the
results which failure function calculated, we can finally reach to the
idea in [1].


More information about the Haskell-Cafe mailing list