[Haskell-cafe] Haskell KMP(Knuth-Morris-Pratt) algorithm

larry.liuxinyu liuxinyu95 at gmail.com
Thu Mar 3 10:58:30 CET 2011


Hi,

Here is Richard Bird's version for reference. I changed it a bit.

data State a = E | S a (State a) (State a)

matched (S (_, []) _ _) = True
matched _ = False

kmpSearch4 :: (Eq a) => [a] -> [a] -> [Int]
kmpSearch4 ws txt = snd $ foldl tr (root, []) (zip txt [1..]) where
    root = build E ([], ws)
    build fails (xs, []) = S (xs, []) fails E
    build fails s@(xs, (y:ys)) = S s fails succs where
        succs = build' (fst (tr (fails, []) (y, 0))) (xs++[y], ys)
    tr (E, ns) _ = (root, ns)
    tr ((S (xs, ys) fails succs), ns) (x, n)
        | [x] `isPrefixOf` ys = if matched succs then (succs, ns++[n])
else (succs, ns)
        | otherwise = tr (fails, ns) (x, n)

In the program, tr is the transfer function applied to the state tree.
And build function is used to build the automaton.

Best regards.
--
LIU

On Mar 3, 5:25 pm, "larry.liuxinyu" <liuxiny... at gmail.com> wrote:
> Hi,
>
> I read about some KMP implementation in Haskell including:
>
>  [1] Richard Bird. ``Pearls of Functional algorithm design''
>  [2]http://twan.home.fmf.nl/blog/haskell/Knuth-Morris-Pratt-in-Haskell.de...
>  [3]http://www.haskell.org/haskellwiki/Runtime_compilation
>  [4] LazyString version
>
> [1] builds a infinite lazy state transfer trees, while [3] uses index
> to build overlap table.
>
> I created a version which isn't as efficient as in [1]. Just for fun:
>
> failure :: (Eq a)=> ([a], [a]) -> ([a], [a])
> failure ([], ys) = ([], ys)
> failure (xs, ys) = fallback (init xs) (last xs:ys) where
>     fallback as bs | as `isSuffixOf` xs = (as, bs)
>                    | otherwise = fallback (init as) (last as:bs)
>
> kmpSearch2 :: (Eq a) => [a] -> [a] ->[Int]
> kmpSearch2 ws txt = snd $ foldl f (([], ws), []) (zip txt [1..]) where
>     f (p@(xs, (y:ys)), ns) (x, n) | x == y = if ys==[] then ((xs++[y],
> ys), ns++[n])
>                                              else ((xs++[y], ys), ns)
>                                   | xs == [] = (p, ns)
>                                   | otherwise = f (failure p, ns) (x,
> n)
>     f (p, ns) e = f (failure p, ns) e
>
> The function failure just follows the idea that in case (xs, ys) fails
> matching some letter c in text,
> where xs++ys = pattern and c!= head ys, it means we must fallback to
> (xs', ys') so that
>   xs' = longest { s: s is prefix of xs AND s is suffix of xs }
>
> The bad thing is that failure can't memorize what it has compute
> before, for example, as pattern = "ababc"
> and we fails at ("abab", "c"), then we call function failure to get
> the new one as ("ab", "abc").
> After several matches, we fails again at ("abab", "c"), failure can't
> just return ("ab", "abc") what it has
> been compute already. It has too do the same work again.
>
> Function f inside kmpSearch2 is in fact a state-transfer function. If
> we try to use some data structure (for example tree) to memorize the
> results which failure function calculated, we can finally reach to the
> idea in [1].
>
> --
> LIUhttp://sites.google.com/site/algoxy/
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-C... at haskell.orghttp://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list