[Haskell-cafe] Substring replacements

Branimir Maksimovic bmaxa at hotmail.com
Thu Dec 15 02:24:52 EST 2005

>From: ajb at spamcop.net
>To: haskell-cafe at haskell.org
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Thu, 15 Dec 2005 00:25:19 -0500
>G'day all.
>Quoting Branimir Maksimovic <bmaxa at hotmail.com>:
> > After seeing that your program is fastest (I've also tried one from
> > http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> > that good in converting to search replace?)
>You probably did it right, but you could post your version to the
>list if you want me to take a look.
Oh, here it is but just don;t laugh :)
I've hacked with unsafePerformIO as I din't know
how to remove IO from match any other way.

searchReplaceKMP :: String->String->String -> String
searchReplaceKMP sr rp s
    | not (null remaining) = found++rp
                             ++ searchReplaceKMP sr rp remaining
    | otherwise = found
        (found,remaining) = unsafePerformIO $ matchKMP sr s

matchKMP :: (Monad m, Eq a) => [a] -> ([a] -> m ([a],[a]))
matchKMP []
    = error "Can't match empty list"
matchKMP xs
    = matchfunc []
        matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs))
        dofail = \ps xs -> case xs of
                                [] -> fail "can't match"
                                (y:ys) -> matchfunc (y:ps) ys

type PartialMatchFunc m a = [a] -> [a] -> m ([a], [a])

makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] -> [(a, Int)]
                -> PartialMatchFunc m a
makeMatchFunc prev []
    = \ps xs -> return (reverse (drop ((length prev)-1) ps), xs)
makeMatchFunc prev ((x,failstate):ms)
    = thisf
        mf = makeMatchFunc (thisf:prev) ms
        failcont = prev !! (length prev - failstate - 1)
        thisf = \ps xs -> case xs of
                                [] -> fail "can't match"
                                (y:ys) -> if (x == y) then mf (y:ps) ys
                                                else failcont ps xs

overlap :: (Eq a) => [a] -> [Int]
overlap str
    = overlap' [0] str
        overlap' prev []
          = reverse prev
        overlap' prev (x:xs)
          = let get_o o
                 | o <= 1 || str !! (o-2) == x = o
                 | otherwise = get_o (1 + prev !! (length prev - o + 1))
                in overlap' (get_o (head prev + 1):prev) xs

These are timings (it's performance is about the same as Rabin-Karp):

$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
searchr.exe: user error (can't match)

real    0m22.187s
user    0m0.015s
sys     0m0.015s

bmaxa at MAXA ~/tutorial
$ ghc -fglasgow-exts  -O2 searchr.hs --make  -o searchr.exe
Chasing modules from: searchr.hs
Compiling Main             ( searchr.hs, searchr.o )
Linking ...

bmaxa at MAXA ~/tutorial
$ time searchr.exe
Working very long

real    0m8.110s
user    0m0.031s
sys     0m0.016s

>When I wrote the RunTimeCompilation code, it wasn't intended to be a
>shining example of efficiency, merely an illustration.  Remember
>that it's doing TWO things: compiling the pattern to code, and then
>performing the search.  The compilation phase is likely to be much
>slower than the search, so the speedup (if any!) would only be realised
>the SECOND time that you searched a string using the same pattern.
>(Assuming you re-used the compiled match code, of course!)

Oh, that explaines it. Actually this has to be converted to searchReplace
in order to be fast, but I don;t know how (yet) as your program
is pretty complicated to my humble Haskell skills.
I think that your technique can be usefull with Aho-Corasick algorithm
as it first constructs finite automaton from tree, then performs search.
So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run
time compilation, but this is too advanced for me for now.

Greetings, Bane.

Express yourself instantly with MSN Messenger! Download today it's FREE! 

More information about the Haskell-Cafe mailing list