[Haskell-cafe] Substring replacements
Branimir Maksimovic
bmaxa at hotmail.com
Thu Dec 15 02:24:52 EST 2005
>From: ajb at spamcop.net
>To: haskell-cafe at haskell.org
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Thu, 15 Dec 2005 00:25:19 -0500
>
>G'day all.
>
>Quoting Branimir Maksimovic <bmaxa at hotmail.com>:
>
> > After seeing that your program is fastest (I've also tried one from
> > http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> > that good in converting to search replace?)
>
>You probably did it right, but you could post your version to the
>list if you want me to take a look.
>
Oh, here it is but just don;t laugh :)
I've hacked with unsafePerformIO as I din't know
how to remove IO from match any other way.
searchReplaceKMP :: String->String->String -> String
searchReplaceKMP sr rp s
| not (null remaining) = found++rp
++ searchReplaceKMP sr rp remaining
| otherwise = found
where
(found,remaining) = unsafePerformIO $ matchKMP sr s
matchKMP :: (Monad m, Eq a) => [a] -> ([a] -> m ([a],[a]))
matchKMP []
= error "Can't match empty list"
matchKMP xs
= matchfunc []
where
matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs))
dofail = \ps xs -> case xs of
[] -> fail "can't match"
(y:ys) -> matchfunc (y:ps) ys
type PartialMatchFunc m a = [a] -> [a] -> m ([a], [a])
makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] -> [(a, Int)]
-> PartialMatchFunc m a
makeMatchFunc prev []
= \ps xs -> return (reverse (drop ((length prev)-1) ps), xs)
makeMatchFunc prev ((x,failstate):ms)
= thisf
where
mf = makeMatchFunc (thisf:prev) ms
failcont = prev !! (length prev - failstate - 1)
thisf = \ps xs -> case xs of
[] -> fail "can't match"
(y:ys) -> if (x == y) then mf (y:ps) ys
else failcont ps xs
overlap :: (Eq a) => [a] -> [Int]
overlap str
= overlap' [0] str
where
overlap' prev []
= reverse prev
overlap' prev (x:xs)
= let get_o o
| o <= 1 || str !! (o-2) == x = o
| otherwise = get_o (1 + prev !! (length prev - o + 1))
in overlap' (get_o (head prev + 1):prev) xs
--------------------------------------------------------------------------------------
These are timings (it's performance is about the same as Rabin-Karp):
$ time searchr.exe
Working:seasearch replace able seaseasearch baker seasearch charlie
searchr.exe: user error (can't match)
real 0m22.187s
user 0m0.015s
sys 0m0.015s
bmaxa at MAXA ~/tutorial
$ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe
Chasing modules from: searchr.hs
Compiling Main ( searchr.hs, searchr.o )
Linking ...
bmaxa at MAXA ~/tutorial
$ time searchr.exe
Working very long
True
Done
real 0m8.110s
user 0m0.031s
sys 0m0.016s
>When I wrote the RunTimeCompilation code, it wasn't intended to be a
>shining example of efficiency, merely an illustration. Remember
>that it's doing TWO things: compiling the pattern to code, and then
>performing the search. The compilation phase is likely to be much
>slower than the search, so the speedup (if any!) would only be realised
>the SECOND time that you searched a string using the same pattern.
>(Assuming you re-used the compiled match code, of course!)
Oh, that explaines it. Actually this has to be converted to searchReplace
in order to be fast, but I don;t know how (yet) as your program
is pretty complicated to my humble Haskell skills.
I think that your technique can be usefull with Aho-Corasick algorithm
as it first constructs finite automaton from tree, then performs search.
So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run
time compilation, but this is too advanced for me for now.
Greetings, Bane.
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
More information about the Haskell-Cafe
mailing list