[Haskell-cafe] Substring replacements

Branimir Maksimovic bmaxa at hotmail.com
Wed Dec 14 04:16:36 EST 2005




>From: Daniel Fischer <daniel.is.fischer at web.de>
>To: "Branimir Maksimovic" <bmaxa at hotmail.com>
>CC: Haskell-Cafe at haskell.org
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Tue, 13 Dec 2005 11:23:29 +0100
>

After seeing that your program is fastest (I've also tried one from
http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
that good in converting to search replace?) I've decided to
try with Rabin-Karp algorithm.
This algorithm performs same operation as straightforward search,
but compares hashes instead of chars.
With ability to rotate hash (remove first, add next) characters
there is also optimisation, that hash is calculated only for single
next character rather again for whole substring.
Unfortunatelly on my machine it is very cheap to compare
characters so with my test hashing overweights character compare,
except in your test when hash searching is faster then straightforward
search.

This is best I can write in terms of performance and readability.
I've tried with getFst that returns Maybe but it was slower so I decided
to return '\0' in case that argument is empty list, which renders '\0'
unusable, but then I really doubt that 0 will be used in strings.

-- Rabin-Karp string search algorithm, it is very effective in searching of 
set
-- of patterns of length n on same string
-- this program is for single pattern search, but can be crafted
-- for multiple patterns of length m

hSearchReplace :: String -> String -> String -> String
hSearchReplace sr rp xs
    | not (null remaining) = found ++ rp
                             ++ hSearchReplace sr rp (drop (length sr) 
remaining)
    | otherwise = found
    where
    (found,remaining) = hSearch sr xs

hSearch :: String -> String -> (String,String)
hSearch sr xs = hSearch' sr xs hcmp ""
    where
        hsrch = hash sr
        hcmp = hash $ take ls xs
        cmp = take ls xs
        ls = length sr

        hSearch' [] xs _ _= (xs,[])
        hSearch' sr [] _ fndFail = (reverse fndFail,[])
        hSearch' srch xxs@(x:xs) hcmps fndFail
            = if hsrch == hcmps
                 then if isPrefixOf srch xxs
                         then (reverse fndFail,xxs)
                         else searchAgain
                 else searchAgain
            where
            searchAgain
             = hSearch' srch xs
                        (hashRotate (getFst xxs) (getFst nextxxs) (ls-1) 
hcmps)
                        (x:fndFail)
            nextxxs = drop ls xxs

getFst :: String -> Char
getFst [] = '\0'
getFst (a:as) = a

hash :: String -> Int
hash str
    =  hash' str (length str - 1)
    where
    hash' :: String -> Int -> Int
    hash' [] _ = 0
    hash' (s:str) pow  = (101 ^ pow) *(fromEnum s)
                         + hash' str (pow-1)

hashRotate :: Char -> Char -> Int -> Int -> Int
hashRotate cout cin pow hsh
    = (hsh - ((101 ^ pow) * (fromEnum cout)))*101
      + (fromEnum cin)



Greetings, Bane.

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



More information about the Haskell-Cafe mailing list