[Haskell-cafe] Substring replacements (was: Differences in optimisiation ...)

Sat Dec 10 16:56:10 EST 2005

Am Samstag, 10. Dezember 2005 02:51 schrieben Sie:
> From: Daniel Fischer <daniel.is.fischer at web.de>
>
> >To: "Branimir Maksimovic" <bmaxa at hotmail.com>
> >CC: Haskell-Cafe at haskell.org
> >Subject: Re: Differences in optimisiation with interactive and compiled mo
> >Date: Fri, 9 Dec 2005 23:27:00 +0100
> >
> >Still doesn't work, though:
> >
> >*Main> searchr "hahal" "jupp" "hahahalala"
> >"hahahalala"
> >
> >The problem is that the string to replace may contain a repeated pattern
> >and the pattern that begins the actual occurence might be consumed before
> > a failure is detected.
>
> Yes, I've corrected it. Now it is just 25% faster and that is only with -O2
> flag.
> Here is whole thing, I hope there are no more bugs left :) :
>
None that sprang to my eyes. However, on my machine, yours is not faster than 
Lemmih's.
Now, using the new Strings, I get the following times:
                 -O2           -O1               no opt
Lemmih's: 38.9 sec    38.9 sec    76.7 sec
Yours     : 40.1 sec     41.5 sec  131.1 sec
Mine       : 32.9 sec     33.1 sec    82.8 sec.

However, there's a problem with Lemmih's replace:

*Main> searchr "ababcab" "###" "ababcababcabab"
"###abcab"
*Main> replace "ababcab" "###" "ababcababcabab"
"ababc###ab"

due to the fact that Lemmih's version scans the input from right to left
(that's easily changed by a few reverses, though -- but costly for long 
inputs), more serious is

Prelude Main> replace "ja" "aja" "jjjjjjja"
"ajajajajajajaja".

The fastest -- and nicely simple above -- that I could come up with is

replace :: String -> String -> String -> String
replace _ _ "" = ""
replace "" _ str = str
replace src dst inp
    = process inp
      where
        n = length src
	process "" = ""
	process st@(c:cs)
	  | src `isPrefixOf` st = dst ++ process (drop n st)
	  | otherwise           = c:process cs

It's roughly 10% faster than my other version on "seasearch" ...
and if you try it on

main2 :: IO ()
main2 = let src = replicate 1000 'r'
            dst = " # "
	    str = replicate 999 'r' ++ 'c': replicate 1000 'r'
	    out = replace src dst $ concat $ replicate 500 str
	    out1 = replace src dst $ concat $ replicate 501 str
	in do putStrLn $ "Working very long"
	      putStrLn $ show (out == out1) ++ "\nDone"

you'll see a real difference. I'm not sure, why your algorithm pays a so much 
higher penalty, though. Maybe, it'll be faster if you make searchr' &c local 
functions? I'll try.

Cheers,
Daniel