[Haskell-cafe] Substring replacements

Daniel Fischer daniel.is.fischer at web.de
Thu Dec 22 05:24:24 EST 2005


Am Mittwoch, 21. Dezember 2005 15:20 schrieb Daniel Fischer:
> > >i'm 90% sure that straightforward method must be faster for one-time
> > >searches.
>
> I'm far less sure of that. If you have a really short search-pattern, I
> think that probably straightforward search is indeed faster (at least, if
> the search-pattern isn't supplied at compile-time). But if you have a
> pattern of length 100, say, and lots of relatively long partial matches,
> chances are that KMP will move on something like 50 chars at once, which
> would have to be re-checked by straightforward. I've no idea, when that
> would outweigh the extra time of building the KMP-arrays, though. In my
> test -- extremely unrealistic, but provides a more or less worst case
> example --
> straightforward must make roughly half a million character comparisons to
> pass each 'c', while KMP does 2000 (+/-2) (one way through the array and
> back), so there are situations where KMP definitely is faster. But
> ordinarily, on my computer, your version of straightforward is 10-15%
> faster than KMP (search-patterns are now supplied on the command line --
> which has no big impact; searched string is " able sea..."; all compiled
> with -O2; NOINLINE makes no difference -- at least with optimisations on
> --; without optimisations, KMP suffers far worse than straightforward).
>

I've now tested with
main = do args <- getArgs
          n <- case args of
	         (ar:_) -> readIO ar `catch` (\_ -> return 400)
		 _      -> return 500
	  let src = replicate n 'r'
	      dst = " # "
	      str = replicate (n-1) 'r' ++ 'c': replicate n 'r'
	      k   = if n < 200 then ceiling (5e6 / fromIntegral n)
	               else ceiling (1e7 / fromIntegral n)
	      out = searchReplace src dst $ concat $ replicate k str
	  putStr "Working with "
	  print n
	  print $ length out
	  putStrLn "Done"

and KMP wins for n == 1 and n >= 12, also for " able seasea...", KMP wins for 
search-patterns of length 1 and patterns with no partial matches (and some 
others), but generally -- on my thing -- Bulat's version wins.

Cheers,
Daniel



More information about the Haskell-Cafe mailing list