[Haskell-cafe] Substring replacements
Daniel Fischer
daniel.is.fischer at web.de
Thu Dec 22 05:24:24 EST 2005
Am Mittwoch, 21. Dezember 2005 15:20 schrieb Daniel Fischer:
> > >i'm 90% sure that straightforward method must be faster for one-time
> > >searches.
>
> I'm far less sure of that. If you have a really short search-pattern, I
> think that probably straightforward search is indeed faster (at least, if
> the search-pattern isn't supplied at compile-time). But if you have a
> pattern of length 100, say, and lots of relatively long partial matches,
> chances are that KMP will move on something like 50 chars at once, which
> would have to be re-checked by straightforward. I've no idea, when that
> would outweigh the extra time of building the KMP-arrays, though. In my
> test -- extremely unrealistic, but provides a more or less worst case
> example --
> straightforward must make roughly half a million character comparisons to
> pass each 'c', while KMP does 2000 (+/-2) (one way through the array and
> back), so there are situations where KMP definitely is faster. But
> ordinarily, on my computer, your version of straightforward is 10-15%
> faster than KMP (search-patterns are now supplied on the command line --
> which has no big impact; searched string is " able sea..."; all compiled
> with -O2; NOINLINE makes no difference -- at least with optimisations on
> --; without optimisations, KMP suffers far worse than straightforward).
>
I've now tested with
main = do args <- getArgs
n <- case args of
(ar:_) -> readIO ar `catch` (\_ -> return 400)
_ -> return 500
let src = replicate n 'r'
dst = " # "
str = replicate (n-1) 'r' ++ 'c': replicate n 'r'
k = if n < 200 then ceiling (5e6 / fromIntegral n)
else ceiling (1e7 / fromIntegral n)
out = searchReplace src dst $ concat $ replicate k str
putStr "Working with "
print n
print $ length out
putStrLn "Done"
and KMP wins for n == 1 and n >= 12, also for " able seasea...", KMP wins for
search-patterns of length 1 and patterns with no partial matches (and some
others), but generally -- on my thing -- Bulat's version wins.
Cheers,
Daniel
More information about the Haskell-Cafe
mailing list