[Haskell-cafe] Substring replacements

Fri Dec 23 11:43:33 EST 2005

Hello Bulat,

I'm not sure what your point is, let's try to enlighten me.

Am Mittwoch, 21. Dezember 2005 16:30 schrieben Sie:
> Hello Daniel,
>
> Wednesday, December 21, 2005, 5:20:18 PM, you wrote:
>
> DF> ordinarily, on my computer, your version of straightforward is 10-15%
> faster DF> than KMP (search-patterns are now supplied on the command line
> -- which has DF> no big impact;
>
> of course. the search pattern can be compiled into the search
> algorithm just at the time of program execution. just for example, if
> you have code
>
> main = do [x] <- getArgs
>           y <- return $ map (\n -> fac $ read x) [1..10^6]
>
> then `fac $ read x` will be computed just one time. in this time, if
> your perform many searches with the same pattern - compiler must
> initialize array just one time and use it for all searches
>
Errrh, what here? If I want to replace each occurence of a pattern within a 
String, of course I want the arrays built only once, destroying and 
rebuilding them would be deliberate stupidity, that can't be your point, 
certainly. And also if we interactively search and replace, as in an editor, 
we'd hold on to the arrays until we get the message 'no more of that, next 
pattern'.

>
> DF> searched string is " able sea..."; all compiled with -O2;
> DF> NOINLINE makes no difference -- at least with optimisations on --
>
> why? i think that is the right way to check speed of optimized program
> without pre-compiling search pattern to search/replace algorithm at
> runtime
? What has NOINLINE to do with precompiling the search pattern?
BTW, I think, the KMP-algorithm is too large to be inlined anyway (maybe, if 
we explicitly asked for it), so I wouldn't expect any NOINLINE-effect in the 
first place.

And about RunTimeCompilation, I don't quite understand the Wiki-page, what I 
imagined might be the point there is that, if I know the search pattern 
before, I might import the general algorithm, write a function
searchReplacePattern = searchReplace "pattern"
and the arrays might then be built at compile-time and included in the object 
code -- but I've no idea whether compilers are clever enough to do that (I 
believe it should be possible to write compilers which would do that, but is 
it worth the trouble -- or is that sort of optimisation easy and routinely 
done?) -- but why that would be called RunTimeCompilation, I cannot imagine.
So then, the thing that might make a difference, would be not to give the 
search pattern at compile-time, which I did. Of course, since we search a 
long String, the time needed to build the arrays is minute in relation to the 
time needed to traverse the String, but in the more realistic situation of a 
shorter String (50 kB instead of 48 MB), it would be significant.
>
> DF> ; without optimisations, KMP suffers far worse than straightforward).
>
> this test is meaningless

Well, it's not really a test for the algorithm, but for ghc's optimiser and 
I've forgotten the exact numbers, but KMP compiled without optimisation is 
much much, much slower than with -O2, so the optimiser does a really great 
job.
>
> >> KMP is O(m) while straightforward is O(m*n).
>
> DF> Where m is the length of the input and n is the length of the
> searched-for DF> pattern, I think?
> DF> But these are worst-case complexities, I believe, ordinarily,
> straightforward DF> will be O(m), too.
>
> and have longer init time (and take more space), so on typical
> searches it will be worser than trivial algorithm

I believe you've misread here. I'm saying that while the worst case complexity 
for the straighforward (or trivial) algorithm is O(n*m) -- as witnessed by my 
test, which was deliberately designed to be baaaaad for that --, usually, in 
most real situations, that will have O(m) complexity, too.
And I -- like you -- believe that for typical searches the straightforward 
algorithm will be faster (at least with a clever implementation, like yours).

Cheers,
Daniel

P.S.: I'd really appreciate another attempt at explaining RunTimeCompilation 
to me (might well be an URL of a paper or something).