[Haskell-cafe] Substring replacements

Branimir Maksimovic bmaxa at hotmail.com
Tue Dec 13 08:22:33 EST 2005




>From: Daniel Fischer <daniel.is.fischer at web.de>
>To: "Branimir Maksimovic" <bmaxa at hotmail.com>
>CC: Haskell-Cafe at haskell.org
>Subject: Re: [Haskell-cafe] Substring replacements
>Date: Tue, 13 Dec 2005 11:23:29 +0100
>
>Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
> > From: Daniel Fischer <daniel.is.fischer at web.de>
> >
> > >To: "Branimir Maksimovic" <bmaxa at hotmail.com>
> > >CC: Haskell-Cafe at haskell.org
> > >Subject: Re: [Haskell-cafe] Substring replacements
> > >Date: Mon, 12 Dec 2005 16:15:46 +0100
> > >
> > >Earlier today:
> > > > Sorry, but
> > > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > > > "abababaaba"
> > > >
> > > > I haven't analyzed the algorithm, so I don't know why exactly this
> > >
> > >fails.
> > >
> > > > I'll take a look sometime soon.
> > >
> > >I found the problem (one at least).
> > >Say the pattern to be replaced begins with 'a' and we have a 
>sufficiently
> > >long
> > >match with the pattern starting at the first 'a' in the String. Upon
> > >encountering the second 'a', while the first pattern still matches, you
> > >start
> > >pushing onto the rollback-stack. But that isn't inspected anymore, so 
>if
> > >the
> > >actual occurence of the pattern starts at the third (or fourth, n-th)
> > >occurence of 'a' and that is already pushed onto the rollback, you miss
> > > it.
> >
> > I've corrected this with adjusting rollback position. if rollBack is 
>null
> > then
> > search for rollback starts at second character if not starts at same as
> > searhed
> > character because I skip what was searched. That's all.
> > Though I'm not so sure now when I read this.
> >
>Still not working:
>
>*New> searchReplace "abababc" "#" "ababababababc"
>"ababababababc"
>*New> searchReplace1 "abababc" "#" "ababababababc"
>"ababababababc"
>

Yes, perhaps you've missed another post of mine. I've noticed
that problem when pattern repeats more then 2 times and gave up
because now whatever I do, your version is always fastest.

>
> > >So the question is, can we find a cheap test to decide whether to use 
>KMP
> > >or
> > >Bulat's version?

Just interleave string with  search hits with one with no seacrh (that means 
partial too)
hits, and your version will gain in speed.
More partial matches and full search matches Bulat's version will gain in
speed.
Longer search strings, your version will have gains.

> >
> > In real world situation your KMP will always be fastest on average.
> > I like that we are not using C arrays as then we have advantage
> > of lazyness and save on memory usage. C++ program will be faster
> > on shorter strings but on this large strings will loose due memory
> > latency. and with your test, both programs are very fast.
> >
> > Greetings, Bane.
> >
>
>On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 
>20%
>faster than my KMP on your test -- btw, I unboxed the pat array, which gave 
>a
>bit of extra speed, but not much.

I think that's because on your machine Bulat's version have better 
perfromance
with CPU cache.
I don;t know but now your version is 25% faster with my test on P4
hyperthreaded.

your new version:
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m8.734s
user    0m0.015s
sys     0m0.000s

Bulat's version:

bmaxa at MAXA ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real    0m11.734s
user    0m0.015s
sys     0m0.015s

3 secs difference now.

>And apologies to Sebastian Sylvan, I also included an unboxed version of 
>bord,
>built from the boxed version, and that sped things further up -- not much,
>again, but there it is.

On my machine you got another 10-15% of boost with unboxed arrays.

>I wonder about this difference, -10% on one system and +20% on another 
>system,
>ist that normal?

Different caching schemes on CPU's perhaps? different memory latencies?
hyperthreading helps your version? more code and data, perhaps because
of that it pays the price on your machine?

Greetings, Bane.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/



More information about the Haskell-Cafe mailing list