[Haskell-cafe] Substring replacements
Daniel Fischer
daniel.is.fischer at web.de
Tue Dec 13 05:23:29 EST 2005
Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
> From: Daniel Fischer <daniel.is.fischer at web.de>
>
> >To: "Branimir Maksimovic" <bmaxa at hotmail.com>
> >CC: Haskell-Cafe at haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Mon, 12 Dec 2005 16:15:46 +0100
> >
> >Earlier today:
> > > Sorry, but
> > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > > "abababaaba"
> > >
> > > I haven't analyzed the algorithm, so I don't know why exactly this
> >
> >fails.
> >
> > > I'll take a look sometime soon.
> >
> >I found the problem (one at least).
> >Say the pattern to be replaced begins with 'a' and we have a sufficiently
> >long
> >match with the pattern starting at the first 'a' in the String. Upon
> >encountering the second 'a', while the first pattern still matches, you
> >start
> >pushing onto the rollback-stack. But that isn't inspected anymore, so if
> >the
> >actual occurence of the pattern starts at the third (or fourth, n-th)
> >occurence of 'a' and that is already pushed onto the rollback, you miss
> > it.
>
> I've corrected this with adjusting rollback position. if rollBack is null
> then
> search for rollback starts at second character if not starts at same as
> searhed
> character because I skip what was searched. That's all.
> Though I'm not so sure now when I read this.
>
Still not working:
*New> searchReplace "abababc" "#" "ababababababc"
"ababababababc"
*New> searchReplace1 "abababc" "#" "ababababababc"
"ababababababc"
> >So the question is, can we find a cheap test to decide whether to use KMP
> >or
> >Bulat's version?
>
> In real world situation your KMP will always be fastest on average.
> I like that we are not using C arrays as then we have advantage
> of lazyness and save on memory usage. C++ program will be faster
> on shorter strings but on this large strings will loose due memory
> latency. and with your test, both programs are very fast.
>
> Greetings, Bane.
>
On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20%
faster than my KMP on your test -- btw, I unboxed the pat array, which gave a
bit of extra speed, but not much.
And apologies to Sebastian Sylvan, I also included an unboxed version of bord,
built from the boxed version, and that sped things further up -- not much,
again, but there it is.
I wonder about this difference, -10% on one system and +20% on another system,
ist that normal?
Cheers, Daniel
------------------------------------------------------------------------------------------
Up-To-Date version of KMP:
import Data.Array.Unboxed (UArray, listArray, (!))
import qualified Data.Array as A (array, (!), elems)
searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
where
len = {-# scc "len" #-} length src
pat :: UArray Int Char
pat = {-# scc "pat" #-} listArray (0,len-1) src
bord ={-# scc "bord" #-} A.array (0,len) $ (0,-1):(1,0):
[(i+1,getBord (pat!i) i + 1) | i <- [1 .. len-1]]
getBord s n
| m < 0 = m
| s == pat!m = m
| otherwise = getBord s m
where
m = bord A.! n
bor :: UArray Int Int
bor = listArray (0,len) $ A.elems bord
getBor s n
| m < 0 || s == pat!m = m
| otherwise = getBor s m
where
m = bor!n
process n str _ | n >= len = {-# scc "process" #-} dst ++ process 0 str
""
process _ "" mat = {-# scc "process" #-} reverse mat
process 0 (s:st) _
| s == c = {-# scc "process" #-} process 1 st [s]
| otherwise = {-# scc "process" #-} s:process 0 st ""
process n str@(s:st) mat
| s == pat!n = {-# scc "process" #-} process (n+1) st (s:mat)
| otherwise = {-# scc "process" #-}
let j = getBor s n
(ret,skip) = splitAt j mat
in if j < 0 then reverse mat ++ process 0 str ""
else reverse skip ++ process (j+1) st (s:ret)
More information about the Haskell-Cafe
mailing list