[Haskell-cafe] Substring replacements

Wed Dec 14 11:10:20 EST 2005

Hi, Bane and all,

Am Dienstag, 13. Dezember 2005 14:22 schrieben Sie:
> > > In real world situation your KMP will always be fastest on average.
> > > I like that we are not using C arrays as then we have advantage
> > > of lazyness and save on memory usage. C++ program will be faster
> > > on shorter strings but on this large strings will loose due memory
> > > latency. and with your test, both programs are very fast.

Yesterday, I did the unspeakable -- I wrote a C-version. Smashes 
Haskell-performance for short enough Strings (factor 10 for my test, factor 
2.2 for Bane's), but once it starts swapping, we catch up, and for really 
large Strings I dare say we'd win far out.

I also managed to get my KMP still faster, using take and drop instead of 
splitAt helps a lot (Bane, the use of 'break' in my transcript of yours was 
what slowed it down, I reintroduced searchr''' and both are equal).
I'm not quite sure, whether that indeed helps, but I also chose to use 
listArray for the boxed array of borders.

Now it's
searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
    where
      len = length src
      pat :: UArray Int Char
      pat = listArray (0,len-1) src
      bord = A.listArray (0,len) $ (-1):0:
                       [getBord (pat!i) i + 1 | i <- [1 .. len-1]]
      getBord s n
         | m < 0      = m
	 | s == pat!m = m
	 | otherwise  = getBord s m
	   where
	     m = bord A.! n
      bor :: UArray Int Int
      bor = listArray (0,len) $ A.elems bord
      getBor s n
         | m < 0 || s == pat!m = m
	 | otherwise = getBor s m
	   where
	     m = bor!n
      process n str _ | n >= len = dst ++ process 0 str ""
      process _ "" mat = reverse mat
      process 0 (s:st) _
         | s == c    = process 1 st [s]
	 | otherwise = s:process 0 st ""
      process n str@(s:st) mat
         | s == pat!n = process (n+1) st (s:mat)
	 | otherwise  =
	    case getBor s n of
	      -1 -> reverse mat ++ process 0 str ""
	      0  -> reverse mat ++ process 1 st [s]
	      j  -> reverse (drop j mat) ++ process (j+1) st (s:take j mat)

gives a speedup of roughly 10% on my box versus yesterday's version.
> > >
> > > Greetings, Bane.
> >
> >On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about
> >20%
> >faster than my KMP on your test -- btw, I unboxed the pat array, which
> > gave a
> >bit of extra speed, but not much.
>
> I think that's because on your machine Bulat's version have better
> perfromance
> with CPU cache.
> I don;t know but now your version is 25% faster with my test on P4
> hyperthreaded.

Errrr, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with 
computers.

>
> your new version:
> $ time srchrep.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real    0m8.734s
> user    0m0.015s
> sys     0m0.000s
>
> Bulat's version:
>
> bmaxa at MAXA ~/tutorial
> $ time replace1.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real    0m11.734s
> user    0m0.015s
> sys     0m0.015s
>
> 3 secs difference now.
>
> >And apologies to Sebastian Sylvan, I also included an unboxed version of
> >bord,
> >built from the boxed version, and that sped things further up -- not much,
> >again, but there it is.
>
> On my machine you got another 10-15% of boost with unboxed arrays.
>
> >I wonder about this difference, -10% on one system and +20% on another
> >system,
> >ist that normal?
>
> Different caching schemes on CPU's perhaps? different memory latencies?
> hyperthreading helps your version? more code and data, perhaps because
> of that it pays the price on your machine?
>
> Greetings, Bane.
>
Well, whatever. Upto now, on my box, Bulat's is still the fastest for your 
test -- though I've narrowed the gap quite a bit.

Cheers,
Daniel