[Haskell-cafe] Substring replacements

Daniel Fischer daniel.is.fischer at web.de
Fri Dec 16 04:22:33 EST 2005


Am Freitag, 16. Dezember 2005 03:36 schrieben Sie:
> From: Daniel Fischer <daniel.is.fischer at web.de>
> >Any improvements are welcome, certainly some of you can do much better.
>
> It is fast on my machine except that you are using Map to lookup
> for badChar which is O(log n).
> I;ve placed this instead:
>       badChar :: UArray Int Int
>       badChar  = array (0,255) ([(i,-1) | i <- [0..255]] ++ proc src 0)
>       proc [] _ = []
>       proc (s:st) i = (ord s,i):proc  st (i+1)
>       getBc c = badChar ! ord c
>
> which gaved it significant boost, O(1) lookup.

Yes, but Char has 1114112 values, and I'm not sure whether such a large array 
would still be better, especially since, presumably, the Map will usually not 
be deeper than five layers, say. But if we restrict ourselves to extended 
ASCII Strings, an array certainly is better.
And maybe, instead of using two arrays, bmGs0 and bmGs, a mutable array (those 
are DiffArrays, I think -- I'll check that out) would also improve it.

> Now it's faster then brute force method but 10% slower then KMP
> with my test.
> I've also performed tests on dual Xeon linux box and results are
> proportionally
> the same as on my intel windows box.
> KMP wins again 10% better then BM and 20-30% better then straightforward
> search,
> which means that KMP is well suited for non indexed strings.
>
> >Cheers,
> >Daniel
> >
> >P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is
> >somewhat
> >fussy.
>
> Yes, BM is for indexed structures.
>
> Greetings, Bane.
>
Cheers,
Daniel


More information about the Haskell-Cafe mailing list