[Haskell-cafe] splitting strings

Matthew Brecknell haskell at brecknell.org
Thu Mar 1 23:45:55 EST 2007


"h." said:
> splitS :: String -> String -> [String]
> splitS a b = splitA a b
>     where
>     z = length b - 1
>     splitA []     _      = [""]
>     splitA (c:cs) (d:ds) | c == d && fst s == ds = "" : splitA (snd s) b
>                          | otherwise = (c : head r) : tail r
>         where
>         r = splitA cs b
>         s = splitAt z cs
> 
> How could it be optimized on speed?

[Brief pause while I do pennance for considering optimisation without
profiling]...

At first glance, your implementation is probably not too bad. Consider
the worst case, for some n and m:

splitS (replicate n '.') (replicate m '.' ++ "!")

I think your splitS is O(nm) in the worst case, and it's hard to imagine
improving on that [1]. In more typical cases, where the delimiter
doesn't have a repeated prefix, you'll get closer to O(n). Further, your
splitS is about as lazy as it could be, so you won't do any more work
than necessary if the output is only partially consumed.

So unless I've missed something, you've just got the constant factor to
work with. You could perhaps fuse a loop or two, to reduce the number of
times sub-lists are traversed and relinked, although your compiler may
already be doing this for you. For example, you could combine the use of
splitAt with the string comparison (==) to arrive at something like
this:

> splitS [] _ = [[]]
> splitS s [] = map return s
> splitS a@(x:xs) b = maybe d c t where
>   d = (x : head r) : tail r
>   c s = [] : splitS s b 
>   t = takePrefix a b
>   r = splitS xs b
> 
> takePrefix (x:xs) (y:ys) | x == y = takePrefix xs ys
> takePrefix s [] = Just s
> takePrefix _ _ = Nothing

Often, hand-coded loop fusion reduces clarity of expression, but in this
case I think it is an improvement, because takePrefix is a potentially
useful abstraction in its own right. There may be more you can do, but
that's a start. The other question you might like to ask yourself is
whether it can be written more clearly, for example without using
explicit recursion.

[1] - unless you write code to analyse the delimiter for repeated
prefixes before proceeding to split the string. Doing that, you could
probably achieve O(n + f(m)), for some f(m).



More information about the Haskell-Cafe mailing list