Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Kim-Ee Yeoh ky3 at atamo.com
Mon Oct 13 22:10:18 UTC 2014


On Tue, Oct 14, 2014 at 1:17 AM, David Feuer <david.feuer at gmail.com> wrote:

> I've done the benchmarks and the results are clear: the implementation I
> gave is faster than the one you gave and the one in Data.List in all cases.
> Yours is usually faster than the one in Data.List, but slower for very
> short lists.


The 2x factor you're seeing over Andreas's diminishes once we put slightly
more effort into an apples-to-apples comparison.

1. Instead of drop (length xs) ys, let's define the equivalent

dropl :: [b] -> [a] -> [a]
dropl (_:xs) (_:ys) = dropl xs ys
dropl [] ys = ys
dropl xs [] = []

i.e. dropl xs ys ~ drop (length xs) ys.

Now with Andreas's original version:

xs `isSuffixOfA` ys =  xs == dropl (dropl xs ys) ys

that 2x gap narrows down to 10% for most cases.

2. There's that fast path which you optimize for, where the needle is
patently too long to be in the haystack. To narrow the semantic gap, we can
write

dropm :: [b] -> [a] -> Maybe [a]
dropm (_:xs) (_:ys) = dropm xs ys
dropm [] ys = Just ys
dropm _ []  = Nothing

xs `isSuffixOfA` ys    =  maybe False id $ do
   zs <- dropm xs ys
   ws <- dropm zs ys    -- ws = needle-sized slice of the end of haystack
   return $ xs == ws

Now, the long-needle-short-haystack case also becomes merely 10% off.

I'm -1 on both your proposal and the no.2 code because it's too much
verbosity for uncertain gain. The benchmarks look good, but is the function
even the bottleneck? For folks that care deeply about speed, lists is
almost always the wrong datatype anyway.

I'm weakly +1 on no.1 because it beats the current double reverse
definition!

-- Kim-Ee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20141014/2e53e7e5/attachment-0001.html>


More information about the Libraries mailing list