Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Mon Oct 13 23:02:40 UTC 2014

I've switched my allegiance from my own version to Milan's, because it's a
tad faster, and also less verbose. One thing to watch out for: although I
don't particularly like this, the current version in Data.List makes  []
`isSuffixOf` undefined = True.  Unless there's a general consensus that
this doesn't matter, we need to preserve it.

I don't think Milan's version is too terribly verbose. I tested something
very similar to your #1 that was proposed on IRC by Reid Barton, and the
numbers just didn't look too wonderful. I don't think Milan's version is
too terribly verbose, and I think it's clearer than your #2. As for the
depth of caring about speed, I generally disagree with you: lists are
actually very good for some purposes, and, moreover, even when they're
*not*, people use them anyway and then other people wait for that code to
run.

On Mon, Oct 13, 2014 at 6:10 PM, Kim-Ee Yeoh <ky3 at atamo.com> wrote:

>
> On Tue, Oct 14, 2014 at 1:17 AM, David Feuer <david.feuer at gmail.com>
> wrote:
>
>> I've done the benchmarks and the results are clear: the implementation I
>> gave is faster than the one you gave and the one in Data.List in all cases.
>> Yours is usually faster than the one in Data.List, but slower for very
>> short lists.
>
>
> The 2x factor you're seeing over Andreas's diminishes once we put slightly
> more effort into an apples-to-apples comparison.
>
> 1. Instead of drop (length xs) ys, let's define the equivalent
>
> dropl :: [b] -> [a] -> [a]
> dropl (_:xs) (_:ys) = dropl xs ys
> dropl [] ys = ys
> dropl xs [] = []
>
> i.e. dropl xs ys ~ drop (length xs) ys.
>
> Now with Andreas's original version:
>
> xs `isSuffixOfA` ys =  xs == dropl (dropl xs ys) ys
>
> that 2x gap narrows down to 10% for most cases.
>
> 2. There's that fast path which you optimize for, where the needle is
> patently too long to be in the haystack. To narrow the semantic gap, we can
> write
>
> dropm :: [b] -> [a] -> Maybe [a]
> dropm (_:xs) (_:ys) = dropm xs ys
> dropm [] ys = Just ys
> dropm _ []  = Nothing
>
> xs `isSuffixOfA` ys    =  maybe False id $ do
>    zs <- dropm xs ys
>    ws <- dropm zs ys    -- ws = needle-sized slice of the end of haystack
>    return $ xs == ws
>
> Now, the long-needle-short-haystack case also becomes merely 10% off.
>
> I'm -1 on both your proposal and the no.2 code because it's too much
> verbosity for uncertain gain. The benchmarks look good, but is the function
> even the bottleneck? For folks that care deeply about speed, lists is
> almost always the wrong datatype anyway.
>
> I'm weakly +1 on no.1 because it beats the current double reverse
> definition!
>
> -- Kim-Ee
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20141013/c10750f8/attachment.html>