Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Mon Oct 13 18:17:01 UTC 2014

I've done the benchmarks and the results are clear: the implementation I
gave is faster than the one you gave and the one in Data.List in all cases.
Yours is usually faster than the one in Data.List, but slower for very
short lists.

Test code:

needles =
  [ ("shortNeedle", ".exe")
   ,("longNeedle", "And who by fire? Who by water? Who in the sunshine? Who
in the nighttime?")]

haystacks =
  [ ("shortHaystack", "C:\\Program Files\\Windows\\Dunno.exe")
   ,("longHaystack", take 10000 $ map toEnum $ iterate (\x -> (x*131) .&.
0xFFFF) 7)]

impls = [("lib", L.isSuffixOf), ("Feuer", isSuffixOf), ("Abel",
isSuffixOfA)]

tests = [("simple", (\impl (needle,haystack) -> fromEnum $ needle `impl`
haystack)),
         ("use", (\impl (needle,haystack) ->
                     if needle `impl` haystack
                     then sum (map fromEnum haystack)
                     else product (map fromEnum haystack)))]

main = defaultMain
  [
  bgroup (unwords [fst test, fst needle, "in", fst haystack])
    [bench (fst impl) $ whnf (snd test $ snd impl) (snd needle, snd
haystack)
       | impl <- impls] | test <- tests, needle <- needles, haystack <-
haystacks]

Results:

benchmarking simple shortNeedle in shortHaystack/lib
time                 244.6 ns   (243.9 ns .. 245.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 245.2 ns   (244.8 ns .. 246.5 ns)
std dev              2.030 ns   (950.4 ps .. 4.378 ns)

benchmarking simple shortNeedle in shortHaystack/Feuer
time                 234.5 ns   (234.1 ns .. 235.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 234.6 ns   (234.2 ns .. 235.3 ns)
std dev              1.628 ns   (678.1 ps .. 2.758 ns)

benchmarking simple shortNeedle in shortHaystack/Abel
time                 268.1 ns   (267.8 ns .. 268.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 268.0 ns   (267.8 ns .. 268.4 ns)
std dev              947.8 ps   (538.1 ps .. 1.668 ns)

benchmarking simple shortNeedle in longHaystack/lib
time                 100.5 μs   (100.2 μs .. 100.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 100.8 μs   (100.5 μs .. 101.4 μs)
std dev              1.511 μs   (1.035 μs .. 2.095 μs)

benchmarking simple shortNeedle in longHaystack/Feuer
time                 55.66 μs   (55.61 μs .. 55.73 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 55.68 μs   (55.63 μs .. 55.79 μs)
std dev              222.0 ns   (128.7 ns .. 415.0 ns)

benchmarking simple shortNeedle in longHaystack/Abel
time                 94.60 μs   (94.39 μs .. 94.94 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 94.51 μs   (94.44 μs .. 94.68 μs)
std dev              332.8 ns   (183.6 ns .. 617.7 ns)

benchmarking simple longNeedle in shortHaystack/lib
time                 480.6 ns   (480.2 ns .. 481.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 480.6 ns   (480.1 ns .. 481.6 ns)
std dev              2.217 ns   (1.025 ns .. 4.032 ns)

benchmarking simple longNeedle in shortHaystack/Feuer
time                 187.0 ns   (186.8 ns .. 187.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 187.0 ns   (186.8 ns .. 187.4 ns)
std dev              922.0 ps   (427.3 ps .. 1.598 ns)

benchmarking simple longNeedle in shortHaystack/Abel
time                 340.9 ns   (339.4 ns .. 343.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 345.0 ns   (341.6 ns .. 351.0 ns)
std dev              14.40 ns   (9.622 ns .. 20.77 ns)
variance introduced by outliers: 60% (severely inflated)

benchmarking simple longNeedle in longHaystack/lib
time                 100.3 μs   (100.2 μs .. 100.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 100.4 μs   (100.3 μs .. 100.9 μs)
std dev              811.3 ns   (333.5 ns .. 1.575 μs)

benchmarking simple longNeedle in longHaystack/Feuer
time                 56.29 μs   (56.19 μs .. 56.41 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 56.23 μs   (56.19 μs .. 56.32 μs)
std dev              197.1 ns   (116.3 ns .. 342.1 ns)

benchmarking simple longNeedle in longHaystack/Abel
time                 94.46 μs   (94.32 μs .. 94.66 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 94.49 μs   (94.39 μs .. 94.70 μs)
std dev              459.8 ns   (277.2 ns .. 763.7 ns)

benchmarking use shortNeedle in shortHaystack/lib
time                 831.8 ns   (828.4 ns .. 836.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 831.2 ns   (829.3 ns .. 834.6 ns)
std dev              8.468 ns   (5.220 ns .. 13.26 ns)

benchmarking use shortNeedle in shortHaystack/Feuer
time                 819.4 ns   (816.7 ns .. 822.7 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 821.2 ns   (818.0 ns .. 828.2 ns)
std dev              15.89 ns   (7.988 ns .. 25.90 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking use shortNeedle in shortHaystack/Abel
time                 853.5 ns   (851.8 ns .. 855.4 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 852.6 ns   (851.5 ns .. 855.6 ns)
std dev              5.462 ns   (2.422 ns .. 10.51 ns)

benchmarking use shortNeedle in longHaystack/lib
time                 261.8 μs   (259.2 μs .. 264.7 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 260.2 μs   (259.5 μs .. 261.4 μs)
std dev              2.854 μs   (1.438 μs .. 4.475 μs)

benchmarking use shortNeedle in longHaystack/Feuer
time                 225.0 μs   (221.9 μs .. 227.1 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 221.0 μs   (220.2 μs .. 222.4 μs)
std dev              3.487 μs   (2.598 μs .. 4.385 μs)

benchmarking use shortNeedle in longHaystack/Abel
time                 244.5 μs   (232.5 μs .. 254.3 μs)
                     0.992 R²   (0.990 R² .. 0.997 R²)
mean                 232.1 μs   (229.4 μs .. 237.7 μs)
std dev              11.45 μs   (6.248 μs .. 16.89 μs)
variance introduced by outliers: 47% (moderately inflated)

benchmarking use longNeedle in shortHaystack/lib
time                 1.088 μs   (1.085 μs .. 1.091 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.087 μs   (1.086 μs .. 1.090 μs)
std dev              6.936 ns   (4.270 ns .. 9.691 ns)

benchmarking use longNeedle in shortHaystack/Feuer
time                 787.4 ns   (785.8 ns .. 789.9 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 786.7 ns   (785.9 ns .. 788.3 ns)
std dev              3.761 ns   (1.408 ns .. 6.358 ns)

benchmarking use longNeedle in shortHaystack/Abel
time                 930.7 ns   (927.7 ns .. 934.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 928.3 ns   (926.7 ns .. 931.3 ns)
std dev              7.241 ns   (3.785 ns .. 13.45 ns)

benchmarking use longNeedle in longHaystack/lib
time                 262.3 μs   (257.8 μs .. 266.1 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 258.6 μs   (257.7 μs .. 260.0 μs)
std dev              3.979 μs   (2.350 μs .. 5.888 μs)

benchmarking use longNeedle in longHaystack/Feuer
time                 224.9 μs   (222.1 μs .. 226.9 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 221.2 μs   (220.4 μs .. 222.3 μs)
std dev              3.168 μs   (2.143 μs .. 4.010 μs)

benchmarking use longNeedle in longHaystack/Abel
time                 233.1 μs   (231.4 μs .. 234.5 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 231.2 μs   (230.7 μs .. 232.0 μs)
std dev              2.099 μs   (1.419 μs .. 2.627 μs)

On Sun, Oct 12, 2014 at 12:02 PM, Andreas Abel <abela at chalmers.se> wrote:

> If you talk about "additional memory" you assume that after `xs isSuffix
> ys`, ys is no longer needed.  Is this the typical use case?
> At least not in the Agda codebase, according to my quick grep...
>
> ./Packaging/Database.hs:
>   dbEntries = filter (".conf" `isSuffixOf`) filePaths
>
> ./TypeChecking/Monad/Open.hs:
>   unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context (" ++ show
> ctx ++ " is not a sub context of " ++ show ctx' ++ ")"
>
> ./Interaction/Options.hs:
>   isLiterate file = ".lagda" `isSuffixOf` file
>
> ./Syntax/Parser.hs:
>   if "lagda" `isSuffixOf` filePath file then
>
> As I said, optimizations should be backed by benchmarks, especially when
> trading a perspicuous implementation for a more complicated one...
>
>
> On 12.10.2014 17:40, David Feuer wrote:
>
>> The haystack and the shared copy are only a needle's-length apart. The
>> first stage traverses H and N until one of them runs out, holding a copy
>> of the beginning  of each. This requires at most O(min{|H|, |N|})
>> additional memory (the original needle and the needle-length prefix of
>> the haystack). Assuming we didn't run out of haystack before we ran out
>> of needle (which assumption seems generally likely), we proceed to the
>> next step, traversing H and (drop |N| H) together. During this phase we
>> hold O(|N|) additional memory: the needle itself and the needle-length
>> portion of the longer of the two haystack remnants we hold. Note that in
>> the second phase, we *don't* hold on to the beginning of the haystack,
>> because we will never need it again! Then in the final phase, when the
>> shorter haystack remnant has run out, we still have the same O(|N|)
>> memory, which is now the needle itself and the needle-length suffix of
>> the haystack, and (==) traverses them both together. Thus the total
>> additional memory residency for this algorithm is O(min {|H|, |N|}).
>> Using the length approach requires that you hold the beginning of the
>> haystack for traversal while running length. To put it another way, you
>> force the entire spine of the haystack to run length, and then start
>> from the beginning. If the haystack is produced lazily, this potentially
>> requires O(|H|) extra memory. Since you also check the length of the
>> needle, your total additional memory comes to O(max {|H|, |N|}). I
>> apologize if my horrid variable names obscured what goes on in the
>> algorithm I described.
>>
>> On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <abela at chalmers.se
>> <mailto:abela at chalmers.se>> wrote:
>>
>>     Well, you also traverse the haystack twice, in your getEndChunk
>>     function you simultaneously traverse the haystack and a (shared)
>>     copy of it.  Why is this so much better?
>>
>>     I guess without data (timings, heap-profiles) we do not get further
>>     here.
>>
>>     On 11.10.2014 14:47, David Feuer wrote:
>>
>>         It can be, but your way traverses the entire haystack *twice*. The
>>         memory needs are equivalent to the reversing version, which I
>>         consider
>>         unacceptable.
>>
>>
>>     I do not understand this comment.  reverse needs O(n) memory, length
>>     O(1).
>>
>>     Cheers,
>>     Andreas
>>
>>         On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <abela at chalmers.se
>>         <mailto:abela at chalmers.se>
>>         <mailto:abela at chalmers.se <mailto:abela at chalmers.se>>> wrote:
>>
>>              David, the essence of your idea seems mutually drop the
>> correct
>>              number of elements from needle and hay and then compare for
>>              equality.  Here is a concise implementation of your idea in
>>         terms of
>>              drop:
>>
>>              isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>>              [] `isSuffixOf` _  =  True
>>              xs `isSuffixOf` ys =  xs == drop (length (drop (length xs)
>>         ys)) ys
>>
>>              This can be further simplified to
>>
>>              isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>>              [] `isSuffixOf` _  =  True
>>              xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys
>>
>>              which is a very easy to understand program, I think,
>>         without need to
>>              reverse lists.
>>
>>
>>
>>         _________________________________________________
>>         Libraries mailing list
>>         Libraries at haskell.org <mailto:Libraries at haskell.org>
>>         http://www.haskell.org/__mailman/listinfo/libraries
>>         <http://www.haskell.org/mailman/listinfo/libraries>
>>
>>
>>
>>     --
>>     Andreas Abel  <><      Du bist der geliebte Mensch.
>>
>>     Department of Computer Science and Engineering
>>     Chalmers and Gothenburg University, Sweden
>>
>>     andreas.abel at gu.se <mailto:andreas.abel at gu.se>
>>     http://www2.tcs.ifi.lmu.de/~__abel/ <http://www2.tcs.ifi.lmu.de/~
>> abel/>
>>
>>
>>
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://www.haskell.org/mailman/listinfo/libraries
>>
>>
>
> --
> Andreas Abel  <><      Du bist der geliebte Mensch.
>
> Department of Computer Science and Engineering
> Chalmers and Gothenburg University, Sweden
>
> andreas.abel at gu.se
> http://www2.tcs.ifi.lmu.de/~abel/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20141013/cffe5e63/attachment-0001.html>