Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Mon Oct 13 20:57:28 UTC 2014

Nice, good work! --Andreas

On 13.10.2014 20:17, David Feuer wrote:
> I've done the benchmarks and the results are clear: the implementation I
> gave is faster than the one you gave and the one in Data.List in all
> cases. Yours is usually faster than the one in Data.List, but slower for
> very short lists.
>
> Test code:
>
> needles =
>    [ ("shortNeedle", ".exe")
>     ,("longNeedle", "And who by fire? Who by water? Who in the sunshine?
> Who in the nighttime?")]
>
> haystacks =
>    [ ("shortHaystack", "C:\\Program Files\\Windows\\Dunno.exe")
>     ,("longHaystack", take 10000 $ map toEnum $ iterate (\x -> (x*131)
> .&. 0xFFFF) 7)]
>
> impls = [("lib", L.isSuffixOf), ("Feuer", isSuffixOf), ("Abel",
> isSuffixOfA)]
>
> tests = [("simple", (\impl (needle,haystack) -> fromEnum $ needle `impl`
> haystack)),
>           ("use", (\impl (needle,haystack) ->
>                       if needle `impl` haystack
>                       then sum (map fromEnum haystack)
>                       else product (map fromEnum haystack)))]
>
> main = defaultMain
>    [
>    bgroup (unwords [fst test, fst needle, "in", fst haystack])
>      [bench (fst impl) $ whnf (snd test $ snd impl) (snd needle, snd
> haystack)
>         | impl <- impls] | test <- tests, needle <- needles, haystack <-
> haystacks]
>
>
> Results:
>
> benchmarking simple shortNeedle in shortHaystack/lib
> time                 244.6 ns   (243.9 ns .. 245.6 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 245.2 ns   (244.8 ns .. 246.5 ns)
> std dev              2.030 ns   (950.4 ps .. 4.378 ns)
>
> benchmarking simple shortNeedle in shortHaystack/Feuer
> time                 234.5 ns   (234.1 ns .. 235.1 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 234.6 ns   (234.2 ns .. 235.3 ns)
> std dev              1.628 ns   (678.1 ps .. 2.758 ns)
>
> benchmarking simple shortNeedle in shortHaystack/Abel
> time                 268.1 ns   (267.8 ns .. 268.4 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 268.0 ns   (267.8 ns .. 268.4 ns)
> std dev              947.8 ps   (538.1 ps .. 1.668 ns)
>
>
> benchmarking simple shortNeedle in longHaystack/lib
> time                 100.5 μs   (100.2 μs .. 100.9 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 100.8 μs   (100.5 μs .. 101.4 μs)
> std dev              1.511 μs   (1.035 μs .. 2.095 μs)
>
> benchmarking simple shortNeedle in longHaystack/Feuer
> time                 55.66 μs   (55.61 μs .. 55.73 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 55.68 μs   (55.63 μs .. 55.79 μs)
> std dev              222.0 ns   (128.7 ns .. 415.0 ns)
>
> benchmarking simple shortNeedle in longHaystack/Abel
> time                 94.60 μs   (94.39 μs .. 94.94 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 94.51 μs   (94.44 μs .. 94.68 μs)
> std dev              332.8 ns   (183.6 ns .. 617.7 ns)
>
>
> benchmarking simple longNeedle in shortHaystack/lib
> time                 480.6 ns   (480.2 ns .. 481.1 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 480.6 ns   (480.1 ns .. 481.6 ns)
> std dev              2.217 ns   (1.025 ns .. 4.032 ns)
>
> benchmarking simple longNeedle in shortHaystack/Feuer
> time                 187.0 ns   (186.8 ns .. 187.2 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 187.0 ns   (186.8 ns .. 187.4 ns)
> std dev              922.0 ps   (427.3 ps .. 1.598 ns)
>
> benchmarking simple longNeedle in shortHaystack/Abel
> time                 340.9 ns   (339.4 ns .. 343.3 ns)
>                       1.000 R²   (0.999 R² .. 1.000 R²)
> mean                 345.0 ns   (341.6 ns .. 351.0 ns)
> std dev              14.40 ns   (9.622 ns .. 20.77 ns)
> variance introduced by outliers: 60% (severely inflated)
>
>
> benchmarking simple longNeedle in longHaystack/lib
> time                 100.3 μs   (100.2 μs .. 100.5 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 100.4 μs   (100.3 μs .. 100.9 μs)
> std dev              811.3 ns   (333.5 ns .. 1.575 μs)
>
> benchmarking simple longNeedle in longHaystack/Feuer
> time                 56.29 μs   (56.19 μs .. 56.41 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 56.23 μs   (56.19 μs .. 56.32 μs)
> std dev              197.1 ns   (116.3 ns .. 342.1 ns)
>
> benchmarking simple longNeedle in longHaystack/Abel
> time                 94.46 μs   (94.32 μs .. 94.66 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 94.49 μs   (94.39 μs .. 94.70 μs)
> std dev              459.8 ns   (277.2 ns .. 763.7 ns)
>
>
> benchmarking use shortNeedle in shortHaystack/lib
> time                 831.8 ns   (828.4 ns .. 836.1 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 831.2 ns   (829.3 ns .. 834.6 ns)
> std dev              8.468 ns   (5.220 ns .. 13.26 ns)
>
> benchmarking use shortNeedle in shortHaystack/Feuer
> time                 819.4 ns   (816.7 ns .. 822.7 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 821.2 ns   (818.0 ns .. 828.2 ns)
> std dev              15.89 ns   (7.988 ns .. 25.90 ns)
> variance introduced by outliers: 23% (moderately inflated)
>
> benchmarking use shortNeedle in shortHaystack/Abel
> time                 853.5 ns   (851.8 ns .. 855.4 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 852.6 ns   (851.5 ns .. 855.6 ns)
> std dev              5.462 ns   (2.422 ns .. 10.51 ns)
>
>
> benchmarking use shortNeedle in longHaystack/lib
> time                 261.8 μs   (259.2 μs .. 264.7 μs)
>                       1.000 R²   (0.999 R² .. 1.000 R²)
> mean                 260.2 μs   (259.5 μs .. 261.4 μs)
> std dev              2.854 μs   (1.438 μs .. 4.475 μs)
>
> benchmarking use shortNeedle in longHaystack/Feuer
> time                 225.0 μs   (221.9 μs .. 227.1 μs)
>                       0.999 R²   (0.999 R² .. 1.000 R²)
> mean                 221.0 μs   (220.2 μs .. 222.4 μs)
> std dev              3.487 μs   (2.598 μs .. 4.385 μs)
>
> benchmarking use shortNeedle in longHaystack/Abel
> time                 244.5 μs   (232.5 μs .. 254.3 μs)
>                       0.992 R²   (0.990 R² .. 0.997 R²)
> mean                 232.1 μs   (229.4 μs .. 237.7 μs)
> std dev              11.45 μs   (6.248 μs .. 16.89 μs)
> variance introduced by outliers: 47% (moderately inflated)
>
>
> benchmarking use longNeedle in shortHaystack/lib
> time                 1.088 μs   (1.085 μs .. 1.091 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 1.087 μs   (1.086 μs .. 1.090 μs)
> std dev              6.936 ns   (4.270 ns .. 9.691 ns)
>
> benchmarking use longNeedle in shortHaystack/Feuer
> time                 787.4 ns   (785.8 ns .. 789.9 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 786.7 ns   (785.9 ns .. 788.3 ns)
> std dev              3.761 ns   (1.408 ns .. 6.358 ns)
>
> benchmarking use longNeedle in shortHaystack/Abel
> time                 930.7 ns   (927.7 ns .. 934.0 ns)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 928.3 ns   (926.7 ns .. 931.3 ns)
> std dev              7.241 ns   (3.785 ns .. 13.45 ns)
>
>
> benchmarking use longNeedle in longHaystack/lib
> time                 262.3 μs   (257.8 μs .. 266.1 μs)
>                       0.999 R²   (0.999 R² .. 1.000 R²)
> mean                 258.6 μs   (257.7 μs .. 260.0 μs)
> std dev              3.979 μs   (2.350 μs .. 5.888 μs)
>
> benchmarking use longNeedle in longHaystack/Feuer
> time                 224.9 μs   (222.1 μs .. 226.9 μs)
>                       0.999 R²   (0.999 R² .. 1.000 R²)
> mean                 221.2 μs   (220.4 μs .. 222.3 μs)
> std dev              3.168 μs   (2.143 μs .. 4.010 μs)
>
> benchmarking use longNeedle in longHaystack/Abel
> time                 233.1 μs   (231.4 μs .. 234.5 μs)
>                       1.000 R²   (1.000 R² .. 1.000 R²)
> mean                 231.2 μs   (230.7 μs .. 232.0 μs)
> std dev              2.099 μs   (1.419 μs .. 2.627 μs)
>
> On Sun, Oct 12, 2014 at 12:02 PM, Andreas Abel <abela at chalmers.se
> <mailto:abela at chalmers.se>> wrote:
>
>     If you talk about "additional memory" you assume that after `xs
>     isSuffix ys`, ys is no longer needed.  Is this the typical use case?
>     At least not in the Agda codebase, according to my quick grep...
>
>     ./Packaging/Database.hs:
>        dbEntries = filter (".conf" `isSuffixOf`) filePaths
>
>     ./TypeChecking/Monad/Open.hs:
>        unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context ("
>     ++ show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"
>
>     ./Interaction/Options.hs:
>        isLiterate file = ".lagda" `isSuffixOf` file
>
>     ./Syntax/Parser.hs:
>        if "lagda" `isSuffixOf` filePath file then
>
>     As I said, optimizations should be backed by benchmarks, especially
>     when trading a perspicuous implementation for a more complicated one...
>
>
>     On 12.10.2014 17:40, David Feuer wrote:
>
>         The haystack and the shared copy are only a needle's-length
>         apart. The
>         first stage traverses H and N until one of them runs out,
>         holding a copy
>         of the beginning  of each. This requires at most O(min{|H|, |N|})
>         additional memory (the original needle and the needle-length
>         prefix of
>         the haystack). Assuming we didn't run out of haystack before we
>         ran out
>         of needle (which assumption seems generally likely), we proceed
>         to the
>         next step, traversing H and (drop |N| H) together. During this
>         phase we
>         hold O(|N|) additional memory: the needle itself and the
>         needle-length
>         portion of the longer of the two haystack remnants we hold. Note
>         that in
>         the second phase, we *don't* hold on to the beginning of the
>         haystack,
>         because we will never need it again! Then in the final phase,
>         when the
>         shorter haystack remnant has run out, we still have the same O(|N|)
>         memory, which is now the needle itself and the needle-length
>         suffix of
>         the haystack, and (==) traverses them both together. Thus the total
>         additional memory residency for this algorithm is O(min {|H|, |N|}).
>         Using the length approach requires that you hold the beginning
>         of the
>         haystack for traversal while running length. To put it another
>         way, you
>         force the entire spine of the haystack to run length, and then start
>         from the beginning. If the haystack is produced lazily, this
>         potentially
>         requires O(|H|) extra memory. Since you also check the length of the
>         needle, your total additional memory comes to O(max {|H|, |N|}). I
>         apologize if my horrid variable names obscured what goes on in the
>         algorithm I described.
>
>         On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel
>         <abela at chalmers.se <mailto:abela at chalmers.se>
>         <mailto:abela at chalmers.se <mailto:abela at chalmers.se>>> wrote:
>
>              Well, you also traverse the haystack twice, in your getEndChunk
>              function you simultaneously traverse the haystack and a
>         (shared)
>              copy of it.  Why is this so much better?
>
>              I guess without data (timings, heap-profiles) we do not get
>         further
>              here.
>
>              On 11.10.2014 14:47, David Feuer wrote:
>
>                  It can be, but your way traverses the entire haystack
>         *twice*. The
>                  memory needs are equivalent to the reversing version,
>         which I
>                  consider
>                  unacceptable.
>
>
>              I do not understand this comment.  reverse needs O(n)
>         memory, length
>              O(1).
>
>              Cheers,
>              Andreas
>
>                  On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel wrote:
>
>                       David, the essence of your idea seems mutually
>         drop the correct
>                       number of elements from needle and hay and then
>         compare for
>                       equality.  Here is a concise implementation of
>         your idea in
>                  terms of
>                       drop:
>
>                       isSuffixOf        :: forall a . (Eq a) => [a] ->
>         [a] -> Bool
>                       [] `isSuffixOf` _  =  True
>                       xs `isSuffixOf` ys =  xs == drop (length (drop
>         (length xs)
>                  ys)) ys
>
>                       This can be further simplified to
>
>                       isSuffixOf        :: forall a . (Eq a) => [a] ->
>         [a] -> Bool
>                       [] `isSuffixOf` _  =  True
>                       xs `isSuffixOf` ys =  xs == drop (length ys -
>         length xs) ys
>
>                       which is a very easy to understand program, I think,
>                  without need to
>                       reverse lists.
>

-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/