Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf
Andreas Abel
abela at chalmers.se
Mon Oct 13 20:57:28 UTC 2014
Nice, good work! --Andreas
On 13.10.2014 20:17, David Feuer wrote:
> I've done the benchmarks and the results are clear: the implementation I
> gave is faster than the one you gave and the one in Data.List in all
> cases. Yours is usually faster than the one in Data.List, but slower for
> very short lists.
>
> Test code:
>
> needles =
> [ ("shortNeedle", ".exe")
> ,("longNeedle", "And who by fire? Who by water? Who in the sunshine?
> Who in the nighttime?")]
>
> haystacks =
> [ ("shortHaystack", "C:\\Program Files\\Windows\\Dunno.exe")
> ,("longHaystack", take 10000 $ map toEnum $ iterate (\x -> (x*131)
> .&. 0xFFFF) 7)]
>
> impls = [("lib", L.isSuffixOf), ("Feuer", isSuffixOf), ("Abel",
> isSuffixOfA)]
>
> tests = [("simple", (\impl (needle,haystack) -> fromEnum $ needle `impl`
> haystack)),
> ("use", (\impl (needle,haystack) ->
> if needle `impl` haystack
> then sum (map fromEnum haystack)
> else product (map fromEnum haystack)))]
>
> main = defaultMain
> [
> bgroup (unwords [fst test, fst needle, "in", fst haystack])
> [bench (fst impl) $ whnf (snd test $ snd impl) (snd needle, snd
> haystack)
> | impl <- impls] | test <- tests, needle <- needles, haystack <-
> haystacks]
>
>
> Results:
>
> benchmarking simple shortNeedle in shortHaystack/lib
> time 244.6 ns (243.9 ns .. 245.6 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 245.2 ns (244.8 ns .. 246.5 ns)
> std dev 2.030 ns (950.4 ps .. 4.378 ns)
>
> benchmarking simple shortNeedle in shortHaystack/Feuer
> time 234.5 ns (234.1 ns .. 235.1 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 234.6 ns (234.2 ns .. 235.3 ns)
> std dev 1.628 ns (678.1 ps .. 2.758 ns)
>
> benchmarking simple shortNeedle in shortHaystack/Abel
> time 268.1 ns (267.8 ns .. 268.4 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 268.0 ns (267.8 ns .. 268.4 ns)
> std dev 947.8 ps (538.1 ps .. 1.668 ns)
>
>
> benchmarking simple shortNeedle in longHaystack/lib
> time 100.5 μs (100.2 μs .. 100.9 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 100.8 μs (100.5 μs .. 101.4 μs)
> std dev 1.511 μs (1.035 μs .. 2.095 μs)
>
> benchmarking simple shortNeedle in longHaystack/Feuer
> time 55.66 μs (55.61 μs .. 55.73 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 55.68 μs (55.63 μs .. 55.79 μs)
> std dev 222.0 ns (128.7 ns .. 415.0 ns)
>
> benchmarking simple shortNeedle in longHaystack/Abel
> time 94.60 μs (94.39 μs .. 94.94 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 94.51 μs (94.44 μs .. 94.68 μs)
> std dev 332.8 ns (183.6 ns .. 617.7 ns)
>
>
> benchmarking simple longNeedle in shortHaystack/lib
> time 480.6 ns (480.2 ns .. 481.1 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 480.6 ns (480.1 ns .. 481.6 ns)
> std dev 2.217 ns (1.025 ns .. 4.032 ns)
>
> benchmarking simple longNeedle in shortHaystack/Feuer
> time 187.0 ns (186.8 ns .. 187.2 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 187.0 ns (186.8 ns .. 187.4 ns)
> std dev 922.0 ps (427.3 ps .. 1.598 ns)
>
> benchmarking simple longNeedle in shortHaystack/Abel
> time 340.9 ns (339.4 ns .. 343.3 ns)
> 1.000 R² (0.999 R² .. 1.000 R²)
> mean 345.0 ns (341.6 ns .. 351.0 ns)
> std dev 14.40 ns (9.622 ns .. 20.77 ns)
> variance introduced by outliers: 60% (severely inflated)
>
>
> benchmarking simple longNeedle in longHaystack/lib
> time 100.3 μs (100.2 μs .. 100.5 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 100.4 μs (100.3 μs .. 100.9 μs)
> std dev 811.3 ns (333.5 ns .. 1.575 μs)
>
> benchmarking simple longNeedle in longHaystack/Feuer
> time 56.29 μs (56.19 μs .. 56.41 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 56.23 μs (56.19 μs .. 56.32 μs)
> std dev 197.1 ns (116.3 ns .. 342.1 ns)
>
> benchmarking simple longNeedle in longHaystack/Abel
> time 94.46 μs (94.32 μs .. 94.66 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 94.49 μs (94.39 μs .. 94.70 μs)
> std dev 459.8 ns (277.2 ns .. 763.7 ns)
>
>
> benchmarking use shortNeedle in shortHaystack/lib
> time 831.8 ns (828.4 ns .. 836.1 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 831.2 ns (829.3 ns .. 834.6 ns)
> std dev 8.468 ns (5.220 ns .. 13.26 ns)
>
> benchmarking use shortNeedle in shortHaystack/Feuer
> time 819.4 ns (816.7 ns .. 822.7 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 821.2 ns (818.0 ns .. 828.2 ns)
> std dev 15.89 ns (7.988 ns .. 25.90 ns)
> variance introduced by outliers: 23% (moderately inflated)
>
> benchmarking use shortNeedle in shortHaystack/Abel
> time 853.5 ns (851.8 ns .. 855.4 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 852.6 ns (851.5 ns .. 855.6 ns)
> std dev 5.462 ns (2.422 ns .. 10.51 ns)
>
>
> benchmarking use shortNeedle in longHaystack/lib
> time 261.8 μs (259.2 μs .. 264.7 μs)
> 1.000 R² (0.999 R² .. 1.000 R²)
> mean 260.2 μs (259.5 μs .. 261.4 μs)
> std dev 2.854 μs (1.438 μs .. 4.475 μs)
>
> benchmarking use shortNeedle in longHaystack/Feuer
> time 225.0 μs (221.9 μs .. 227.1 μs)
> 0.999 R² (0.999 R² .. 1.000 R²)
> mean 221.0 μs (220.2 μs .. 222.4 μs)
> std dev 3.487 μs (2.598 μs .. 4.385 μs)
>
> benchmarking use shortNeedle in longHaystack/Abel
> time 244.5 μs (232.5 μs .. 254.3 μs)
> 0.992 R² (0.990 R² .. 0.997 R²)
> mean 232.1 μs (229.4 μs .. 237.7 μs)
> std dev 11.45 μs (6.248 μs .. 16.89 μs)
> variance introduced by outliers: 47% (moderately inflated)
>
>
> benchmarking use longNeedle in shortHaystack/lib
> time 1.088 μs (1.085 μs .. 1.091 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 1.087 μs (1.086 μs .. 1.090 μs)
> std dev 6.936 ns (4.270 ns .. 9.691 ns)
>
> benchmarking use longNeedle in shortHaystack/Feuer
> time 787.4 ns (785.8 ns .. 789.9 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 786.7 ns (785.9 ns .. 788.3 ns)
> std dev 3.761 ns (1.408 ns .. 6.358 ns)
>
> benchmarking use longNeedle in shortHaystack/Abel
> time 930.7 ns (927.7 ns .. 934.0 ns)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 928.3 ns (926.7 ns .. 931.3 ns)
> std dev 7.241 ns (3.785 ns .. 13.45 ns)
>
>
> benchmarking use longNeedle in longHaystack/lib
> time 262.3 μs (257.8 μs .. 266.1 μs)
> 0.999 R² (0.999 R² .. 1.000 R²)
> mean 258.6 μs (257.7 μs .. 260.0 μs)
> std dev 3.979 μs (2.350 μs .. 5.888 μs)
>
> benchmarking use longNeedle in longHaystack/Feuer
> time 224.9 μs (222.1 μs .. 226.9 μs)
> 0.999 R² (0.999 R² .. 1.000 R²)
> mean 221.2 μs (220.4 μs .. 222.3 μs)
> std dev 3.168 μs (2.143 μs .. 4.010 μs)
>
> benchmarking use longNeedle in longHaystack/Abel
> time 233.1 μs (231.4 μs .. 234.5 μs)
> 1.000 R² (1.000 R² .. 1.000 R²)
> mean 231.2 μs (230.7 μs .. 232.0 μs)
> std dev 2.099 μs (1.419 μs .. 2.627 μs)
>
> On Sun, Oct 12, 2014 at 12:02 PM, Andreas Abel <abela at chalmers.se
> <mailto:abela at chalmers.se>> wrote:
>
> If you talk about "additional memory" you assume that after `xs
> isSuffix ys`, ys is no longer needed. Is this the typical use case?
> At least not in the Agda codebase, according to my quick grep...
>
> ./Packaging/Database.hs:
> dbEntries = filter (".conf" `isSuffixOf`) filePaths
>
> ./TypeChecking/Monad/Open.hs:
> unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context ("
> ++ show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"
>
> ./Interaction/Options.hs:
> isLiterate file = ".lagda" `isSuffixOf` file
>
> ./Syntax/Parser.hs:
> if "lagda" `isSuffixOf` filePath file then
>
> As I said, optimizations should be backed by benchmarks, especially
> when trading a perspicuous implementation for a more complicated one...
>
>
> On 12.10.2014 17:40, David Feuer wrote:
>
> The haystack and the shared copy are only a needle's-length
> apart. The
> first stage traverses H and N until one of them runs out,
> holding a copy
> of the beginning of each. This requires at most O(min{|H|, |N|})
> additional memory (the original needle and the needle-length
> prefix of
> the haystack). Assuming we didn't run out of haystack before we
> ran out
> of needle (which assumption seems generally likely), we proceed
> to the
> next step, traversing H and (drop |N| H) together. During this
> phase we
> hold O(|N|) additional memory: the needle itself and the
> needle-length
> portion of the longer of the two haystack remnants we hold. Note
> that in
> the second phase, we *don't* hold on to the beginning of the
> haystack,
> because we will never need it again! Then in the final phase,
> when the
> shorter haystack remnant has run out, we still have the same O(|N|)
> memory, which is now the needle itself and the needle-length
> suffix of
> the haystack, and (==) traverses them both together. Thus the total
> additional memory residency for this algorithm is O(min {|H|, |N|}).
> Using the length approach requires that you hold the beginning
> of the
> haystack for traversal while running length. To put it another
> way, you
> force the entire spine of the haystack to run length, and then start
> from the beginning. If the haystack is produced lazily, this
> potentially
> requires O(|H|) extra memory. Since you also check the length of the
> needle, your total additional memory comes to O(max {|H|, |N|}). I
> apologize if my horrid variable names obscured what goes on in the
> algorithm I described.
>
> On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel
> <abela at chalmers.se <mailto:abela at chalmers.se>
> <mailto:abela at chalmers.se <mailto:abela at chalmers.se>>> wrote:
>
> Well, you also traverse the haystack twice, in your getEndChunk
> function you simultaneously traverse the haystack and a
> (shared)
> copy of it. Why is this so much better?
>
> I guess without data (timings, heap-profiles) we do not get
> further
> here.
>
> On 11.10.2014 14:47, David Feuer wrote:
>
> It can be, but your way traverses the entire haystack
> *twice*. The
> memory needs are equivalent to the reversing version,
> which I
> consider
> unacceptable.
>
>
> I do not understand this comment. reverse needs O(n)
> memory, length
> O(1).
>
> Cheers,
> Andreas
>
> On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel wrote:
>
> David, the essence of your idea seems mutually
> drop the correct
> number of elements from needle and hay and then
> compare for
> equality. Here is a concise implementation of
> your idea in
> terms of
> drop:
>
> isSuffixOf :: forall a . (Eq a) => [a] ->
> [a] -> Bool
> [] `isSuffixOf` _ = True
> xs `isSuffixOf` ys = xs == drop (length (drop
> (length xs)
> ys)) ys
>
> This can be further simplified to
>
> isSuffixOf :: forall a . (Eq a) => [a] ->
> [a] -> Bool
> [] `isSuffixOf` _ = True
> xs `isSuffixOf` ys = xs == drop (length ys -
> length xs) ys
>
> which is a very easy to understand program, I think,
> without need to
> reverse lists.
>
--
Andreas Abel <>< Du bist der geliebte Mensch.
Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden
andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/
More information about the Libraries
mailing list