Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Andreas Abel abela at chalmers.se
Sun Oct 12 16:02:54 UTC 2014


If you talk about "additional memory" you assume that after `xs isSuffix 
ys`, ys is no longer needed.  Is this the typical use case?
At least not in the Agda codebase, according to my quick grep...

./Packaging/Database.hs:
   dbEntries = filter (".conf" `isSuffixOf`) filePaths

./TypeChecking/Monad/Open.hs:
   unless (ctx `isSuffixOf` ctx') $ fail $ "thing out of context (" ++ 
show ctx ++ " is not a sub context of " ++ show ctx' ++ ")"

./Interaction/Options.hs:
   isLiterate file = ".lagda" `isSuffixOf` file

./Syntax/Parser.hs:
   if "lagda" `isSuffixOf` filePath file then

As I said, optimizations should be backed by benchmarks, especially when 
trading a perspicuous implementation for a more complicated one...

On 12.10.2014 17:40, David Feuer wrote:
> The haystack and the shared copy are only a needle's-length apart. The
> first stage traverses H and N until one of them runs out, holding a copy
> of the beginning  of each. This requires at most O(min{|H|, |N|})
> additional memory (the original needle and the needle-length prefix of
> the haystack). Assuming we didn't run out of haystack before we ran out
> of needle (which assumption seems generally likely), we proceed to the
> next step, traversing H and (drop |N| H) together. During this phase we
> hold O(|N|) additional memory: the needle itself and the needle-length
> portion of the longer of the two haystack remnants we hold. Note that in
> the second phase, we *don't* hold on to the beginning of the haystack,
> because we will never need it again! Then in the final phase, when the
> shorter haystack remnant has run out, we still have the same O(|N|)
> memory, which is now the needle itself and the needle-length suffix of
> the haystack, and (==) traverses them both together. Thus the total
> additional memory residency for this algorithm is O(min {|H|, |N|}).
> Using the length approach requires that you hold the beginning of the
> haystack for traversal while running length. To put it another way, you
> force the entire spine of the haystack to run length, and then start
> from the beginning. If the haystack is produced lazily, this potentially
> requires O(|H|) extra memory. Since you also check the length of the
> needle, your total additional memory comes to O(max {|H|, |N|}). I
> apologize if my horrid variable names obscured what goes on in the
> algorithm I described.
>
> On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <abela at chalmers.se
> <mailto:abela at chalmers.se>> wrote:
>
>     Well, you also traverse the haystack twice, in your getEndChunk
>     function you simultaneously traverse the haystack and a (shared)
>     copy of it.  Why is this so much better?
>
>     I guess without data (timings, heap-profiles) we do not get further
>     here.
>
>     On 11.10.2014 14:47, David Feuer wrote:
>
>         It can be, but your way traverses the entire haystack *twice*. The
>         memory needs are equivalent to the reversing version, which I
>         consider
>         unacceptable.
>
>
>     I do not understand this comment.  reverse needs O(n) memory, length
>     O(1).
>
>     Cheers,
>     Andreas
>
>         On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <abela at chalmers.se
>         <mailto:abela at chalmers.se>
>         <mailto:abela at chalmers.se <mailto:abela at chalmers.se>>> wrote:
>
>              David, the essence of your idea seems mutually drop the correct
>              number of elements from needle and hay and then compare for
>              equality.  Here is a concise implementation of your idea in
>         terms of
>              drop:
>
>              isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>              [] `isSuffixOf` _  =  True
>              xs `isSuffixOf` ys =  xs == drop (length (drop (length xs)
>         ys)) ys
>
>              This can be further simplified to
>
>              isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>              [] `isSuffixOf` _  =  True
>              xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys
>
>              which is a very easy to understand program, I think,
>         without need to
>              reverse lists.
>
>
>
>         _________________________________________________
>         Libraries mailing list
>         Libraries at haskell.org <mailto:Libraries at haskell.org>
>         http://www.haskell.org/__mailman/listinfo/libraries
>         <http://www.haskell.org/mailman/listinfo/libraries>
>
>
>
>     --
>     Andreas Abel  <><      Du bist der geliebte Mensch.
>
>     Department of Computer Science and Engineering
>     Chalmers and Gothenburg University, Sweden
>
>     andreas.abel at gu.se <mailto:andreas.abel at gu.se>
>     http://www2.tcs.ifi.lmu.de/~__abel/ <http://www2.tcs.ifi.lmu.de/~abel/>
>
>
>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>


-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel at gu.se
http://www2.tcs.ifi.lmu.de/~abel/


More information about the Libraries mailing list