Proposal: Make a very slight change to the semantics of Data.List.isSuffixOf

Sun Oct 12 15:40:43 UTC 2014

The haystack and the shared copy are only a needle's-length apart. The
first stage traverses H and N until one of them runs out, holding a copy of
the beginning  of each. This requires at most O(min{|H|, |N|}) additional
memory (the original needle and the needle-length prefix of the haystack).
Assuming we didn't run out of haystack before we ran out of needle (which
assumption seems generally likely), we proceed to the next step, traversing
H and (drop |N| H) together. During this phase we hold O(|N|) additional
memory: the needle itself and the needle-length portion of the longer of
the two haystack remnants we hold. Note that in the second phase, we
*don't* hold on to the beginning of the haystack, because we will never
need it again! Then in the final phase, when the shorter haystack remnant
has run out, we still have the same O(|N|) memory, which is now the needle
itself and the needle-length suffix of the haystack, and (==) traverses
them both together. Thus the total additional memory residency for this
algorithm is O(min {|H|, |N|}). Using the length approach requires that you
hold the beginning of the haystack for traversal while running length. To
put it another way, you force the entire spine of the haystack to run
length, and then start from the beginning. If the haystack is produced
lazily, this potentially requires O(|H|) extra memory. Since you also check
the length of the needle, your total additional memory comes to O(max {|H|,
|N|}). I apologize if my horrid variable names obscured what goes on in the
algorithm I described.

On Sun, Oct 12, 2014 at 10:14 AM, Andreas Abel <abela at chalmers.se> wrote:

> Well, you also traverse the haystack twice, in your getEndChunk function
> you simultaneously traverse the haystack and a (shared) copy of it.  Why is
> this so much better?
>
> I guess without data (timings, heap-profiles) we do not get further here.
>
> On 11.10.2014 14:47, David Feuer wrote:
>
>> It can be, but your way traverses the entire haystack *twice*. The
>> memory needs are equivalent to the reversing version, which I consider
>> unacceptable.
>>
>
> I do not understand this comment.  reverse needs O(n) memory, length O(1).
>
> Cheers,
> Andreas
>
>  On Sat, Oct 11, 2014 at 5:57 AM, Andreas Abel <abela at chalmers.se
>> <mailto:abela at chalmers.se>> wrote:
>>
>>     David, the essence of your idea seems mutually drop the correct
>>     number of elements from needle and hay and then compare for
>>     equality.  Here is a concise implementation of your idea in terms of
>>     drop:
>>
>>     isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>>     [] `isSuffixOf` _  =  True
>>     xs `isSuffixOf` ys =  xs == drop (length (drop (length xs) ys)) ys
>>
>>     This can be further simplified to
>>
>>     isSuffixOf        :: forall a . (Eq a) => [a] -> [a] -> Bool
>>     [] `isSuffixOf` _  =  True
>>     xs `isSuffixOf` ys =  xs == drop (length ys - length xs) ys
>>
>>     which is a very easy to understand program, I think, without need to
>>     reverse lists.
>>
>>
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://www.haskell.org/mailman/listinfo/libraries
>>
>>
>
> --
> Andreas Abel  <><      Du bist der geliebte Mensch.
>
> Department of Computer Science and Engineering
> Chalmers and Gothenburg University, Sweden
>
> andreas.abel at gu.se
> http://www2.tcs.ifi.lmu.de/~abel/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20141012/2805dc8c/attachment.html>