Pre-proposal discussion: add a version of dropWhileEnd with different laziness properties to Data.List

Mon Sep 29 00:34:45 UTC 2014

That decision is up to the user. Data.List provides a number of sequence
functions that aren't efficient for lists, including take, drop, splitAt,
init, last, reverse, zip, unzip, and (!!). dropWhileEnd and dropWhileLE
aren't any worse, I don't think.

On Sun, Sep 28, 2014 at 6:59 PM, Greg Weber <greg at gregweber.info> wrote:

> Should the code be using Seq instead of a List?
>
> On Sun, Sep 28, 2014 at 3:53 PM, David Feuer <david.feuer at gmail.com>
> wrote:
>
>> One good option might be dropWhileR, to match Data.Sequence, but I'd only
>> want to use that if the dropWhileEnd name were deprecated in favor of
>> takeUntilLastSatisfying or whatever—otherwise it's just too confusing.
>> On Sep 28, 2014 4:39 PM, "David Feuer" <david.feuer at gmail.com> wrote:
>>
>>> BACKGROUND:
>>>
>>> A somewhat common idiom I discovered digging around the GHC tree is the
>>> use of
>>>
>>>     reverse . dropWhile p . reverse
>>>
>>> to remove some unwanted things from the end of a list. The lists
>>> involved tend to be short (e.g., file names, package names, lines of user
>>> input, etc.) and the amount removed from the end of the list tends to be
>>> short (a trailing newline, the last part of a filename, extra spaces,
>>> etc.). I initially intended to replace all of these with
>>> Data.List.dropWhileEnd p. Unfortunately, my benchmarking showed that this
>>> had a substantial negative impact on performance. Data.List.dropWhileEnd is
>>> defined like this:
>>>
>>>     dropWhileEnd p = foldr (\x r -> if p x && null r then [] else x:r) []
>>>
>>> This is lazy in the *spine* of the list--it can "flush its buffer" and
>>> produce list elements any time p x is found to be false. This is the best
>>> you can do if you need to process a very large list and don't want to have
>>> to load the whole thing into memory, and/or your predicate is *very* cheap.
>>> Unfortunately, it pays a steep price: for non-trivial p, it's strict in the
>>> *elements* of the list. This makes it unsuitable, from a performance
>>> standpoint, for most of the purposes for which I've seen reverse .
>>> dropWhile p . reverse used.
>>>
>>> CONCEPT:
>>>
>>> Add (by some name) a function that is lazy in the elements while strict
>>> in the spine:
>>>
>>> dropWhileEndLE :: (a -> Bool) -> [a] -> [a]
>>> -- specification:  dropWhileEndLE p = reverse . dropWhile p . reverse
>>> dropWhileEndLE p = foldr (\x r -> if null r && p x then [] else x:r) []
>>>
>>> This is a good bit faster than the double-reversing version, presumably
>>> because it presumably allocates its buffer stack on the GHC stack rather
>>> than spraying junk across the heap.
>>>
>>> If I were a Time Lord, I'd go back in time and add to the library,
>>> instead of dropWhileEnd as it is, a function with a name like
>>>
>>>     takeUntilLastSatisfying p = dropWhileEnd (not . p)
>>>
>>> which has a name that explains better what it does, and I'd define
>>> dropWhileEnd to be dropWhileEndLE. But I'm not a Time Lord, so I'd like to
>>> hear what name or naming approach other people would recommend.
>>>
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://www.haskell.org/mailman/listinfo/libraries
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140928/645525e0/attachment.html>