Pre-proposal discussion: add a version of dropWhileEnd with different laziness properties to Data.List

Sun Sep 28 22:59:58 UTC 2014

Should the code be using Seq instead of a List?

On Sun, Sep 28, 2014 at 3:53 PM, David Feuer <david.feuer at gmail.com> wrote:

> One good option might be dropWhileR, to match Data.Sequence, but I'd only
> want to use that if the dropWhileEnd name were deprecated in favor of
> takeUntilLastSatisfying or whatever—otherwise it's just too confusing.
> On Sep 28, 2014 4:39 PM, "David Feuer" <david.feuer at gmail.com> wrote:
>
>> BACKGROUND:
>>
>> A somewhat common idiom I discovered digging around the GHC tree is the
>> use of
>>
>>     reverse . dropWhile p . reverse
>>
>> to remove some unwanted things from the end of a list. The lists involved
>> tend to be short (e.g., file names, package names, lines of user input,
>> etc.) and the amount removed from the end of the list tends to be short (a
>> trailing newline, the last part of a filename, extra spaces, etc.). I
>> initially intended to replace all of these with Data.List.dropWhileEnd p.
>> Unfortunately, my benchmarking showed that this had a substantial negative
>> impact on performance. Data.List.dropWhileEnd is defined like this:
>>
>>     dropWhileEnd p = foldr (\x r -> if p x && null r then [] else x:r) []
>>
>> This is lazy in the *spine* of the list--it can "flush its buffer" and
>> produce list elements any time p x is found to be false. This is the best
>> you can do if you need to process a very large list and don't want to have
>> to load the whole thing into memory, and/or your predicate is *very* cheap.
>> Unfortunately, it pays a steep price: for non-trivial p, it's strict in the
>> *elements* of the list. This makes it unsuitable, from a performance
>> standpoint, for most of the purposes for which I've seen reverse .
>> dropWhile p . reverse used.
>>
>> CONCEPT:
>>
>> Add (by some name) a function that is lazy in the elements while strict
>> in the spine:
>>
>> dropWhileEndLE :: (a -> Bool) -> [a] -> [a]
>> -- specification:  dropWhileEndLE p = reverse . dropWhile p . reverse
>> dropWhileEndLE p = foldr (\x r -> if null r && p x then [] else x:r) []
>>
>> This is a good bit faster than the double-reversing version, presumably
>> because it presumably allocates its buffer stack on the GHC stack rather
>> than spraying junk across the heap.
>>
>> If I were a Time Lord, I'd go back in time and add to the library,
>> instead of dropWhileEnd as it is, a function with a name like
>>
>>     takeUntilLastSatisfying p = dropWhileEnd (not . p)
>>
>> which has a name that explains better what it does, and I'd define
>> dropWhileEnd to be dropWhileEndLE. But I'm not a Time Lord, so I'd like to
>> hear what name or naming approach other people would recommend.
>>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140928/0cfb8619/attachment-0001.html>