Pre-proposal discussion: add a version of dropWhileEnd with different laziness properties to Data.List

David Feuer david.feuer at gmail.com
Sun Sep 28 22:53:35 UTC 2014


One good option might be dropWhileR, to match Data.Sequence, but I'd only
want to use that if the dropWhileEnd name were deprecated in favor of
takeUntilLastSatisfying or whatever—otherwise it's just too confusing.
On Sep 28, 2014 4:39 PM, "David Feuer" <david.feuer at gmail.com> wrote:

> BACKGROUND:
>
> A somewhat common idiom I discovered digging around the GHC tree is the
> use of
>
>     reverse . dropWhile p . reverse
>
> to remove some unwanted things from the end of a list. The lists involved
> tend to be short (e.g., file names, package names, lines of user input,
> etc.) and the amount removed from the end of the list tends to be short (a
> trailing newline, the last part of a filename, extra spaces, etc.). I
> initially intended to replace all of these with Data.List.dropWhileEnd p.
> Unfortunately, my benchmarking showed that this had a substantial negative
> impact on performance. Data.List.dropWhileEnd is defined like this:
>
>     dropWhileEnd p = foldr (\x r -> if p x && null r then [] else x:r) []
>
> This is lazy in the *spine* of the list--it can "flush its buffer" and
> produce list elements any time p x is found to be false. This is the best
> you can do if you need to process a very large list and don't want to have
> to load the whole thing into memory, and/or your predicate is *very* cheap.
> Unfortunately, it pays a steep price: for non-trivial p, it's strict in the
> *elements* of the list. This makes it unsuitable, from a performance
> standpoint, for most of the purposes for which I've seen reverse .
> dropWhile p . reverse used.
>
> CONCEPT:
>
> Add (by some name) a function that is lazy in the elements while strict in
> the spine:
>
> dropWhileEndLE :: (a -> Bool) -> [a] -> [a]
> -- specification:  dropWhileEndLE p = reverse . dropWhile p . reverse
> dropWhileEndLE p = foldr (\x r -> if null r && p x then [] else x:r) []
>
> This is a good bit faster than the double-reversing version, presumably
> because it presumably allocates its buffer stack on the GHC stack rather
> than spraying junk across the heap.
>
> If I were a Time Lord, I'd go back in time and add to the library, instead
> of dropWhileEnd as it is, a function with a name like
>
>     takeUntilLastSatisfying p = dropWhileEnd (not . p)
>
> which has a name that explains better what it does, and I'd define
> dropWhileEnd to be dropWhileEndLE. But I'm not a Time Lord, so I'd like to
> hear what name or naming approach other people would recommend.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140928/4eefdb35/attachment.html>


More information about the Libraries mailing list