Pre-proposal discussion: add a version of dropWhileEnd with different laziness properties to Data.List

Carter Schonwald carter.schonwald at gmail.com
Mon Sep 29 01:11:42 UTC 2014


on the flip side, how can we encourage more usage of Data.Sequence ? :)

On Sun, Sep 28, 2014 at 8:34 PM, David Feuer <david.feuer at gmail.com> wrote:

> That decision is up to the user. Data.List provides a number of sequence
> functions that aren't efficient for lists, including take, drop, splitAt,
> init, last, reverse, zip, unzip, and (!!). dropWhileEnd and dropWhileLE
> aren't any worse, I don't think.
>
> On Sun, Sep 28, 2014 at 6:59 PM, Greg Weber <greg at gregweber.info> wrote:
>
>> Should the code be using Seq instead of a List?
>>
>> On Sun, Sep 28, 2014 at 3:53 PM, David Feuer <david.feuer at gmail.com>
>> wrote:
>>
>>> One good option might be dropWhileR, to match Data.Sequence, but I'd
>>> only want to use that if the dropWhileEnd name were deprecated in favor of
>>> takeUntilLastSatisfying or whatever—otherwise it's just too confusing.
>>> On Sep 28, 2014 4:39 PM, "David Feuer" <david.feuer at gmail.com> wrote:
>>>
>>>> BACKGROUND:
>>>>
>>>> A somewhat common idiom I discovered digging around the GHC tree is the
>>>> use of
>>>>
>>>>     reverse . dropWhile p . reverse
>>>>
>>>> to remove some unwanted things from the end of a list. The lists
>>>> involved tend to be short (e.g., file names, package names, lines of user
>>>> input, etc.) and the amount removed from the end of the list tends to be
>>>> short (a trailing newline, the last part of a filename, extra spaces,
>>>> etc.). I initially intended to replace all of these with
>>>> Data.List.dropWhileEnd p. Unfortunately, my benchmarking showed that this
>>>> had a substantial negative impact on performance. Data.List.dropWhileEnd is
>>>> defined like this:
>>>>
>>>>     dropWhileEnd p = foldr (\x r -> if p x && null r then [] else x:r)
>>>> []
>>>>
>>>> This is lazy in the *spine* of the list--it can "flush its buffer" and
>>>> produce list elements any time p x is found to be false. This is the best
>>>> you can do if you need to process a very large list and don't want to have
>>>> to load the whole thing into memory, and/or your predicate is *very* cheap.
>>>> Unfortunately, it pays a steep price: for non-trivial p, it's strict in the
>>>> *elements* of the list. This makes it unsuitable, from a performance
>>>> standpoint, for most of the purposes for which I've seen reverse .
>>>> dropWhile p . reverse used.
>>>>
>>>> CONCEPT:
>>>>
>>>> Add (by some name) a function that is lazy in the elements while strict
>>>> in the spine:
>>>>
>>>> dropWhileEndLE :: (a -> Bool) -> [a] -> [a]
>>>> -- specification:  dropWhileEndLE p = reverse . dropWhile p . reverse
>>>> dropWhileEndLE p = foldr (\x r -> if null r && p x then [] else x:r) []
>>>>
>>>> This is a good bit faster than the double-reversing version, presumably
>>>> because it presumably allocates its buffer stack on the GHC stack rather
>>>> than spraying junk across the heap.
>>>>
>>>> If I were a Time Lord, I'd go back in time and add to the library,
>>>> instead of dropWhileEnd as it is, a function with a name like
>>>>
>>>>     takeUntilLastSatisfying p = dropWhileEnd (not . p)
>>>>
>>>> which has a name that explains better what it does, and I'd define
>>>> dropWhileEnd to be dropWhileEndLE. But I'm not a Time Lord, so I'd like to
>>>> hear what name or naming approach other people would recommend.
>>>>
>>>
>>> _______________________________________________
>>> Libraries mailing list
>>> Libraries at haskell.org
>>> http://www.haskell.org/mailman/listinfo/libraries
>>>
>>>
>>
>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140928/4c8688cd/attachment.html>


More information about the Libraries mailing list