Pre-proposal discussion: add a version of dropWhileEnd with different laziness properties to Data.List

David Feuer david.feuer at gmail.com
Mon Sep 29 01:26:30 UTC 2014


I think the Foldable/Traversable work for 7.10 will help with a lot of
these issues. Beyond that: make it faster if possible, make less-capable
versions that are faster, and make it easier to find the right version for
the job. A complicated way might be to use the "bundle" approach that the
vector fusion folks have pioneered, but there are surely other ways. But I
don't think any of that discussion takes away from the need to have really
great list support.

On Sun, Sep 28, 2014 at 9:11 PM, Carter Schonwald <
carter.schonwald at gmail.com> wrote:

> on the flip side, how can we encourage more usage of Data.Sequence ? :)
>
> On Sun, Sep 28, 2014 at 8:34 PM, David Feuer <david.feuer at gmail.com>
> wrote:
>
>> That decision is up to the user. Data.List provides a number of sequence
>> functions that aren't efficient for lists, including take, drop, splitAt,
>> init, last, reverse, zip, unzip, and (!!). dropWhileEnd and dropWhileLE
>> aren't any worse, I don't think.
>>
>> On Sun, Sep 28, 2014 at 6:59 PM, Greg Weber <greg at gregweber.info> wrote:
>>
>>> Should the code be using Seq instead of a List?
>>>
>>> On Sun, Sep 28, 2014 at 3:53 PM, David Feuer <david.feuer at gmail.com>
>>> wrote:
>>>
>>>> One good option might be dropWhileR, to match Data.Sequence, but I'd
>>>> only want to use that if the dropWhileEnd name were deprecated in favor of
>>>> takeUntilLastSatisfying or whatever—otherwise it's just too confusing.
>>>> On Sep 28, 2014 4:39 PM, "David Feuer" <david.feuer at gmail.com> wrote:
>>>>
>>>>> BACKGROUND:
>>>>>
>>>>> A somewhat common idiom I discovered digging around the GHC tree is
>>>>> the use of
>>>>>
>>>>>     reverse . dropWhile p . reverse
>>>>>
>>>>> to remove some unwanted things from the end of a list. The lists
>>>>> involved tend to be short (e.g., file names, package names, lines of user
>>>>> input, etc.) and the amount removed from the end of the list tends to be
>>>>> short (a trailing newline, the last part of a filename, extra spaces,
>>>>> etc.). I initially intended to replace all of these with
>>>>> Data.List.dropWhileEnd p. Unfortunately, my benchmarking showed that this
>>>>> had a substantial negative impact on performance. Data.List.dropWhileEnd is
>>>>> defined like this:
>>>>>
>>>>>     dropWhileEnd p = foldr (\x r -> if p x && null r then [] else x:r)
>>>>> []
>>>>>
>>>>> This is lazy in the *spine* of the list--it can "flush its buffer" and
>>>>> produce list elements any time p x is found to be false. This is the best
>>>>> you can do if you need to process a very large list and don't want to have
>>>>> to load the whole thing into memory, and/or your predicate is *very* cheap.
>>>>> Unfortunately, it pays a steep price: for non-trivial p, it's strict in the
>>>>> *elements* of the list. This makes it unsuitable, from a performance
>>>>> standpoint, for most of the purposes for which I've seen reverse .
>>>>> dropWhile p . reverse used.
>>>>>
>>>>> CONCEPT:
>>>>>
>>>>> Add (by some name) a function that is lazy in the elements while
>>>>> strict in the spine:
>>>>>
>>>>> dropWhileEndLE :: (a -> Bool) -> [a] -> [a]
>>>>> -- specification:  dropWhileEndLE p = reverse . dropWhile p . reverse
>>>>> dropWhileEndLE p = foldr (\x r -> if null r && p x then [] else x:r) []
>>>>>
>>>>> This is a good bit faster than the double-reversing version,
>>>>> presumably because it presumably allocates its buffer stack on the GHC
>>>>> stack rather than spraying junk across the heap.
>>>>>
>>>>> If I were a Time Lord, I'd go back in time and add to the library,
>>>>> instead of dropWhileEnd as it is, a function with a name like
>>>>>
>>>>>     takeUntilLastSatisfying p = dropWhileEnd (not . p)
>>>>>
>>>>> which has a name that explains better what it does, and I'd define
>>>>> dropWhileEnd to be dropWhileEndLE. But I'm not a Time Lord, so I'd like to
>>>>> hear what name or naming approach other people would recommend.
>>>>>
>>>>
>>>> _______________________________________________
>>>> Libraries mailing list
>>>> Libraries at haskell.org
>>>> http://www.haskell.org/mailman/listinfo/libraries
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://www.haskell.org/mailman/listinfo/libraries
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140928/32c5f4c1/attachment-0001.html>


More information about the Libraries mailing list