Proposal for Data.List.splitBy

Sun Jan 18 09:11:22 EST 2009

Duncan Coutts wrote:
> On Sun, 2009-01-18 at 12:02 +0100, Marcus D. Gabriel wrote:
>   
>> Brent Yorgey wrote:
>>     
>>>> P2. There should be no information loss, that is, keep the
>>>>         
>> delimiters,
>>     
>>>> keep the separators, keep the parts of the original list xs that
>>>>         
>> satisfy
>>     
>>>> a predicate p, do not lose information about the beginning and the
>>>>         
>> end
>>     
>>>> of the list relative to the first and last elements of the list
>>>> respectively. The user of the function decides what to discard.
>>>>
>>>> P3. A split list should be unsplittable so as to recover the original
>>>> list xs. (I made up the word unsplittable.) (P2 implies P3, but let us
>>>> state this anyway.)
>>>>         
>>> I'm not sure I agree with this.
>>>       
>> Thanks for stating this.  Dropping P3 would change my
>> thinking about this topic, that is, if we drop P3, then
>> I would prefer that no splitter functions are added to
>> Data.List and that it is left as is.
>>
>>     
>>> The problem is that much (most?) of
>>> the time, people looking for a split function want to discard
>>> delimiters; for example, if you have a string like "foo;bar;baz" and
>>> you want to split it into ["foo","bar","baz"].
>>>       
>> I agree with this comment when thinking about strings and what
>> I would do most of the time and from a pragmatic point of view.
>>     
>
> Indeed, the existing Data.List.words is certainly lossyand deliberately
> so. It's also useful and widely used.
>
> On the other hand it is a widely held view that Data.List.lines should
> not be lossy, ie that Data.List.unlines . Data.List.lines  should be the
> identity. In the current implementation of lines . unlines it is not the
> case because of the way it handles a trailing newline.
>
> Duncan
>   

An argument for not placing any fundamental splitter functions
in Data.List that are lossy if I ever read one.

The user of these functions should explicitly choose to lose
information.  Then the documentation in the Haskell 98 report
might have stated instead something like

  unlines . lines == id iff xs ends with '\n'

which would at least be up front.

Cheers,
- Marcus