Proposal for Data.List.splitBy

Sun Jan 18 11:14:50 EST 2009

Duncan Coutts wrote:
> On Sun, 2009-01-18 at 15:11 +0100, Marcus D. Gabriel wrote:
>   
>> Duncan Coutts wrote:
>>     
>>> On Sun, 2009-01-18 at 12:02 +0100, Marcus D. Gabriel wrote:
>>>   
>>>       
>>>> Brent Yorgey wrote:
>>>>     
>>>>         
>>>>>> P2. There should be no information loss, that is, keep the
>>>>>>         
>>>>>>             
>>>> delimiters,
>>>>     
>>>>         
>>>>>> keep the separators, keep the parts of the original list xs that
>>>>>>         
>>>>>>             
>>>> satisfy
>>>>     
>>>>         
>>>>>> a predicate p, do not lose information about the beginning and the
>>>>>>         
>>>>>>             
>>>> end
>>>>     
>>>>         
>>>>>> of the list relative to the first and last elements of the list
>>>>>> respectively. The user of the function decides what to discard.
>>>>>>
>>>>>> P3. A split list should be unsplittable so as to recover the original
>>>>>> list xs. (I made up the word unsplittable.) (P2 implies P3, but let us
>>>>>> state this anyway.)
>>>>>>         
>>>>>>             
>>>>> I'm not sure I agree with this.
>>>>>       
>>>>>           
>>>> Thanks for stating this.  Dropping P3 would change my
>>>> thinking about this topic, that is, if we drop P3, then
>>>> I would prefer that no splitter functions are added to
>>>> Data.List and that it is left as is.
>>>>
>>>>     
>>>>         
>>>>> The problem is that much (most?) of
>>>>> the time, people looking for a split function want to discard
>>>>> delimiters; for example, if you have a string like "foo;bar;baz" and
>>>>> you want to split it into ["foo","bar","baz"].
>>>>>       
>>>>>           
>>>> I agree with this comment when thinking about strings and what
>>>> I would do most of the time and from a pragmatic point of view.
>>>>     
>>>>         
>>> Indeed, the existing Data.List.words is certainly lossyand deliberately
>>> so. It's also useful and widely used.
>>>
>>> On the other hand it is a widely held view that Data.List.lines should
>>> not be lossy, ie that Data.List.unlines . Data.List.lines  should be the
>>> identity. In the current implementation of lines . unlines it is not the
>>> case because of the way it handles a trailing newline.
>>>       
>
>   
>> An argument for not placing any fundamental splitter functions
>> in Data.List that are lossy if I ever read one.
>>
>> The user of these functions should explicitly choose to lose
>> information.  Then the documentation in the Haskell 98 report
>> might have stated instead something like
>>
>>   unlines . lines == id iff xs ends with '\n'
>>
>> which would at least be up front.
>>     
>
> Of course, the properties should have been specified. But are you also
> really saying that 'words' should have been omitted from the List
> module?
>   

Good catch, very good catch.  Historically, it is there, so
pragmatically, it stays.

This go backs to my question about module, package, or library design.
A set of Data.List design criteria could have been set up such that

> unlines :: [String] -> String
> unwords :: [String] -> String

were not in List of Haskell 98 but in a smaller, specific module,
something similar to Data.Char for example.  I recall that when
I was first learning Haskell and browsed List under Hugs, these
two functions stood out compared to everything else simply because
of their type signature.

Well Duncan, you kind of undermined the perspective that I was trying
to construct.  You pushed my view to Alex's one about simply
leaving Data.List alone and moving forward with Data.List.Split.

Assuming this, then from a very pragmatic point of view, I would
tell a new programmer to Haskell to study Data.List first, and if
you do not find for what you are looking, try Data.List.Split.

Still, I am curious what Brent will write.

Cheers,
- Marcus