[Haskell-cafe] How to split this string.

Thu Jan 5 11:02:37 CET 2012

Steve Horne <sh006d3592 at blueyonder.co.uk> writes:

> On 02/01/2012 11:12, Jon Fairbairn wrote:
>> max<mk at mtw.ru>  writes:
>>
>>> I want to write a function whose behavior is as follows:
>>>
>>> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
>>> "string2\r\nstring3", "string4"]
>>>
>>> Note the sequence "\r\n", which is ignored. How can I do this?
>> cabal install split
>>
>> then do something like
>>
>>     import Data.List (groupBy)
>>     import Data.List.Split (splitOn)
>>
>>     rn '\r' '\n' = True
>>     rn _ _ = False
>>
>>     required_function = fmap concat . splitOn ["\n"] . groupBy rn
>>
>> (though that might be an abuse of groupBy)
>>
> Sadly, it turns out that not only is this an abuse of
> groupBy, but it has (I think) a subtle bug as a result.

It does indeed. Thanks. That was pretty much what I feared.

> Explanation (best guess) - the function passed to groupBy,
> according to the docs, is meant to test whether two values
> are 'equal'. I'm guessing the assumption is that the
> function will effectively treat values as belonging to
> equivalence classes. That implies some rules such as...

Right.  This issue has come up from time to time since groupBy
was first written, and someone pops up to justify the present
behaviour, but I can never remember why.

> In the context of this \r\n test function, this behaviour
> will I guess result in \r\n\n being combined into one group.
> The second \n will therefore not be seen as a valid
> splitting point.

Correct. In my defence, I did say “do something like” :-)

> Personally, I think this is a tad disappointing. Given that
> groupBy cannot check or enforce that it's test respects
> equivalence classes, it should ideally give results that
> make as much sense as possible either way. That said, even
> if the test was always given adjacent elements, there's
> still room for a different order of processing the list
> (left-to-right or right-to-left) to give different results -
> and in any case, maybe it's more efficient the way it is.

Looking back at the libraries list, I get the impression that
there was a suggestion to change the behaviour of groupBy, but
it doesn’t seem to have happened.

-- 
Jón Fairbairn                                 Jon.Fairbairn at cl.cam.ac.uk
http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html  (updated 2010-09-14)