[Haskell-cafe] Re: Why is there no splitBy in the list module?

Matthew Pocock matthew.pocock at ncl.ac.uk
Thu Jul 13 05:32:59 EDT 2006


As someone who's not used these library methods before, I would expect splitBy 
and splitLines to work differently to each other. When splitting into lines, 
I would assume that it is repeatedly applying the regular expression "([^t]*)
(t|$)" where t is the line-terminator. You return the first group each time, 
and discard the rest. The 2nd group also handles the end-of-string boundary 
condition.

As others have said, I would expect splitBy to return all of the zero-length 
matches as well - interlieving a "[^t]*" match-and-return with a "t" 
match-and-discard. The collapsed form of the output is the same as 
interleving a "[^t]" match-and-return with a "t*" match-and-discard.

Matthew

On Thursday 13 July 2006 10:16, Jon Fairbairn wrote:
> On 2006-07-12 at 23:24BST "Brian Hulley" wrote:
> > Christian Maeder wrote:
> > > Donald Bruce Stewart schrieb:
> > >> Question over whether it should be:
> > >>     splitBy (=='a') "aabbaca" == ["","","bb","c",""]
> > >>   or
> > >>     splitBy (=='a') "aabbaca" == ["bb","c"]
> > >>
> > >> I argue the second form is what people usually want.
> > >
> > > Yes, the second form is needed for "words", but the first form is
> > > needed for "lines", where one final empty element needs to be removed
> > > from your version!
> > >
> > > Prelude> lines "a\nb\n"
> > > ["a","b"]
> > > Prelude> lines "a\n\nb\n\n"
> > > ["a","","b",""]
> >
> > Prelude.lines and Prelude.unlines treat '\n' as a terminator instead of a
> > separator. I'd argue that this is poor design, since information is lost
> > ie lines . unlines === id whereas unlines . lines =/= id whereas if '\n'
> > had been properly conceived of as a separator, the identity would hold.
>
> Hooray!  I've been waiting to ask "Why aren't we asking what
> laws hold for these operations?" but now you've saved me the
> effort. I've been bitten by unlines . lines /= id already;
> it's something we could gainfully change without wrecking
> too much code, methinks.
>
> > So I vote for the first option ie:
> >
> >     splitBy (=='a') "aabbaca" == ["","","bb","c",""]
>
> Seconded.
>
> As far as naming is concerned, since this is a declarative
> language, surely we shouldn't be using active verbs like
> this? (OK I lost that argument way back in the mists of
> Haskell 0.0 with take. Before then I called "take" "first":
> "first n some_list" reads perfectly well).
>
>  Jón


More information about the Libraries mailing list