[Haskell-cafe] Re: Why is there no splitBy in the list module?

Wed Jul 12 14:29:53 EDT 2006

On Wed, 12 Jul 2006, Donald Bruce Stewart wrote:

> I vote for this, currently implemented in Data.ByteString:
>         
>     -- | split on characters
>     split        :: Char -> String -> [String]
> 
>     -- | split on predicate *
>     splitBy      :: (Char -> Bool) -> String -> [String]
> 
> and
>     -- | split on a string
>     tokens       :: String -> String -> [String]

OED on "token":
    3b [Computing] The smallest meaningful unit of information
       in sequence of data for a compiler.

I think that's more or less what it means to me, too.  It may be
possible to come up with a name that is more likely to suggest
what it does and less likely to collide with identifiers used
elsewhere.  Maybe "splits", but anyway ideally including "split".

Of course technically we seem to be talking about lists, but
this last one is surely mostly about strings.

> Question over whether it should be:
>     splitBy (=='a') "aabbaca" == ["","","bb","c",""]
>   or
>     splitBy (=='a') "aabbaca" == ["bb","c"]
> 
> I argue the second form is what people usually want.

People will want both.  The second form can be computed from the
first, because it discards information about the input string,
but for the same reason of course the first can't be derived from
the second.  (I'm not the first to say that, but since mail to
this list has been arriving out of order, here it is again.)

The convention I know, possibly coming from the world of UNIX
shell tools, the default white space split is type 2, but split
on any other string is type 1.  UNIX shell does that, awk,
Python ...  (Perl is awk gone horribly wrong, so it presumably
does but if it doesn't, it's the exception that proves the rule.)
It has worked for a lot of people who do a lot of splitting, for
a lot of years.

	Donn Cave, donn at drizzle.com