Prelude function suggestions

Wed Jul 28 12:09:26 EDT 2004

 > [split, chop, all that]

How about biting the bullet and providing a real
"tokenizer"? I have had the problem of having to split a
text into lines, for instance, which used \r\n as EOL
marker, not just \n. So I couldn't use 'lines'.

Judging by the 'split' (or 'chop') proposals I've seen so
far, I wouldn't be able to use them for that purpose either,
because they don't support multi-byte tokens.

Shooting from the hip, I'd say this more general function
would do the trick:

  tokenize :: (a -> Bool) -> (a -> Bool) -> [a] -> [[a]]

The first function returns 'True' if the the current input
element is part of a valid token. The second function (the
"skipper") would return 'True' if the current element is
ignorable "whitespace".

The input "foo bar  \t claus \r\n stuff", for instance,
could be tokenized into ["foo", "bar", "claus", "stuff"] by
something along the lines of the following function call:

  tokenize isAlphaNum isSpace "input string"

Basically, the 'tokenize' function would consume input until
the first function says "False". Then it would consume (and
drop) input until the second function says "False". And so
on, until the end of input string is reached. One would have
to think about what 'tokenize' would do if _both_ functions
say 'False' for an input element, but IMHO that could just
be an 'error'. 

I think that would be a nice addition to the standard
library, and 'split' (or 'chop') would simply be specialized
versions of this one.

Peter