[Haskell-cafe] "Best practices" for Text

Mon Nov 8 05:28:50 EST 2010

For starters, I'm considering writing a polyparse [1] instance for
Text. However, even with the current Bytestring instances for
polyparse there seems to be an emphasis on character-based parsing.

Polyparse is not very character-oriented at all;  I tend to write a separate lexer, and then write the parsers over an application-specific token-stream.

But ByteStrings are certainly character-oriented, and since many people like to mix lexing with parsing, I included an instance for BS.  I imagine if someone wants to lex directly from a Text rather than a String or a BS, then that process is likely to be very character-oriented as well.

Having said that, on those occasions when I do parse direct from a String-like input, almost all of the parsers use a "word" parser (i.e. multi-character, space-separated) as if it were a primitive.  Such a word parser would almost certainly make heavy use of (break isSpace), unless there is a better alternative in Text?

Is that the correct way of doing things? For example, what would be
the best way to try to parse a text value when you don't care about
case?

When case is irrelevant, I tend to (map toUpper) over both the input stream, and any textual arguments to individual parsers.

Regards,
    Malcolm
-------------- next part --------------
Skipped content of type multipart/related