[Haskell-cafe] Token parsers in parsec consume trailing whitespace

Sat Dec 12 15:03:49 EST 2009

I recently ran into a nasty little bug where token parsers in
parsec consume trailing whitespace, which were consuming newlines
and thus bamboozling a higher-level "sepBy" combinator.  I
replaced my instances of 'natural' with 'read <$> many1 digit',
but this gave rise to the following questions:

1. Is there a more elegant way of doing number parsing?  In
particular, are there token parsers that don't consume trailing
whitespace, or is there a better way to do this with the
primitives.

2. It seems that the "token" approach of parsing lends itself
to a different style of parsing than the one I'm doing, namely,
instead of assuming all of your parsers consume exactly what
they need, and no more, you assume that they consume what they
need and spaces.  Thus, code that looks like:

    do
        foo <- fooParser
        spaces
        bar <- barParser
        spaces
        baz <- bazParser
        return $ FooBarBaz foo bar baz

becomes:

    FooBarBaz <$> fooParser <*> barParser <*> bazParser

And instead of using sepBy you just use many.  One of the problems
I see with this approach is if I was using sepBy newline, the new
token oriented parser has no way of distinguishing
"foo bar baz\nfoo bar baz" from "foo bar baz foo bar baz", which
is something I might want to care about.

Which method do you prefer?

3. Not so much a question as a comment: when parsing entire files,
be sure to add the eof combinator at the end!

Cheers,
Edward