Character escape codes in Parsec

Sat Mar 29 17:02:22 EDT 2008

Hi all,

I'm using the version of Text.ParserCombinators.Parsec.Token that
ships with GHC 6.8.2. I'm trying to use the stringLiteral parser that
makeTokenParser uses by default to parse strings that may contain hex
ASCII escape codes of the form:
\x [hex-digit] [hex-digit]

For example, "\x0a" should parse to "\n". On the other hand, "\x0aE"
should parse to "\nE".

I don't get this behavior with Parsec, though: "\x0aE" parses to "\174".

I can tell from looking at the source that a hex escape code is
defined to be "\x" followed by an arbitrary-length sequence of hex
digits. I think this is wrong, because in ASCII, such an escape
sequence has exactly two digits.

I can't exactly tell from looking at the source, but is the default
token parser in Parsec supposed to be parsing ASCII strings? If so,
isn't this a bug in Parsec? If not, and it's meant to be able to
handle Unicode or something, I think the documentation should be
clearer.

I'd much welcome any enlightenment.

Thanks,
Tim

-- 
Tim Chevalier * http://cs.pdx.edu/~tjc * Often in error, never in doubt
"and the things I'm working on are invisible to everyone"--Meg Hutchinson