[Haskell-cafe] Parsec - separating Parsing from Lexing

Ryan Ingram ryani.spam at gmail.com
Mon Nov 30 15:21:34 EST 2009

Hi Fernando.  I tried this approach for a toy language as well, and I
was unhappy with it.

I have found that, with Parsec, it is best to *not* split your parsing
completely into "tokenization" and "parsing" phases, but rather to
interleave them.  Instead of

> tokenize :: Parser [MJVal]


> token :: Parser MJVal

Then use something like the following:

> tokenSatisfies :: (MJVal -> Bool) -> Parser MJVal
> tokenSatisfies f = try $ do
>    t <- token
>    if (f t) then return t else fail "No parse"

> program :: Parser Program
> program = do
>    tokenSatisfies (== Program_)
>    programName <- identifier
>    -- etc.
>    return $ Program i ...

There is a bit of an inefficiency using "try"; you'll reparse the same
token multiple times for each failure branch of a "choice" branch, but
I've found this to be the simplest solution and parsing time rarely
dominates your running time.

  -- ryan

On Tue, Nov 10, 2009 at 11:23 AM, Fernando Henrique Sanches
<fernandohsanches at gmail.com> wrote:
> Hello.
> I'm currently implementing a MicroJava compiler for a college assignment
> (the implemented language was defined, the implementation language was of
> free choice).
> I've sucessfully implemented the lexer using Parsec. It has the type String
> -> Parser [MJVal], where MJVal are all the possible tokens.
> However, I don't know how to implement the parser, or at least how to do it
> keeping it distinguished from the lexer.
> For example, the MicroJava grammar specifies:
> Program = "program" ident {ConstDecl | VarDecl | ClassDecl}
>           "{" {MethodDecl} "}".
> The natural solution (for me) would be:
> program = do
>   string "program"
>   programName <- identifier
>   ...
> However, I can't do this because the file is already tokenized, what I have
> is something like:
> [Program_, identifier_ "testProgram", lBrace_, ...]
> for the example program:
> program testProgram {
> ...
> How should I implement the parser separated from the lexer? That is, how
> should I parse Tokens instead of Strings in the "Haskell way"?
> Fernando Henrique Sanches
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

More information about the Haskell-Cafe mailing list