[Haskell-cafe] Parsec - separating Parsing from Lexing
Ryan Ingram
ryani.spam at gmail.com
Mon Nov 30 15:21:34 EST 2009
Hi Fernando. I tried this approach for a toy language as well, and I
was unhappy with it.
I have found that, with Parsec, it is best to *not* split your parsing
completely into "tokenization" and "parsing" phases, but rather to
interleave them. Instead of
> tokenize :: Parser [MJVal]
make
> token :: Parser MJVal
Then use something like the following:
> tokenSatisfies :: (MJVal -> Bool) -> Parser MJVal
> tokenSatisfies f = try $ do
> t <- token
> if (f t) then return t else fail "No parse"
> program :: Parser Program
> program = do
> tokenSatisfies (== Program_)
> programName <- identifier
> -- etc.
> return $ Program i ...
There is a bit of an inefficiency using "try"; you'll reparse the same
token multiple times for each failure branch of a "choice" branch, but
I've found this to be the simplest solution and parsing time rarely
dominates your running time.
-- ryan
On Tue, Nov 10, 2009 at 11:23 AM, Fernando Henrique Sanches
<fernandohsanches at gmail.com> wrote:
> Hello.
> I'm currently implementing a MicroJava compiler for a college assignment
> (the implemented language was defined, the implementation language was of
> free choice).
> I've sucessfully implemented the lexer using Parsec. It has the type String
> -> Parser [MJVal], where MJVal are all the possible tokens.
> However, I don't know how to implement the parser, or at least how to do it
> keeping it distinguished from the lexer.
> For example, the MicroJava grammar specifies:
> Program = "program" ident {ConstDecl | VarDecl | ClassDecl}
> "{" {MethodDecl} "}".
> The natural solution (for me) would be:
> program = do
> string "program"
> programName <- identifier
> ...
> However, I can't do this because the file is already tokenized, what I have
> is something like:
> [Program_, identifier_ "testProgram", lBrace_, ...]
> for the example program:
> program testProgram {
> ...
> How should I implement the parser separated from the lexer? That is, how
> should I parse Tokens instead of Strings in the "Haskell way"?
> Fernando Henrique Sanches
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
More information about the Haskell-Cafe
mailing list