[Haskell-cafe] Parselib sample

Tue Jun 1 14:45:24 EDT 2010

Hi Kashyap

There's a C parser for Happy (LR) - I long while ago I converted this
to Frown (also LR) - both Happy and Frown are parser generators that
take a grammar description and generate a Haskell module that
implements the Parser. Personally I prefer Frown, I find the input
syntax a bit nicer than Happy but unfortunately Frown has disappeared
since its author Ralf Hinze moved to Oxford.

As I said, the C grammar is generally presented in LR form (e.g. in
the K&R book, in Harbison & Steele and in the ISO spec). Other C like
languages e.g. the GLSL shading language tend to be in LR form as well
- I've an untested Happy parser for GLSL in my source repository.
Using an LR grammar with Parsec or ParseLib would be fatal - your
program would go into an infinite loop on all but the most trivial
input. You would have to rewrite the grammar into LL form - Parsec
(and ParseLib) have some helpers to do this, and quite often you can
write a Parsec grammar quite close to the LR one by using the - many,
many1 - combinators for repetition, plus sometimes the - chain -
combinators. Unfortunately its still quite a bit of effort to convert
to LL - it took me about a days work to write a Parsec parser for
GHC-core from the LR Happy grammar and GHC-core is much simpler than
GLSL or C. Also the Parsec parser was much slower.

Parsec is very powerful - it can handle large lookahead - other LL
parsers tend to be LL(1) where 1 is one token lookahead. But while
Parsec has the power, lookahead is costly and large lookahead will
make the parser slow and use a lot of memory. Parsec can handle
context-sensitive parsing - this is often very useful when writing
parsers where you are writing a parser for an ad-hoc file format
rather than a grammar. You can often do some tricks with
context-sensitive parsing that would take a lot of extra work to do
with context-free parsing. Finally, its commonly used as a
scanner-less parser - where you can use all the features to parse at
the character level rather than delegate to a separate scanner.

I've thought about writing an article for The Monad Reader - moving
from Graham Hutton's parsers to Parsec, if there's any interest I'll
look into doing it. For the time being the main difference is probably
that Parsec parsers generally use the TokenParser module for some of
the combinators that Parselib provides directly. Using the TokenParser
module requires a couple of tricks - import it qualified and re-define
unqualified versions of the parsers you need - int, symbol, identifier
- etc.

Best wishes

Stephen