Fwd: [Haskell-cafe] Parsing in Practice

Wed Oct 19 02:05:24 EDT 2005

[Sending it again to haskell-cafe]

On 10/18/05, Tom Hawkins <tom at confluent.org> wrote:
> I am writing a parser for a big, ugly, standard language and I need to
> decide between using either Happy or Parsec.

I wrote a parser for a big, ugly, non-standard language - Transact-SQL
from MSSQL.

> I currently have a priliminary LALR(1) grammar, so a port to Happy would
> be relatively easy.

In such situation I would try Happy first, but with some cautiousness.

I also started with LALR(1) parsing in ocamlyacc, but I didn't manage
to cover the whole language with this approach. The more productions
I added the more conflicts I had. Of course I tried to remove the conflicts,
but often when I added new productions I had to repeat the work. It is
possible that my understanding of LALR parsing was insufficient, but the
problem is that the set of LALR grammars is not closed under composition
(as I've read in some paper on GLR parsing).

Do you know that Happy supports GLR parsing?

> But, I'm wondering if life would be easier if I chose Parsec's combinator
> parsing instead.

>From my experience it was indeed quite nice. However, there were some
parts of the grammer that I didn't know how to handle without using "try",
which could result in worse efficiency and error reporting.

> It's error reporting seems to be top notch

So long as you can avoid using "try" for non-terminals.

> and it's "optional", "many", and "sepBy1" combinators are very elegant.

Indeed they are, as they correspond nicely to E in EBNF ;-)

> However, I have a few concerns with Parsec.  First is performance; what
> factor of slow-down should I expect?

I only have a Parsec parser, so I can't compare.  I dimly remember that
my parser had an average efficiency, ie. it could parse about 100kB
per second on P3 850.

There is a parser-combinator version of a parser for Haskell which you
could compare to the one from GHC.

> Second is bug prevention.  I don't have much experience writing LL(n)
> grammars, so how easy is it to introduce bugs in a Parsec grammar?

I always feared that this would be a problem, but it wasn't. I used a
sample of about 1MB of application code to test, and I didn't get to
using QuickCheck.

> Even though I hate debugging LALR(1) parsing ambiguities,
> it prevents problems.

Yes, when you fight the conflicts, you get static guarantees in return.
I wonder how it is with GLR?

Best regards
Tomasz