Advance notice that I'd like to make Cabal depend on parsec

Thu Mar 14 18:02:39 CET 2013

On Thu, 2013-03-14 at 09:39 -0700, Jason Dagit wrote:

> > Why did I choose parsec? Practicality dictates that I can only use
> > things in the core libraries, and the nearest thing we have to that is
> > the parser lib that is in the HP. I tried to use happy but I could not
> > construct a grammar/lexer combo to handle the layout (also, happy is not
> > exactly known for its great error messages).
> >
> 
> Failed attempt aside for a moment, I think you should reconsider happy. Can
> you learn how to do layout from reading the GHC source?

Yes I looked at it, though Haskell's layout is a bit different.

> The happy documentation that explains how to attach a monad (you could
> use it to communicate between alex and happy for layout info) is a bit
> misleading but I have examples I can share with you.

Yes, that's what I was doing. I've used happy with monadic lexers with
feedback between the lexer and parser before, e.g. when I wrote the C
parser now used in language-c.

> I haven't specifically tackled the layout problem but I could try to
> make a parser if it would help.
> 
> One major benefit of using happy is that the productions of the grammar can
> be analyzed for shift/shift and shift/reduce conflicts.

Right, I know and that's great. For example there's no way I could have
extended the C89 grammar I started with to cover C99 and GNU C
extensions without the aid of that analysis.

In this case I could not for the life of me construct a grammar that
didn't have conflicts.

Now it's plausible that now that I have worked out a grammar using
parsec that I could have another go with happy and make it work, though
I'd have to do the layout rather differently from how I do it with
parsec. I was so pleased to finally have something work, I didn't feel
like going back and trying it with happy again.

I'd be happy to show you the code I've got with parsec and you can have
a go with happy.

> The equivalent analysis doesn't appear to be possible in parsec. In
> theory, applicative parsers should allow for this but my understanding
> is that parsec does not have this feature for its applicative subset.

Right, it doesn't.

> Other benefits are: a) GHC can certainly use parers generated by it, b) the
> generated code uses common dependencies, c) it's fast, d) it's expressive.

Yes, I started with happy for all those reasons. The speed isn't a
problem here. I'm using a fast lexer using alex and profiling indicates
that still almost all the time is spent in the lexer and very little in
the parser. (And that's after I submitted a patch to alex which gets us
a 30% perf improvement.)

About dependencies. So if we got it working with happy, there is still
the issue that we need to parse the individual fields. The way
the .cabal (and other files like ghc-pkg input files) work is that we
parse the outline and then use individual parsers on the fields. For the
latter we use a type class with a parser and pretty printer. That
approach using a type class more or less requires that we use a parser
combinator approach, rather than a monolithic happy style parser. And
it's actually the field parsers that are a large part of the problem:
they give us no error messages and their performance is atrocious
(that's where we get the massive memory blowups). I think happy just
isn't suitable there, so I'd want to use parsec (or any other decent
combinator lib) for that part anyway.

> What is it about happy parser errors that you don't like? Do you know
> examples where parsec does a better job?

Happy doesn't really give parser errors at all as such. It tells you
where it failed and you can poke at the token stream and do what you
like. It doesn't tell you what production you're in, what set of tokens
it was expecting, nothing. Parsec tells us what tokens it was expecting
and it tells us what production it was in and it has code to take that
info and generate reasonable error messages from it (which I've extended
to include the line in question and a visual position indicator).

The reason ghc's parser error messages are so bad is exactly because
happy doesn't really give us anything to work with. See frown for an
example of how we can do better, while still using an LALR(1) approach.

Duncan