Parser.y rewrite with parser combinators

Iavor Diatchki iavor.diatchki at gmail.com
Mon Oct 8 22:00:14 UTC 2018


Hello,

my experience with complex parsers written using parsing combinators is
that they tend to be quite difficult to modify and have any kind of
assurance that now you haven't broken something else.   While reduce-reduce
errors are indeed annoying, you at least know that there is some sort of
issue you need to address.   With a combinator based parser, you basically
have to do program verification, or more pragmatically, have a large test
suite and hope that you tested everything.

I think the current approach is actually quite reasonable:  use the Happy
grammar to parse out the basic structure of the program, without trying to
be completely precise, and then have a separate pass that validates and
fixes up the results.   While this has the draw-back of some constructors
being in the "wrong place", there are also benefits---namely we can report
better parse errors.  Also, with the new rewrite of HsSyn, we should be
able to mark such constructors as only usable in the parsing pass, so later
passes wouldn't need to worry about them.

-Iavor











On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <
ghc-devs at haskell.org> wrote:

> I'm no parser expert, but a parser that was easier to understand and
> modify, and was as fast as the current one, sounds good to me.
>
> It's a tricky area though; e.g. the layout rule.
>
> Worth talking to Simon Marlow.
>
> Simon
>
>
>
> | -----Original Message-----
> | From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Vladislav
> | Zavialov
> | Sent: 08 October 2018 21:44
> | To: ghc-devs <ghc-devs at haskell.org>
> | Subject: Parser.y rewrite with parser combinators
> |
> | Hello devs,
> |
> | Recently I've been working on a couple of parsing-related issues in
> | GHC. I implemented support for the -XStarIsType extension, fixed
> | parsing of the (!) type operator (Trac #15457), allowed using type
> | operators in existential contexts (Trac #15675).
> |
> | Doing these tasks required way more engineering effort than I expected
> | from my prior experience working with parsers due to complexities of
> | GHC's grammar.
> |
> | In the last couple of days, I've been working on Trac #1087 - a
> | 12-year old parsing bug. After trying out a couple of approaches, to
> | my dismay I realised that fixing it properly (including support for
> | bang patterns inside infix constructors, etc) would require a complete
> | rewrite of expression and pattern parsing logic.
> |
> | Worse yet, most of the work would be done outside Parser.y in Haskell
> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
> | Parser.y, in every design direction I face reduce/reduce conflicts.
> |
> | The reduce/reduce conflicts are the worst.
> |
> | Perhaps it is finally time to admit that Haskell syntax with all of
> | the GHC cannot fit into a LALR grammar?
> |
> | The extent of hacks that we have right now just to make parsing
> | possible is astonishing. For instance, we have dedicated constructors
> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
> | type checker operates on) has four additional constructors that exist
> | due to a reduce/reduce conflict between patterns and expressions.
> |
> | I propose a complete rewrite of GHC's parser to use recursive descent
> | parsing with monadic parser combinators.
> |
> | 1. We could significantly simplify parsing logic by doing things in a
> | more direct manner. For instance, instead of parsing patterns as
> | expressions and then post-processing them, we could have separate
> | parsing logic for patterns and expressions.
> |
> | 2. We could fix long-standing parsing bugs like Trac #1087 because
> | recursive descent offers more expressive power than LALR (at the cost
> | of support for left recursion, which is not much of a loss in
> | practice).
> |
> | 3. New extensions to the grammar would require less engineering effort.
> |
> | Of course, this rewrite is a huge chunk of work, so before I start, I
> | would like to know that this work would be accepted if done well.
> | Here's what I want to achieve:
> |
> | * Comparable performance. The new parser could turn out to be faster
> | because it would do less post-processing, but it could be slower
> | because 'happy' does all the sorts of low-level optimisations. I will
> | consider this project a success only if comparable performance is
> | achieved.
> |
> | * Correctness. The new parser should handle 100% of the syntactic
> | constructs that the current parser can handle.
> |
> | * Error messages. The new error messages should be of equal or better
> | quality than existing ones.
> |
> | * Elegance. The new parser should bring simplification to other parts
> | of the compiler (e.g. removal of pattern constructors from HsExpr).
> | And one of the design principles is to represent things by dedicated
> | data structures, in contrast to the current state of affairs where we
> | represent patterns as expressions, data constructor declarations as
> | types (before D5180), etc.
> |
> | Let me know if this is a good/acceptable direction of travel. That's
> | definitely something that I personally would like to see happen.
> |
> | All the best,
> | - Vladislav
> | _______________________________________________
> | ghc-devs mailing list
> | ghc-devs at haskell.org
> |
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
> | devs&data=02%7C01%7Csimonpj%40microsoft.com
> %7C19181de5c6bd493ab07a08d
> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
> | &sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&reserved=
> | 0
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20181008/a590596e/attachment.html>


More information about the ghc-devs mailing list