Parser.y rewrite with parser combinators
Simon Peyton Jones
simonpj at microsoft.com
Mon Oct 8 22:04:31 UTC 2018
use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.
Incidentally, we use this for operator fixity and precedence, where the fixup is done in the renamer, and for that purpose it works really well.
From: Iavor Diatchki <iavor.diatchki at gmail.com>
Sent: 08 October 2018 23:00
To: Simon Peyton Jones <simonpj at microsoft.com>
Cc: vlad.z.4096 at gmail.com; ghc-devs <ghc-devs at haskell.org>
Subject: Re: Parser.y rewrite with parser combinators
my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else. While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address. With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
I think the current approach is actually quite reasonable: use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results. While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors. Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>> wrote:
I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.
It's a tricky area though; e.g. the layout rule.
Worth talking to Simon Marlow.
| -----Original Message-----
| From: ghc-devs <ghc-devs-bounces at haskell.org<mailto:ghc-devs-bounces at haskell.org>> On Behalf Of Vladislav
| Sent: 08 October 2018 21:44
| To: ghc-devs <ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>>
| Subject: Parser.y rewrite with parser combinators
| Hello devs,
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
| The reduce/reduce conflicts are the worst.
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| 3. New extensions to the grammar would require less engineering effort.
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
| All the best,
| - Vladislav
| ghc-devs mailing list
| ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
ghc-devs mailing list
ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ghc-devs