Parser.y rewrite with parser combinators
Simon Peyton Jones
simonpj at microsoft.com
Mon Oct 8 21:26:31 UTC 2018
I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.
It's a tricky area though; e.g. the layout rule.
Worth talking to Simon Marlow.
Simon
| -----Original Message-----
| From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs <ghc-devs at haskell.org>
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
|
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
|
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
|
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
|
| All the best,
| - Vladislav
| _______________________________________________
| ghc-devs mailing list
| ghc-devs at haskell.org
| https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
| ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
| 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
| &sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&reserved=
| 0
More information about the ghc-devs
mailing list