Parser.y rewrite with parser combinators

Vanessa McHale vanessa.mchale at iohk.io
Tue Oct 9 03:46:54 UTC 2018


I actually have some experience in this department, having authored both
madlang <http://hackage.haskell.org/package/madlang> and language-ats
<http://hackage.haskell.org/package/language-ats>. Parsers using
combinators alone are more brittle than parsers using Happy, at least
for human-facing languages.

I'm also not sure what exactly parser combinators provide over Happy. It
has macros that can emulate e.g. between, many. Drawing up a minimal
example might be a good idea.


On 10/08/2018 05:24 PM, Vladislav Zavialov wrote:
>> complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else
> That's true regardless of implementation technique, parsers are rather
> delicate. A LALR-based parser generator does provide more information
> when it detects shift/reduce and reduce/reduce conflicts, but I never
> found this information useful. It was always quite the opposite of
> being helpful - an indication that a LALR parser could not handle my
> change and I had to look for workarounds.
>
>> With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
> Even when doing modifications to Parser.y, I relied mainly on the test
> suite to determine whether my change was right (and the test suite
> always caught many issues). A large test suite is the best approach
> both for 'happy'-based parsers and for combinator-based parsers.
>
>> and then have a separate pass that validates and fixes up the results
> That's where my concern lies. This separate pass is confusing (at
> least for me - it's not the most straightforward thing to parse
> something incorrectly and then restructure it), it is hard to modify,
> it does not handle corner cases (e.g. #1087).
>
> Since we have all this Haskell code that does a significant portion of
> processing, why even bother with having a LALR pass before it?
>
>> namely we can report better parse errors
> I don't think that's true, we can achieve better error messages with
> recursive descent.
>
>> Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
> Not completely true, GhcPs-parametrized structures are the final
> output of parsing, so at least the renamer will face these
> constructors.
>
> On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <iavor.diatchki at gmail.com> wrote:
>> Hello,
>>
>> my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else.   While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address.   With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
>>
>> I think the current approach is actually quite reasonable:  use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results.   While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors.  Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
>>
>> -Iavor
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <ghc-devs at haskell.org> wrote:
>>> I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.
>>>
>>> It's a tricky area though; e.g. the layout rule.
>>>
>>> Worth talking to Simon Marlow.
>>>
>>> Simon
>>>
>>>
>>>
>>> | -----Original Message-----
>>> | From: ghc-devs <ghc-devs-bounces at haskell.org> On Behalf Of Vladislav
>>> | Zavialov
>>> | Sent: 08 October 2018 21:44
>>> | To: ghc-devs <ghc-devs at haskell.org>
>>> | Subject: Parser.y rewrite with parser combinators
>>> |
>>> | Hello devs,
>>> |
>>> | Recently I've been working on a couple of parsing-related issues in
>>> | GHC. I implemented support for the -XStarIsType extension, fixed
>>> | parsing of the (!) type operator (Trac #15457), allowed using type
>>> | operators in existential contexts (Trac #15675).
>>> |
>>> | Doing these tasks required way more engineering effort than I expected
>>> | from my prior experience working with parsers due to complexities of
>>> | GHC's grammar.
>>> |
>>> | In the last couple of days, I've been working on Trac #1087 - a
>>> | 12-year old parsing bug. After trying out a couple of approaches, to
>>> | my dismay I realised that fixing it properly (including support for
>>> | bang patterns inside infix constructors, etc) would require a complete
>>> | rewrite of expression and pattern parsing logic.
>>> |
>>> | Worse yet, most of the work would be done outside Parser.y in Haskell
>>> | code instead, in RdrHsSyn helpers. When I try to keep the logic inside
>>> | Parser.y, in every design direction I face reduce/reduce conflicts.
>>> |
>>> | The reduce/reduce conflicts are the worst.
>>> |
>>> | Perhaps it is finally time to admit that Haskell syntax with all of
>>> | the GHC cannot fit into a LALR grammar?
>>> |
>>> | The extent of hacks that we have right now just to make parsing
>>> | possible is astonishing. For instance, we have dedicated constructors
>>> | in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
>>> | EViewPat, ELazyPat). That is, one of the fundamental types (that the
>>> | type checker operates on) has four additional constructors that exist
>>> | due to a reduce/reduce conflict between patterns and expressions.
>>> |
>>> | I propose a complete rewrite of GHC's parser to use recursive descent
>>> | parsing with monadic parser combinators.
>>> |
>>> | 1. We could significantly simplify parsing logic by doing things in a
>>> | more direct manner. For instance, instead of parsing patterns as
>>> | expressions and then post-processing them, we could have separate
>>> | parsing logic for patterns and expressions.
>>> |
>>> | 2. We could fix long-standing parsing bugs like Trac #1087 because
>>> | recursive descent offers more expressive power than LALR (at the cost
>>> | of support for left recursion, which is not much of a loss in
>>> | practice).
>>> |
>>> | 3. New extensions to the grammar would require less engineering effort.
>>> |
>>> | Of course, this rewrite is a huge chunk of work, so before I start, I
>>> | would like to know that this work would be accepted if done well.
>>> | Here's what I want to achieve:
>>> |
>>> | * Comparable performance. The new parser could turn out to be faster
>>> | because it would do less post-processing, but it could be slower
>>> | because 'happy' does all the sorts of low-level optimisations. I will
>>> | consider this project a success only if comparable performance is
>>> | achieved.
>>> |
>>> | * Correctness. The new parser should handle 100% of the syntactic
>>> | constructs that the current parser can handle.
>>> |
>>> | * Error messages. The new error messages should be of equal or better
>>> | quality than existing ones.
>>> |
>>> | * Elegance. The new parser should bring simplification to other parts
>>> | of the compiler (e.g. removal of pattern constructors from HsExpr).
>>> | And one of the design principles is to represent things by dedicated
>>> | data structures, in contrast to the current state of affairs where we
>>> | represent patterns as expressions, data constructor declarations as
>>> | types (before D5180), etc.
>>> |
>>> | Let me know if this is a good/acceptable direction of travel. That's
>>> | definitely something that I personally would like to see happen.
>>> |
>>> | All the best,
>>> | - Vladislav
>>> | _______________________________________________
>>> | ghc-devs mailing list
>>> | ghc-devs at haskell.org
>>> | https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask
>>> | ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
>>> | devs&data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
>>> | 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
>>> | &sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&reserved=
>>> | 0
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20181008/53a33047/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20181008/53a33047/attachment-0001.sig>


More information about the ghc-devs mailing list