<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>I actually have some experience in this department, having
authored both <a moz-do-not-send="true"
href="http://hackage.haskell.org/package/madlang">madlang</a>
and <a moz-do-not-send="true"
href="http://hackage.haskell.org/package/language-ats">language-ats</a>.
Parsers using combinators alone are more brittle than parsers
using Happy, at least for human-facing languages.<br>
<br>
I'm also not sure what exactly parser combinators provide over
Happy. It has macros that can emulate e.g. <tt>between</tt>, <tt>many</tt>.
Drawing up a minimal example might be a good idea. <br>
</p>
<br>
<div class="moz-cite-prefix">On 10/08/2018 05:24 PM, Vladislav
Zavialov wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAHZ0NLoxuqN=VSque4ev+cPTJuWb8mJFtPk6vqE1aHr1K5WWkg@mail.gmail.com">
<blockquote type="cite">
<pre wrap="">complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else
</pre>
</blockquote>
<pre wrap="">
That's true regardless of implementation technique, parsers are rather
delicate. A LALR-based parser generator does provide more information
when it detects shift/reduce and reduce/reduce conflicts, but I never
found this information useful. It was always quite the opposite of
being helpful - an indication that a LALR parser could not handle my
change and I had to look for workarounds.
</pre>
<blockquote type="cite">
<pre wrap="">With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
</pre>
</blockquote>
<pre wrap="">
Even when doing modifications to Parser.y, I relied mainly on the test
suite to determine whether my change was right (and the test suite
always caught many issues). A large test suite is the best approach
both for 'happy'-based parsers and for combinator-based parsers.
</pre>
<blockquote type="cite">
<pre wrap="">and then have a separate pass that validates and fixes up the results
</pre>
</blockquote>
<pre wrap="">
That's where my concern lies. This separate pass is confusing (at
least for me - it's not the most straightforward thing to parse
something incorrectly and then restructure it), it is hard to modify,
it does not handle corner cases (e.g. #1087).
Since we have all this Haskell code that does a significant portion of
processing, why even bother with having a LALR pass before it?
</pre>
<blockquote type="cite">
<pre wrap="">namely we can report better parse errors
</pre>
</blockquote>
<pre wrap="">
I don't think that's true, we can achieve better error messages with
recursive descent.
</pre>
<blockquote type="cite">
<pre wrap="">Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
</pre>
</blockquote>
<pre wrap="">
Not completely true, GhcPs-parametrized structures are the final
output of parsing, so at least the renamer will face these
constructors.
On Tue, Oct 9, 2018 at 1:00 AM Iavor Diatchki <a class="moz-txt-link-rfc2396E" href="mailto:iavor.diatchki@gmail.com"><iavor.diatchki@gmail.com></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">
Hello,
my experience with complex parsers written using parsing combinators is that they tend to be quite difficult to modify and have any kind of assurance that now you haven't broken something else. While reduce-reduce errors are indeed annoying, you at least know that there is some sort of issue you need to address. With a combinator based parser, you basically have to do program verification, or more pragmatically, have a large test suite and hope that you tested everything.
I think the current approach is actually quite reasonable: use the Happy grammar to parse out the basic structure of the program, without trying to be completely precise, and then have a separate pass that validates and fixes up the results. While this has the draw-back of some constructors being in the "wrong place", there are also benefits---namely we can report better parse errors. Also, with the new rewrite of HsSyn, we should be able to mark such constructors as only usable in the parsing pass, so later passes wouldn't need to worry about them.
-Iavor
On Mon, Oct 8, 2018 at 2:26 PM Simon Peyton Jones via ghc-devs <a class="moz-txt-link-rfc2396E" href="mailto:ghc-devs@haskell.org"><ghc-devs@haskell.org></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">
I'm no parser expert, but a parser that was easier to understand and modify, and was as fast as the current one, sounds good to me.
It's a tricky area though; e.g. the layout rule.
Worth talking to Simon Marlow.
Simon
| -----Original Message-----
| From: ghc-devs <a class="moz-txt-link-rfc2396E" href="mailto:ghc-devs-bounces@haskell.org"><ghc-devs-bounces@haskell.org></a> On Behalf Of Vladislav
| Zavialov
| Sent: 08 October 2018 21:44
| To: ghc-devs <a class="moz-txt-link-rfc2396E" href="mailto:ghc-devs@haskell.org"><ghc-devs@haskell.org></a>
| Subject: Parser.y rewrite with parser combinators
|
| Hello devs,
|
| Recently I've been working on a couple of parsing-related issues in
| GHC. I implemented support for the -XStarIsType extension, fixed
| parsing of the (!) type operator (Trac #15457), allowed using type
| operators in existential contexts (Trac #15675).
|
| Doing these tasks required way more engineering effort than I expected
| from my prior experience working with parsers due to complexities of
| GHC's grammar.
|
| In the last couple of days, I've been working on Trac #1087 - a
| 12-year old parsing bug. After trying out a couple of approaches, to
| my dismay I realised that fixing it properly (including support for
| bang patterns inside infix constructors, etc) would require a complete
| rewrite of expression and pattern parsing logic.
|
| Worse yet, most of the work would be done outside Parser.y in Haskell
| code instead, in RdrHsSyn helpers. When I try to keep the logic inside
| Parser.y, in every design direction I face reduce/reduce conflicts.
|
| The reduce/reduce conflicts are the worst.
|
| Perhaps it is finally time to admit that Haskell syntax with all of
| the GHC cannot fit into a LALR grammar?
|
| The extent of hacks that we have right now just to make parsing
| possible is astonishing. For instance, we have dedicated constructors
| in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
| EViewPat, ELazyPat). That is, one of the fundamental types (that the
| type checker operates on) has four additional constructors that exist
| due to a reduce/reduce conflict between patterns and expressions.
|
| I propose a complete rewrite of GHC's parser to use recursive descent
| parsing with monadic parser combinators.
|
| 1. We could significantly simplify parsing logic by doing things in a
| more direct manner. For instance, instead of parsing patterns as
| expressions and then post-processing them, we could have separate
| parsing logic for patterns and expressions.
|
| 2. We could fix long-standing parsing bugs like Trac #1087 because
| recursive descent offers more expressive power than LALR (at the cost
| of support for left recursion, which is not much of a loss in
| practice).
|
| 3. New extensions to the grammar would require less engineering effort.
|
| Of course, this rewrite is a huge chunk of work, so before I start, I
| would like to know that this work would be accepted if done well.
| Here's what I want to achieve:
|
| * Comparable performance. The new parser could turn out to be faster
| because it would do less post-processing, but it could be slower
| because 'happy' does all the sorts of low-level optimisations. I will
| consider this project a success only if comparable performance is
| achieved.
|
| * Correctness. The new parser should handle 100% of the syntactic
| constructs that the current parser can handle.
|
| * Error messages. The new error messages should be of equal or better
| quality than existing ones.
|
| * Elegance. The new parser should bring simplification to other parts
| of the compiler (e.g. removal of pattern constructors from HsExpr).
| And one of the design principles is to represent things by dedicated
| data structures, in contrast to the current state of affairs where we
| represent patterns as expressions, data constructor declarations as
| types (before D5180), etc.
|
| Let me know if this is a good/acceptable direction of travel. That's
| definitely something that I personally would like to see happen.
|
| All the best,
| - Vladislav
| _______________________________________________
| ghc-devs mailing list
| <a class="moz-txt-link-abbreviated" href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a>
| <a class="moz-txt-link-freetext" href="https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask">https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.hask</a>
| ell.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fghc-
| devs&data=02%7C01%7Csimonpj%40microsoft.com%7C19181de5c6bd493ab07a08d
| 62d5edbe0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746282778542095
| &sdata=lFRt1t4k3BuuRdyOqwOYTZcLPRB%2BtFJwfFtgMpNLxW0%3D&reserved=
| 0
_______________________________________________
ghc-devs mailing list
<a class="moz-txt-link-abbreviated" href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a>
<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a>
</pre>
</blockquote>
</blockquote>
<pre wrap="">_______________________________________________
ghc-devs mailing list
<a class="moz-txt-link-abbreviated" href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a>
<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a>
</pre>
</blockquote>
</body>
</html>