Parser.y rewrite with parser combinators
Vladislav Zavialov
vlad.z.4096 at gmail.com
Mon Oct 8 20:44:06 UTC 2018
Hello devs,
Recently I've been working on a couple of parsing-related issues in
GHC. I implemented support for the -XStarIsType extension, fixed
parsing of the (!) type operator (Trac #15457), allowed using type
operators in existential contexts (Trac #15675).
Doing these tasks required way more engineering effort than I expected
from my prior experience working with parsers due to complexities of
GHC's grammar.
In the last couple of days, I've been working on Trac #1087 - a
12-year old parsing bug. After trying out a couple of approaches, to
my dismay I realised that fixing it properly (including support for
bang patterns inside infix constructors, etc) would require a complete
rewrite of expression and pattern parsing logic.
Worse yet, most of the work would be done outside Parser.y in Haskell
code instead, in RdrHsSyn helpers. When I try to keep the logic inside
Parser.y, in every design direction I face reduce/reduce conflicts.
The reduce/reduce conflicts are the worst.
Perhaps it is finally time to admit that Haskell syntax with all of
the GHC cannot fit into a LALR grammar?
The extent of hacks that we have right now just to make parsing
possible is astonishing. For instance, we have dedicated constructors
in HsExpr to make parsing patterns possible (EWildPat, EAsPat,
EViewPat, ELazyPat). That is, one of the fundamental types (that the
type checker operates on) has four additional constructors that exist
due to a reduce/reduce conflict between patterns and expressions.
I propose a complete rewrite of GHC's parser to use recursive descent
parsing with monadic parser combinators.
1. We could significantly simplify parsing logic by doing things in a
more direct manner. For instance, instead of parsing patterns as
expressions and then post-processing them, we could have separate
parsing logic for patterns and expressions.
2. We could fix long-standing parsing bugs like Trac #1087 because
recursive descent offers more expressive power than LALR (at the cost
of support for left recursion, which is not much of a loss in
practice).
3. New extensions to the grammar would require less engineering effort.
Of course, this rewrite is a huge chunk of work, so before I start, I
would like to know that this work would be accepted if done well.
Here's what I want to achieve:
* Comparable performance. The new parser could turn out to be faster
because it would do less post-processing, but it could be slower
because 'happy' does all the sorts of low-level optimisations. I will
consider this project a success only if comparable performance is
achieved.
* Correctness. The new parser should handle 100% of the syntactic
constructs that the current parser can handle.
* Error messages. The new error messages should be of equal or better
quality than existing ones.
* Elegance. The new parser should bring simplification to other parts
of the compiler (e.g. removal of pattern constructors from HsExpr).
And one of the design principles is to represent things by dedicated
data structures, in contrast to the current state of affairs where we
represent patterns as expressions, data constructor declarations as
types (before D5180), etc.
Let me know if this is a good/acceptable direction of travel. That's
definitely something that I personally would like to see happen.
All the best,
- Vladislav
More information about the ghc-devs
mailing list