[Haskell-cafe] Haskell grammar ambiguity
Li-yao Xia
lysxia at gmail.com
Wed Jun 16 06:45:22 UTC 2021
Hello Café,
With the BlockArguments proposal[1], the formal grammar of Haskell has
become much nicer, with all of the "boring" expression constructs (i.e.,
other than function or operator application) under one non-terminal:
https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0090-block-arguments.rst#proposed-change-specification
Its one flaw, inherited from Haskell2010, is ambiguity. I wonder whether
that ambiguity is necessary.
Currently, a meta-rule fixes that flaw:
> The grammar is ambiguous regarding the extent of lambda abstractions,
let expressions, and conditionals. The ambiguity is resolved by the
meta-rule that each of these constructs extends as far to the right as
possible.
That is quite informal, so interpreting it takes a bit of thinking.
1. Does the rule make sense? That seems quite likely, but at the same
time it seems to depend on some good behavior of the grammar. Like,
aren't there grammars where such an "extend far to the right" meta-rule
would itself be ambiguous? What's the key criterion for that meta-rule
to make sense?
2. How to implement this rule? It might be obvious, with shift-reduce
parsing ("shift always"). It also seems feasible with Earley parsing
(the only other technique I am familiar with). But, either way, you have
to dig into the internals of the chosen parsing method to resolve
ambiguities.
Instead, what if the grammar were unambiguous to start with? I share an
attempt at the bottom of this email. Which brings me to my main two
questions:
3. Is this grammar unambiguous?
4. Does it match the meta-rule?
I also can't help but think about how it might look like in a
hypothetical language report.
5. What are the pros and cons of using a ambiguous or unambiguous
grammar for a programming language standard?
A plausible starting point is thus:
- An ambiguous grammar may be more readable (since there are strictly
more choices available), especially if you care more about the AST than
the concrete syntax.
- An unambiguous grammar is easier to implement: just throw it into an
Earley parser.
This unambiguous grammar can somewhat easily be "flattened" into the
ambiguous one from the report. That challenges the assumption that
ambiguous grammars are generally more readable. In fact, the current
ambiguous grammar already exposes other concrete syntax concerns (via
the distinction between exp, infixexp, fexp, and aexp). In contrast,
turning an ambiguous grammar into an unambiguous one takes a bit of
effort (when it's possible at all). So if there are ambiguities worth
keeping, is there a clear threshold?
One other possible issue is that non-ambiguity is a fragile property, so
extending the grammar requires extra care. Furthermore, even if the
updated grammar is unambiguous, it might not be obvious that it still
encodes the intent behind the meta-rule. That meta-rule may still be
worth clarifying as a requirement for language extensions at least.
Haskell syntax can be pretty, but the process behind it is a wee bit
complicated.
Regards,
Li-yao
---
# Unambiguous grammar for expressions (some rules omitted for brevity)
# Compared to Haskell2010+BlockArguments, we move the rules for if,
lambdas and lets into their own non-terminal rexp, and add four extra
rules in exp, ensuring that rexp occurs only to the right end of
function/operator applications.
# Possibly simplifiable more
# * and + represent repetition.
# Expressions.
exp ->
iexp "::" type
| iexp
# EXTRA RULES with rexp
| rexp
| iexp qop rexp
| iexp qop frexp
| frexp
# EXTRA SYMBOL rexp
# Expressions with right-recursive syntax.
rexp ->
"if" exp "then" exp "else" exp
| "\\" var "->" exp
| "let" decls "in" exp
# EXTRA (AUXILIARY) SYMBOL frexp
# Function application with a trailing block argument.
frexp -> aexp aexp* rexp
# Infix expression (was infixexp). OLD
iexp -> fexp (qop fexp)*
# Optional function applications. OLD
fexp -> aexp
| aexp aexp+
# Atomic expressions. OLD with removed if/lambda/let
aexp -> qvar
| "[" "]"
| "(" ")"
| qcon
| "(" exp ")"
| "do" "{" stmts "}"
| "case" exp "of" "{" alts "}"
More information about the Haskell-Cafe
mailing list