[Haskell-cafe] Haskell grammar ambiguity

Wed Jun 16 06:45:22 UTC 2021

Hello Café,

With the BlockArguments proposal[1], the formal grammar of Haskell has 
become much nicer, with all of the "boring" expression constructs (i.e., 
other than function or operator application) under one non-terminal:

https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0090-block-arguments.rst#proposed-change-specification

Its one flaw, inherited from Haskell2010, is ambiguity. I wonder whether 
that ambiguity is necessary.

Currently, a meta-rule fixes that flaw:

 > The grammar is ambiguous regarding the extent of lambda abstractions, 
let expressions, and conditionals. The ambiguity is resolved by the 
meta-rule that each of these constructs extends as far to the right as 
possible.

That is quite informal, so interpreting it takes a bit of thinking.

1. Does the rule make sense? That seems quite likely, but at the same 
time it seems to depend on some good behavior of the grammar. Like, 
aren't there grammars where such an "extend far to the right" meta-rule 
would itself be ambiguous? What's the key criterion for that meta-rule 
to make sense?

2. How to implement this rule? It might be obvious, with shift-reduce 
parsing ("shift always"). It also seems feasible with Earley parsing 
(the only other technique I am familiar with). But, either way, you have 
to dig into the internals of the chosen parsing method to resolve 
ambiguities.

Instead, what if the grammar were unambiguous to start with? I share an 
attempt at the bottom of this email. Which brings me to my main two 
questions:

3. Is this grammar unambiguous?

4. Does it match the meta-rule?

I also can't help but think about how it might look like in a 
hypothetical language report.

5. What are the pros and cons of using a ambiguous or unambiguous 
grammar for a programming language standard?

A plausible starting point is thus:

- An ambiguous grammar may be more readable (since there are strictly 
more choices available), especially if you care more about the AST than 
the concrete syntax.
- An unambiguous grammar is easier to implement: just throw it into an 
Earley parser.

This unambiguous grammar can somewhat easily be "flattened" into the 
ambiguous one from the report. That challenges the assumption that 
ambiguous grammars are generally more readable. In fact, the current 
ambiguous grammar already exposes other concrete syntax concerns (via 
the distinction between exp, infixexp, fexp, and aexp). In contrast, 
turning an ambiguous grammar into an unambiguous one takes a bit of 
effort (when it's possible at all). So if there are ambiguities worth 
keeping, is there a clear threshold?

One other possible issue is that non-ambiguity is a fragile property, so 
extending the grammar requires extra care. Furthermore, even if the 
updated grammar is unambiguous, it might not be obvious that it still 
encodes the intent behind the meta-rule. That meta-rule may still be 
worth clarifying as a requirement for language extensions at least.

Haskell syntax can be pretty, but the process behind it is a wee bit 
complicated.

Regards,
Li-yao

---

# Unambiguous grammar for expressions (some rules omitted for brevity)

# Compared to Haskell2010+BlockArguments, we move the rules for if, 
lambdas and lets into their own non-terminal rexp, and add four extra 
rules in exp, ensuring that rexp occurs only to the right end of 
function/operator applications.

# Possibly simplifiable more

# * and + represent repetition.

# Expressions.
exp ->
     iexp "::" type
   | iexp
   # EXTRA RULES with rexp
   | rexp
   | iexp qop rexp
   | iexp qop frexp
   | frexp

# EXTRA SYMBOL rexp
# Expressions with right-recursive syntax.
rexp ->
     "if" exp "then" exp "else" exp
   | "\\" var "->" exp
   | "let" decls "in" exp

# EXTRA (AUXILIARY) SYMBOL frexp
# Function application with a trailing block argument.
frexp -> aexp aexp* rexp

# Infix expression (was infixexp). OLD
iexp -> fexp (qop fexp)*

# Optional function applications. OLD
fexp -> aexp
   | aexp aexp+

# Atomic expressions. OLD with removed if/lambda/let
aexp -> qvar
   | "[" "]"
   | "(" ")"
   | qcon
   | "(" exp ")"
   | "do" "{" stmts "}"
   | "case" exp "of" "{" alts "}"