prefix operators

Thu Jul 8 04:45:41 EDT 2010

On Thu, Jul 08, 2010 at 07:09:29AM +0000, Simon Peyton-Jones wrote:
> (ie as infix operators) and I have to squizzle around to re-interpret them as prefix operators.  Not very cool.  Something unified would be a Good Thing.

So, after thinking about it some, I think there may be a somewhat
elegant solution. 

The other day I found myself writing a prolog parser in haskell, an
interesting thing about prolog is that it is a pure operator precedence
grammar[1]. Meaning that the entire grammar can be defined by a list of
symbols, their fixities and their priorities. An example of a definition
for a prolog-like language is 
http://www.mercury.csse.unimelb.edu.au/information/doc-release/mercury_ref/Builtin-Operators.html
Now, a really nice thing about operator precedence grammars is that they
admit a very simple linear parsing algorithm[2] and they are quite simple
to understand.

So, Why not utilize the nice properties of this style of grammar when
defining haskell, we already attempt to interpret infix operators in the
grammar BNF proper, but then go and refix them anyway in the fixity
fixups pass, probably with something very similar to an operator
precedence parser. so the idea is basically to get rid of the initial
dummy parsing of expressions altogether and parse expressions as a pure
operator precedence grammar in the fixups pass. This will allow seamless
handling of prefix operators and likely simplify the formal description
of the language to boot.

So, for the most part the grammar will look like it does now, except
when we get to expressions, patterns, and types, we just parse them
uniformly as a sequence of atomic nodes. These may be variables, but
also may be things like infix operators, or even an entire parenthesized
term or case expression. These can be recursive, a parenthesized
expression will itself contain a list of nodes. 

Now, we can simply define the fixity resolution pass as applying the
appropriate operator precedence grammar to each list of nodes, producing
expressions, types, or patterns. The really nice thing is that we are
under no obligation to use the same operator precedence grammar for each
context, we can always tell from the parsing context whether we are parsing
a type, expression, or pattern, so we just use the appropriate grammar,
for instance, we will augment the grammar with the prefix '~' in
patterns, and the prefix '!' (for strictness) in types. '!' can be
defined as prefix in patterns and infix in expressions simply by using a
slightly different precedence table when interpreting them. This also
makes very clear how user defined fixities are used, they are simply
appended to the precedence table in this pass. Turning on and off bang
patterns with a switch is also extremely easy, just omit that prefix
operator from the table when they are switched off. no need to mess with
the lexer or the parser.

I also suspect we can produce much better error messages with this
strategy.

        John

[1]  http://en.wikipedia.org/wiki/Operator-precedence_parser 
[2] http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

-- 
John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/