Application letters at the Haskell workshop: suggestion
Sun, 16 Sep 2001 15:30:24 +0200
Marcin 'Qrczak' Kowalczyk wrote (on 16-09-01 09:30 +0000):
> Getting right descriptions of what was expected or unexpected is not
> trivial. For example when there is no separate lexer, we rarely have
> anything besides raw characters as "unexpected". We have something
> more descriptive only if the grammar explicitly calls 'unexpected'
> after successfully parsing something. We really don't want to give
> a message like "expecting 'e', found 'i'" when the real cause is
> "expecting 'then', found 'thickness'".
A bit off-topic, but after some experience using combinator parsers in Haskell
(not just Parsec) where the lexical and syntactical bits were done in the same
grammar, I concluded that the traditional separation between the two, a la Lex
and Yacc, does indeed have its merits for just this reason: by stratifying the
grammar you introduce an abstraction boundary which, I think, agrees better
with the way programmers, at least, have learned to reason about syntax. (And
that boundary is an ideal place to introduce cuts to prevent backtracking.)
IMO, the main advantage to combining the two stages is that you can use the
same formalism and, in the case of Haskell-style parsers, that you can
modularize the grammar into libraries; but viewing lexemes as non-terminals is
mostly a disadvantage.
More generally, one might imagine stratifying a large grammar even further, by
feeding the parser output to another parser. Traditionally we do this to
handle context-sensitive conditions because of limitations in Yacc-style
parser technology; for example, static analyzers and type checkers are usually
context-sensitive. But if your second-stage parser emits abstract syntax
trees, maybe you could have a third-stage parser which emits declaration
blocks or modules.
Frank Atanassow, Information & Computing Sciences, Utrecht University
Padualaan 14, PO Box 80.089, 3508 TB Utrecht, Netherlands
Tel +31 (030) 253-3261 Fax +31 (030) 251-379