Question about Haskell lexer and parser.

Simon Marlow simonmar@microsoft.com
Wed, 28 May 2003 10:24:26 +0100


=20
> Question is really about layout rules.
> If the first lexeme of a module is not a "module" keyword, we insert
> {n}, where n is the indentation of first lexeme. Then we apply
> function L to the list of lexemes and stack of layouts:
>=20
>   L ({n}:lexemes) []
>=20
> One of first case definitions of L covers this situation:
>   L ({n}:ts) []
>      | n>0 =3D {:L ts [n] -- n>0 means that layout isn't explicit.
> It's obvious that after our first {n} comes usual lexeme, like Var.
> This situaiton is covered in the following L rule:
>=20
>   L (t:ts) (m:ms)
>      | m /=3D 0 =3D }:L (t:ts) ms -- Haskell Report says that we here
>                               -- should recognize parsing errors
>                               -- but how we can do that in lexer?

This rule only applies if there is a parse error on the next token (t).
The report explains what this means in "Note 5".

It does mean that layout processing can't be done entirely in the lexer,
the parser has to be involved too.  This is what makes parsing layout in
Haskell hard.  Take a look at some of the existing Haskell parsers to
see how it can be done: there are two in GHC's source tree (one in GHC
itself, another in libraries/haskell-src).  Hugs's parser is written in
C but uses similar techniques.  nhc98's parser uses parsing combinators
IIRC.

Cheers,
	Simon