[Haskell-cafe] Question on Lexing Haskell syntax

Oleg Grenrus oleg.grenrus at iki.fi
Wed Nov 1 00:27:57 UTC 2023


Yes, the "communication between lexer and parser" is exactly what GHC does.

Amelia has a nice post about it 
https://amelia.how/posts/parsing-layout.html which made it click it for me.

Note, you don't actually need to use alex and happy, you can do 
hand-written lexer and parsec (or alex and parsec, ...). The key insight 
is to have stateful lexer, and control it from the parser.

Amelia's post grammar is a bit too strict, e.g. GHC accepts real semis 
in virtual layout, and also empty "statements" in between, so we can write

    \x y z -> case x of True -> y;;;;;; False -> z

but that's easy (at least in parsec) to adjust the parser grammar to 
accept those.

Or, you can *approximate* the parse-error rule with "alternative layout 
rule" [1], which can be implemented as a pass between lexing and 
parsing, or as a stateful lexer (but in this case parser won't need to 
adjust lexer's state). GHC has an undocumented AlternativeLayoutRule 
extension, so you can experiment with it to see what it accepts (look 
for tests in GHC source for examples). It handles let-in bindings well 
enough.

[1] https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html 
which can be imp

- Oleg

On 1.11.2023 0.31, Travis Athougies wrote:
> According to the Haskell report [1] (See Note 5), a virtual `}` token
> is inserted if parsing the next token would cause a parse error and the
> indentation stack is non-empty.
>
> I'm trying to lex and parse Haskell source and this sort of interplay
> (which requires two-way communication between lexer and parser) makes
> it very difficult to write a conformant implementation.
>
> I can't change the standard (obviously), but I'm wondering if this is
> actually what GHC (de facto the only Haskell compiler) does, or if it
> applies some other rule. If so, does anyone know the exact mechanism of
> its implementation?
>
> I've been programming Haskell for more than a decade, and while I have
> an intuitive understanding of the indentation rules, I would have
> assumed the source could be lexed without also having a parser. In
> particular, the note seems to imply that the main purpose of this is to
> properly lex `let`/`in` bindings. Perhaps there's an alternate
> equivalent rule?
>
> Curious to hear other's thoughts.
>
> Travis
>
> [1]
> https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3
> _______________________________________________  > Haskell-Cafe mailing list > To (un)subscribe, modify options or view 
archives go to: > 
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe > Only 
members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20231101/42a395c4/attachment.html>


More information about the Haskell-Cafe mailing list