[Haskell-cafe] Question on Lexing Haskell syntax

Brandon Allbery allbery.b at gmail.com
Wed Nov 1 00:32:12 UTC 2023


Feedback between lexer and parser isn't exactly unusual. Consider that
parsing a C `typedef` generally needs to feed back to the lexer so uses
will be recognized properly.

On Wed, Nov 1, 2023 at 12:28 AM Oleg Grenrus <oleg.grenrus at iki.fi> wrote:

> Yes, the "communication between lexer and parser" is exactly what GHC does.
>
> Amelia has a nice post about it
> https://amelia.how/posts/parsing-layout.html which made it click it for
> me.
>
> Note, you don't actually need to use alex and happy, you can do
> hand-written lexer and parsec (or alex and parsec, ...). The key insight is
> to have stateful lexer, and control it from the parser.
>
> Amelia's post grammar is a bit too strict, e.g. GHC accepts real semis in
> virtual layout, and also empty "statements" in between, so we can write
>
>    \x y z -> case x of True -> y;;;;;; False -> z
>
> but that's easy (at least in parsec) to adjust the parser grammar to
> accept those.
>
> Or, you can *approximate* the parse-error rule with "alternative layout
> rule" [1], which can be implemented as a pass between lexing and parsing,
> or as a stateful lexer (but in this case parser won't need to adjust
> lexer's state). GHC has an undocumented AlternativeLayoutRule extension, so
> you can experiment with it to see what it accepts (look for tests in GHC
> source for examples). It handles let-in bindings well enough.
>
> [1] https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html
> which can be imp
>
> - Oleg
>
> On 1.11.2023 0.31, Travis Athougies wrote:
>
> According to the Haskell report [1] (See Note 5), a virtual `}` token
> is inserted if parsing the next token would cause a parse error and the
> indentation stack is non-empty.
>
> I'm trying to lex and parse Haskell source and this sort of interplay
> (which requires two-way communication between lexer and parser) makes
> it very difficult to write a conformant implementation.
>
> I can't change the standard (obviously), but I'm wondering if this is
> actually what GHC (de facto the only Haskell compiler) does, or if it
> applies some other rule. If so, does anyone know the exact mechanism of
> its implementation?
>
> I've been programming Haskell for more than a decade, and while I have
> an intuitive understanding of the indentation rules, I would have
> assumed the source could be lexed without also having a parser. In
> particular, the note seems to imply that the main purpose of this is to
> properly lex `let`/`in` bindings. Perhaps there's an alternate
> equivalent rule?
>
> Curious to hear other's thoughts.
>
> Travis
>
> [1]
>
> https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3
>
> > _______________________________________________ > Haskell-Cafe mailing
> list > To (un)subscribe, modify options or view archives go to: >
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe > Only
> members subscribed via the mailman list are allowed to post.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.



-- 
brandon s allbery kf8nh
allbery.b at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20231101/142266e7/attachment.html>


More information about the Haskell-Cafe mailing list