<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
Yes, the "communication between lexer and parser" is exactly what
GHC does.<br>
<br>
Amelia has a nice post about it
<a class="moz-txt-link-freetext" href="https://amelia.how/posts/parsing-layout.html">https://amelia.how/posts/parsing-layout.html</a> which made it click it
for me.<br>
<br>
Note, you don't actually need to use alex and happy, you can do
hand-written lexer and parsec (or alex and parsec, ...). The key
insight is to have stateful lexer, and control it from the parser.<br>
<br>
Amelia's post grammar is a bit too strict, e.g. GHC accepts real
semis in virtual layout, and also empty "statements" in between, so
we can write<br>
<br>
\x y z -> case x of True -> y;;;;;; False -> z<br>
<br>
but that's easy (at least in parsec) to adjust the parser grammar to
accept those.<br>
<p>Or, you can *approximate* the parse-error rule with "alternative
layout rule" [1], which can be implemented as a pass between
lexing and parsing, or as a stateful lexer (but in this case
parser won't need to adjust lexer's state). GHC has an
undocumented AlternativeLayoutRule extension, so you can
experiment with it to see what it accepts (look for tests in GHC
source for examples). It handles let-in bindings well enough.<br>
<br>
[1]
<a class="moz-txt-link-freetext" href="https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html">https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html</a>
which can be imp<br>
<br>
- Oleg<br>
<br>
</p>
On 1.11.2023 0.31, Travis Athougies wrote:<br>
<blockquote type="cite">According to the Haskell report [1] (See
Note 5), a virtual `}` token<br>
is inserted if parsing the next token would cause a parse error
and the<br>
indentation stack is non-empty.<br>
<br>
I'm trying to lex and parse Haskell source and this sort of
interplay<br>
(which requires two-way communication between lexer and parser)
makes<br>
it very difficult to write a conformant implementation.<br>
<br>
I can't change the standard (obviously), but I'm wondering if this
is<br>
actually what GHC (de facto the only Haskell compiler) does, or if
it<br>
applies some other rule. If so, does anyone know the exact
mechanism of<br>
its implementation?<br>
<br>
I've been programming Haskell for more than a decade, and while I
have<br>
an intuitive understanding of the indentation rules, I would have<br>
assumed the source could be lexed without also having a parser. In<br>
particular, the note seems to imply that the main purpose of this
is to<br>
properly lex `let`/`in` bindings. Perhaps there's an alternate<br>
equivalent rule?<br>
<br>
Curious to hear other's thoughts.<br>
<br>
Travis<br>
<br>
[1]<br>
<a class="moz-txt-link-freetext" href="https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3">https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3</a><br>
</blockquote>
<span style="white-space: pre-wrap; display: block; width: 98vw;">> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> <a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a>
> Only members subscribed via the mailman list are allowed to post.
</span><br>
</body>
</html>