[Haskell-cafe] Question on Lexing Haskell syntax

Oleg Grenrus oleg.grenrus at iki.fi
Thu Nov 2 21:01:34 UTC 2023


That is, if you want to have distinct tokens for multiplication and 
pointer type asterisks. I don't see why you would need to do that 
distinction in the lexer though. You can just lex * as an asterisk, and 
later in parser figure out what that asterisk meant. The same way 
various parenthesis and brackets, or comma are often overloaded in the 
programming languages, but that doesn't complicate their lexing in any way.

The Haskell indentation is much more complicated.

Your example illustrates that parser cannot operate (decide between 
variable definition or an expression) without also processing typedef 
statements. So C forces part of renaming to be done in parsing. That is 
unfortunate coupling, but it's different coupling.

- Oleg

On 2.11.2023 22.50, Tom Smeding wrote:
> The fun (?) thing about C syntax is that you _cannot_ defer this. 
> Consider the following (invalid) C program:
>
> int main(void) {
>   t * x;
>   int t;
>   t * x;
> }
>
> When I pass this through gcc, what I get is:
>
> file.c: In function ‘main’:
> file.c:2:3: error: unknown type name ‘t’
>     2 |   t * x;
>       |   ^
> file.c:4:5: error: invalid operands to binary * (have ‘int’ and ‘int *’)
>     4 |   t * x;
>       |     ^
>
> The first 't * x' statement was parsed as a declaration of the 
> variable 'x' with as type 't*'. The second such statement was parsed 
> as a multiplication. The difference in behaviour is the declaration of 
> 't' as a variable in between.
>
> When starting this email I thought that the default was the other way 
> round, i.e. 't * x' is parsed as a multiplication unless 't' is 
> defined as a type; this would be accomplished by e.g. 'typedef int 
> t;'. However it seems that the default, at least in gcc 13.2.1, is a 
> variable declaration. Luckily (?), the point stands that to lex C, if 
> you want to distinguish multiplication from the pointer type symbol, 
> you need communication from the parser.
>
> - Tom
>
> On 01/11/2023 01:51, Oleg Grenrus wrote:
>>
>> In C, AFAIU you can (and probably should) defer `typedef` usage 
>> recognition to a separate "renamer/ name resolution" pass. In Haskell 
>> we are forced to do name resolution after parsing, as we don't need 
>> to declare stuff before use. Even so, separate pass is usually a good 
>> idea anyway, you are better equipped to produce good error messages. 
>> In fact GHC does even more: it defers the unbound names reporting to 
>> the type checking phase, so it can give the types to unbound 
>> variables, like:
>>
>>     Prelude> x : "foo"
>>     <interactive>:2:1: error: Variable not in scope: x :: Char
>>
>> - Oleg
>>
>> On 1.11.2023 2.32, Brandon Allbery wrote:
>>> Feedback between lexer and parser isn't exactly unusual. Consider 
>>> that parsing a C `typedef` generally needs to feed back to the lexer 
>>> so uses will be recognized properly.
>>>
>>> On Wed, Nov 1, 2023 at 12:28 AM Oleg Grenrus <oleg.grenrus at iki.fi> 
>>> wrote:
>>>
>>>     Yes, the "communication between lexer and parser" is exactly
>>>     what GHC does.
>>>
>>>     Amelia has a nice post about it
>>>     https://amelia.how/posts/parsing-layout.html which made it click
>>>     it for me.
>>>
>>>     Note, you don't actually need to use alex and happy, you can do
>>>     hand-written lexer and parsec (or alex and parsec, ...). The key
>>>     insight is to have stateful lexer, and control it from the parser.
>>>
>>>     Amelia's post grammar is a bit too strict, e.g. GHC accepts real
>>>     semis in virtual layout, and also empty "statements" in between,
>>>     so we can write
>>>
>>>        \x y z -> case x of True -> y;;;;;; False -> z
>>>
>>>     but that's easy (at least in parsec) to adjust the parser
>>>     grammar to accept those.
>>>
>>>     Or, you can *approximate* the parse-error rule with "alternative
>>>     layout rule" [1], which can be implemented as a pass between
>>>     lexing and parsing, or as a stateful lexer (but in this case
>>>     parser won't need to adjust lexer's state). GHC has an
>>>     undocumented AlternativeLayoutRule extension, so you can
>>>     experiment with it to see what it accepts (look for tests in GHC
>>>     source for examples). It handles let-in bindings well enough.
>>>
>>>     [1]
>>>     https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html
>>>     which can be imp
>>>
>>>     - Oleg
>>>
>>>     On 1.11.2023 0.31, Travis Athougies wrote:
>>>>     According to the Haskell report [1] (See Note 5), a virtual `}`
>>>>     token
>>>>     is inserted if parsing the next token would cause a parse error
>>>>     and the
>>>>     indentation stack is non-empty.
>>>>
>>>>     I'm trying to lex and parse Haskell source and this sort of
>>>>     interplay
>>>>     (which requires two-way communication between lexer and parser)
>>>>     makes
>>>>     it very difficult to write a conformant implementation.
>>>>
>>>>     I can't change the standard (obviously), but I'm wondering if
>>>>     this is
>>>>     actually what GHC (de facto the only Haskell compiler) does, or
>>>>     if it
>>>>     applies some other rule. If so, does anyone know the exact
>>>>     mechanism of
>>>>     its implementation?
>>>>
>>>>     I've been programming Haskell for more than a decade, and while
>>>>     I have
>>>>     an intuitive understanding of the indentation rules, I would have
>>>>     assumed the source could be lexed without also having a parser. In
>>>>     particular, the note seems to imply that the main purpose of
>>>>     this is to
>>>>     properly lex `let`/`in` bindings. Perhaps there's an alternate
>>>>     equivalent rule?
>>>>
>>>>     Curious to hear other's thoughts.
>>>>
>>>>     Travis
>>>>
>>>>     [1]
>>>>     https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3
>>>     > _______________________________________________ > Haskell-Cafe mailing list > To (un)subscribe, modify options
>>>     or view archives go to: >
>>>     http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >
>>>     Only members subscribed via the mailman list are allowed to post.
>>>     _______________________________________________
>>>     Haskell-Cafe mailing list
>>>     To (un)subscribe, modify options or view archives go to:
>>>     http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>>     Only members subscribed via the mailman list are allowed to post.
>>>
>>>
>>>
>>> -- 
>>> brandon s allbery kf8nh
>>> allbery.b at gmail.com
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20231102/c57983e2/attachment.html>


More information about the Haskell-Cafe mailing list