Lexing / Parsing and final token

Alan & Kim Zimmerman alan.zimm at gmail.com
Tue Jan 19 23:03:18 UTC 2021


FYI I did the horrible thing for now, optimisations welcome.

The change is at [1]

Alan

[1]
https://gitlab.haskell.org/ghc/ghc/-/commit/742273a94c187f51e3b143f9c206c42024486ecf?merge_request_iid=2418

On Tue, 19 Jan 2021 at 22:04, Alan & Kim Zimmerman <alan.zimm at gmail.com>
wrote:

> And if there is a comment after the '}' and then more blank lines, the
> last token is a comment.
>
> If no curlies, it is a ITsemi for the last location, after the comment.
>
> So my hacky scheme of using ITsemi as the means to track the last gap is
> not viable.
>
> And I don't want to put extra housekeeping on every token to track two
> tokens back, not just one. Back to the drawing board.
>
> Thanks
>   Alan
>
>
> On Tue, 19 Jan 2021 at 21:59, Richard Eisenberg <rae at richarde.dev> wrote:
>
>> So, I think there's your answer: the last token might be ITccurly, not
>> ITsemi. It seems that the "insert invisible curlies and semis" is taken
>> more literally for semis than for curlies.
>>
>> Richard
>>
>> On Jan 19, 2021, at 4:58 PM, Alan & Kim Zimmerman <alan.zimm at gmail.com>
>> wrote:
>>
>> Changing it to remove the final ';' gives a last token of ITccurly.
>>
>> Changing it to
>>
>> module Bug where
>> x = 5
>> y = 6
>>
>> Gives a last token of ITsemi.
>>
>> Alan
>>
>> On Tue, 19 Jan 2021 at 21:50, Richard Eisenberg <rae at richarde.dev> wrote:
>>
>>> That's bizarre. Does it still happen with explicit braces?
>>>
>>> Just to test, I tried
>>>
>>> module Bug where {
>>> x = 5;
>>> y = 6;
>>> };
>>>
>>> and GHC rejected because of the trailing ;.
>>>
>>> Richard
>>>
>>> > On Jan 19, 2021, at 4:35 PM, Alan & Kim Zimmerman <alan.zimm at gmail.com>
>>> wrote:
>>> >
>>> > I am (still) working on !2418 to bring the API Annotations into the
>>> GHC ParsedSource, and making good progress.
>>> >
>>> > I am currently making a rough port of ghc-exactprint, to ensure I can
>>> get all the tests around modifying the AST to work.
>>> >
>>> > One of the last pieces is being able to capture the spacing from the
>>> last token in the file to the EOF.  I guess technically it is the second
>>> last token.
>>> >
>>> > Empirically (calling getTokenStream), it seems this is always ITsemi.
>>> I am not sure how this comes about, as the `module` parsing rule in
>>> Parser.y ends with body or body2, and those both finish with an actual or
>>> virtual '}'.
>>> >
>>> > Can I rely on the token before ITEof always being ITsemi?
>>> >
>>> > Alan
>>> > _______________________________________________
>>> > ghc-devs mailing list
>>> > ghc-devs at haskell.org
>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210119/e126bb32/attachment.html>


More information about the ghc-devs mailing list