How to use Lexer.lexer to produce closing braces as well?

Edward Kmett ekmett at gmail.com
Wed Aug 18 16:05:18 UTC 2021


Unfortunately, the current parsing rules for Haskell aren't fully
phase-separable like this.

If you look at the rules for Layout token insertion in the Haskell report
the 9th rule requires that in the event the parser encounters a parse
error it should insert a virtual close brace and continue on!

Otherwise you couldn't parse things like *let **{** foo = bar **}** in
baz *where
the {}'s are virtual without reframing *let* and *in* as a different kind
of paired opening and closing brace or using other hacks in the grammar. It
is quite difficult to hack around all the ways parses can go wrong.

The main downside this has from a language standpoint is you simply can't
properly lex Haskell without more or less fully parsing Haskell.

-Edward

On Wed, Aug 18, 2021 at 7:22 AM Kwanghoon Choi <lazyswamp at gmail.com> wrote:

>
> Hi,
>
> I have recently been playing with GHC's Lexer.lexer in the ghc-parser-lib
> package.
>
> Given
>
>    module HelloWorld where
>
>    main = putStrLn "Hello World!\n"
>
> it produces
>
>    stack exec -- lexer-exe ./examples/HelloWorld.hs
>    Lexing&Parsing: ./examples/HelloWorld.hs
>    module at (1, 1): module
>    CONID at (1, 8): CONID
>    where at (1, 19): where
>    vocurly at (3, 1): vocurly    <==== { is inserted automatically!!
>    VARID at (3, 1): VARID
>    = at (3, 6): =
>    VARID at (3, 8): VARID
>    STRING at (3, 17): STRING
>    ; at (4, 1): ;
>
> By the example above, the lexer automatically inserts an opening brace
> (i.e. vocurly) right after 'where'. But it does not insert a matching
> closing brace (i.e., vccurly), which would lead to a failure in parsing a
> list of tokens produced by the lexer.
>
> My question is how to use the GHC lexer to produce closing braces as well.
>
> All my code is available
>  - https://github.com/kwanghoon/hslexer
>
> To save your time, the relevant part of the code is as follows:
>
> In app/HaskellLexer.hs,
>
>     singleHaskellToken :: P (Located Token)
>     singleHaskellToken =
>       Lexer.lexer False
>         (\locatedToken -> P (\pstate -> POk pstate locatedToken))
>
>     tokInfos :: [Terminal Token] -> P (Line, Column, [Terminal Token])
>     tokInfos s = do
>       locatedToken <- singleHaskellToken
>       case locatedToken of
>         L srcspan ITeof ->
>           let (start_line, start_col, end_line, end_col) =
> srcSpanToLineCol srcspan in
>           return (end_line, end_col, s)
>
>         L srcspan tok ->
>           let (start_line, start_col, end_line, end_col) =
> srcSpanToLineCol srcspan in
>           tokInfos (Terminal (fromToken tok) start_line start_col (Just
> tok) : s)
>
> Thanks in advance
>
> Best regards,
>
> Kwanghoon
>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210818/1ca5a2c6/attachment.html>


More information about the ghc-devs mailing list