How does GHC implement layout?

Sun Apr 4 18:52:47 UTC 2021

Hi Alexis,

I wasn't sure what the "alternative layout" is either and did some
googling, and it appears that it is something that was never really
documented properly.   The following link contains pointers to the commit
that introduced it (in 2009!)  (not the main ticket but some of the
comments)

https://ghc-tickets.haskell.narkive.com/htgwkF80/13087-alternativelayoutrule-breaks-lambdacase

Overall, I do think that Haskell's layout rule is more complicated than it
needs to be, and this is mostly because of the rule that requires the
insertion of a "virtual close curly" on a parse error.  This means that the
parser and lexer have to communicate.  I've implemented a few
languages with layout, and usually use a simpler version of layout that
just omits that case.  The benefit is that layout can be implemented as a
simple pre-processor pass on the stream of tokens,  which is much simpler
to specify and implement.   The drawback is that sometimes you have to
write programs in a slightly different way, but nothing that can't be
easily worked around.

My feeling is that it'd be pretty tricky to do layout in the parser with
grammar rules, but you may be able to do something with the parser state.
 I wonder how different it would end up looking though, as in a way that's
exactly what we are doing now, it is just that some of the state is the
lexer.

-Iavor

On Sat, Apr 3, 2021 at 5:05 PM Alexis King <lexi.lambda at gmail.com> wrote:

> Hi all,
>
> I’m wondering if there are any resources that discuss the design of GHC’s
> implementation of layout. (I haven’t been able to find any.) From looking
> at the code, here’s what I’ve gathered so far:
>
>    - Layout is implemented in the lexer (compiler/GHC/Parser/Lexer.x).
>
>    - The implementation is similar in some respects to the approach
>    described in the Haskell Report, but still fairly different. Virtual braces
>    and semicolons are inserted during the lexing process itself with the
>    assistance of Alex lexer states (aka “start codes”).
>
>    - In order to handle particularly tricky cases like
>
>        if e then do x; y else z
>
>
>    where the virtual close brace must be inserted in the middle of a
>    line, tokens such as in and else are given special context-sensitive
>    treatment. This appears to be quite subtle.
>
> Overall, I can mostly follow the code, but I still have a few unanswered
> questions:
>
>    - The layout-related code consistently uses the phrase “alternative
>    layout rule”—what does “alternative” mean here? Does it refer to GHC’s
>    implementation of layout? Or maybe it refers to
>    NondecreasingIndentation? It isn’t clear.
>
>    - The implementation of layout seems quite complex, in large part
>    because it has to worry about parsing concerns in the lexer in order to
>    handle tricky cases like the one I provided above. Is there are reason all
>    this is done in the lexer, rather than deferring some more of the work to
>    the parser?
>
> I’ve found remarkably little information about implementing layout in
> general, so perhaps I’m missing some resources or keywords to search for,
> but any information or perspectives would be appreciated!
>
> Thanks,
> Alexis
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20210404/ebfda6de/attachment.html>