<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>That is, if you want to have distinct tokens for multiplication

      and pointer type asterisks. I don't see why you would need to do

      that distinction in the lexer though. You can just lex * as an

      asterisk, and later in parser figure out what that asterisk meant.

      The same way various parenthesis and brackets, or comma are often

      overloaded in the programming languages, but that doesn't

      complicate their lexing in any way.<br>

      <br>

      The Haskell indentation is much more complicated.<br>

      <br>

      Your example illustrates that parser cannot operate (decide

      between variable definition or an expression) without also

      processing typedef statements. So C forces part of renaming to be

      done in parsing. That is unfortunate coupling, but it's different

      coupling.<br>

      <br>

      - Oleg<br>

      <br>

    </p>

    <div class="moz-cite-prefix">On 2.11.2023 22.50, Tom Smeding wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:0e8ba474-58d7-493a-a6c9-3b2d70df8825@tomsmeding.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      The fun (?) thing about C syntax is that you _cannot_ defer this.

      Consider the following (invalid) C program:<br>

      <br>

      int main(void) {<br>

        t * x;<br>

        int t;<br>

        t * x;<br>

      }<br>

      <br>

      When I pass this through gcc, what I get is:<br>

      <br>

      file.c: In function ‘main’:<br>

      file.c:2:3: error: unknown type name ‘t’<br>

          2 |   t * x;<br>

            |   ^<br>

      file.c:4:5: error: invalid operands to binary * (have ‘int’ and

      ‘int *’)<br>

          4 |   t * x;<br>

            |     ^<br>

      <br>

      The first 't * x' statement was parsed as a declaration of the

      variable 'x' with as type 't*'. The second such statement was

      parsed as a multiplication. The difference in behaviour is the

      declaration of 't' as a variable in between.<br>

      <br>

      When starting this email I thought that the default was the other

      way round, i.e. 't * x' is parsed as a multiplication unless 't'

      is defined as a type; this would be accomplished by e.g. 'typedef

      int t;'. However it seems that the default, at least in gcc

      13.2.1, is a variable declaration. Luckily (?), the point stands

      that to lex C, if you want to distinguish multiplication from the

      pointer type symbol, you need communication from the parser.<br>

      <br>

      - Tom<br>

      <br>

      <div class="moz-cite-prefix">On 01/11/2023 01:51, Oleg Grenrus

        wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:61950d60-4aab-4c61-54e1-62fbe9b93905@iki.fi">

        <meta http-equiv="Content-Type" content="text/html;

          charset=UTF-8">

        <p>In C, AFAIU you can (and probably should) defer `typedef`

          usage recognition to a separate "renamer/ name resolution"

          pass. In Haskell we are forced to do name resolution after

          parsing, as we don't need to declare stuff before use. Even

          so, separate pass is usually a good idea anyway, you are

          better equipped to produce good error messages. In fact GHC

          does even more: it defers the unbound names reporting to the

          type checking phase, so it can give the types to unbound

          variables, like:<br>

          <br>

              Prelude> x : "foo"<br>

              <interactive>:2:1: error: Variable not in scope: x

          :: Char<br>

          <br>

          - Oleg<br>

        </p>

        <div class="moz-cite-prefix">On 1.11.2023 2.32, Brandon Allbery

          wrote:<br>

        </div>

        <blockquote type="cite"

cite="mid:CAKFCL4Vy7sgY=y-fz4v22i2Zm0CrbpwfZeAKyzHke=eczDkLkQ@mail.gmail.com">

          <meta http-equiv="content-type" content="text/html;

            charset=UTF-8">

          <div dir="ltr">Feedback between lexer and parser isn't exactly

            unusual. Consider that parsing a C `typedef` generally needs

            to feed back to the lexer so uses will be recognized

            properly.</div>

          <br>

          <div class="gmail_quote">

            <div dir="ltr" class="gmail_attr">On Wed, Nov 1, 2023 at

              12:28 AM Oleg Grenrus <<a

                href="mailto:oleg.grenrus@iki.fi" moz-do-not-send="true"

                class="moz-txt-link-freetext">oleg.grenrus@iki.fi</a>>

              wrote:<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              <div> Yes, the "communication between lexer and parser" is

                exactly what GHC does.<br>

                <br>

                Amelia has a nice post about it <a

                  href="https://amelia.how/posts/parsing-layout.html"

                  target="_blank" moz-do-not-send="true"

                  class="moz-txt-link-freetext">https://amelia.how/posts/parsing-layout.html</a>

                which made it click it for me.<br>

                <br>

                Note, you don't actually need to use alex and happy, you

                can do hand-written lexer and parsec (or alex and

                parsec, ...). The key insight is to have stateful lexer,

                and control it from the parser.<br>

                <br>

                Amelia's post grammar is a bit too strict, e.g. GHC

                accepts real semis in virtual layout, and also empty

                "statements" in between, so we can write<br>

                <br>

                   \x y z -> case x of True -> y;;;;;; False ->

                z<br>

                <br>

                but that's easy (at least in parsec) to adjust the

                parser grammar to accept those.<br>

                <p>Or, you can *approximate* the parse-error rule with

                  "alternative layout rule" [1], which can be

                  implemented as a pass between lexing and parsing, or

                  as a stateful lexer (but in this case parser won't

                  need to adjust lexer's state). GHC has an undocumented

                  AlternativeLayoutRule extension, so you can experiment

                  with it to see what it accepts (look for tests in GHC

                  source for examples). It handles let-in bindings well

                  enough.<br>

                  <br>

                  [1] <a

href="https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html"

                    target="_blank" moz-do-not-send="true"

                    class="moz-txt-link-freetext">https://www.mail-archive.com/haskell-prime@haskell.org/msg01938.html</a>

                  which can be imp<br>

                  <br>

                  - Oleg<br>

                  <br>

                </p>

                On 1.11.2023 0.31, Travis Athougies wrote:<br>

                <blockquote type="cite">According to the Haskell report

                  [1] (See Note 5), a virtual `}` token<br>

                  is inserted if parsing the next token would cause a

                  parse error and the<br>

                  indentation stack is non-empty.<br>

                  <br>

                  I'm trying to lex and parse Haskell source and this

                  sort of interplay<br>

                  (which requires two-way communication between lexer

                  and parser) makes<br>

                  it very difficult to write a conformant

                  implementation.<br>

                  <br>

                  I can't change the standard (obviously), but I'm

                  wondering if this is<br>

                  actually what GHC (de facto the only Haskell compiler)

                  does, or if it<br>

                  applies some other rule. If so, does anyone know the

                  exact mechanism of<br>

                  its implementation?<br>

                  <br>

                  I've been programming Haskell for more than a decade,

                  and while I have<br>

                  an intuitive understanding of the indentation rules, I

                  would have<br>

                  assumed the source could be lexed without also having

                  a parser. In<br>

                  particular, the note seems to imply that the main

                  purpose of this is to<br>

                  properly lex `let`/`in` bindings. Perhaps there's an

                  alternate<br>

                  equivalent rule?<br>

                  <br>

                  Curious to hear other's thoughts.<br>

                  <br>

                  Travis<br>

                  <br>

                  [1]<br>

                  <a

href="https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3"

                    target="_blank" moz-do-not-send="true"

                    class="moz-txt-link-freetext">https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3</a><br>

                </blockquote>

                <span style="white-space:pre-wrap;display:block;width:98vw">> _______________________________________________

> Haskell-Cafe mailing list

> To (un)subscribe, modify options or view archives go to:

> <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a>

> Only members subscribed via the mailman list are allowed to post.

</span><br>

              </div>

              _______________________________________________<br>

              Haskell-Cafe mailing list<br>

              To (un)subscribe, modify options or view archives go to:<br>

              <a

                href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe"

                rel="noreferrer" target="_blank" moz-do-not-send="true"

                class="moz-txt-link-freetext">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>

              Only members subscribed via the mailman list are allowed

              to post.</blockquote>

          </div>

          <br clear="all">

          <div><br>

          </div>

          <span class="gmail_signature_prefix">-- </span><br>

          <div dir="ltr" class="gmail_signature">

            <div dir="ltr">

              <div>

                <div dir="ltr">

                  <div>brandon s allbery kf8nh</div>

                  <div><a href="mailto:allbery.b@gmail.com"

                      target="_blank" moz-do-not-send="true"

                      class="moz-txt-link-freetext">allbery.b@gmail.com</a></div>

                </div>

              </div>

            </div>

          </div>

        </blockquote>

        <br>

        <fieldset class="moz-mime-attachment-header"></fieldset>

        <pre class="moz-quote-pre" wrap="">_______________________________________________

Haskell-Cafe mailing list

To (un)subscribe, modify options or view archives go to:

<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" moz-do-not-send="true">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a>

Only members subscribed via the mailman list are allowed to post.</pre>

      </blockquote>

      <br>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

Haskell-Cafe mailing list

To (un)subscribe, modify options or view archives go to:

<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a>

Only members subscribed via the mailman list are allowed to post.</pre>

    </blockquote>

  </body>

</html>