<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>The idea of making Cmm roundtripable comes up every now and

      then. <br>

      While the ability to feed dump output to GHC for debugging or

      similar purposes is useful In the end we always<br>

      ended up prioritizing one of the many other things that needed

      doing.<br>

      <br>

      Or in other words making Cmm (more) roundtripable seems inherently

      useful. <br>

      However it's questionably how much it is worth breaking things

      like .cmm code that exists in libraries for it.<br>

      So if you want to work towards this it should be with the goal to

      avoid breakage.<br>

      <br>

      There are likely also a lot of corner cases to consider. Which

      might make this more complicated then it sounds.<br>

      Ultimately this is up to you and your mentor. But if I understand

      correctly you have about 5 weeks left for<br>

      GSoC so getting full Cmm roundtrip ability into a state where it

      can be merged into GHC during that time might be<br>

      too optimistic depending on your haskell/parser/GHC experience.<br>

      <br>

      As a GHC maintainer for us the most useful thing therefore would

      be incremental patches which take Cmm closer<br>

      to being roundtripable. And that would allow you to get at least

      some work that benefits the GHC project into the tree even if you

      end up not making it all the way to full roundtrip capability.<br>

      <br>

      On the pure technical aspects:<br>

      -------------<br>

      <br>

      <blockquote type="cite">> Create a separate parser ...</blockquote>

      <br>

      1. Creating a separate parser is not viable. It would likely

      bitrot and break on the next change to Cmm and only causes

      increased maintenance overhead. At least not if you want the GHC

      team to maintain it.</p>

    <p>

      <blockquote type="cite">Extend the current parser with a dedicated

        block</blockquote>

      Having blocks ala C seems fine. Your suggestion seems different

      however. It's unclear from your example how those blocks would

      work exactly. Is `<code>low_level_unwrapped` </code>a label. If so

      can we goto to it? Is it a keyword? Something else entirely?</p>

    <p>If the main issue is the "offset" string in the generated case

      I'm fine with deleting that from the pretty printer. I'm not sure

      that does anything of value so removing it from the output seems

      fine. (See pprCmmGraph).<br>

      <br>

      > If we introduce this new “exact” low-level form, it's

      possible the existing low-level mode could become redundant. We

      might then have:<br>

      <br>

      What changes are you planning that make the new parser/syntax

      incompatible with the old one? Can't you just modify the current

      parser, maybe with some slight changes to the pretty printer, in a

      way that makes it mostly backwards compatible?</p>

    <p>> <code>aeson</code> adds a large dependency footprint, and

      likely wouldn't be suitable for inclusion in GHC.<br>

      <br>

      Yes aeson seems unsuitable. </p>

    <p>> Lastly—I’ve heard that parts of the Cmm pipeline may

      currently be under refactoring.<br>

      <br>

      This is the first time I hear of this so I wonder where this

      information came from? There could always be changes to those

      sorts of things, because at the end of the day they are compiler

      internals. But I'm not aware of any big planned changes in the

      near future.<br>

      <br>

      Cheers<br>

      Andreas</p>

    <div class="moz-cite-prefix">On 28/07/2025 02:16, Diego Antonio

      Rosario Palomino wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAONcbWLjEePZCQOzy9JPF1zp=MWAzxZXSn7L8Y8yGf7qC_sa9A@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <p>Hello GHC devs,</p>

        <p>I'm currently working on Cmm documentation and tooling

          improvements as part of my Google Summer of Code project. One

          of my core goals is to make Cmm roundtrip serializable.</p>

        <p>Right now, the in-memory Cmm data structure—generated

          programmatically (e.g., from STG via GHC)—can be

          pretty-printed, and Cmm can also be parsed. However, the

          pretty-printed version is not compatible with the parser. That

          is, we cannot take the output of the pretty printer and feed

          it directly back into the parser.</p>

        <p>Example:</p>

        <p>Parseable version:</p>

        <pre><code>sum {

 cr:

  bits64 x;

  x = R1 + R2;

  R1 = x;

  jump %ENTRY_CODE(Sp(0))[R1];

}

</code></pre>

        <p>Pretty-printed version:</p>

        <pre><code>sum() { // []

  { info_tbls: []

    stack_info: arg_space: 8

  }

  {offset

    cf: // global

      _ce::I64 = R1 + R2;

      R1 = _ce::I64;

      call (I64[Sp + 0 * 8])(R1) args: 8, res: 0, upd: 8;

  }

}

</code></pre>

        <p>Another example:</p>

        <p>Parseable version:</p>

        <pre><code>simple_sum_4 { // [R2, R1]

  cr: // global

    bits64 _cq;

    _cq = R2;

    bits64 _cp;

    _cp = R1;

    R1 = _cq + _cp;

    jump (bits64[Sp])[R1];

}

</code></pre>

        <p>Pretty-printed version:</p>

        <pre><code>simple_sum_4() { // []

  { info_tbls: []

    stack_info: arg_space: 8

  }

  {offset

    cs: // global

      _cq::I64 = R2;

      _cr::I64 = R1;

      R1 = _cq::I64 + _cr::I64;

      call (I64[Sp])(R1) args: 8, res: 0, upd: 8;

  }

}

</code></pre>

        <p>While it’s possible to write parseable Cmm that resembles the

          pretty-printed version (and hence the internal ADT), they

          don’t fully match—mainly because the parser inserts inferred

          fields using convenience functions.</p>

        <p>Proposal:</p>

        <p>To make roundtrip serialization possible, I propose

          supporting a new syntax that matches the pretty printer output

          exactly.</p>

        <p>There are a couple of design options:</p>

        <ol>

          <li>

            <p>Create a separate parser that accepts the pretty-printed

              syntax. Files could then use either the current parser or

              the new strict one.</p>

          </li>

          <li>

            <p>Extend the current parser with a dedicated block syntax

              like:</p>

          </li>

        </ol>

        <pre><code>low_level_unwrapped {

  ...

}

</code></pre>

        <p>This second option is the one my mentor recommends, as it may

          better reflect GHC developers' preferences. In this mode, the

          parser would not insert any inferred data and would expect the

          input to match the pretty-printed form exactly.</p>

        <p>This would enable a true roundtrip:</p>

        <ul>

          <li>

            <p>Compile Haskell to Cmm (in-memory AST)</p>

          </li>

          <li>

            <p>Pretty-print and write it to disk (wrapped in

              low_level_unwrapped { ... })</p>

          </li>

          <li>

            <p>Later read it back using the parser and continue with

              codegen</p>

          </li>

        </ul>

        <p>Optional future direction:</p>

        <p>As a side note: currently the parser has both a “high-level”

          and a “low-level” mode. The low-level mode resembles the AST

          more closely but still inserts some inferred data.</p>

        <p>If we introduce this new “exact” low-level form, it's

          possible the existing low-level mode could become redundant.

          We might then have:</p>

        <ul>

          <li>

            <p>High-level syntax</p>

          </li>

          <li>

            <p>New low-level (exact)</p>

          </li>

          <li>

            <p>And possibly deprecate the current low-level variant</p>

          </li>

        </ul>

        <p>I’d be interested in your thoughts on whether that direction

          makes sense.</p>

        <p>Serialization libraries?</p>

        <p>One technically possible—but likely unacceptable—alternative

          would be to derive serialization via a library like <code>aeson</code>.

          That would enable serializing and deserializing the Cmm AST

          directly. However, I understand that <code>aeson</code> adds

          a large dependency footprint, and likely wouldn't be suitable

          for inclusion in GHC.</p>

        <p>Final question:</p>

        <p>Lastly—I’ve heard that parts of the Cmm pipeline may

          currently be under refactoring. If that’s the case, could you

          point me to which parts (parser, pretty printer, internal

          representation, etc.) are being modified? I’d like to align my

          efforts accordingly and avoid conflicts.</p>

        <p>Thanks very much for your time and input! I'm happy to

          iterate on this based on your feedback.</p>

        <p>Best regards,<br>

          Diego Antonio Rosario Palomino<br>

          GSoC 2025 – Cmm Documentation & Tooling</p>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre wrap="" class="moz-quote-pre">_______________________________________________

ghc-devs mailing list

<a class="moz-txt-link-abbreviated" href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a>

<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs">http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs</a>

</pre>

    </blockquote>

  </body>

</html>