Operating on HsSyn

Mon Jul 31 09:18:48 UTC 2017

Am I far off?
I don’t think you are far off.  All I’m after is the debug-printer use-case: I want to be able to see the contents of the tree, including the PostRn annotations.  I don’t care about round-tripping it.

Simon

From: Shayan Najd [mailto:sh.najd at gmail.com]
Sent: 28 July 2017 15:32
To: Simon Peyton Jones <simonpj at microsoft.com>
Cc: ghc-devs at haskell.org; Alan & Kim Zimmerman <alan.zimm at gmail.com>; Jacques Carette <carette at mcmaster.ca>
Subject: Re: Operating on HsSyn

On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

Right. So if I understand correctly, with this semantics, `Outputable` is somewhere between pretty printing as often used in program manipulation libraries (like Haskell-Src-Exts (HSE)) which is closer to syntax, and `Show` which is closer to Haskell representation.
(There are also "exact printers" (as in HSE) that are even closer to syntax in some sense.)
Often, pretty printers generate only grammatically valid terms, not the ones polluted with extra annotations (hence grammatically invalid), e.g., what is the grammatically valid form of `OpApp` printed via `Outputable` that includes the fixity annotation.

If I recall correctly, we have briefly studied these in the past summer, we came up with some roundtrip correctness criteria, like the following (bar error handling; assume valid input):

 (parser . prettyPrint . parser) = id

[paging in Jacques]

The reason I am trying to flesh out the semantics is the /big/ gains on code reuse later on in the process: one does not need to define a separate pretty printing library for Haskell syntax, and can reuse the well-tested and well-maintained one in GHC.

Reformulating part of your concern, based on my understanding (if I may), the questions is: what is the proper design of an "outputer" (debug-printer?) where /annotated/ terms can be pretty-printed including any printable (pretty?showable?) annotations.
In particular, we may want to take advantage of extensibility of data types for this.
Am I far off?

Note: with proper design, an extensible debug-printer can still subsume corresponding pretty-printers.

On Fri, Jul 28, 2017 at 2:36 PM, Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>> wrote:
I have been under the impression that we don't even want to print those.
On the contrary, inside GHC I /do/ want to print them. Otherwise how can I see what the renamer has done?

Simon

From: Shayan Najd [mailto:sh.najd at gmail.com<mailto:sh.najd at gmail.com>]
Sent: 28 July 2017 12:20
To: Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>>
Cc: ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>; Alan & Kim Zimmerman <alan.zimm at gmail.com<mailto:alan.zimm at gmail.com>>
Subject: Re: Operating on HsSyn

Before all this, we may need to discuss a bit about the intended semantics of
`Outputable`: does it need to print `PostRn`, or `PostTc` fields; or `Out`
suffixed constructors?  If not, then we only need to write a set of instances
for the base growable AST, once and for all.  Such instances will be polymorphic
on the extension descriptor `p`, and do not need to mention the constraints like
`(PostRn p (IdP p)`, since these are just extensions and not part of the base
growable AST.  Or, am I missing something about the intended semantics of
`Outputable`?

You write

So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.

I have been under the impression that we don't even want to print those.

Of course, there are scenarios (like `Show` instances) where we do want to write
compositional / generic functions that take into account the extensions.
Here is my abstract overview of the scenario, that may help the discussion.
Consider data types `A`, `B`, and `C` (say, one AST datatype per compiler phase) that
 are defined as extensions to a base datatype `T`:

> A = T XA
> B = T XB
> C = T XC

where `X*`s are extension descriptors.
Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`, `B`, and `C`.
We have two main alternatives:
(a) either we write these  (manually or using the deriving mechanism) separately
(b) or we write a generic / parametric function `g` over `T`, and reuse that to define `f_*`s

Of course, (b) is preferable in theory , but not always possible or preferable in practice.
In which case, we can always resort to (a).
The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more parametric should
`g` get, as this is the case for any generic function.

With a correct design, I believe, these are all independent of Trees that Grow story itself:
we are now not only trying to reuse data types, and functions agnostic towards extensions
(pretty printers in my view of their semantics), but also reuse functions with parametric /
varying behaviour with respect to extensions.

/Shayan

On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>> wrote:
Devs,

Shayan is working away on “Trees that grow”… do keep it on your radar:

To: ghc-devs
Sent: 25 May 2017 23:49
Do take a look at this:

•         We propose to re-engineer HsSyn itself.  This will touch a lot of code.

•         But it’s very neat, and will bring big long-term advantages

•         And we can do it a bit at a time

The wiki page https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an internal change, not a change to GHC’s specification, so it’s independent of the GHC proposals process.  But I’d value the opinion of other GHC devs

Meanwhile I have a question. When pretty-printing HsSyn we often have a situation like this:

  data Pass = Parsed | Renamed | Typechecked

  data HsExpr (p :: Pass) = HsVar (IdP p) | ....

  type famliy IdP p where

     IdP Parsed      = RdrName

     IdP Renamed     = Name

     IdP Typechecked = Id

  instance (Outputable (IdP p)) => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

The (ppr v) requires (Outputable (IdP p)), hence the context.

Moreover, and more seriously, there are things we just can't pretty-print
right now.  For example, HsExpr has this data constructor:

  data HsExpr p = ...

    | OpApp       (LHsExpr p)

                  (LHsExpr p)

                  (PostRn p Fixity)

                  (LHsExpr p)

To pretty-print the third argument, we'd need to add

  instance (Outputable (IdP p),

            Outputable (PostRn p Fixity))   -- New

        => Outputable (HsExpr p) where

     ppr (HsVar v) = ppr v

and that gets onerous.  So today we never print these annotations, to avoid bloating the instance contexts, which can be painful.  It bit me yesterday.

We have bitten that bullet for the Data class: look at HsExtension.DataId, which abbreviates the long list of dictionaries:

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , Data (PostRn p (IdP p))

    , Data (PostRn p (Located Name))

    , Data (PostRn p Bool)

    , Data (PostRn p Fixity)

     ,..and nine more... )

Let me note in passing that [wiki:QuantifiedContexts<https://ghc.haskell.org/trac/ghc/wiki/QuantifiedContexts>] would make this somewhat shorter

  type DataId p =

    ( Data p

    , ForallX Data p

    , Data (NameOrRdrName (IdP p))

    , Data (IdP p)

    , forall t. Data t => Data (PostRn p t))

But we still need one item in this list for each type function,
and I am worried about how this scales to the
[wiki:ImplementingTreesThatGrow<https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow>] story, when we have a type
function for each data constructor -- and there are a lot of data
constructors.

So I have four questions

  1.  I think we should probably use a superclass instead of a type synonym

class (Data p, ForallX Data p, ....) => DataId p where {}

Why?  Only one argument to pass, and to pass on to successive calls.  I see no downside.

  1.  Shall we treat Outputable like Data?  (I.e. make an abbreviation for a long list of Outputable instances and use it everywhere)
  2.  I thought of another way to do it (pass a token); see below

  1.  Are there any other ways?

Token passing idea.

Perhaps instead of passing lots of functions, we pass a singleton token
that encodes the pass, like this:

  instance (PassC p) => Outputable (HsExpr p) where

     ppr (HsVar v) = case getPass :: IsPass p of

                       IsParsed      -> ppr v

                       IsRenamed     -> ppr v

                       IsTypechecked -> ppr v

The three ppr's are at different types, of course; that's the point.
The infrastructure is something like

  class PassC p where

    getPass :: IsPass p

  data IsPass p where

    IsParsed      :: IsPass Parsed

    IsRenamed     :: IsParsed Renamed

    IsTypechecked :: IsParsed Typechecked

  instance PassC Parsed where getPass = IsParsed

    ...etc...

Now we could sweep away all those OutputableX classes,
replacing them with dynamic tests on the singletons IsParsed etc.
This would have advantages:

- Probably faster: there's a dynamic test, but many fewer dictionary
  arguments and higher-order function dispatch

- Only one dictionary to pass; programming is easier.

The big downside is that it's not extensible: it works only because
we know the three cases.  But the "Trees that Grow" story really doesn't
scale well to pretty-printing: so maybe we should just give up on that?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170731/b495ce97/attachment-0001.html>