Operating on HsSyn

Alan & Kim Zimmerman alan.zimm at gmail.com
Fri Jul 28 17:49:04 UTC 2017


Well, in GHC as it stands now (from 8.2.1), the original formulation is
correct.

The current pretty printer does not reproduce layout, but reparsing a
pretty printed ParsedSource will faithfully reproduce the original
ParsedSource (except for the Locations).

This is useful for constructing hsSyn fragments programmatically and
generating usable source from them.

I am hoping (intending) that a future version will allow the other
formulation too.

Alan

On 28 July 2017 at 19:38, Shayan Najd <sh.najd at gmail.com> wrote:

> by
>
>  (parser . prettyPrint . parser) = id
>
>
> I meant
>
>  (prettyPrint . parser . prettyPrint) = id
>
> for a valid input.
>
>
> On Fri, Jul 28, 2017 at 4:32 PM, Shayan Najd <sh.najd at gmail.com> wrote:
>
>> On the contrary, inside GHC I /do/ want to print them. Otherwise how can
>>> I see what the renamer has done?
>>
>>
>> Right. So if I understand correctly, with this semantics, `Outputable` is
>> somewhere between pretty printing as often used in program manipulation
>> libraries (like Haskell-Src-Exts (HSE)) which is closer to syntax, and
>> `Show` which is closer to Haskell representation.
>> (There are also "exact printers" (as in HSE) that are even closer to
>> syntax in some sense.)
>> Often, pretty printers generate only grammatically valid terms, not the
>> ones polluted with extra annotations (hence grammatically invalid), e.g.,
>> what is the grammatically valid form of `OpApp` printed via `Outputable`
>> that includes the fixity annotation.
>>
>> If I recall correctly, we have briefly studied these in the past summer,
>> we came up with some roundtrip correctness criteria, like the following
>> (bar error handling; assume valid input):
>>
>>  (parser . prettyPrint . parser) = id
>>
>> [paging in Jacques]
>>
>> The reason I am trying to flesh out the semantics is the /big/ gains on
>> code reuse later on in the process: one does not need to define a separate
>> pretty printing library for Haskell syntax, and can reuse the well-tested
>> and well-maintained one in GHC.
>>
>> Reformulating part of your concern, based on my understanding (if I may),
>> the questions is: what is the proper design of an "outputer"
>> (debug-printer?) where /annotated/ terms can be pretty-printed including
>> any printable (pretty?showable?) annotations.
>> In particular, we may want to take advantage of extensibility of data
>> types for this.
>> Am I far off?
>>
>> Note: with proper design, an extensible debug-printer can still subsume
>> corresponding pretty-printers.
>>
>>
>> On Fri, Jul 28, 2017 at 2:36 PM, Simon Peyton Jones <
>> simonpj at microsoft.com> wrote:
>>
>>> I have been under the impression that we don't even want to print those.
>>>
>>> On the contrary, inside GHC I /do/ want to print them. Otherwise how can
>>> I see what the renamer has done?
>>>
>>>
>>>
>>> Simon
>>>
>>>
>>>
>>> *From:* Shayan Najd [mailto:sh.najd at gmail.com]
>>> *Sent:* 28 July 2017 12:20
>>> *To:* Simon Peyton Jones <simonpj at microsoft.com>
>>> *Cc:* ghc-devs at haskell.org; Alan & Kim Zimmerman <alan.zimm at gmail.com>
>>> *Subject:* Re: Operating on HsSyn
>>>
>>>
>>>
>>> Before all this, we may need to discuss a bit about the intended
>>> semantics of
>>>
>>> `Outputable`: does it need to print `PostRn`, or `PostTc` fields; or
>>> `Out`
>>>
>>> suffixed constructors?  If not, then we only need to write a set of
>>> instances
>>>
>>> for the base growable AST, once and for all.  Such instances will be
>>> polymorphic
>>>
>>> on the extension descriptor `p`, and do not need to mention the
>>> constraints like
>>>
>>> `(PostRn p (IdP p)`, since these are just extensions and not part of the
>>> base
>>>
>>> growable AST.  Or, am I missing something about the intended semantics of
>>>
>>> `Outputable`?
>>>
>>>
>>>
>>> You write
>>>
>>>
>>>
>>> So today we never print these annotations, to avoid bloating the
>>> instance contexts, which can be painful.
>>>
>>>
>>>
>>> I have been under the impression that we don't even want to print those.
>>>
>>>
>>>
>>> Of course, there are scenarios (like `Show` instances) where we do want
>>> to write
>>>
>>> compositional / generic functions that take into account the extensions.
>>>
>>> Here is my abstract overview of the scenario, that may help the
>>> discussion.
>>>
>>> Consider data types `A`, `B`, and `C` (say, one AST datatype per
>>> compiler phase) that
>>>
>>>  are defined as extensions to a base datatype `T`:
>>>
>>>
>>>
>>> > A = T XA
>>>
>>> > B = T XB
>>>
>>> > C = T XC
>>>
>>>
>>>
>>> where `X*`s are extension descriptors.
>>>
>>> Now, say we want to a define functions `f_A`, `f_B`, and `f_C` over `A`,
>>> `B`, and `C`.
>>>
>>> We have two main alternatives:
>>>
>>> (a) either we write these  (manually or using the deriving mechanism)
>>> separately
>>>
>>> (b) or we write a generic / parametric function `g` over `T`, and reuse
>>> that to define `f_*`s
>>>
>>>
>>>
>>> Of course, (b) is preferable in theory , but not always possible or
>>> preferable in practice.
>>>
>>> In which case, we can always resort to (a).
>>>
>>> The more varying are the definitions of `f_A`, `f_B`, and `f_C` the more
>>> parametric should
>>>
>>> `g` get, as this is the case for any generic function.
>>>
>>>
>>>
>>> With a correct design, I believe, these are all independent of Trees
>>> that Grow story itself:
>>>
>>> we are now not only trying to reuse data types, and functions agnostic
>>> towards extensions
>>>
>>> (pretty printers in my view of their semantics), but also reuse
>>> functions with parametric /
>>>
>>> varying behaviour with respect to extensions.
>>>
>>>
>>>
>>> /Shayan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 28, 2017 at 10:18 AM, Simon Peyton Jones <
>>> simonpj at microsoft.com> wrote:
>>>
>>> Devs,
>>>
>>>
>>>
>>> Shayan is working away on “Trees that grow”… do keep it on your radar:
>>>
>>>
>>>
>>> *To:* ghc-devs
>>> *Sent:* 25 May 2017 23:49
>>>
>>> Do take a look at this:
>>>
>>>
>>>
>>> ·         We propose to re-engineer HsSyn itself.  This will touch a
>>> *lot* of code.
>>>
>>> ·         But it’s very neat, and will bring big long-term advantages
>>>
>>> ·         And we can do it a bit at a time
>>>
>>>
>>>
>>> The wiki page https://ghc.haskell.org/trac/g
>>> hc/wiki/ImplementingTreesThatGrow has the details.  It’s entirely an
>>> internal change, not a change to GHC’s specification, so it’s independent
>>> of the GHC proposals process.  But I’d value the opinion of other GHC devs
>>>
>>>
>>>
>>> Meanwhile I have a question. When pretty-printing HsSyn we often have a
>>> situation like this:
>>>
>>>
>>>
>>>   data Pass = Parsed | Renamed | Typechecked
>>>
>>>
>>>
>>>   data HsExpr (p :: Pass) = HsVar (IdP p) | ....
>>>
>>>
>>>
>>>   type famliy IdP p where
>>>
>>>      IdP Parsed      = RdrName
>>>
>>>      IdP Renamed     = Name
>>>
>>>      IdP Typechecked = Id
>>>
>>>
>>>
>>>   instance (Outputable (IdP p)) => Outputable (HsExpr p) where
>>>
>>>      ppr (HsVar v) = ppr v
>>>
>>>
>>>
>>> The (ppr v) requires (Outputable (IdP p)), hence the context.
>>>
>>>
>>>
>>> Moreover, and more seriously, there are things we just can't pretty-print
>>>
>>> right now.  For example, HsExpr has this data constructor:
>>>
>>>
>>>
>>>   data HsExpr p = ...
>>>
>>>     | OpApp       (LHsExpr p)
>>>
>>>                   (LHsExpr p)
>>>
>>>                   (PostRn p Fixity)
>>>
>>>                   (LHsExpr p)
>>>
>>>
>>>
>>> To pretty-print the third argument, we'd need to add
>>>
>>>
>>>
>>>   instance (Outputable (IdP p),
>>>
>>>             Outputable (PostRn p Fixity))   -- New
>>>
>>>         => Outputable (HsExpr p) where
>>>
>>>      ppr (HsVar v) = ppr v
>>>
>>>
>>>
>>> and that gets onerous.  *So today we never print these annotations*, to
>>> avoid bloating the instance contexts, which can be painful.  It bit me
>>> yesterday.
>>>
>>>
>>>
>>> We have bitten that bullet for the Data class: look at
>>> HsExtension.DataId, which abbreviates the long list of dictionaries:
>>>
>>>
>>>
>>>   type DataId p =
>>>
>>>     ( Data p
>>>
>>>     , ForallX Data p
>>>
>>>     , Data (NameOrRdrName (IdP p))
>>>
>>>     , Data (IdP p)
>>>
>>>     , Data (PostRn p (IdP p))
>>>
>>>     , Data (PostRn p (Located Name))
>>>
>>>     , Data (PostRn p Bool)
>>>
>>>     , Data (PostRn p Fixity)
>>>
>>>      ,..and nine more... )
>>>
>>>
>>>
>>> Let me note in passing that [wiki:QuantifiedContexts
>>> <https://ghc.haskell.org/trac/ghc/wiki/QuantifiedContexts>] would make
>>> this somewhat shorter
>>>
>>>
>>>
>>>   type DataId p =
>>>
>>>     ( Data p
>>>
>>>     , ForallX Data p
>>>
>>>     , Data (NameOrRdrName (IdP p))
>>>
>>>     , Data (IdP p)
>>>
>>>     , forall t. Data t => Data (PostRn p t))
>>>
>>>
>>>
>>> But we still need one item in this list for each type function,
>>>
>>> and I am worried about how this scales to the
>>>
>>> [wiki:ImplementingTreesThatGrow
>>> <https://ghc.haskell.org/trac/ghc/wiki/ImplementingTreesThatGrow>]
>>> story, when we have a type
>>>
>>> function for each data constructor -- and there are a *lot* of data
>>>
>>> constructors.
>>>
>>>
>>>
>>> *So I have four questions*
>>>
>>>
>>>
>>>    1. I think we should probably use a superclass instead of a type
>>>    synonym
>>>
>>>
>>>
>>> class (Data p, ForallX Data p, ....) => DataId p where {}
>>>
>>>
>>>
>>> Why?  Only one argument to pass, and to pass on to successive calls.  I
>>> see no downside.
>>>
>>>
>>>
>>>    1. Shall we treat Outputable like Data?  (I.e. make an abbreviation
>>>    for a long list of Outputable instances and use it everywhere)
>>>    2. I thought of another way to do it (pass a token); see below
>>>
>>>
>>>
>>>    1. Are there any other ways?
>>>
>>>
>>>
>>>
>>>
>>> *Token passing idea.*
>>>
>>>
>>>
>>> Perhaps instead of passing lots of functions, we pass a singleton token
>>>
>>> that encodes the pass, like this:
>>>
>>>
>>>
>>>   instance (PassC p) => Outputable (HsExpr p) where
>>>
>>>      ppr (HsVar v) = case getPass :: IsPass p of
>>>
>>>                        IsParsed      -> ppr v
>>>
>>>                        IsRenamed     -> ppr v
>>>
>>>                        IsTypechecked -> ppr v
>>>
>>>
>>>
>>> The three ppr's are at different types, of course; that's the point.
>>>
>>> The infrastructure is something like
>>>
>>>
>>>
>>>   class PassC p where
>>>
>>>     getPass :: IsPass p
>>>
>>>
>>>
>>>   data IsPass p where
>>>
>>>     IsParsed      :: IsPass Parsed
>>>
>>>     IsRenamed     :: IsParsed Renamed
>>>
>>>     IsTypechecked :: IsParsed Typechecked
>>>
>>>
>>>
>>>   instance PassC Parsed where getPass = IsParsed
>>>
>>>     ...etc...
>>>
>>>
>>>
>>> Now we could sweep away all those OutputableX classes,
>>>
>>> replacing them with dynamic tests on the singletons IsParsed etc.
>>>
>>> This would have advantages:
>>>
>>>
>>>
>>> - Probably faster: there's a dynamic test, but many fewer dictionary
>>>
>>>   arguments and higher-order function dispatch
>>>
>>>
>>>
>>> - Only one dictionary to pass; programming is easier.
>>>
>>>
>>>
>>> The big downside is that it's not extensible: it works only because
>>>
>>> we know the three cases.  But the "Trees that Grow" story really doesn't
>>>
>>> scale well to pretty-printing: so maybe we should just give up on that?
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170728/fbbcee5c/attachment-0001.html>


More information about the ghc-devs mailing list