GHC AST Annotations

p.k.f.holzenspies at utwente.nl p.k.f.holzenspies at utwente.nl
Fri Sep 26 08:07:48 UTC 2014


Dear Alan,


Nice going and thanks for undertaking yet another useful AST transformation!


A few thoughts (do with them as you see fit):


- Always called "ann"; doesn't this require OverloadedRecordFields? You're in danger of delaying your modification (scheduled to land in 7.10). Other than that, as before, from a design perspective: yes please.


- In terms of presentation/comments; when I first started looking at (i.e. traversing, selectively printing etc.) the AST, I was always really annoyed that every child in the tree has one extra step of indirection, due to the location annotations being "L loc thing", as opposed to a loc-field as part of the thing. I would simply call it annotation (no talk of external tool writers). In time, I hope GHC-annotations also move to that field.


Regards,

Phili​p





________________________________
From: Alan & Kim Zimmerman <alan.zimm at gmail.com>
Sent: 23 September 2014 20:57
To: Richard Eisenberg
Cc: ghc-devs at haskell.org
Subject: Re: GHC AST Annotations

I have created https://ghc.haskell.org/trac/ghc/ticket/9628 for this, and have decided to first tackle adding a type parameter to the entire AST, so that tool writers can add custom information as required.

My first stab at this is to do is as follows

```
data HsModule r name
  = HsModule {
      ann :: r, -- ^ Annotation for external tool writers
      hsmodName :: Maybe (Located ModuleName),
        -- ^ @Nothing@: \"module X where\" is omitted (in which case the next
        --     field is Nothing too)
      hsmodExports :: Maybe [LIE name],
     ....
```

Salient points

1. It comes as the first type parameter, and is called r
2. It gets added as the first field of the syntax element
3. It is always called ann

Before undertaking this particular change, I would appreciate some feedback.

Regards
  Alan

On Thu, Aug 28, 2014 at 8:34 PM, Alan & Kim Zimmerman <alan.zimm at gmail.com<mailto:alan.zimm at gmail.com>> wrote:
This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.

I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.

The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.

A parameterised annotation allows the annotations to be manipulated via something like for HSE:

 -- |AST nodes are annotated, and this class allows manipulation of the annotations.
class Functor ast => Annotated ast where

   -- |Retrieve the annotation of an AST node.
  ann :: ast l -> l

  -- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not
  --  the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap.
  amap :: (l -> l) -> ast l -> ast l

Alan


On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg <eir at cis.upenn.edu<mailto:eir at cis.upenn.edu>> wrote:
For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:

  | HsDo        SrcSpan              -- of the word "do"
                BlockSrcSpans
                (HsStmtContext Name) -- The parameterisation is unimportant
                                     -- because in this context we never use
                                     -- the PatGuard or ParStmt variant
                [ExprLStmt id]       -- "do":one or more stmts
                PostTcType           -- Type of the whole expression

...

data BlockSrcSpans = LayoutBlock Int  -- the parameter is the indentation level
                                 ...  -- stuff to track the appearance of any semicolons
                   | BracesBlock ...  -- stuff to track the braces and semicolons


The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.

Popping up a level, I do support the idea of including this info in the AST.

Richard

On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones <simonpj at microsoft.com<mailto:simonpj at microsoft.com>> wrote:

> In general I’m fine with this direction of travel. Some specifics:
>
> ·        You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities”
> ·        Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree?
> Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
>
> Simon
>
> From: ghc-devs [mailto:ghc-devs-bounces at haskell.org<mailto:ghc-devs-bounces at haskell.org>] On Behalf Of Alan & Kim Zimmerman
> Sent: 28 August 2014 15:00
> To: ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
> Subject: GHC AST Annotations
>
> Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
>
> Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
>
> The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the  syntactic markers, depending on the particular AST fragment being annotated.
>
> In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
>
> The motivation for this change is then
>
> 1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
>
> 2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
>
> 3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
>
>
>
> I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
>
>
>
> Regards
>
>   Alan
>
>
>
>
> [1] https://phabricator.haskell.org/D157
>
> [2] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-Haskell-Exts-SrcLoc.html#t:SrcSpanInfo
>
> [3] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-Haskell-Exts-Annotated-Syntax.html#t:Annotated
>
> [4] http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-Haskell-Exts-Annotated-ExactPrint.html#v:exactPrint
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org<mailto:ghc-devs at haskell.org>
> http://www.haskell.org/mailman/listinfo/ghc-devs



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140926/b6f5cc52/attachment-0001.html>


More information about the ghc-devs mailing list