GHC AST Annotations

Simon Peyton Jones simonpj at
Thu Aug 28 20:38:45 UTC 2014

I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
It’s probably not too bad if you use record syntax; thus
  | HsDo  { hsdo_do_loc :: SrcSpan              -- of the word "do"
          , hsdo_blocks :: BlockSrcSpans
          , hsdo_ctxt   :: HsStmtContext Name
          , hsdo_stmts  :: [ExprLStmt id]
          , hsdo_type    :: PostTcType }


From: Alan & Kim Zimmerman [mailto:alan.zimm at]
Sent: 28 August 2014 19:35
To: Richard Eisenberg
Cc: Simon Peyton Jones; ghc-devs at
Subject: Re: GHC AST Annotations

This does have the advantage of being explicit. I modelled the initial proposal on HSE as a proven solution, and I think that they were trying to keep it non-invasive, to allow both an annotated and non-annoted AST.
I thiink the key question is whether it is acceptable to sprinkle this kind of information throughout the AST. For someone interested in source-to-source conversions (like me) this is great, others may find it intrusive.
The other question, which is probably orthogonal to this, is whether we want the annotation to be a parameter to the AST, which allows it to be overridden by various tools for various purposes, or fixed as in Richard's suggestion.
A parameterised annotation allows the annotations to be manipulated via something like for HSE:

 -- |AST nodes are annotated, and this class allows manipulation of the annotations.
class Functor ast => Annotated ast where

   -- |Retrieve the annotation of an AST node.
  ann :: ast l -> l

  -- |Change the annotation of an AST node. Note that only the annotation of the node itself is affected, and not
  --  the annotations of any child nodes. if all nodes in the AST tree are to be affected, use fmap.
  amap :: (l -> l) -> ast l -> ast l


On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg <eir at<mailto:eir at>> wrote:
For what it's worth, my thought is not to use SrcSpanInfo (which, to me, is the wrong way to slice the abstraction) but instead to add SrcSpan fields to the relevant nodes. For example:

  | HsDo        SrcSpan              -- of the word "do"
                (HsStmtContext Name) -- The parameterisation is unimportant
                                     -- because in this context we never use
                                     -- the PatGuard or ParStmt variant
                [ExprLStmt id]       -- "do":one or more stmts
                PostTcType           -- Type of the whole expression


data BlockSrcSpans = LayoutBlock Int  -- the parameter is the indentation level
                                 ...  -- stuff to track the appearance of any semicolons
                   | BracesBlock ...  -- stuff to track the braces and semicolons

The way I understand it, the SrcSpanInfo proposal means that we would have lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.

Popping up a level, I do support the idea of including this info in the AST.


On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones <simonpj at<mailto:simonpj at>> wrote:

> In general I’m fine with this direction of travel. Some specifics:
> ·        You’d have to be careful to document, for every data constructor in HsSyn, what the association between the [SrcSpan] in the SrcSpanInfo and the “sub-entities”
> ·        Many of the sub-entities will have their own SrcSpanInfo wrapped around them, so there’s some unhelpful duplication. Maybe you only want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the syntactic keywords) that do not show up as children in the syntax tree?
> Anyway do by all means create a GHC Trac wiki page to describe your proposed design, concretely.
> Simon
> From: ghc-devs [mailto:ghc-devs-bounces at<mailto:ghc-devs-bounces at>] On Behalf Of Alan & Kim Zimmerman
> Sent: 28 August 2014 15:00
> To: ghc-devs at<mailto:ghc-devs at>
> Subject: GHC AST Annotations
> Now that the landmines have hopefully been cleared from the AST via [1] I would like to propose changing the location information in the AST.
> Right now the locations of syntactic markers such as do/let/where/in/of in the source are discarded from the AST, although they are retained in the rich token stream.
> The haskell-src-exts package deals with this by means of using the SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC Located type but also has a list of SrcSpan s for the  syntactic markers, depending on the particular AST fragment being annotated.
> In addition, the annotation type is provided as a parameter to the AST, so that it can be changed as required, see [3].
> The motivation for this change is then
> 1. Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers.
> 2. Allow the annotation to be a parameter so that it can be replaced with a different one in tools, for example HaRe would include the tokens for the AST fragment leaves.
> 3. Aim for some level compatibility with haskell-src-exts so that tools developed for it could be easily ported to GHC, for example exactprint [4].
> I would like feedback as to whether this would be acceptable, or if the same goals should be achieved a different way.
> Regards
>   Alan
> [1]
> [2]
> [3]
> [4]
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at<mailto:ghc-devs at>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the ghc-devs mailing list