GHC AST Annotations

Neil Mitchell ndmitchell at gmail.com
Sat Aug 30 21:18:03 UTC 2014


Since Alan is trying to do something for HaRe that I want for HLint on
top of haskell-src-exts, he asked me for my opinions on the proposal.
There seem to be two approaches to take:

* Add SrcSpan's throughout. The HSE approach of having a list of inner
source spans is nasty - the details of which source space goes where
is entirely undocumented and hard to discover. Even worse, for things
like instance, which may or may not have a where after, the number of
inner SrcSpan's changes. Simon's idea of hsdo_do_loc is much cleaner,
and easily extends to Maybe SrcSpan if the keyword is optional.

* Having the annotation be a type parameter gives much greater
flexibility. In particular, it would let you mark certain nodes as
being added/deleted. However, since SrcSpan has an Int in it, you can
always pass around a separate IntMap and make the SrcSpan really be an
index into more detailed information. It's nasty, but only the people
who use it pay for it.

Both approaches have disadvantages. You could always combine both
ideas, and have a SrcSpan and entirely separately an annotation (which
defaults to (), rather than SrcSpanInfo), but maybe that's too much
extra baggage on the AST.

Thanks, Neil


On Sat, Aug 30, 2014 at 3:32 PM, Alan & Kim Zimmerman
<alan.zimm at gmail.com> wrote:
> A further use case would be to be able to convert all the locations to be
> relative, or include a relative portion, so that as tools manipulate the AST
> by adding or removing parts the layout can be preserved.
>
> I think I may need to make a wip branch for this and experiment, it is
> always easier to comment on concrete things.
>
> Alan
>
>
> On Thu, Aug 28, 2014 at 10:38 PM, Simon Peyton Jones <simonpj at microsoft.com>
> wrote:
>>
>> I thiink the key question is whether it is acceptable to sprinkle this
>> kind of information throughout the AST. For someone interested in
>> source-to-source conversions (like me) this is great, others may find it
>> intrusive.
>>
>> It’s probably not too bad if you use record syntax; thus
>>
>>   | HsDo  { hsdo_do_loc :: SrcSpan              -- of the word "do"
>>
>>           , hsdo_blocks :: BlockSrcSpans
>>
>>           , hsdo_ctxt   :: HsStmtContext Name
>>
>>           , hsdo_stmts  :: [ExprLStmt id]
>>
>>           , hsdo_type    :: PostTcType }
>>
>>
>>
>> Simon
>>
>>
>>
>> From: Alan & Kim Zimmerman [mailto:alan.zimm at gmail.com]
>> Sent: 28 August 2014 19:35
>> To: Richard Eisenberg
>> Cc: Simon Peyton Jones; ghc-devs at haskell.org
>> Subject: Re: GHC AST Annotations
>>
>>
>>
>> This does have the advantage of being explicit. I modelled the initial
>> proposal on HSE as a proven solution, and I think that they were trying to
>> keep it non-invasive, to allow both an annotated and non-annoted AST.
>>
>> I thiink the key question is whether it is acceptable to sprinkle this
>> kind of information throughout the AST. For someone interested in
>> source-to-source conversions (like me) this is great, others may find it
>> intrusive.
>>
>> The other question, which is probably orthogonal to this, is whether we
>> want the annotation to be a parameter to the AST, which allows it to be
>> overridden by various tools for various purposes, or fixed as in Richard's
>> suggestion.
>>
>> A parameterised annotation allows the annotations to be manipulated via
>> something like for HSE:
>>
>>  -- |AST nodes are annotated, and this class allows manipulation of the
>> annotations.
>> class Functor ast => Annotated ast where
>>
>>    -- |Retrieve the annotation of an AST node.
>>   ann :: ast l -> l
>>
>>   -- |Change the annotation of an AST node. Note that only the annotation
>> of the node itself is affected, and not
>>   --  the annotations of any child nodes. if all nodes in the AST tree are
>> to be affected, use fmap.
>>
>>   amap :: (l -> l) -> ast l -> ast l
>>
>>
>>
>> Alan
>>
>>
>>
>> On Thu, Aug 28, 2014 at 7:11 PM, Richard Eisenberg <eir at cis.upenn.edu>
>> wrote:
>>
>> For what it's worth, my thought is not to use SrcSpanInfo (which, to me,
>> is the wrong way to slice the abstraction) but instead to add SrcSpan fields
>> to the relevant nodes. For example:
>>
>>   | HsDo        SrcSpan              -- of the word "do"
>>                 BlockSrcSpans
>>                 (HsStmtContext Name) -- The parameterisation is
>> unimportant
>>                                      -- because in this context we never
>> use
>>                                      -- the PatGuard or ParStmt variant
>>                 [ExprLStmt id]       -- "do":one or more stmts
>>                 PostTcType           -- Type of the whole expression
>>
>> ...
>>
>> data BlockSrcSpans = LayoutBlock Int  -- the parameter is the indentation
>> level
>>                                  ...  -- stuff to track the appearance of
>> any semicolons
>>                    | BracesBlock ...  -- stuff to track the braces and
>> semicolons
>>
>>
>> The way I understand it, the SrcSpanInfo proposal means that we would have
>> lots of empty SrcSpanInfos, no? Most interior nodes don't need one, I think.
>>
>> Popping up a level, I do support the idea of including this info in the
>> AST.
>>
>> Richard
>>
>>
>> On Aug 28, 2014, at 11:54 AM, Simon Peyton Jones <simonpj at microsoft.com>
>> wrote:
>>
>> > In general I’m fine with this direction of travel. Some specifics:
>> >
>> > ·        You’d have to be careful to document, for every data
>> > constructor in HsSyn, what the association between the [SrcSpan] in the
>> > SrcSpanInfo and the “sub-entities”
>> > ·        Many of the sub-entities will have their own SrcSpanInfo
>> > wrapped around them, so there’s some unhelpful duplication. Maybe you only
>> > want the SrcSpanInfo to list the [SrcSpan]s for the sub-entities (like the
>> > syntactic keywords) that do not show up as children in the syntax tree?
>> > Anyway do by all means create a GHC Trac wiki page to describe your
>> > proposed design, concretely.
>> >
>> > Simon
>> >
>> > From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Alan &
>> > Kim Zimmerman
>> > Sent: 28 August 2014 15:00
>> > To: ghc-devs at haskell.org
>> > Subject: GHC AST Annotations
>> >
>> > Now that the landmines have hopefully been cleared from the AST via [1]
>> > I would like to propose changing the location information in the AST.
>> >
>> > Right now the locations of syntactic markers such as do/let/where/in/of
>> > in the source are discarded from the AST, although they are retained in the
>> > rich token stream.
>> >
>> > The haskell-src-exts package deals with this by means of using the
>> > SrcSpanInfo data type [2] which contains the SrcSpan as per the current GHC
>> > Located type but also has a list of SrcSpan s for the  syntactic markers,
>> > depending on the particular AST fragment being annotated.
>> >
>> > In addition, the annotation type is provided as a parameter to the AST,
>> > so that it can be changed as required, see [3].
>> >
>> > The motivation for this change is then
>> >
>> > 1. Simplify the roundtripping and modification of source by explicitly
>> > capturing the missing location information for the syntactic markers.
>> >
>> > 2. Allow the annotation to be a parameter so that it can be replaced
>> > with a different one in tools, for example HaRe would include the tokens for
>> > the AST fragment leaves.
>> >
>> > 3. Aim for some level compatibility with haskell-src-exts so that tools
>> > developed for it could be easily ported to GHC, for example exactprint [4].
>> >
>> >
>> >
>> > I would like feedback as to whether this would be acceptable, or if the
>> > same goals should be achieved a different way.
>> >
>> >
>> >
>> > Regards
>> >
>> >   Alan
>> >
>> >
>> >
>> >
>> > [1] https://phabricator.haskell.org/D157
>> >
>> > [2]
>> > http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-Haskell-Exts-SrcLoc.html#t:SrcSpanInfo
>> >
>> > [3]
>> > http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-Haskell-Exts-Annotated-Syntax.html#t:Annotated
>> >
>> > [4]
>> > http://hackage.haskell.org/package/haskell-src-exts-1.15.0.1/docs/Language-Haskell-Exts-Annotated-ExactPrint.html#v:exactPrint
>> >
>>
>> > _______________________________________________
>> > ghc-devs mailing list
>> > ghc-devs at haskell.org
>> > http://www.haskell.org/mailman/listinfo/ghc-devs
>>
>>
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>


More information about the ghc-devs mailing list