TTG: Handling Source Locations

Tue Feb 12 14:19:20 UTC 2019

> On Feb 12, 2019, at 5:19 AM, Shayan Najd <sh.najd at gmail.com> wrote:
> 
> About the new code, the convention is straightforward: anytime you
> destruct an AST node, assume a wrapper node inside (add an extra
> case), or use the smart constructors/pattern synonyms.

Aha! This, I did not know. So, you're saying that all the consumers of the GHC AST need to remember to use dL every time they pattern-match. With the new design, using dL when it's unnecessary doesn't hurt, but forgetting it is problematic. So: just use it every time. My problem, though, is that this is just a convention -- no one checks it. It would be easy to forget.

> On Feb 12, 2019, at 6:00 AM, Simon Peyton Jones via ghc-devs <ghc-devs at haskell.org> wrote:
> 
> One way to think of it is this: we can now put SrcSpans where they make sense, rather than everywhere.

This has some logic to it, but I'm not quite sold. Another way of saying this is that the new design favors flexibility for the producer, at the cost of requiring consumers to be aware of and consistently apply the convention Shayan describes above. The problem is, though, that if the producer is stingy in adding source locations, the consumer won't know which locations are expected to be informative. Is the consumer expected to collect locations from a variety of places and try to combine them somehow? I doubt it. So this means that the flexibility for the producer isn't really there -- the type system will accept arbitrary choices of where to put locations, but consumers won't get the locations where they expect them.

>   We can still say (Located t) in places where we want to guarantee a SrcSpan.

This seems to go against the TTG philosophy. We can do this in, say, the return type of a function, but we can't in the AST proper, because that's shared among a number of clients, some of whom don't want the source locations.

> 
> Yes, this lets us add more than one; that's redundant but not harmful.

I disagree here. If we add locations to a node twice, then we'll have to use dL twice to find the underlying constructor. This is another case there the type system offers the producer flexibility but hamstrings the consumer.

> On Feb 12, 2019, at 7:32 AM, Vladislav Zavialov <vladislav at serokell.io> wrote:
> 
> I claim an SrcSpan makes sense everywhere, so this is not a useful
> distinction. Think about it as code provenance, an AST node always
> comes from somewhere

I agree with this observation. Perhaps SrcSpan is a bad name, and SrcProvenance is better. We could even imagine using the new HasCallStack feature to track where generated code comes from (perhaps only in DEBUG compilers). Do we need to do this today? I'm not sure there's a crying need. But philosophically, we are able to attach a provenance to every slice of AST, so there's really no reason for uninformative locations.

> My concrete proposal: let's just put SrcSpan in the extension fields
> of each node

I support this counter-proposal. Perhaps if it required writing loads of extra type instances, I wouldn't be as much in favor. But we already have to write those instances -- they just change from NoExt to SrcSpan. This seems to solve all the problems nicely, at relatively low cost. And, I'm sure it's more efficient at runtime than either the previous ping-pong style or the current scenario, as we can pattern-match on constructors directly, requiring one less pointer-chase or function call.

One downside of this proposal is that it means that more care will have to be taken when setting the extension field of AST nodes after a pass, making sure to preserve the location. (This isn't really all that different from location-shuffling today.) A quick mock-up shows that record-updates can make this easier:

> data Phase = Parsed | Renamed
> 
> data Exp p = Node (XNode p) Int
> 
> type family XNode (p :: Phase)
> type instance XNode p = NodeExt p
> 
> data NodeExt p where
>   NodeExt :: { flag :: Bool, fvs :: RenamedOnly p String } -> NodeExt p
> 
> type family RenamedOnly p t where
>   RenamedOnly Parsed _ = ()
>   RenamedOnly Renamed t = t
> 
> example :: Exp Parsed
> example = Node (NodeExt { flag = True, fvs = () }) 5
> 
> rename :: Exp Parsed -> Exp Renamed
> rename (Node ext n) = Node (ext { fvs = "xyz" }) n

Note that the extension point is a record type that has a field available only after renaming. We can then do a type-changing record update when producing the renamed node, preserving the flag in the code above. What's sad is that, if there were no renamer-only field, we couldn't do a type-changing empty record update as the default case. (Haskell doesn't have empty record updates. Perhaps it should. They would be useful in doing a type-change on a datatype with a phantom index. A clever compiler could even somehow ensure that such a record update is completely compiled away.) In any case, this little example is essentially orthogonal to my points above, and the choice of whether to use records or other structures are completely local to the extension point. I just thought it might make for a nice style.

Thanks,
Richard