Update on HIE Files

Zubin Duggal zubin.duggal at gmail.com
Tue Jun 26 10:48:24 UTC 2018


Hello all,

I've been working on the HIE File (
https://ghc.haskell.org/trac/ghc/wiki/HIEFiles) GSOC project,

The design of the data structure as well as the traversal of GHCs ASTs to
collect all the relevant info is mostly complete.

We traverse the Renamed and Typechecked AST to collect the following info
about each SrcSpan

1) Its type, if it corresponds to a binding, pattern or expression
2) Details about any tokens in the original source corresponding to this
span(keywords, symbols, etc.)
3) The set of Constructor/Type pairs that correspond to this span in the
GHC AST
4) Details about all the identifiers that occur at this SrcSpan

For each occurrence of an identifier(Name or ModuleName), we store its
type(if it has one), and classify it as one of the following based on how
it occurs:

1) Use
2) Import/Export
3) Pattern Binding, along with the scope of the binding, and the span of
the entire binding location(including the RHS) if it occurs as part of a
top level declaration, do binding or let/where binding
4) Value Binding, along with whether it is an instance binding or not, its
scope, and the span of its entire binding site, including the RHS
5) Type Declaration (class or regular) (foo :: ...)
6) Declaration(class, type, instance, data, type family etc.)
7) Type variable binding, along with its scope(which takes into account
ScopedTypeVariables)

I have updated the wiki page with more details about the Scopes associated
with bindings:
https://ghc.haskell.org/trac/ghc/wiki/HIEFiles#Scopeinformationaboutsymbols

These annotated SrcSpans are then arranged into a interval/rose tree to aid
lookups.

We assume that no SrcSpans ever partially overlap, for any two SrcSpans
that occur in the Renamed/Typechecked ASTs, either they are equal,
disjoint, or strictly contained in each other. This assumption has mostly
held out so far while testing on the entire ghc:HEAD tree, other than one
case where the typechecker strips out parenthesis in the original source,
which has been patched(see https://ghc.haskell.org/trac/ghc/ticket/15242).

I have also written functions that lookup the binding site(including RHS)
and scope of an identifier from the tree. Testing these functions on the
ghc:HEAD tree, it succeeds in looking up scopes for almost all symbol
occurrences in all source files, and I've also verified that the calculated
scope always contains all the occurrences of the symbol. The few cases
where this check fails is where the SrcSpans have been mangled by CPP(see
https://ghc.haskell.org/trac/ghc/ticket/15279).

The code for this currently lives here:
https://github.com/haskell/haddock/compare/ghc-head...wz1000:hiefile-2

Moving forward, the plan for the rest of the summer is

1) Move this into the GHC tree and add a flag that controls generating this
2) Write serializers and deserializers for this info
3) Teach the GHC PackageDb about .hie files
4) Rewrite haddocks --hyperlinked-source to use .hie files.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20180626/2d4b5940/attachment.html>


More information about the ghc-devs mailing list