Tracking down instances from use-sites

Tue Jun 26 17:21:33 UTC 2018

Christopher Done <chrisdone at gmail.com> writes:

> Hi all,
>
> Given a TypecheckedModule, what's the most direct way given a Var
> expression retrieved from the AST, to determine:
>
> 1) that it's a class method e.g. `read`
> 2) that it's a generic call (no instance chosen) e.g. `Read a => a -> String`
> 3) or if it's a resolved instance, then which instance is it and which
> package, module and declaration is that defined in?
>
> Starting with this file that has a TypecheckedModule in it:
> https://gist.github.com/chrisdone/6fcb9f1cba6324148d481fcd4eab6af6#file-ghc-api-hs-L23
>
> I presume at this point that instance resolution has taken place. I'm
> not sure that dictionaries or chosen instances are inserted into the
> AST, or whether just the resolved types are inserted e.g. `Int ->
> String`, where I want e.g. `Read Int`, which might lead me to finding
> the matching instance from an InstEnv or so.
>
> I'd like to do some analyses of Haskell codebases, and the fact that
> calls to class methods are opaque is a bit of a road-blocker. Any
> handy tips? Prior work?
>
> It'd be neat in tooling to just hit a goto-definition key on `read`
> and be taken to the instance implementation rather than the class
> definition.
>
Indeed that would be great.

I believe (1) is quite straightforward: You can recognize a class
operation by looking at the function's IdDetails (specifically looking
for ClassOpId). This contains the Class to which the method belongs.

Getting back to the instance is a bit trickier. I'll admit I don't know
whether there is a convenient way to do this. However, I can try to fill
in some background and give a few ideas. First let's review of
how typeclass evidence is represented in HsSyn (apologies if this is
already known): For concreteness, let's consider the program,

    showList :: Show a => [a] -> String
    showList x = show x

After typechecking this will likely turn into something like (taken from
the output of -ddump-tc -fprint-typechecker-elaboration):

    AbsBindsSig [a_a1hj] [$dShow_a1hl]
        {Exported type: Hi.showList :: forall a. Show a => [a] -> String
                        [LclId]
        Bind: showList_a1hk x_azo = show @ [a_a1hj] $dShow_a1hn x_azo
        Evidence: EvBinds{[W] $dShow_a1hn
                            = GHC.Show.$fShow[] @[a_a1hj] [$dShow_a1hl]}}

This AbsBind represents a binding abstracted over a dictionary argument
($dShow_a1hl :: Show a_a1hj). The "Evidence" section gives
a list of evidence bindings which the desugarer will wrap the RHS in; in
this case the typechecker has built a `Show [a_a1hj]` instance from the
`Show a => Show [a]` instance defined in GHC.Show and the abstracted
`$dShow_A1hl` dictionary.

The `show` call site will then look something like this in HsSyn:

    HsApp
      (HsWrap
          (WpEvApp $dShow_a1hn)
          (HsWrap
              (WpTyApp a_a1hj)
              (HsVar GHC.Show.show)))
      (HsVar x_azo)

Here the typechecker has wrapped the (show x_azo) expression in a pair
of HsWrappers which apply its type and dictionary arguments.

This suggests an approach to identify "generic" call sites (item (2)
above): look at whether the RHS of the call site's dictionary is
lambda-bound or not. In the above case we see that it is not
lambda-bound but rather a concrete dictionary: `GHC.Show.$fShow[]`. You
can know that this is a dictionary by looking at its IdDetails
(specifically, it is of the DFunId variety).

By contrast if we have a generic call-site:

    printIt :: Show a => a -> IO ()
    printIt x = putStrLn $ show x

We see that we the evidence binding is headed by a lambda-bound dictionary:

    AbsBindsSig [a_a1AP] [$dShow_a1AR]
      {Exported type: printIt :: forall a. Show a => a -> IO ()
                      [LclId]
      Bind: printIt_a1AQ x_a12W
              = putStrLn $ show @ a_a1AP $dShow_a1AV x_a12W
      Evidence: EvBinds{[W] $dShow_a1AV = $dShow_a1AR}}

Of course, in the case that you have a concrete dictionary you *also*
want to know the source location of the instance declaration from which
it arose. I'm afraid this may be quite challenging as this isn't
information we currently keep. Currently interface files don't really
keep any information that might be useful to IDE tooling users. It's
possible that we could add such information, although it's unclear
exactly what this would look like. It would be great to hear more from
tooling users regarding what information they would like to see.

Also relevant here is the HIE file GSoC project [1] being worked on this
summer of Zubin Duggal (CC'd).

> Also, listing all functions that use throw# or functions defined in
> terms of throw# or FFI calls would be helpful, especially for doing
> audits. If I could immediately list all partial functions in a
> project, then list all call-sites, it would be a very convenient way
> when doing an audit to see whether partial functions (such as head)
> are used with the proper preconditions or not.
>
This may be non-trivial; you may be able to get something along these
lines out of the strictness signature present in IdInfo. However, I
suspect this will be a bit fragile (e.g. we don't even run demand
analysis with -O0 IIRC).

Cheers,

- Ben

[1] https://ghc.haskell.org/trac/ghc/wiki/HIEFiles
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20180626/5c7f85e3/attachment.sig>