A couple of GHC-API questions

Fri Jul 25 18:30:34 UTC 2014

On July 24, 2014 at 23:49:10, Simon Peyton Jones (simonpj at microsoft.com) wrote:
> | I think part of the confusion is coming from the fact that we don’t
> | process each module individually, resolve all of the types, and
> | serialize them to disk somewhere. Rather, when we check a module, we
> | parse in ALL of the specifications it could possibly refer to (using
> | the transitive closure of the module imports) and try to resolve all of
> | them before moving to the actual constraint generation and solving
> | step.
>  
> Yes that makes more sense.
>  
> But it's not a good fit with GHC's way of working. You want to take some random module M,  
> parse it afresh to find its LIQUID bits, and then resolve what those things mean. But that  
> *necessarily* means that you must reconstruct the top-level lexical environment,  
> which is formed from the import and top-level declarations. Constructing this environment  
> -- the GlobalRdrEnv -- is not a simple matter. Much of RnNames does precisely that.
>  
> Alas, we do not serialise this environment into M's interface file M.hi. We could do so,  
> and doing so would be somewhat useful for GHCi, but it's absolutely not needed in the usual  
> way of things.

This is basically what I expected, and I wouldn’t want to argue for adding more things to the .hi files unless there are other compelling use cases.

> So I suppose what you could do is to reconstruct it afresh using GHC's existing code to  
> do so. If you call RnNames.rnImports on the import declarations of your module, you'll  
> get back a GlobalRdrEnv in which you can look up the RdrNames that occur in the LIQUID declarations.  
>  
> Actually that'll deal only with the imported things. You also want the locally defined  
> top-level things. To do that you could use things like getLocalNonValBinders (see the  
> call in RnSource.rnSrcDecls), but it might be easier to take the TypeEnv computed from  
> the module's interface file, and turn that back into a bunch of global binders.
>  
> But it's not so easy. What about
>  
> module M( f ) where
>  
> {-# LIQUID f :: { x | ...g... } -> Int
> f = blah
>  
> g = blah
>  
> Here g is only mentioned form f's LIQUID spec. So g will have disappeared altogether from  
> M's interface file, discarded as dead code.

Thanks for these pointers! I’m not too concerned about the dead code issue since Liquid Types generally won’t refer to top-level binders (they’re either going to be functions, which can’t appear in the types in the first place, or constants, which can be inlined).

> The natural alternative would be to serialise the LIQUID things into M.hi. That would  
> entail parsing and renaming them in GHC, and extending the interface file format to handle  
> it. I think it would be simpler. But it would be a bit more invasive of GHC.
>  
>  
> There is some support for serialising random stuff into interface files, called "annotations".  
> You write
> {-# ANN f #-}
> and GHC will record in the interface file that f is associated with expression. See http://www.haskell.org/ghc/docs/latest/html/users_guide/extending-ghc.html#annotation-pragmas  

Yes, I’ve looked at the annotations stuff in the past and have imagined converting LiquidHaskell to a GHC Plugin that pulls the types out of annotations. I *think* it could work, though there are two immediate wrinkles:

1. The docs say that annotations can only be applied to top-level binders/declarations. We’ve found it *very* helpful to annotate nested binders in some cases to reduce the inference burden (it’s also extremely helpful for debugging the code/spec). Is this restriction in place because of the aforementioned issue of not serializing the GlobalRdrEnv? If so, could the restriction be lifted with the caveat that annotations on nested binders will *not* be exported? I think that would be sufficient for us as long as we can get a hold of the Core before any sort of inlining happens.

2. The docs say that you can only annotate binders that are declared in the same module. We’ve similarly found it useful to be able to *assume* stronger types for certain imported functions, though this is of course unsound. Could we also lift the same-module restriction, again with the caveat that the annotations will *not* be exported?

These are just some things to think about, I doubt I would have time to attempt serious changes to LiquidHaskell along these lines until next year at the earliest.

> Maybe there's a neater way to do this, that neither involves changing GHC nor all the faff  
> above. If so I'd be happy to change GHC to accommodate the neater way. I'd like GHC to accommodate  
> extensions like LH much more easily.

I still think that a Template Haskell-based approach could be really nice without requiring any changes to GHC (beyond the profiling issue we’ve run into). The reason I like this idea is that it feels more lightweight to me, users would just express their specifications as Haskell values, and they (and we) wouldn’t have to worry about making sure that the right flags are passed to GHC to find imported modules etc.

Has anyone ever tried to blend Template Haskell with Annotations/Plugins? I’m imagining something like the following

    t_head      = head ::: [liquid| {v:[a] | length v > 0} -> a|]
    head []     = error
    head (x:xs) = x
    {-# ANN head t_head #-}

Here the spec for head exists alongside the implementation as a Haskell value, so it’s immediately available for other tools to build on top of, but it’s also attached to head as an Annotation so that Plugins like a future version of the liquidhaskell verifier, or perhaps an optimization pass, can make use of it during compilation. The ANN pragma seems a bit redundant here, but I really like the idea of having these types readily available at multiple levels and I don’t quite see how to accomplish that with Template Haskell or Annotations alone.

Thanks for all of the comments!

Eric