Separating typechecking and type error reporting in two passes?

Tue Nov 29 16:20:19 UTC 2016

Hi,

the following is a rough idea that I came up with while pondering
#12864, which is about better (and fewer!) error messages when the user
forgot an argument to a function, or swapped their order.

The current scheme of type checking and error reporting is the
following (please correct me if I am wrong, I have actually done little
in this area so far):

    The typechecker traverses the syntax tree from top to bottom. Along
    the way, it keeps track of the current context in terms of SDoc
    fragments of the form “In the second argument of…”. It generates
    constraints (e.g. the argument type of a function is the type of the
    actual argument), which it either solves, or propagate outwards as
    “wanted constraints”, which refer to their context, and refer to a
    “hole” in the program where, if this constraint gets solved later,
    some form of evidence gets filled in.
    At the end, if there are any wanted constraints left, they are
    reported as type errors, together with the context they are
    mentioned.
    But with -fdefer-type-errors, they are not reported, but the
    evidence holes are filled in with essentially calls to `error` that
    print the error message at runtime.

The problem I see with this approach is that the type checker has to
remember interesting bits about context that might possibly eventually
be relevant when reporting the error. In particular, it makes it harder
 to report multiple related error messages at once.

So I am wondering if this approach would be better:

    The type checker does *not* bother keeping track of context: It
    traverses the syntax tree, creates constraints, and unsolvable
    constraints get filled with “error evidence”. Basically as if
    -fdefer-type-errors is one.

    Then a second pass is done over the syntax tree. This pass does keep
    track of the context. Whenever it finds some error evidence, it
    reports it.

I see the following advantages:

 * The type checker code gets a bit simpler and tidier.
 * As most compiled programs are type correct [citation needed], the 
   type checker, but not the error reporting, is compiler-performance-
   critical. This might make type checking a (possibly insignificant) 
   tad faster.
 * The error reporting pass can “look around” more. For example, before
   recursing into a pair `(e1,e2)`, it could check for the special case
   of `(e1 ▷ TyError Int Bool, e1 ▷ TyError Bool Int)` and report this
   as one error “Tuple with swapped arguments” instead of two errors.
   The #12864 is a more elaborate application of this.
 * There could even be an intermediate pass over the syntax tree
   between typechecking and error reporting that “transform” or
   “optimizes” the AST by moving type errors around, by coalescing
   error or other things, all with the aim of giving easier to
   understand error messages.
 * (This is getting into the area of wild dreams:)
   Library authors could somehow express custom error messages
   for certain patterns of the form
   If the AST now contains an application of `Library.foo` where
   the first argument is a type error `TyError Int Bool`, then
   give this particular domain-specific error message.

I don’t see any disadvantages now, but there probably are some, and I
would be grateful about feedback from those who actually have worked
with this part of GHC before.

Thanks,
Joachim

-- 
Joachim “nomeata” Breitner
  mail at joachim-breitner.de • https://www.joachim-breitner.de/
  XMPP: nomeata at joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20161129/82d88e7a/attachment.sig>