Separating typechecking and type error reporting in two passes?
Joachim Breitner
mail at joachim-breitner.de
Tue Nov 29 16:20:19 UTC 2016
Hi,
the following is a rough idea that I came up with while pondering
#12864, which is about better (and fewer!) error messages when the user
forgot an argument to a function, or swapped their order.
The current scheme of type checking and error reporting is the
following (please correct me if I am wrong, I have actually done little
in this area so far):
The typechecker traverses the syntax tree from top to bottom. Along
the way, it keeps track of the current context in terms of SDoc
fragments of the form “In the second argument of…”. It generates
constraints (e.g. the argument type of a function is the type of the
actual argument), which it either solves, or propagate outwards as
“wanted constraints”, which refer to their context, and refer to a
“hole” in the program where, if this constraint gets solved later,
some form of evidence gets filled in.
At the end, if there are any wanted constraints left, they are
reported as type errors, together with the context they are
mentioned.
But with -fdefer-type-errors, they are not reported, but the
evidence holes are filled in with essentially calls to `error` that
print the error message at runtime.
The problem I see with this approach is that the type checker has to
remember interesting bits about context that might possibly eventually
be relevant when reporting the error. In particular, it makes it harder
to report multiple related error messages at once.
So I am wondering if this approach would be better:
The type checker does *not* bother keeping track of context: It
traverses the syntax tree, creates constraints, and unsolvable
constraints get filled with “error evidence”. Basically as if
-fdefer-type-errors is one.
Then a second pass is done over the syntax tree. This pass does keep
track of the context. Whenever it finds some error evidence, it
reports it.
I see the following advantages:
* The type checker code gets a bit simpler and tidier.
* As most compiled programs are type correct [citation needed], the
type checker, but not the error reporting, is compiler-performance-
critical. This might make type checking a (possibly insignificant)
tad faster.
* The error reporting pass can “look around” more. For example, before
recursing into a pair `(e1,e2)`, it could check for the special case
of `(e1 ▷ TyError Int Bool, e1 ▷ TyError Bool Int)` and report this
as one error “Tuple with swapped arguments” instead of two errors.
The #12864 is a more elaborate application of this.
* There could even be an intermediate pass over the syntax tree
between typechecking and error reporting that “transform” or
“optimizes” the AST by moving type errors around, by coalescing
error or other things, all with the aim of giving easier to
understand error messages.
* (This is getting into the area of wild dreams:)
Library authors could somehow express custom error messages
for certain patterns of the form
If the AST now contains an application of `Library.foo` where
the first argument is a type error `TyError Int Bool`, then
give this particular domain-specific error message.
I don’t see any disadvantages now, but there probably are some, and I
would be grateful about feedback from those who actually have worked
with this part of GHC before.
Thanks,
Joachim
--
Joachim “nomeata” Breitner
mail at joachim-breitner.de • https://www.joachim-breitner.de/
XMPP: nomeata at joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20161129/82d88e7a/attachment.sig>
More information about the ghc-devs
mailing list