Separating typechecking and type error reporting in two passes?
Simon Peyton Jones
simonpj at microsoft.com
Wed Nov 30 10:51:34 UTC 2016
| Then a second pass is done over the syntax tree. This pass does
| keep
| track of the context. Whenever it finds some error evidence, it
| reports it.
The syntax tree is a big type. A second pass would be a fairly big deal. But doable.
You'd need to be able to look at the nature of the error; SDocs won't do. Maybe errors become a data type. That might be a good idea but it would be another big deal.
I very much doubt that you'll be able to discard the context information from the type checker. Maybe some of it. I can't say exactly why, it's a gut feel for now.
There is a rich literature on type errors, worth digging into. Alejandro Serrano and his adviser Juraiaan Hage are thinking about this at the moment. SherLoc was interesting too; worked for Haskell.
Lennart gave a nice talk at the Haskell Implementors meeting last year about error provenance. Maybe available online?
Simon
| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of
| Joachim Breitner
| Sent: 29 November 2016 16:20
| To: ghc-devs at haskell.org
| Subject: Separating typechecking and type error reporting in two
| passes?
|
| Hi,
|
| the following is a rough idea that I came up with while pondering
| #12864, which is about better (and fewer!) error messages when the
| user forgot an argument to a function, or swapped their order.
|
|
| The current scheme of type checking and error reporting is the
| following (please correct me if I am wrong, I have actually done
| little in this area so far):
|
| The typechecker traverses the syntax tree from top to bottom.
| Along
| the way, it keeps track of the current context in terms of SDoc
| fragments of the form “In the second argument of…”. It generates
| constraints (e.g. the argument type of a function is the type of
| the
| actual argument), which it either solves, or propagate outwards as
| “wanted constraints”, which refer to their context, and refer to a
| “hole” in the program where, if this constraint gets solved later,
| some form of evidence gets filled in.
| At the end, if there are any wanted constraints left, they are
| reported as type errors, together with the context they are
| mentioned.
| But with -fdefer-type-errors, they are not reported, but the
| evidence holes are filled in with essentially calls to `error`
| that
| print the error message at runtime.
|
| The problem I see with this approach is that the type checker has to
| remember interesting bits about context that might possibly eventually
| be relevant when reporting the error. In particular, it makes it
| harder to report multiple related error messages at once.
|
|
| So I am wondering if this approach would be better:
|
| The type checker does *not* bother keeping track of context: It
| traverses the syntax tree, creates constraints, and unsolvable
| constraints get filled with “error evidence”. Basically as if
| -fdefer-type-errors is one.
|
| Then a second pass is done over the syntax tree. This pass does
| keep
| track of the context. Whenever it finds some error evidence, it
| reports it.
|
|
| I see the following advantages:
|
| * The type checker code gets a bit simpler and tidier.
| * As most compiled programs are type correct [citation needed], the
| type checker, but not the error reporting, is compiler-performance-
| critical. This might make type checking a (possibly insignificant)
| tad faster.
| * The error reporting pass can “look around” more. For example,
| before
| recursing into a pair `(e1,e2)`, it could check for the special
| case
| of `(e1 ▷ TyError Int Bool, e1 ▷ TyError Bool Int)` and report this
| as one error “Tuple with swapped arguments” instead of two errors.
| The #12864 is a more elaborate application of this.
| * There could even be an intermediate pass over the syntax tree
| between typechecking and error reporting that “transform” or
| “optimizes” the AST by moving type errors around, by coalescing
| error or other things, all with the aim of giving easier to
| understand error messages.
| * (This is getting into the area of wild dreams:)
| Library authors could somehow express custom error messages
| for certain patterns of the form
| If the AST now contains an application of `Library.foo` where
| the first argument is a type error `TyError Int Bool`, then
| give this particular domain-specific error message.
|
| I don’t see any disadvantages now, but there probably are some, and I
| would be grateful about feedback from those who actually have worked
| with this part of GHC before.
|
| Thanks,
| Joachim
|
| --
| Joachim “nomeata” Breitner
| mail at joachim-breitner.de •
| https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.j
| oachim-
| breitner.de%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C41dcebe5f26043
| 94f94108d41873a1f0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636160
| 332322948063&sdata=hoylWVvBSQjqGmm0ICnr3nDFz80waxDStDOjWDPQb5A%3D&rese
| rved=0
| XMPP: nomeata at joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
| Debian Developer: nomeata at debian.org
More information about the ghc-devs
mailing list