Separating typechecking and type error reporting in two passes?

Alex Rozenshteyn rpglover64 at gmail.com
Tue Nov 29 16:48:28 UTC 2016


One concern I have is with the claim that "most compiled programs are type
correct". This has emphatically not been my experience while developing
Haskell. Often, I edit and recompile to find the next type error to fix, or
the new type of the hole; this is especially easy (and automatic) in emacs
with flycheck.

It's also the case that compiling correct programs is (or at least appears
to me to be) much slower than compiling incorrect ones (with GHC 8.0.1, at
least). I worry that this proposal might slow down type errors
significantly. Of course, someone with more (basically any) knowledge of
GHC internals can correct me and allay my fear.

On Tue, Nov 29, 2016 at 8:20 AM Joachim Breitner <mail at joachim-breitner.de>
wrote:

> Hi,
>
> the following is a rough idea that I came up with while pondering
> #12864, which is about better (and fewer!) error messages when the user
> forgot an argument to a function, or swapped their order.
>
>
> The current scheme of type checking and error reporting is the
> following (please correct me if I am wrong, I have actually done little
> in this area so far):
>
>     The typechecker traverses the syntax tree from top to bottom. Along
>     the way, it keeps track of the current context in terms of SDoc
>     fragments of the form “In the second argument of…”. It generates
>     constraints (e.g. the argument type of a function is the type of the
>     actual argument), which it either solves, or propagate outwards as
>     “wanted constraints”, which refer to their context, and refer to a
>     “hole” in the program where, if this constraint gets solved later,
>     some form of evidence gets filled in.
>     At the end, if there are any wanted constraints left, they are
>     reported as type errors, together with the context they are
>     mentioned.
>     But with -fdefer-type-errors, they are not reported, but the
>     evidence holes are filled in with essentially calls to `error` that
>     print the error message at runtime.
>
> The problem I see with this approach is that the type checker has to
> remember interesting bits about context that might possibly eventually
> be relevant when reporting the error. In particular, it makes it harder
>  to report multiple related error messages at once.
>
>
> So I am wondering if this approach would be better:
>
>     The type checker does *not* bother keeping track of context: It
>     traverses the syntax tree, creates constraints, and unsolvable
>     constraints get filled with “error evidence”. Basically as if
>     -fdefer-type-errors is one.
>
>     Then a second pass is done over the syntax tree. This pass does keep
>     track of the context. Whenever it finds some error evidence, it
>     reports it.
>
>
> I see the following advantages:
>
>  * The type checker code gets a bit simpler and tidier.
>  * As most compiled programs are type correct [citation needed], the
>    type checker, but not the error reporting, is compiler-performance-
>    critical. This might make type checking a (possibly insignificant)
>    tad faster.
>  * The error reporting pass can “look around” more. For example, before
>    recursing into a pair `(e1,e2)`, it could check for the special case
>    of `(e1 ▷ TyError Int Bool, e1 ▷ TyError Bool Int)` and report this
>    as one error “Tuple with swapped arguments” instead of two errors.
>    The #12864 is a more elaborate application of this.
>  * There could even be an intermediate pass over the syntax tree
>    between typechecking and error reporting that “transform” or
>    “optimizes” the AST by moving type errors around, by coalescing
>    error or other things, all with the aim of giving easier to
>    understand error messages.
>  * (This is getting into the area of wild dreams:)
>    Library authors could somehow express custom error messages
>    for certain patterns of the form
>    If the AST now contains an application of `Library.foo` where
>    the first argument is a type error `TyError Int Bool`, then
>    give this particular domain-specific error message.
>
> I don’t see any disadvantages now, but there probably are some, and I
> would be grateful about feedback from those who actually have worked
> with this part of GHC before.
>
> Thanks,
> Joachim
>
> --
> Joachim “nomeata” Breitner
>   mail at joachim-breitner.dehttps://www.joachim-breitner.de/
>   XMPP: nomeata at joachim-breitner.de • OpenPGP-Key: 0xF0FBF51F
>   Debian Developer: nomeata at debian.org
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20161129/6aa21719/attachment.html>


More information about the ghc-devs mailing list