Interested to help with error messages

Sat Jun 3 15:50:34 UTC 2017

CCing,
   * Alfredo Di Napoli for his on-going work in this area
   * Shivansh Rai for his interest in contributing
   * David Luposchainsky for his recent pretty-printer library
   * Richard Eisenberg due to his participation in #8809
   * Bartosz for his participation in #10735
   * Alan Zimmerman for his interest in Haskell tooling

My apologies for the tome that follows. I have been thinking about this
problem recently and think an overview of where we stand would be helpful.

Siddharth Bhat <siddu.druid at gmail.com> writes:

> Hello all,
>
> Thanks for all the work that's put into GHC :)
>
Thanks for your interest in helping!

> I've tried to get into GHC development before, but I was unsuccessful,
> mostly because I didn't dedicate enough time to understanding the problem
> at hand & exploring the codebase.
>
> I'd like to give it another shot. This time, I think I have a clear vision
> of what I want to help with: Have haskell's error messages be easier to
> read and understand.
>
> 1. Colors and layout to highlight important parts of the error messages

As I say below, I think #8809 will provide a good foundation for
improvements here. More on this below.

> 2. Clear formatting & naming of errors, so they're easily googleable,
> stack-overflow able, etc.

Indeed this is a great goal. Do you have a list of error messages that
you think are particularly egregious in this respect? Are you advocating
that we give error classes unique identifiers (e.g. as rustc does IIRC)
or are you merely suggesting that we improve the wording of the existing
messages?

> 3. better hints with error messages, and perhaps integrated lints(?).

This sounds like a noble goal, but it's a bit unclear how you get there.
We currently do try to give hints where possible, but of course we could
always offer more. It would be helpful to have a set of concrete
examples to discuss.

> 4. I don't know if this is already possible, but allowing GHC errors to be
> shipped off as JSON or something to interested tooling.
>
Indeed, this would be great. Thanks to Matthew Pickering we already
offer some limited form of this in 8.2 [1], but I think having more
structured error documents as suggested in #8809 would make this even
nicer.

[1] https://downloads.haskell.org/~ghc/master/users-guide//debugging.html?highlight=json#ghc-flag--ddump-json

The State of #8809
==================

> I saw this ticket on trac: https://ghc.haskell.org/trac/ghc/ticket/8809
> I would like to take this up, but I'd like help / pointers and stuff. I
> have GHC setup, I know how to use phabricator, but.. where do I start? :)
>
This ticket has recently seen quite a bit of activity and I've been
meaning to write down some thoughts on it. Here it goes:

Currently Alfredo Di Napoli is working [2] on the `pretty` library to
both improve performance and allow us to drop GHC's fork (see #10735),
perhaps to use annotated pretty-printer documents. Meanwhile, David
Luposchainsky, has recently released [3] his `prettyprinter` library
which may serve as a drop-in replacement to `pretty` and handles all of
the cases that Alfredo is working on. Moreover, Shivansh Rai has also
recently expressed interest in helping out with this effort.

All of this is great news: I have been hoping we'd get Idris-style
errors for quite some time. However, given how many hands we have in
this area, we should be careful not to step on each toes. Below I'll
describe the various facets of the task as I see them.

[2] https://github.com/haskell/pretty/pull/43
[3] https://www.reddit.com/r/haskell/comments/6e62i5/ann_prettyprinter_10_ending_the_wadlerleijen_zoo/

# Choice of pretty printer

It seems like we first need to resolve the question of whether switching
from (our fork of) `pretty` to the `prettyprinter` library is
worthwhile. The argument for moving to `prettyprinter` is its support
for optimized infinite-band-width layout, which is one of the things
holding us back from moving back to `pretty`.

However, there are two impediments to switching,

 * `prettyprinter` depends upon the `text` package while GHC does not.
   Making GHC dependent on `text` is an option, but we should be
   careful. Adding a dependency has a non-trivial cost (GHC build times
   rise, GHC API users are stuck using whatever dependency versions GHC
   uses, release engineering is a bit more complicated).

   Currently GHC has its own abstractions for working with text
   efficiently, LitString and FastString. FastString is used throughout
   the compiler, including the pretty-printer, and represents a
   dense UTF-8 buffer (and a hash for quick comparison). It's not clear that we
   would want to move it to `text` as this would introduce UTF-8/UTF-16
   conversion.

 * `prettyprinter` doesn't support rendering to `String`. This
   essentially means that we either use `Text` or fork the package.
   However, if we have already decided on depending on `text`, then
   perhaps the former isn't so bad.

It's unclear to me exactly how difficult switching would be compared to
finishing up the work Alfredo has started on `pretty`. Alfredo, what is
your opinion?

If we decide against moving to `prettyprinter`, then we will need to
finish up something like Alfredo's `pretty` patches to rid GHC of its
fork.

# Representing rich error messages in GHC

In my opinion we should avoid baking more stylistic decisions (e.g. printing
types in red, terms in blue) into the modules like TcErrors which produce
error messages. This is why I propose that we use annotated
pretty-printer documents in #8809 (see comment 3). This would allow us
to represent the typical things seen in GHC error messages (e.g. types,
terms, spans, binders, etc.) in structured form, allowing the error
message consumer (e.g. GHC itself, a GHC API user, or some JSON error
sink) to make decisions about how to present these elements to the user.

I think this approach give us a much better story for dealing with the
problems currently solved by flags like -fprint-runtime-reps,
-fprint-explicit-kinds, etc., especially for users using an IDE.

As far as I can recall, there was still a bit of disagreement
surrounding whether the values carried by the error message should be
statically or dynamically typed. In particular, Richard Eisenberg
advocated that error message documents look like,

    -- A dynamically typed value embedded in an error message
    data ErrItem = forall a. (Outputable a, Typeable a). ErrItem a

    type ErrDoc = Doc ErrItem

Whereas I argue that this would quickly become unmaintainable,
especially when one considers GHC API users. Rather, I say that we
should encode the "vocabulary" of things that may appear in an error
message explicitly,

    data ErrItem = ErrType Type
                 | ErrSpan Span
                 | ErrTerm HsExpr
                 | ErrInstance ClsInst
                 | ErrVar  Var
                 | ...

While there are good arguments for both options, although I think that
in balance an explicit approach will be better for consumers. Anyways,
this is a question that will need to be answered.

Once there is consensus I think it shouldn't be too difficult to move
things forward. The change can be made incrementally and for the most
part should only touch a few modules (with the bulk in TcErrors).

## What do we represent?

There is also the question of what the vocabulary of embeddable items
should consist of. I think the above are pretty non-controversial but I
can think of a variety of items which would more precisely capture
some common patterns,

    data ErrItem = ...
                 | ErrExpectedActual Type Type
                   -- ^ e.g. "Expected type: ty1, Actual type: ty2"
                 | ErrContext Type
                   -- ^ Like ErrType but specifically captures a context
                 | ErrPotentialInstances [ClsInst]
                   -- ^ A list of potentially matching instances
                 | ...

Exactly how far we want to go is something that would need to be
decided. I think we would want to start with the minimal set initially
proposed and then introduce additional items as we gain experience with
the scheme.

# Using rich error messages

Once we have GHC producing rich error documents we can teach GHC's
command line driver to prettify them. We can also teach haskell-mode,
ghc-mod, and friends to preserve their structure to give the user an
Idris-like experience.

Exactly how many stylistic decisions we want GHC to make is a tricky
question; this is prime territory for bike-shedding and people tend to
have rather strong aesthetic beliefs; keeping things simple while
satisfying all tastes may be a challenge.

# Summary

Above I discussed several tasks and a few questions,

 * We need to decide on whether David's `prettyprinter` library is right
   for GHC; having a prototype patch introducing it to the tree would
   help in evaluating this. Alfredo, what is your opinion here?

 * If not we need to drop our fork of `pretty` in favor of upstream

 * We need consensus on whether Idris-style annotated pretty-printer
   documents are the right approach for GHC (I think we are close to
   this)

 * If we want annotated documents, should the items be statically or
   dynamically typed?

 * Once these questions are resolved we can start introducing
   annotations into GHC's error documents (this shouldn't be hard)

 * Then we can teach GHC and associated tooling to pretty-print these
   rich messages prettily

There is certainly a fair bit of work here although it's not
obvious how to parallelize it across all of the interested
parties. Regardless, I would be happy to advise on any bit of this.

Cheers,

- Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170603/55d29c4d/attachment.sig>