Proposal: require Haddock comment for every new top-level function and type in GHC source code

Austin Seipp austin at
Wed Jul 2 16:04:07 UTC 2014

I also support this proposal (I actually tripped up on these exact
functions as well!) As stated elsewhere, I think this should be the
case for all top-level functions, not just exported ones.

Of course, I'd also like it if this rule explicitly extended to
top-level data types, type classes, etc as well. I believe that was
the intention but I'm just making sure. :)

(Finally, I actually would like some kind of mechanical enforcement of
this, but I don't think it has to be a hard rule - we shouldn't reject
things on that basis alone. I'm not sure how we would do that anyway,

On Fri, Jun 27, 2014 at 4:51 AM, Johan Tibell <johan.tibell at> wrote:
> Hi!
> I found myself exploring new parts of the GHC code base the last few weeks
> (exciting!), which again reminded me of my biggest frustration when working
> on GHC: the lack of per-function/type (Haddock) comments.
> GHC code is sometimes commented with "notes", which are great but tend to
> (1) mostly cover the exceptional cases and (2) talk about the implementation
> of a function, not how a caller might use it or why.
> Lack of documentation, in GHC and other software projects, usually has (at
> least) two causes:
> Programmers comment code they think is "complex enough to warrant a
> comment". The problem is that the author is usually a poor judge of what's
> complex enough, because he/she is too familiar with the code and tends to
> under-document code when following this principle.
> Documenting is boring and tends to have little benefit the person writing to
> documentation. Given lack of incentives we tend to document less than we
> ought to.
> I've only seen one successful way to combat the lack of documentation that
> stems from the above: have the project's style guide mandate that top-level
> functions and types (or at least those that are exported) have
> documentation. This works well at Google.
> Anecdote: we have one code base inside Google that was until recently exempt
> from this rule and documentation is almost completely absent in that code
> base, even though hundreds of engineers work on and need to understand it
> every day. This breeds institutional knowledge problems i.e. if the author
> of a core piece of code leaves, lots of knowledge is lost.
> Proposal: I propose that we require that new top-level functions and types
> have Haddock comments, even if they start out as a single, humble sentence.
> I've found that putting even that one sentence (1) helps new users and (2)
> establishes a place for improvements to be made. There's a strong "broken
> window" effect to lack of comments, in that lack of comments breeds more
> lack of comments as developers follow established practices.
> We should add this requirement to the style guide. Having it as a written
> down policy tends to prevent having to re-hash the whole argument about
> documentation over and over again. This has also helped us a lot at Google,
> because programmers can spend endless amount of time arguing about comments,
> placement of curly braces, etc. and having a written policy helps cut down
> on that.
> To give an idea of how to write good comments, here are two examples of
> undocumented code I ran into in GHC and how better comments would have
> helped.
> First example
> In compiler/nativeGen/X86/Instr.hs there's a (local) function called mkRUR,
> which is a helper function use when computing instruction register usage.
> The first question that I asked upon seeing uses of that function was "what
> does RUR stand for?" Given the context the function is in, I guessed it
> stands for read-update-read, because R is used to mean "read" in the
> enclosing function and "updating" is related to "reading" so that must be
> what U stands for. It turns out that it stands for RegUsageReadonly. Here's
> a comment that would have captured, in a single sentence, what this function
> is for:
>     -- | Create register usage info for instruction that only
>     -- reads registers.
>     mkRUR src = src' `seq` RU src' []
>         where src' = filter (interesting platform) src
> That already a big improvement. A note about the register filtering, which
> means that not all registers you pass to the function will be recorded as
> being read in the end, could also be useful.
> Aside: providing a type signature, which would have made it clear that the
> return type is RU, might also have helped in this particular case.
> Second example
> In the same file there a function called x86_regUsageOfInstr. It's the
> function that encloses the local function mkRUR above.
> I could figure out that this function has something to do with register
> usage, of the instruction passed as an argument, and that register usage is
> important for the register allocator. However, trying to understand in more
> detail what that meant was more of challenge than it needed to be. First, a
> comment more clearly explaining what computing register usage means in
> practice would be helpful:
>     -- | Returns which registers are read and written by this
>     -- instruction, as a (read, written) pair. This info is used
>     -- by the register allocator.
>     x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
> The reason mentioning that the return value is essentially a (read, written)
> pair is helpful is because the body of the function a big case statement
> full of lines like this one:
>     GCMP _ src1 src2 -> mkRUR [src1,src2]
>     ...
>     FDIV _ src  dst  -> usageRM src dst
> It's not immediately clear that all the various helper functions used here
> just end up computing a pair of the above form. A top-level comment lets you
> understand what's going on without understanding exactly what all these
> helper functions are doing.
> Thoughts?
> -- Johan
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at


Austin Seipp, Haskell Consultant
Well-Typed LLP,

More information about the ghc-devs mailing list