Proposal: require Haddock comment for every new top-level function and type in GHC source code

Edward Kmett ekmett at
Wed Jul 2 15:56:09 UTC 2014

That has a high chance of backfiring and requiring everyone to use do {
...; ... } with explicit braces and semis. ;)


On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow <marlowsd at> wrote:

> Agreed, let's do it.  Thanks for the well-argued proposal.
> Next up: consistent style :-)
> Cheers,
> Simon
> On 27/06/2014 10:51, Johan Tibell wrote:
>> Hi!
>> I found myself exploring new parts of the GHC code base the last few
>> weeks (exciting!), which again reminded me of my biggest frustration
>> when working on GHC: the lack of per-function/type (Haddock) comments.
>> GHC code is sometimes commented with "notes", which are great but tend
>> to (1) mostly cover the exceptional cases and (2) talk about the
>> implementation of a function, not how a caller might use it or why.
>> Lack of documentation, in GHC and other software projects, usually has
>> (at least) two causes:
>>   * Programmers comment code they think is "complex enough to warrant a
>>     comment". The problem is that the author is usually a poor judge of
>>     what's complex enough, because he/she is too familiar with the code
>>     and tends to under-document code when following this principle.
>>   * Documenting is boring and tends to have little benefit the person
>>     writing to documentation. Given lack of incentives we tend to
>>     document less than we ought to.
>> I've only seen one successful way to combat the lack of documentation
>> that stems from the above: have the project's style guide mandate that
>> top-level functions and types (or at least those that are exported) have
>> documentation. This works well at Google.
>> Anecdote: we have one code base inside Google that was until recently
>> exempt from this rule and documentation is almost completely absent in
>> that code base, even though hundreds of engineers work on and need to
>> understand it every day. This breeds institutional knowledge problems
>> i.e. if the author of a core piece of code leaves, lots of knowledge is
>> lost.
>> *Proposal: *I propose that we require that new top-level functions and
>> types have Haddock comments, even if they start out as a single, humble
>> sentence.
>> I've found that putting even that one sentence (1) helps new users and
>> (2) establishes a place for improvements to be made. There's a strong
>> "broken window" effect to lack of comments, in that lack of comments
>> breeds more lack of comments as developers follow established practices.
>> We should add this requirement to the style guide. Having it as a
>> written down policy tends to prevent having to re-hash the whole
>> argument about documentation over and over again. This has also helped
>> us a lot at Google, because programmers can spend endless amount of time
>> arguing about comments, placement of curly braces, etc. and having a
>> written policy helps cut down on that.
>> To give an idea of how to write good comments, here are two examples of
>> undocumented code I ran into in GHC and how better comments would have
>> helped.
>> *First example*
>> In compiler/nativeGen/X86/Instr.hs there's a (local) function called
>> mkRUR, which is a helper function use when computing instruction
>> register usage.
>> The first question that I asked upon seeing uses of that function was
>> "what does RUR stand for?" Given the context the function is in, I
>> guessed it stands for read-update-read, because R is used to mean "read"
>> in the enclosing function and "updating" is related to "reading" so that
>> must be what U stands for. It turns out that it stands for
>> RegUsageReadonly. Here's a comment that would have captured, in a single
>> sentence, what this function is for:
>>      -- | Create register usage info for instruction that only
>>      -- reads registers.
>>      mkRUR src = src' `seq` RU src' []
>>          where src' = filter (interesting platform) src
>> That already a big improvement. A note about the register filtering,
>> which means that not all registers you pass to the function will be
>> recorded as being read in the end, could also be useful.
>> Aside: providing a type signature, which would have made it clear that
>> the return type is RU, might also have helped in this particular case.
>> *Second example*
>> In the same file there a function called x86_regUsageOfInstr. It's the
>> function that encloses the local function mkRUR above.
>> I could figure out that this function has something to do with register
>> usage, of the instruction passed as an argument, and that register usage
>> is important for the register allocator. However, trying to understand
>> in more detail what that meant was more of challenge than it needed to
>> be. First, a comment more clearly explaining what computing register
>> usage means in practice would be helpful:
>>      -- | Returns which registers are read and written by this
>>      -- instruction, as a (read, written) pair. This info is used
>>      -- by the register allocator.
>>      x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
>> The reason mentioning that the return value is essentially a (read,
>> written) pair is helpful is because the body of the function a big case
>> statement full of lines like this one:
>>      GCMP _ src1 src2 -> mkRUR [src1,src2]
>>      ...
>>      FDIV _ src  dst  -> usageRM src dst
>> It's not immediately clear that all the various helper functions used
>> here just end up computing a pair of the above form. A top-level comment
>> lets you understand what's going on without understanding exactly what
>> all these helper functions are doing.
>> Thoughts?
>> -- Johan
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at
>>  _______________________________________________
> ghc-devs mailing list
> ghc-devs at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the ghc-devs mailing list