Proposal: require Haddock comment for every new top-level function and type in GHC source code

Wed Jul 2 08:08:10 UTC 2014

Agreed, let's do it.  Thanks for the well-argued proposal.

Next up: consistent style :-)

Cheers,
Simon

On 27/06/2014 10:51, Johan Tibell wrote:
> Hi!
>
> I found myself exploring new parts of the GHC code base the last few
> weeks (exciting!), which again reminded me of my biggest frustration
> when working on GHC: the lack of per-function/type (Haddock) comments.
>
> GHC code is sometimes commented with "notes", which are great but tend
> to (1) mostly cover the exceptional cases and (2) talk about the
> implementation of a function, not how a caller might use it or why.
>
> Lack of documentation, in GHC and other software projects, usually has
> (at least) two causes:
>
>   * Programmers comment code they think is "complex enough to warrant a
>     comment". The problem is that the author is usually a poor judge of
>     what's complex enough, because he/she is too familiar with the code
>     and tends to under-document code when following this principle.
>   * Documenting is boring and tends to have little benefit the person
>     writing to documentation. Given lack of incentives we tend to
>     document less than we ought to.
>
> I've only seen one successful way to combat the lack of documentation
> that stems from the above: have the project's style guide mandate that
> top-level functions and types (or at least those that are exported) have
> documentation. This works well at Google.
>
> Anecdote: we have one code base inside Google that was until recently
> exempt from this rule and documentation is almost completely absent in
> that code base, even though hundreds of engineers work on and need to
> understand it every day. This breeds institutional knowledge problems
> i.e. if the author of a core piece of code leaves, lots of knowledge is
> lost.
>
> *Proposal: *I propose that we require that new top-level functions and
> types have Haddock comments, even if they start out as a single, humble
> sentence.
>
> I've found that putting even that one sentence (1) helps new users and
> (2) establishes a place for improvements to be made. There's a strong
> "broken window" effect to lack of comments, in that lack of comments
> breeds more lack of comments as developers follow established practices.
>
> We should add this requirement to the style guide. Having it as a
> written down policy tends to prevent having to re-hash the whole
> argument about documentation over and over again. This has also helped
> us a lot at Google, because programmers can spend endless amount of time
> arguing about comments, placement of curly braces, etc. and having a
> written policy helps cut down on that.
>
> To give an idea of how to write good comments, here are two examples of
> undocumented code I ran into in GHC and how better comments would have
> helped.
>
> *First example*
> In compiler/nativeGen/X86/Instr.hs there's a (local) function called
> mkRUR, which is a helper function use when computing instruction
> register usage.
>
> The first question that I asked upon seeing uses of that function was
> "what does RUR stand for?" Given the context the function is in, I
> guessed it stands for read-update-read, because R is used to mean "read"
> in the enclosing function and "updating" is related to "reading" so that
> must be what U stands for. It turns out that it stands for
> RegUsageReadonly. Here's a comment that would have captured, in a single
> sentence, what this function is for:
>
>      -- | Create register usage info for instruction that only
>      -- reads registers.
>      mkRUR src = src' `seq` RU src' []
>          where src' = filter (interesting platform) src
>
> That already a big improvement. A note about the register filtering,
> which means that not all registers you pass to the function will be
> recorded as being read in the end, could also be useful.
>
> Aside: providing a type signature, which would have made it clear that
> the return type is RU, might also have helped in this particular case.
>
> *Second example*
> In the same file there a function called x86_regUsageOfInstr. It's the
> function that encloses the local function mkRUR above.
>
> I could figure out that this function has something to do with register
> usage, of the instruction passed as an argument, and that register usage
> is important for the register allocator. However, trying to understand
> in more detail what that meant was more of challenge than it needed to
> be. First, a comment more clearly explaining what computing register
> usage means in practice would be helpful:
>
>      -- | Returns which registers are read and written by this
>      -- instruction, as a (read, written) pair. This info is used
>      -- by the register allocator.
>      x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
>
> The reason mentioning that the return value is essentially a (read,
> written) pair is helpful is because the body of the function a big case
> statement full of lines like this one:
>
>      GCMP _ src1 src2 -> mkRUR [src1,src2]
>      ...
>      FDIV _ src  dst  -> usageRM src dst
>
> It's not immediately clear that all the various helper functions used
> here just end up computing a pair of the above form. A top-level comment
> lets you understand what's going on without understanding exactly what
> all these helper functions are doing.
>
> Thoughts?
>
> -- Johan
>
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>