Proposal: require Haddock comment for every new top-level function and type in GHC source code

Manuel M T Chakravarty chak at cse.unsw.edu.au
Thu Jul 3 04:38:33 UTC 2014


Which makes a lot of GHC code more readable — I’m serious!

Manuel

PS: I have resisted it for a while, but after slogging through GHC for extended periods, I’ve come to appreciate the additional clarity in large and tricky functions (e.g., in the type checker & renamer).

Edward Kmett <ekmett at gmail.com>:
> That has a high chance of backfiring and requiring everyone to use do { ...; ... } with explicit braces and semis. ;)
> 
> -Edward
> 
> 
> On Wed, Jul 2, 2014 at 4:08 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> Agreed, let's do it.  Thanks for the well-argued proposal.
> 
> Next up: consistent style :-)
> 
> Cheers,
> Simon
> 
> 
> On 27/06/2014 10:51, Johan Tibell wrote:
> Hi!
> 
> I found myself exploring new parts of the GHC code base the last few
> weeks (exciting!), which again reminded me of my biggest frustration
> when working on GHC: the lack of per-function/type (Haddock) comments.
> 
> GHC code is sometimes commented with "notes", which are great but tend
> to (1) mostly cover the exceptional cases and (2) talk about the
> implementation of a function, not how a caller might use it or why.
> 
> Lack of documentation, in GHC and other software projects, usually has
> (at least) two causes:
> 
>   * Programmers comment code they think is "complex enough to warrant a
> 
>     comment". The problem is that the author is usually a poor judge of
>     what's complex enough, because he/she is too familiar with the code
>     and tends to under-document code when following this principle.
>   * Documenting is boring and tends to have little benefit the person
> 
>     writing to documentation. Given lack of incentives we tend to
>     document less than we ought to.
> 
> I've only seen one successful way to combat the lack of documentation
> that stems from the above: have the project's style guide mandate that
> top-level functions and types (or at least those that are exported) have
> documentation. This works well at Google.
> 
> Anecdote: we have one code base inside Google that was until recently
> exempt from this rule and documentation is almost completely absent in
> that code base, even though hundreds of engineers work on and need to
> understand it every day. This breeds institutional knowledge problems
> i.e. if the author of a core piece of code leaves, lots of knowledge is
> lost.
> 
> *Proposal: *I propose that we require that new top-level functions and
> 
> types have Haddock comments, even if they start out as a single, humble
> sentence.
> 
> I've found that putting even that one sentence (1) helps new users and
> (2) establishes a place for improvements to be made. There's a strong
> "broken window" effect to lack of comments, in that lack of comments
> breeds more lack of comments as developers follow established practices.
> 
> We should add this requirement to the style guide. Having it as a
> written down policy tends to prevent having to re-hash the whole
> argument about documentation over and over again. This has also helped
> us a lot at Google, because programmers can spend endless amount of time
> arguing about comments, placement of curly braces, etc. and having a
> written policy helps cut down on that.
> 
> To give an idea of how to write good comments, here are two examples of
> undocumented code I ran into in GHC and how better comments would have
> helped.
> 
> *First example*
> 
> In compiler/nativeGen/X86/Instr.hs there's a (local) function called
> mkRUR, which is a helper function use when computing instruction
> register usage.
> 
> The first question that I asked upon seeing uses of that function was
> "what does RUR stand for?" Given the context the function is in, I
> guessed it stands for read-update-read, because R is used to mean "read"
> in the enclosing function and "updating" is related to "reading" so that
> must be what U stands for. It turns out that it stands for
> RegUsageReadonly. Here's a comment that would have captured, in a single
> sentence, what this function is for:
> 
>      -- | Create register usage info for instruction that only
>      -- reads registers.
>      mkRUR src = src' `seq` RU src' []
>          where src' = filter (interesting platform) src
> 
> That already a big improvement. A note about the register filtering,
> which means that not all registers you pass to the function will be
> recorded as being read in the end, could also be useful.
> 
> Aside: providing a type signature, which would have made it clear that
> the return type is RU, might also have helped in this particular case.
> 
> *Second example*
> 
> In the same file there a function called x86_regUsageOfInstr. It's the
> function that encloses the local function mkRUR above.
> 
> I could figure out that this function has something to do with register
> usage, of the instruction passed as an argument, and that register usage
> is important for the register allocator. However, trying to understand
> in more detail what that meant was more of challenge than it needed to
> be. First, a comment more clearly explaining what computing register
> usage means in practice would be helpful:
> 
>      -- | Returns which registers are read and written by this
>      -- instruction, as a (read, written) pair. This info is used
>      -- by the register allocator.
>      x86_regUsageOfInstr :: Platform -> Instr -> RegUsage
> 
> The reason mentioning that the return value is essentially a (read,
> written) pair is helpful is because the body of the function a big case
> statement full of lines like this one:
> 
>      GCMP _ src1 src2 -> mkRUR [src1,src2]
>      ...
>      FDIV _ src  dst  -> usageRM src dst
> 
> It's not immediately clear that all the various helper functions used
> here just end up computing a pair of the above form. A top-level comment
> lets you understand what's going on without understanding exactly what
> all these helper functions are doing.
> 
> Thoughts?
> 
> -- Johan
> 
> 
> 
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
> 
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
> 
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20140703/303e960a/attachment-0001.html>


More information about the ghc-devs mailing list