RFC: style cleanup & guidelines for GHC, and related bikeshedding

Richard Eisenberg eir at cis.upenn.edu
Thu Jul 3 00:32:51 UTC 2014

I have mixed feelings on all of this.

First, a disclaimer: I have a significant (~10,000 lines of difference, perhaps) branch and would be hit hard by this change. (Branch is at github.com/goldfirere/ghc under the "nokinds" branch.) That said, if I'm careful as I'm merging, I could probably make this less painful by merging up to the commit right before the changeover, then merging just whitespace changes, then merging everything afterward. It probably wouldn't be much worse than merges are for me, anyway. (They're already terrible, but that's strictly my problem.)

But, I'm not sure of the practical benefits of doing any of this. The aesthete in me really wants this change to happen -- these inconsistencies and the tabs are surely a blemish. At the same time, I don't think my understanding of the code has ever really been hindered by these problems. If anything, I think a rigid style guide would slow me down a bit, because the perfectionist in me would require making sure all my code conformed well. Now, I try to have my code match the surrounding context, but I can see that it's not critical I get it bang on. So, addressing advocates of this change: why do you want it? Is it just for the sake of beauty (not to diminish the importance of beauty)? Or do the problems outlines here properly trip you up?

If the answers are mostly about beauty, that doesn't kill the proposal, but it allows us to evaluate the pros and cons a little more crisply.


On Jul 2, 2014, at 3:59 PM, Austin Seipp <austin at well-typed.com> wrote:

> Hi *,
> With that out of the way - today on IRC, there was some discussion
> about some stylistic/consistency issues in GHC, and being spurred by
> Johans recent proposal for top-level documentation, I figured perhaps
> we should beat the drum on this issue as well.
> The TL;DR is that GHC has a lot of inconsistent style issues,
> including things like:
> - Mixing literate haskell with non-literate haskell files
> - Legacy code with tabs and spaces intermixed
> - Related to the last one, trailing whitespace
> - Mixing styles of do notation in different parts of the compiler
> (braces vs no braces)
> - Probably things like indentation mismatches even in the same code
> - Probably many other things I've missed, obvious or not.
> These issues by themselves aren't too bad, but together they make the
> coding style for GHC very inconsistent, and this hurts maintainability
> a bit I feel. Furthermore, some of these issues block related
> improvements - for example,
> https://ghc.haskell.org/trac/ghc/ticket/9230 which is probably quite
> reasonable will likely be a bit annoying to implement until GHC itself
> is de-tabbed - we use -Werror during ./validate. This particular issue
> is what started the discussion.
> Also, with developers now using arcanist and phabricator, they have
> linting enabled for new patches, but they will often warn about
> surrounding issues, mostly tabs and trailing spaces. This is a bit
> annoying for submitters, and would be fixed by enforcing it.
> First attack plan
> ~~~~~~~~~~~~~~~
> So, to start, I'd like to propose that we make some guidelines for
> these kinds of things, and also a plan to fix some of them. To start:
> #1) We should really consider going ahead and detabbing the remaining
> files that have them. We already enforce this on new commits with git
> hooks, but by doing this, we can make -fwarn-tabs a default flag and
> then validate with -Werror in the development process.
> #2) Similarly, we should kill all the trailing whitespace. (I think
> this is less controversial than #1)
> #3) We should most certainly move the remaining files from literate
> haskell to non-literate haskell. Most of the files in the compiler are
> already in this form, and the literate haskell documentation can't be
> used to generate PDFs or anything similar. I suggest we get rid of it.
> More Haskell users use non-literate files anyway. This is probably the
> least controversial.
> Merge issues
> ~~~~~~~~~~~~~~~~~
> The reason we haven't done the above three things historically is that
> it makes merge conflicts nastier. A useful approximation suggested on
> IRC might be to detab and remove whitespace for files older than a
> certain date (say, 6 months).
> However, in general I'm thinking perhaps it's best to go ahead and
> bite the bullet. maybe. I'd like to know what other people think! If
> we have a vote and most people are in favor of doing this, maybe we
> should really do it.
> I'd especially like to hear about this if you have an outstanding branch.
> Some numbers on these issues
> ~~~~~~~~~~~~~~~~~~~~~~~~
> Here are some quick numbers on where most of the tabs reside, as well
> as the breakdown of literate files vs non-literate files.
> NOTE: these tests occurred in the 'compiler' subdirectory of the GHC
> repository, which is where most of the relevant code is.
> $ find . -type f -iname '*.hs' | wc -l
> 206
> $ find . -type f -iname '*.lhs' | wc -l
> 194
> Non-literate wins by a slim margin! But having the compiler divided in
> half is really not a good thing IMO...
> NOTE: this counts the number of lines which have tabs in them. It does
> not count the total number of tab occurrences.
> $ for x in `echo */`; do echo -n "$x:\t\t"; find $x -type f -regex
> '.*\.\(lhs\|hs\)' | xargs grep -P '\t' | wc -l; done
> basicTypes/: 919
> cbits/: 0
> cmm/: 38
> codeGen/: 0
> coreSyn/: 843
> deSugar/: 545
> ghci/: 90
> hsSyn/: 120
> iface/: 213
> llvmGen/: 0
> main/: 8
> nativeGen/: 1213
> parser/: 19
> prelude/: 182
> profiling/: 39
> rename/: 188
> simplCore/: 754
> simplStg/: 0
> specialise/: 0
> stgSyn/: 0
> stranal/: 336
> typecheck/: 1171
> types/: 301
> utils/: 220
> vectorise/: 0
> From these numbers, we can see a few useful things at least, primarily
> that there are definitely some places where removing tabs should be
> easy. For example, parser/, profiling/, main/, and cmm/ can all be
> de-tabbed without much of a problem, I think.
> nativeGen is very often not touched, so even though it has a *huge*
> amount of tabs, it can likely be de-tabbed as well with minimal
> impact.
> Other style issues
> ~~~~~~~~~~~~~~~~~
> We should also discuss some related issues, like what general
> block-width to use for indentations, naming conventions, and other
> stuff. However, I leave this all to you, and perhaps it is best we
> split that part off into a separate thread. Some things I'd like you
> all to consider:
> - Block width for indentation
> - Naming conventions (we use camelCase and_underscores_sometimes
> which isReally_confusing)
> - Import/export styles (I think we have some sloppiness here too)
> - Other things worth arguing forever about.
> Thoughts on the above issues?
> -- 
> Regards,
> Austin Seipp, Haskell Consultant
> Well-Typed LLP, http://www.well-typed.com/
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs

More information about the ghc-devs mailing list