# RFC: style cleanup & guidelines for GHC, and related bikeshedding

Simon Marlow marlowsd at gmail.com
Tue Jul 8 14:31:58 UTC 2014

Austin didn't mention this, so I will: we have a wiki page for style

It has a pretty clear set of guidelines for imports/exports, for example
(that we don't follow as much as we should).

I'd be in favour of changing .lhs files to .hs files, replacing all the
\begin{code}...\end{code} with -}...{-.  As I said in my reply to Simon,
literate source files aren't providing any real benefit to us, and in
the name of consistency this would be a positive step.  I'm all in
favour of gardening the code base to clean up things like this.

However, the best time for a big stylistic sweep is a time that
minimizes the number of merges we have to do across these commits.  That
would be just before we branch for a new major release; hopefully at
that point most of the feature branches will be merged and we're not
going to merge any further patches into the previous release branch.

I'm less enthusiastic about fixing whitespace things.  It's a tough
call, but I'm guessing that fixing it would cause more pain than not
fixing it.  Opinions might differ, and I wouldn't mind at all if the
consensus were to do a whitespace sweep too.

One other thing I'd like to propose is an 80-column limit on new code.
Personally I've always used an 80-column limit for various reasons.
This is the biggest bikeshed ever and we could talk all day about it,
but here are a couple of concrete points that I think are uncontroversial:

- there has to be *some* limit, so that we know how wide to make our
windows.  The only valid discussion is what the limit should be.

- Phabricator's side-by-side diffs are hard to read on a laptop screen
when lines go beyond 80 columns.

And I think 80 is a good enough number, especially for Haskell where you
can pack a lot into an 80-column line.  Phabricator is already flagging
up >80 column lines in its linter, which is its default setting.

Cheers,
Simon

On 02/07/2014 12:59, Austin Seipp wrote:
> Hi *,
>
> First off, WARNING: BIKESHEDDING AHEAD.
>
> With that out of the way - today on IRC, there was some discussion
> about some stylistic/consistency issues in GHC, and being spurred by
> Johans recent proposal for top-level documentation, I figured perhaps
> we should beat the drum on this issue as well.
>
> The TL;DR is that GHC has a lot of inconsistent style issues,
> including things like:
>
>   - Legacy code with tabs and spaces intermixed
>   - Related to the last one, trailing whitespace
>   - Mixing styles of do notation in different parts of the compiler
> (braces vs no braces)
>   - Probably things like indentation mismatches even in the same code
>   - Probably many other things I've missed, obvious or not.
>
> These issues by themselves aren't too bad, but together they make the
> coding style for GHC very inconsistent, and this hurts maintainability
> a bit I feel. Furthermore, some of these issues block related
> improvements - for example,
> https://ghc.haskell.org/trac/ghc/ticket/9230 which is probably quite
> reasonable will likely be a bit annoying to implement until GHC itself
> is de-tabbed - we use -Werror during ./validate. This particular issue
> is what started the discussion.
>
> Also, with developers now using arcanist and phabricator, they have
> linting enabled for new patches, but they will often warn about
> surrounding issues, mostly tabs and trailing spaces. This is a bit
> annoying for submitters, and would be fixed by enforcing it.
>
> First attack plan
> ~~~~~~~~~~~~~~~
>
> So, to start, I'd like to propose that we make some guidelines for
> these kinds of things, and also a plan to fix some of them. To start:
>
>   #1) We should really consider going ahead and detabbing the remaining
> files that have them. We already enforce this on new commits with git
> hooks, but by doing this, we can make -fwarn-tabs a default flag and
> then validate with -Werror in the development process.
>
>   #2) Similarly, we should kill all the trailing whitespace. (I think
> this is less controversial than #1)
>
>   #3) We should most certainly move the remaining files from literate
> haskell to non-literate haskell. Most of the files in the compiler are
> already in this form, and the literate haskell documentation can't be
> used to generate PDFs or anything similar. I suggest we get rid of it.
> More Haskell users use non-literate files anyway. This is probably the
> least controversial.
>
> Merge issues
> ~~~~~~~~~~~~~~~~~
>
> The reason we haven't done the above three things historically is that
> it makes merge conflicts nastier. A useful approximation suggested on
> IRC might be to detab and remove whitespace for files older than a
> certain date (say, 6 months).
>
> However, in general I'm thinking perhaps it's best to go ahead and
> bite the bullet. maybe. I'd like to know what other people think! If
> we have a vote and most people are in favor of doing this, maybe we
> should really do it.
>
>
> Some numbers on these issues
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> Here are some quick numbers on where most of the tabs reside, as well
> as the breakdown of literate files vs non-literate files.
>
> NOTE: these tests occurred in the 'compiler' subdirectory of the GHC
> repository, which is where most of the relevant code is.
>
> LITERATE vs NON-LITERATE:
>
> $find . -type f -iname '*.hs' | wc -l > 206 > >$ find . -type f -iname '*.lhs' | wc -l
> 194
>
> Non-literate wins by a slim margin! But having the compiler divided in
> half is really not a good thing IMO...
>
> NUMBER OF TABS PER SUBDIRECTORY:
>
> NOTE: this counts the number of lines which have tabs in them. It does
> not count the total number of tab occurrences.
>
> $for x in echo */; do echo -n "$x:\t\t"; find \$x -type f -regex
> '.*\.$$lhs\|hs$$' | xargs grep -P '\t' | wc -l; done
> basicTypes/: 919
> cbits/: 0
> cmm/: 38
> codeGen/: 0
> coreSyn/: 843
> deSugar/: 545
> ghci/: 90
> hsSyn/: 120
> iface/: 213
> llvmGen/: 0
> main/: 8
> nativeGen/: 1213
> parser/: 19
> prelude/: 182
> profiling/: 39
> rename/: 188
> simplCore/: 754
> simplStg/: 0
> specialise/: 0
> stgSyn/: 0
> stranal/: 336
> typecheck/: 1171
> types/: 301
> utils/: 220
> vectorise/: 0
>
>  From these numbers, we can see a few useful things at least, primarily
> that there are definitely some places where removing tabs should be
> easy. For example, parser/, profiling/, main/, and cmm/ can all be
> de-tabbed without much of a problem, I think.
>
> nativeGen is very often not touched, so even though it has a *huge*
> amount of tabs, it can likely be de-tabbed as well with minimal
> impact.
>
> Other style issues
> ~~~~~~~~~~~~~~~~~
>
> We should also discuss some related issues, like what general
> block-width to use for indentations, naming conventions, and other
> stuff. However, I leave this all to you, and perhaps it is best we
> split that part off into a separate thread. Some things I'd like you
> all to consider:
>
>   - Block width for indentation
>   - Naming conventions (we use camelCase and_underscores_sometimes
> which isReally_confusing)
>   - Import/export styles (I think we have some sloppiness here too)
>   - Other things worth arguing forever about.
>
> Thoughts on the above issues?
>