# RFC: style cleanup & guidelines for GHC, and related bikeshedding

Manuel M T Chakravarty chak at cse.unsw.edu.au
Wed Jul 16 00:49:49 UTC 2014

Simon Marlow <marlowsd at gmail.com>:
> Austin didn't mention this, so I will: we have a wiki page for style
>
>
> It has a pretty clear set of guidelines for imports/exports, for example (that we don't follow as much as we should).
>
> I'd be in favour of changing .lhs files to .hs files, replacing all the \begin{code}...\end{code} with -}...{-.  As I said in my reply to Simon, literate source files aren't providing any real benefit to us, and in the name of consistency this would be a positive step.  I’m all in favour of gardening the code base to clean up things like this.

Yes!

> However, the best time for a big stylistic sweep is a time that minimizes the number of merges we have to do across these commits.  That would be just before we branch for a new major release; hopefully at that point most of the feature branches will be merged and we're not going to merge any further patches into the previous release branch.
>
> I'm less enthusiastic about fixing whitespace things.  It's a tough call, but I'm guessing that fixing it would cause more pain than not fixing it.  Opinions might differ, and I wouldn’t mind at all if the consensus were to do a whitespace sweep too.

I think, the tabs should go — it’s been dragging on for a long time now. (The rest is less important.)

> One other thing I'd like to propose is an 80-column limit on new code. Personally I've always used an 80-column limit for various reasons. This is the biggest bikeshed ever and we could talk all day about it, but here are a couple of concrete points that I think are uncontroversial:
>
> - there has to be *some* limit, so that we know how wide to make our
>   windows.  The only valid discussion is what the limit should be.
>
> - Phabricator's side-by-side diffs are hard to read on a laptop screen
>   when lines go beyond 80 columns.
>
> And I think 80 is a good enough number, especially for Haskell where you can pack a lot into an 80-column line.  Phabricator is already flagging up >80 column lines in its linter, which is its default setting.

I used to be a 80 column guy, but moved away from that the last years. But you are right, there must be an upper limit and, if >80 is a problem for code reviews, then it’s a reasonable choice.

Cheers,
Manuel

> On 02/07/2014 12:59, Austin Seipp wrote:
>> Hi *,
>>
>> First off, WARNING: BIKESHEDDING AHEAD.
>>
>> With that out of the way - today on IRC, there was some discussion
>> about some stylistic/consistency issues in GHC, and being spurred by
>> Johans recent proposal for top-level documentation, I figured perhaps
>> we should beat the drum on this issue as well.
>>
>> The TL;DR is that GHC has a lot of inconsistent style issues,
>> including things like:
>>
>>  - Legacy code with tabs and spaces intermixed
>>  - Related to the last one, trailing whitespace
>>  - Mixing styles of do notation in different parts of the compiler
>> (braces vs no braces)
>>  - Probably things like indentation mismatches even in the same code
>>  - Probably many other things I've missed, obvious or not.
>>
>> These issues by themselves aren't too bad, but together they make the
>> coding style for GHC very inconsistent, and this hurts maintainability
>> a bit I feel. Furthermore, some of these issues block related
>> improvements - for example,
>> https://ghc.haskell.org/trac/ghc/ticket/9230 which is probably quite
>> reasonable will likely be a bit annoying to implement until GHC itself
>> is de-tabbed - we use -Werror during ./validate. This particular issue
>> is what started the discussion.
>>
>> Also, with developers now using arcanist and phabricator, they have
>> linting enabled for new patches, but they will often warn about
>> surrounding issues, mostly tabs and trailing spaces. This is a bit
>> annoying for submitters, and would be fixed by enforcing it.
>>
>> First attack plan
>> ~~~~~~~~~~~~~~~
>>
>> So, to start, I'd like to propose that we make some guidelines for
>> these kinds of things, and also a plan to fix some of them. To start:
>>
>>  #1) We should really consider going ahead and detabbing the remaining
>> files that have them. We already enforce this on new commits with git
>> hooks, but by doing this, we can make -fwarn-tabs a default flag and
>> then validate with -Werror in the development process.
>>
>>  #2) Similarly, we should kill all the trailing whitespace. (I think
>> this is less controversial than #1)
>>
>>  #3) We should most certainly move the remaining files from literate
>> haskell to non-literate haskell. Most of the files in the compiler are
>> already in this form, and the literate haskell documentation can't be
>> used to generate PDFs or anything similar. I suggest we get rid of it.
>> More Haskell users use non-literate files anyway. This is probably the
>> least controversial.
>>
>> Merge issues
>> ~~~~~~~~~~~~~~~~~
>>
>> The reason we haven't done the above three things historically is that
>> it makes merge conflicts nastier. A useful approximation suggested on
>> IRC might be to detab and remove whitespace for files older than a
>> certain date (say, 6 months).
>>
>> However, in general I'm thinking perhaps it's best to go ahead and
>> bite the bullet. maybe. I'd like to know what other people think! If
>> we have a vote and most people are in favor of doing this, maybe we
>> should really do it.
>>
>>
>> Some numbers on these issues
>> ~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Here are some quick numbers on where most of the tabs reside, as well
>> as the breakdown of literate files vs non-literate files.
>>
>> NOTE: these tests occurred in the 'compiler' subdirectory of the GHC
>> repository, which is where most of the relevant code is.
>>
>> LITERATE vs NON-LITERATE:
>>
>> $find . -type f -iname '*.hs' | wc -l >> 206 >> >>$ find . -type f -iname '*.lhs' | wc -l
>> 194
>>
>> Non-literate wins by a slim margin! But having the compiler divided in
>> half is really not a good thing IMO...
>>
>> NUMBER OF TABS PER SUBDIRECTORY:
>>
>> NOTE: this counts the number of lines which have tabs in them. It does
>> not count the total number of tab occurrences.
>>
>> $for x in echo */; do echo -n "$x:\t\t"; find \$x -type f -regex
>> '.*\.$$lhs\|hs$$' | xargs grep -P '\t' | wc -l; done
>> basicTypes/: 919
>> cbits/: 0
>> cmm/: 38
>> codeGen/: 0
>> coreSyn/: 843
>> deSugar/: 545
>> ghci/: 90
>> hsSyn/: 120
>> iface/: 213
>> llvmGen/: 0
>> main/: 8
>> nativeGen/: 1213
>> parser/: 19
>> prelude/: 182
>> profiling/: 39
>> rename/: 188
>> simplCore/: 754
>> simplStg/: 0
>> specialise/: 0
>> stgSyn/: 0
>> stranal/: 336
>> typecheck/: 1171
>> types/: 301
>> utils/: 220
>> vectorise/: 0
>>
>> From these numbers, we can see a few useful things at least, primarily
>> that there are definitely some places where removing tabs should be
>> easy. For example, parser/, profiling/, main/, and cmm/ can all be
>> de-tabbed without much of a problem, I think.
>>
>> nativeGen is very often not touched, so even though it has a *huge*
>> amount of tabs, it can likely be de-tabbed as well with minimal
>> impact.
>>
>> Other style issues
>> ~~~~~~~~~~~~~~~~~
>>
>> We should also discuss some related issues, like what general
>> block-width to use for indentations, naming conventions, and other
>> stuff. However, I leave this all to you, and perhaps it is best we
>> split that part off into a separate thread. Some things I'd like you
>> all to consider:
>>
>>  - Block width for indentation
>>  - Naming conventions (we use camelCase and_underscores_sometimes
>> which isReally_confusing)
>>  - Import/export styles (I think we have some sloppiness here too)
>>  - Other things worth arguing forever about.
>>
>> Thoughts on the above issues?
>>
> _______________________________________________
> ghc-devs mailing list