RFC: style cleanup & guidelines for GHC, and related bikeshedding
Manuel M T Chakravarty
chak at cse.unsw.edu.au
Wed Jul 16 00:49:49 UTC 2014
Simon Marlow <marlowsd at gmail.com>:
> Austin didn't mention this, so I will: we have a wiki page for style
>
> https://ghc.haskell.org/trac/ghc/wiki/Commentary/CodingStyle
>
> It has a pretty clear set of guidelines for imports/exports, for example (that we don't follow as much as we should).
>
> I'd be in favour of changing .lhs files to .hs files, replacing all the \begin{code}...\end{code} with -}...{-. As I said in my reply to Simon, literate source files aren't providing any real benefit to us, and in the name of consistency this would be a positive step. I’m all in favour of gardening the code base to clean up things like this.
Yes!
> However, the best time for a big stylistic sweep is a time that minimizes the number of merges we have to do across these commits. That would be just before we branch for a new major release; hopefully at that point most of the feature branches will be merged and we're not going to merge any further patches into the previous release branch.
>
> I'm less enthusiastic about fixing whitespace things. It's a tough call, but I'm guessing that fixing it would cause more pain than not fixing it. Opinions might differ, and I wouldn’t mind at all if the consensus were to do a whitespace sweep too.
I think, the tabs should go — it’s been dragging on for a long time now. (The rest is less important.)
> One other thing I'd like to propose is an 80-column limit on new code. Personally I've always used an 80-column limit for various reasons. This is the biggest bikeshed ever and we could talk all day about it, but here are a couple of concrete points that I think are uncontroversial:
>
> - there has to be *some* limit, so that we know how wide to make our
> windows. The only valid discussion is what the limit should be.
>
> - Phabricator's side-by-side diffs are hard to read on a laptop screen
> when lines go beyond 80 columns.
>
> And I think 80 is a good enough number, especially for Haskell where you can pack a lot into an 80-column line. Phabricator is already flagging up >80 column lines in its linter, which is its default setting.
I used to be a 80 column guy, but moved away from that the last years. But you are right, there must be an upper limit and, if >80 is a problem for code reviews, then it’s a reasonable choice.
Cheers,
Manuel
> On 02/07/2014 12:59, Austin Seipp wrote:
>> Hi *,
>>
>> First off, WARNING: BIKESHEDDING AHEAD.
>>
>> With that out of the way - today on IRC, there was some discussion
>> about some stylistic/consistency issues in GHC, and being spurred by
>> Johans recent proposal for top-level documentation, I figured perhaps
>> we should beat the drum on this issue as well.
>>
>> The TL;DR is that GHC has a lot of inconsistent style issues,
>> including things like:
>>
>> - Mixing literate haskell with non-literate haskell files
>> - Legacy code with tabs and spaces intermixed
>> - Related to the last one, trailing whitespace
>> - Mixing styles of do notation in different parts of the compiler
>> (braces vs no braces)
>> - Probably things like indentation mismatches even in the same code
>> - Probably many other things I've missed, obvious or not.
>>
>> These issues by themselves aren't too bad, but together they make the
>> coding style for GHC very inconsistent, and this hurts maintainability
>> a bit I feel. Furthermore, some of these issues block related
>> improvements - for example,
>> https://ghc.haskell.org/trac/ghc/ticket/9230 which is probably quite
>> reasonable will likely be a bit annoying to implement until GHC itself
>> is de-tabbed - we use -Werror during ./validate. This particular issue
>> is what started the discussion.
>>
>> Also, with developers now using arcanist and phabricator, they have
>> linting enabled for new patches, but they will often warn about
>> surrounding issues, mostly tabs and trailing spaces. This is a bit
>> annoying for submitters, and would be fixed by enforcing it.
>>
>> First attack plan
>> ~~~~~~~~~~~~~~~
>>
>> So, to start, I'd like to propose that we make some guidelines for
>> these kinds of things, and also a plan to fix some of them. To start:
>>
>> #1) We should really consider going ahead and detabbing the remaining
>> files that have them. We already enforce this on new commits with git
>> hooks, but by doing this, we can make -fwarn-tabs a default flag and
>> then validate with -Werror in the development process.
>>
>> #2) Similarly, we should kill all the trailing whitespace. (I think
>> this is less controversial than #1)
>>
>> #3) We should most certainly move the remaining files from literate
>> haskell to non-literate haskell. Most of the files in the compiler are
>> already in this form, and the literate haskell documentation can't be
>> used to generate PDFs or anything similar. I suggest we get rid of it.
>> More Haskell users use non-literate files anyway. This is probably the
>> least controversial.
>>
>> Merge issues
>> ~~~~~~~~~~~~~~~~~
>>
>> The reason we haven't done the above three things historically is that
>> it makes merge conflicts nastier. A useful approximation suggested on
>> IRC might be to detab and remove whitespace for files older than a
>> certain date (say, 6 months).
>>
>> However, in general I'm thinking perhaps it's best to go ahead and
>> bite the bullet. maybe. I'd like to know what other people think! If
>> we have a vote and most people are in favor of doing this, maybe we
>> should really do it.
>>
>> I'd especially like to hear about this if you have an outstanding branch.
>>
>> Some numbers on these issues
>> ~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Here are some quick numbers on where most of the tabs reside, as well
>> as the breakdown of literate files vs non-literate files.
>>
>> NOTE: these tests occurred in the 'compiler' subdirectory of the GHC
>> repository, which is where most of the relevant code is.
>>
>> LITERATE vs NON-LITERATE:
>>
>> $ find . -type f -iname '*.hs' | wc -l
>> 206
>>
>> $ find . -type f -iname '*.lhs' | wc -l
>> 194
>>
>> Non-literate wins by a slim margin! But having the compiler divided in
>> half is really not a good thing IMO...
>>
>> NUMBER OF TABS PER SUBDIRECTORY:
>>
>> NOTE: this counts the number of lines which have tabs in them. It does
>> not count the total number of tab occurrences.
>>
>> $ for x in `echo */`; do echo -n "$x:\t\t"; find $x -type f -regex
>> '.*\.\(lhs\|hs\)' | xargs grep -P '\t' | wc -l; done
>> basicTypes/: 919
>> cbits/: 0
>> cmm/: 38
>> codeGen/: 0
>> coreSyn/: 843
>> deSugar/: 545
>> ghci/: 90
>> hsSyn/: 120
>> iface/: 213
>> llvmGen/: 0
>> main/: 8
>> nativeGen/: 1213
>> parser/: 19
>> prelude/: 182
>> profiling/: 39
>> rename/: 188
>> simplCore/: 754
>> simplStg/: 0
>> specialise/: 0
>> stgSyn/: 0
>> stranal/: 336
>> typecheck/: 1171
>> types/: 301
>> utils/: 220
>> vectorise/: 0
>>
>> From these numbers, we can see a few useful things at least, primarily
>> that there are definitely some places where removing tabs should be
>> easy. For example, parser/, profiling/, main/, and cmm/ can all be
>> de-tabbed without much of a problem, I think.
>>
>> nativeGen is very often not touched, so even though it has a *huge*
>> amount of tabs, it can likely be de-tabbed as well with minimal
>> impact.
>>
>> Other style issues
>> ~~~~~~~~~~~~~~~~~
>>
>> We should also discuss some related issues, like what general
>> block-width to use for indentations, naming conventions, and other
>> stuff. However, I leave this all to you, and perhaps it is best we
>> split that part off into a separate thread. Some things I'd like you
>> all to consider:
>>
>> - Block width for indentation
>> - Naming conventions (we use camelCase and_underscores_sometimes
>> which isReally_confusing)
>> - Import/export styles (I think we have some sloppiness here too)
>> - Other things worth arguing forever about.
>>
>> Thoughts on the above issues?
>>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
More information about the ghc-devs
mailing list