Proposal: accept pull requests on GitHub

Austin Seipp austin at well-typed.com
Thu Sep 3 04:41:46 UTC 2015


(JFYI: I hate to announce my return with a giant novel of
negative-nancy-ness about a proposal that just came up. I'm sorry
about this!)

TL;DR: I'm strongly -1 on this, because I think it introduces a lot of
associated costs for everyone, the benefits aren't really clear, and I
think it obscures the real core issue about "how do we get more
contributors" and how to make that happen. Needless to say, GitHub
does not magically solve both of these AFAICS.

As is probably already widely known, I'm fairly against GitHub because
I think at best its tools are mediocre and inappropriate for GHC - but
I also don't think this proposal or the alternatives stemming from it
are very good, and that it reduces visibility of the real, core
complaints about what is wrong. Some of those problems may be with
Phabricator, but it's hard to sort the wheat from the chaff, so to
speak.

For one, having two code review tools of any form is completely
bonkers, TBQH. This is my biggest 'obvious' blocker. If we're going to
switch, we should just switch. Having to have people decide how to
contribute with two tools is as crazy as having two VCSs and just a
way of asking people to get *more* confused, and have us answer more
questions. That's something we need to avoid.

For the same reason, I'm also not a fan of 'use third party thing to
augment other thing to remove its deficiencies making it OK', because
the problem is _it adds surface area_ and other problems in other
cases. It is a solution that should be considered a last resort,
because it is a logical solution that applies to everything. If we
have a bot that moves GH PRs into Phab and then review them there, the
surface area of what we have to maintain and explain has suddenly
exploded: because now instead of 1 thing we have 3 things (GH, Phab,
bot) and the 3 interactions between them, for a multiplier of *six*
things we have to deal with. And then we use reviewable,io, because GH
reviews are terrible, adding a 4th mechanism? It's rube goldberg-ian.
We can logically 'automate' everything in all ways to make all
contributors happy, but there's a real *cognitive* overhead to this
and humans don't scale as well as computers do. It is not truly
'automated away' if the cognitive burden is still there.

I also find it extremely strange to tell people "By the way, this
method in which you've contributed, as was requested by community
members, is actually a complete proxy for the real method of
contributing, you can find all your imported code here". How is this
supposed to make contribution *easier* as opposed to just more
confusing? Now you've got the impression you're using "the real thing"
when in reality it's shoved off somewhere else to have the nitpicking
done. Just using Phabricator would be less complicated, IMO, and much
more direct.

The same thing goes for reviewable.io. Adding it as a layer over
GitHub just makes the surface area larger, and puts less under our
control. And is it going to exist in the same form in 2 or 3 years?
Will it continue to offer the same tools, the same workflows that we
"like", and what happens when we hit a wall? It's easy to say
"probably" or "sure" to all this, until we hit something we dislike
and have no possibility of fixing.

And once you do all this, BTW, you can 'never go back'. It seems so
easy to just say 'submit pull requests' once and nothing else, right?
Wrong. Once you commit to that infrastructure, it is *there* and
simply taking it out from under the feet of those using it is not only
unfortunate, it is *a huge timesink to undo it all*. Which amounts to
it never happening. Oh, but you can import everything elsewhere! The
problem is you *can't* import everything, but more importantly you
can't *import my memories in another way*, so it's a huge blow to
contributors to ask them about these mental time sinks, then to forget
them all. And as your project grows, this becomes more of a memory as
you made a first and last choice to begin with.

Phabricator was 'lucky' here because it had the gateway into being the
first review tool for us. But that wasn't because it was *better* than
GitHub. It was because we were already using it, and it did not
interact badly with our other tools or force us to compromise things -
so the *cost* was low. The cost is immeasurably higher by default
against GitHub because of this, at least to me. That's just how it is
sometimes.

Keep in mind there is a cost to everything and how you fix it. GitHub
is not a simple patch to add a GHC feature. It is a question that
fundamentally concerns itself with the future of the project for a
long time. The costs must be analyzed more aggressively. Again,
Phabricator had 'first child' preferential treatment. That's not
something we can undo now.

I know this sounds like a lot of ad hoc mumbo jumbo, but please bear
with me: we need to identify the *root issue* here to fix it.
Otherwise we will pay for the costs of an improper fix for a long
time, and we are going to keep having this conversation over, and over
again. And we need to weigh in the cost of fixing it, which is why I
mention that so much.

So with all this in mind, you're back to just using GitHub. But again
GitHub is quite mediocre at best. So what is the point of all this?
It's hinted at here:

> the number of contributions will go up, commits will be smaller, and there will be more of them per pull request (contributors will be able to put style changes and refactorings into separate commits, without jumping through a bunch of hoops).

The real hint is that "the number of contributions will go up". That's
a noble goal and I think it's at the heart of this proposal.

Here's the meat of it question: what is the cost of achieving this
goal? That is, what amount of work is sufficient to make this goal
realizable, and finally - why is GitHub *the best use of our time for
achieving this?* That's one aspect of the cost - that it's the best
use of the time. I feel like this is fundamentally why I always seem
to never 'get' this argument, and I'm sure it's very frustrating on
behalf of the people who have talked to me about it and like GitHub.
But I feel like I've never gotten a straight answer for GHC.

If the goal is actually "make more people contribute", that's pretty
broad. I can make that very easy: give everyone who ever submits a
patch push access. This is a legitimate way to run large projects that
has worked. People will almost certainly be more willing to commit,
especially when overhead on patch submission is reduced so much. Why
not just do that instead? It's not like we even mandate code review,
although we could. You could reasonably trust CI to catch and revert
things a lot of the time for people who commit directly to master. We
all do it sometimes.

I'm being serious about this. I can start doing that tomorrow because
the *cost is low*, both now and reasonably speaking into some
foreseeable future. It is one of many solutions to raw heart of the
proposal. GitHub is not a low cost move, but also, it is a *long term
cost* because of the technical deficiencies it won't aim to address
(merge commits are ugly, branch reviews are weak, ticket/PR namespace
overlaps with Trac, etc etc) or that we'll have to work around.

That means that if we want GitHub to fix the "give us more
contributors" problem, and it has a high cost, it not only has _to fix
the problem_, it also has to do that well enough to offset its cost. I
don't think it's clear that is the case right now, among a lot of
other solutions.

I don't think the root issue is "We _need_ GitHub to get more
contributors". It sounds like the complaint is more "I don't like how
Phabricator works right now". That's an important distinction, because
the latter is not only more specific, it's more actionable:

  - Things like Arcanist can be tracked as a Git submodule. There is
little to no pain in this, it's low cost, and it can always be
synchronized with Phabricator. This eliminates the "Must clone
arcanist" and "need to upgrade arcanist" points.

  - Similarly when Phabricator sometimes kills a lot of builds, it's
because I do an upgrade. That's mostly an error on my part and I can
simply schedule upgrades regularly, barring hotfixes or somesuch. That
should basically eliminate these. The other build issues are from
picking the wrong base commit from the revision, I think, which I
believe should be fixable upstream (I need to get a solid example of
one that isn't a mega ultra patch.)

  - If Harbormaster is not building dependent patches as mentioned in
WhyNotPhabricator, that is a bug, and I have not been aware of it.
Please make me aware of it so I can file bugs! I seriously don't look
at _every_ patch, I need to know this. That could have probably been
fixed ASAP otherwise.

  - We can get rid of the awkwardness of squashes etc by using
Phabricator's "immutable" history, although it introduces merge
commits. Whether this is acceptable is up to debate (I dislike merge
commits, but could live with it).

  - I do not understand point #3, about answering questions. Here's
the reality: every single one of those cases is *almost always an
error*. That's not a joke. Forgetting to commit a file, amending
changes in the working tree, and specifying a reviewer are all total
errors as it stands today. Why is this a minus? It catches a useful
class of 'interaction bugs'. If it's because sometimes Phabricator
yells about build arifacts in the tree, those should be .gitignore'd.
If it's because you have to 'git stash' sometimes, this is fairly
trivial IMO. Finally, specifying reviewers IS inconvenient, but
currently needed. We could easily assign a '#reviewers' tag that would
add default reviewers.
    - In the future, Phabricator will hopefully be able to
automatically assign the right reviewers to every single incoming
patch, based on the source file paths in the tree, using the Owners
tool. Technically, we could do that today if we wanted, it's just a
little more effort to add more Herald rules. This will be far, far
more robust than anything GitHub can offer, and eliminates point #3.

  - Styling, linting etc errors being included, because reviews are
hard to create: This is tangential IMO. We need to just bite the
bullet on this and settle on some lint and coding styles, and apply
them to the tree uniformly. The reality is *nobody ever does style
changes on their own*, and they are always accompanied by a diff, and
they always have to redo the work of pulling them out, Phab or not.
Literally 99% of the time we ask for this, it happens this way.
Perhaps instead we should just eliminate this class of work by just
running linters over all of the source code at once, and being happy
with it.

  Doing this in fact has other benefits: like `arc lint` will always
_correctly_ report when linting errors are violated. And we can reject
patches that violate them, because they will always be accurate.

  - As for some of the quotes, some of them are funny, but the real
message lies in the context. :) In particular, there have been several
cases (such as the DWARF work) where the idea was "write 30 commits
and put them on Phabricator". News flash: *this is bad*, no matter
whether you're using Phabricator or not, because it makes reviewing
the whole thing immensely difficult from a reviewer perspective. The
point here is that we can clear this up by being more communicative
about what we expect of authors of large patches, and communicating
your intent ASAP so we can get patches in as fast as possible. Writing
a patch is the easiest part of the work.

And more:

  - Clean up the documentation, it's a mess. It feels nice that
everything has clear, lucid explanations on the wiki, but the wiki is
ridiculously massive and we have a tendancy for 'link creep' where we
spread things out. The contributors docs could probably stand to be
streamlined. We would have to do this anyway, moving to GitHub or not.

  - Improve the homepage, directly linking to this aforementioned page.

  - Make it clear what we expect of contributors. I feel like a lot of
this could be explained by having a 5 minute drive-by guide for
patches, and then a longer 10-minute guide about A) How to style
things, B) How to format your patches if you're going to contribute
regularly, C) Why it is this way, and D) finally links to all the
other things you need to know. People going into Phabricator expecting
it to behave like GitHub is a problem (more a cultural problem IMO but
that's another story), and if this can't be directly fixed, the best
thing to do is make it clear why it isn't.

Those are just some of the things OTTOMH, but this email is already
way too long. This is what I mean though: fixing most of these is
going to have *seriously smaller cost* than moving to GitHub. It does
not account for "The GitHub factor" of people contributing "just
because it's on GitHub", but again, that value has to outweigh the
other costs. I'm not seriously convinced it does.

I know it's work to fix these things. But GitHub doesn't really
magically make a lot of our needs go away, and it's not going to
magically fix things like style or lint errors, the fact Travis-CI is
still pretty insufficient for us in the long term (and Harbormaster is
faster, on our own hardware, too), or that it will cause needlessly
higher amounts of spam through Trac and GitHub itself. I don't think
settling on it as - what seems to be - a first resort, is a really
good idea.


On Wed, Sep 2, 2015 at 4:09 PM, Niklas Hambüchen <mail at nh2.me> wrote:
> On 02/09/15 22:42, Kosyrev Serge wrote:
>> As a wild idea -- did anyone look at /Gitlab/ instead?
>
> Hi, yes. It does not currently have a sufficient review functionality
> (cannot handle multiple revisions easily).
>
> On 02/09/15 20:51, Simon Marlow wrote:
>> It might feel better
>> for the author, but discovering what changed between two branches of
>> multiple commits on github is almost impossible.
>
> I disagree with the first part of this: When the UI of the review tool
> is good, it is easy to follow. But there's no open-source implementation
> of that around.
>
> I agree that it is not easy to follow on Github.
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>



-- 
Regards,

Austin Seipp, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/


More information about the ghc-devs mailing list