Moving Haddock development out of GHC tree

Fri Aug 8 05:25:01 UTC 2014

Hello,

A slightly long e-mail but I ask that you voice your opinion if you ever
changed GHC API. You can skim over the details, simply know that it
saves me vast amount of time, allows me to try and find contributors and
doesn't impact GHC negatively. It seems like a win-win scenario for GHC
and Haddock. GHC team's workflow does not change and will not require
any new commitment: I do all the work and I think it's a 1 line change
in sync-all when transition is ready. Here it is:

It is no secret that many core Haskell projects lack developer hands and
Haddock is no exception: the current maintainers are Simon Hengel and
myself. Simon does not have much time so currently all the issues and
updates are up to me. Ideally I would like if some more people could
come and hack on Haddock but there are a couple of problems with trying
to recruit folk for this:

1. Interacting with GHC API is not the easiest thing. This is Haddock's
problem but I thought I'd mention it here.

2. Haddock resides directly in the GHC tree and it is currently
*required* that it compiles with GHC HEAD. This is a huge barrier of
entry for anyone: today I wanted to make a fairly simple change but it
still took me 3 validate runs to be at least somewhat confident that I
didn't break much in GHC. On top of this I had help from Edward Z. Yang
on IRC and information from him on what the issue exactly was. If I was
to do everything alone it would have taken even more validates. A
validate is not fast on machine by any means, it takes an hour or two.

Here is what I want to do unless there are major objections: I want to
move the active development away from GHC tree. Below is how it would
work. For simplicity please imagine that we have *just* released 7.8.3.

* Haddock development would concentrate on supporting the last public
release of GHC: I stop developing against GHC HEAD and currently would
develop against 7.8.3.

* GHC itself checks out Haddock as a submodule as it does now. The only
difference is that it points at whatever commit worked last. Let us
assume it is the Haddock 2.14.3 release commit. The vital difference
from current state is that GHC will no longer track changes in master
branch.

* Now when GHC API changes things proceed as they normally do: whoever
is responsible for the changes, pops into the Haddock submodule applies
the patches necessary for Haddock to build with HEAD and everyone is
happy. What does *not* happen is these patches don't go into master: I
ignore them and keep working with 7.8.3.

* When a GHC release rolls around, I update Haddock to work with the new
API so that people with new release can still use it. Once it works
against new API, GHC can start tracking from that commit onwards and
proceed as usual.

Here are the advantages:

* I don't have to work against GHC HEAD. This means I don't have to
build GHC HEAD and I don't need to worry about GHC API changes. I don't
waste 2-4 hours building before hacking and validating after hacking to
make any minor changes and to make sure I haven't broken anything.

* More importantly, anyone who wants to write a patch for Haddock can
now do so easily, all they need is recent compiler rather than being
forced to build HEAD. Building and validating against HEAD is a **huge**
barrier of entry.

* I only have to care about GHC API changes once a release and not twice
a week. I think PatternSynonyms have changed 4 times in a month but the
end result at release time is the same and that's what people care about.

* It is less work for anyone changing GHC API: they only have to deal
with their own changes and not my changes which add features or whatever.

* If I break something in Haddock HEAD, GHC is not affected.

* If Haddock's binary interface doesn't change, we may even allow more
versions of GHC be compatible through CPP and other such trickery. If we
were to do it today, it would be an increased burden on the GHC team to
deal with those.

* I can release as often as I want against the same compiler version.
Currently doing this requires backporting features (see v2.14 branch)
which is a massive pain. I no longer have to tell the users ‘yes, your
bug is fixed but to get it you need to compile GHC HEAD or wait 6-12
months until next GHC release’. I have to do this a lot.

Here are the disadvantages and why I think they don't make a big difference:

* GHC HEAD doesn't get any new-and-cool features that we might
implement. I say this doesn't matter because no one uses varying GHC
HEAD versions to develop actual software, documentation and all. What I
mean to say is that the only user of the Haddock that's developed in GHC
tree is GHC itself. The only case where GHC actually used in-tree
Haddock was when Herbert generated documentation for base-4.7 early for
me to eye before the release. Even this doesn't matter because so close
to the release I'll already have the existing GHC API integrated anyway.
Again, it does not matter if GHC HEAD itself doesn't get pretty operator
rendering or whatever right when I implement it because no one cares
about it until it's release time. I know that many people simply
HADDOCK_DOCS=NO to save time. The actual users only care about Haddock
that works with 7.6.x, 7.8.x, 7.10.x; only GHC cares about in-betweens
and only for the purpose of being able to build and validate.

* GHC team can't easily contribute features and get the back
immediately. In part it doesn't matter because of the previous point and
in the last year or so there were no features contributed directly from
GHC except those necessary to keep Haddock compiling. This just means
there's no demand for such close relationship.

* Haddock-affecting changes in GHC parser don't ‘take effect’ straight
away. This is my loss and considering the infrequency at which such
changes happen, it's a tiny price to pay to have to wait until release.

* …that's it, no other disadvantages that I can think of, but that's why
I'm sending it to the list to review!

What's worth mentioning is that the no-external-dependencies thing still
applies because even though we no longer need to compile against HEAD,
we still need to compile against the tree at release time.

In summary:

My life gets easier because I stop wasting it on playing with whole GHC
tree, GHC team's life gets easier because they don't have to deal with
the changes I make. My life gets even easier because I only have to make
big API updates once a release. I can actually start looking for
contributors.

When a release rolls around, GHC and Haddock ‘meet up’, we make sure it
all works, release happens, GHC starts tracking from that point and we
part ways until the next release.

What do you think? If there are no major objections in one week then I
will assume I am good to go with this.

Transition from current setup:
If I receive some patches I was promised then I will then make a 2.14.4
bugfix/compat release make sure that master is up to date and then
create something like GHC-tracking branch from master and track that. I
will then abandon that branch and not push to it unless it is GHC
release time. The next commit in master will bring Haddock to a state
where it works with 7.8.3: yes, this means removing all new API stuff
until 7.10 or 7.8.4 or whatever. GHC API changes go onto GHC-tracking
while all the stuff I write goes master. When GHC makes a release or is
about to, I make master work with that and make GHC-tracking point to
that instead.

Thanks!
-- 
Mateusz K.

Moving Haddock *development* out of GHC tree

Moving Haddock development out of GHC tree