[GHC DevOps Group] CI effort status

Gershom B gershomb at gmail.com
Mon Feb 5 03:57:31 UTC 2018


Let my articulate my question a bit more clearly. Looking at the
devops group charter
(https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says
the following about the goals:


The mission of the GHC DevOps Group is to

* to take leadership of the devops aspects of GHC,
* to resource it better, and
* to broaden the sense of community ownership and control of GHC.


Further it says under “Resources”:

"The GHC DevOps Group identifies the ongoing and one-off devops
requirements of GHC. It develops and manages the strategies and
projects to implement the needed tools, processes, and documentation
to meet those requirements. To that end and on the basis of actionable
project plans, it seeks to obtain the necessary resources from
organisations that rely on GHC as a production-ready tool. By doing
this, we aim to unlock more resources than are currently available. At
the same time, we seek broad community ownership to minimise the load
on any single contributor and to avoid a single point of failure."

My concern is at the moment there has been discussion regarding devops
aspects, and perhaps a broadened sense of community ownership and
control.

But I do not see better resourcing, although the initial contributions
of CI configurations were certainly a good kickstart. As such, I do
not see community ownership in the sense of the latter paragraph —
i.e. in the sense that it will “minimise the load on any single
contributor” and thus “avoid a single point of failure.”

The way this works, as I understand it, is a quid-pro-quo. In order to
accomplish goals with regards to regularity of GHC releases,
streamlined processes, etc., there needs to be at least some infusion
of resources, presumably “unlocked” from "organisations that rely on
GHC as a production-ready tool”.

Otherwise this quickly becomes expecting a variety of new work from
the same cast of characters, just with more voices on a mailinglist
chiming in with proposals as to what they would like see accomplished.

I am well aware that assembling resources and pulling them together is
_hard_, and many attempts to do so founder. I’ve been participant in
any number of foundered attempts myself over the years, or attempts
that have accomplished a few useful things, but far from even the
modest initial goals they set out with.

But I do not want this aspect of the DevOps Group charter to fade from
consciousness — getting these resources is not automatic. It requires
constant shaking of tree branches, and constant attempts to
reformulate problems and break them down in ways that make more
collaboration amenable — as well as not-infrequent followup on partial
commitments or indications towards such in the past, to try to pin
down their concrete implementation.

What I am seeing right now is that there is a danger of settling into
a “new status quo” with no new resources, and I think that would be a
not good thing for the future prospects of the DevOps Group, and
probably quickly lead to it being yet another stillborn effort.

I am not offering anything at the moment here — I have nothing _to_
offer. But this is my attempt to provide a gentle “poke” to all those
on the list who thought they might have some ability to play a role in
this to please be forthcoming with some proposals as to how they might
help. One important aspect to bear in mind is that at this point, it
seems to me that money is _not_ the issue. That is to say, if there
was a volunteer of some skilled-ops time which had experience
wrangling with CI, that would be idea. But if there was a proffer of
some such time, but from e.g. some associated contractor who would
need to be paid to help out, then it would probably be feasible to
fund that as well, from any number of sources.

The plan for migrating CI is only partially complete. The working
hypothesis is that when it is complete, it will mean less work in the
long-run. But in my opinion, dealing with someone else’s flaky boxes
(e.g. those of CircleCI) is not much better than dealing with your own
flaky boxes, except you have to now bother other people to figure out
more stuff for you. So if we could get someone with experience to be a
deputy CircleCI ombudsman or the like, and take charge of some aspect
of this work, I think we would have a much greater chance of A)
success, and B) genuinely distributing the workload more widely.

Best,
Gershom


On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty
(mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote:

> Hi Gershom,
>
> Ben is surely the main actor and he has put considerable effort into this. However, we (Tweag) did help out initially writing some of the original CI configurations.
>
> Having said that, it would be absolutely fabulous if other developers could help out. Please let Ben and me know if you know anybody who would be happy to help!
>
> Cheers,
> Manuel
>
> > 05.02.2018 13:41 Gershom B :
> >
> > A question from an observer here -- my understanding was that part of
> > the plan with the shift in CI infrastructure was that the burden
> > would be lifted from Ben's exclusive shoulders here and there would be
> > some greater division of labor, which is made possible in part by
> > using shared standard services rather than self-hosted solutions. But
> > at the moment I see reports largely of Ben continuing to try to
> > resolve issues and move this plan forward on his own. Is there still
> > some medium-term plan have a more collective effort in this
> > transition?
> >
> > --Gershom
> >
> >
> > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote:
> >> Ben Gamari writes:
> >>
> >>> Manuel M T Chakravarty writes:
> >>>
> >>>> Hi Ben,
> >>>>
> >>>> I meant to post on https://github.com/appveyor/ci/issues/517
> >>>> to request an increased
> >>>> limit, but didn’t get around to it yet. If you’d be able to put a
> >>>> request on that issue, that’d be great.
> >>>>
> >>> Sure, I'm on it.
> >>>
> >> I have created a new GHCAppveyor (due to name length constraints)
> >> Appveyor [1] project and configured it to pull from the ghc/ghc GitHub
> >> mirror. I have also requested, and was granted, the typical build time
> >> limit extension to 90 minutes.
> >>
> >> Unfortunately, it seems that even 90 minutes is insufficient to even
> >> finish a build, much less run the testsuite, under Appveyor's build
> >> environment. Given where the build was terminated, I would guess that
> >> it would need at least another 10 minutes of compilation to make it to
> >> the testsuite. On top of this the testsuite will require another ~35
> >> minutes (as it is quite heavy on process spawning, which is very
> >> expensive on Windows).
> >>
> >> I haven't yet inquired as to whether a further build time extension
> >> would be possible. However, I am not hopeful that our plan of using
> >> Appveyor will be feasible without purchasing build time.
> >>
> >>
> >> On the CircleCI front, I have been continuing work to clear up the
> >> remaining build failures. At this point only two remain:
> >>
> >> * I have a patch (D4360) to fix T11489 by running our build jobs as an
> >> unprivileged user
> >>
> >> * scc01 appears to be slightly non-deterministic; I am investigating
> >> this.
> >>
> >> Unfortunately the CircleCI infrastructure is still exhibiting a fair
> >> amount of flakiness. See, for instance, this build which is shown to be
> >> "Cancelled" despite having finished (and having apparently been run at
> >> least twice). Judging from the build history, this seems to be a fairly
> >> regular occurrence. I have contacted CircleCI about this but have not
> >> yet heard back.
> >>
> >> I am also occassionally seeing rather extreme variance in test times.
> >> In particular the linux-llvm target usually completes in around 4 hours
> >> 20 minutes, but sometimes takes over 5 hours, resulting in the build
> >> timing out. It appears that the build hangs during the testsuite run
> >> (e.g. [2]); it's not impossible that this is due to a bug in the
> >> testsuite driver but I have been able to reproduce this neither locally
> >> nor remotely on CircleCI infrastructure so it has proved to be a tough
> >> nut to crack.
> >>
> >> Cheers,
> >>
> >> - Ben
> >>
> >> [1] https://ci.appveyor.com/project/GHCAppveyor/ghc
> >> [2] https://circleci.com/gh/ghc/ghc/1558
> >>
> >> _______________________________________________
> >> Ghc-devops-group mailing list
> >> Ghc-devops-group at haskell.org
> >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
> >>
> > _______________________________________________
> > Ghc-devops-group mailing list
> > Ghc-devops-group at haskell.org
> > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
>


More information about the Ghc-devops-group mailing list