[GHC DevOps Group] CI effort status

Manuel M T Chakravarty manuel.chakravarty at tweag.io
Wed Feb 7 02:34:20 UTC 2018


I would more than welcome concrete offers of resources or suggestions on how to get more resources. Mathieu and I have worked towards getting additional resources since we announced the group, but these things (apparently) take time. We surely could use the help of everybody involved in this group!

Cheers,
Manuel

PS: Just a gentle reminder that several Tweag people (including me) have spent Tweag time on this effort. This certainly doesn’t match the investment of Facebook or Microsoft, but it serves as constructive proof that this is not just for large firms.

> 07.02.2018 00:31 Simon Peyton Jones <simonpj at microsoft.com>:
> 
> Thanks Gershom.
> 
> I think of the devops group as
> 
> 1 Broadening "ownership" of GHC's development and release processes,
>  so that a larger group of people feel that they can influence and
>  contribute to GHC's development, and hence feel more comfortable
>  making GHC mission-critical to their business or other plans
> 
> 2 Making it more likely that what we do with GHC actually matches
>  what GHC's users want
> 
> 3 Broadening and deepening the pool of stakeholders who are
>  willing to contribute time and/or money to making GHC into the
>  solidly reliable tool that they need.   (Currently we have
>  Microsoft, Facebook, IOHK contributing directly, I think.)
> 
> I think Gershom's message is really about (3).  To me, progress on
> (1) and (2) will help to make the case for (3).  But I don’t want
> to lose sight of (3).  The factor that precipitated the devops group's
> formation was a sudden awareness about how vulnerable we are, as a 
> community, to a very small number supporters.
> 
> Discussion at ICFP made me think that several other companies would
> consider making donations, if (a) we had a compelling case that it'd
> be money well spent, and (b) the actual process worked. For (b) I think
> some would prefer a central fund; others might prefer a specific task
> or set of tasks to fund.  The discussion on mechanism is a bit stalled
> I think.
> 
> We don't currently have a crisis.  But I think there may already be
> things for which application of money might help: e.g. paying for CircleCI
> cycles rather than spending Ben's time trying to shoehorn everything into
> the for-free limits.  Maybe Appveyor is similar.
> 
> So I would very much welcome it if the Devops group could take, as an
> important task (even if it is not day-to-day urgent) task, working out a
> sustainable model for GHC's maintenance, support, CI, and releases.
> 
> Simon
> 
> 
> |  -----Original Message-----
> |  From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org]
> |  On Behalf Of Gershom B
> |  Sent: 05 February 2018 03:58
> |  To: Manuel Chakravarty <mchakravarty at me.com>
> |  Cc: ghc-devops-group at haskell.org
> |  Subject: Re: [GHC DevOps Group] CI effort status
> |  
> |  Let my articulate my question a bit more clearly. Looking at the
> |  devops group charter
> |  (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says
> |  the following about the goals:
> |  
> |  
> |  The mission of the GHC DevOps Group is to
> |  
> |  * to take leadership of the devops aspects of GHC,
> |  * to resource it better, and
> |  * to broaden the sense of community ownership and control of GHC.
> |  
> |  
> |  Further it says under “Resources”:
> |  
> |  "The GHC DevOps Group identifies the ongoing and one-off devops
> |  requirements of GHC. It develops and manages the strategies and
> |  projects to implement the needed tools, processes, and documentation
> |  to meet those requirements. To that end and on the basis of actionable
> |  project plans, it seeks to obtain the necessary resources from
> |  organisations that rely on GHC as a production-ready tool. By doing
> |  this, we aim to unlock more resources than are currently available. At
> |  the same time, we seek broad community ownership to minimise the load
> |  on any single contributor and to avoid a single point of failure."
> |  
> |  My concern is at the moment there has been discussion regarding devops
> |  aspects, and perhaps a broadened sense of community ownership and
> |  control.
> |  
> |  But I do not see better resourcing, although the initial contributions
> |  of CI configurations were certainly a good kickstart. As such, I do
> |  not see community ownership in the sense of the latter paragraph —
> |  i.e. in the sense that it will “minimise the load on any single
> |  contributor” and thus “avoid a single point of failure.”
> |  
> |  The way this works, as I understand it, is a quid-pro-quo. In order to
> |  accomplish goals with regards to regularity of GHC releases,
> |  streamlined processes, etc., there needs to be at least some infusion
> |  of resources, presumably “unlocked” from "organisations that rely on
> |  GHC as a production-ready tool”.
> |  
> |  Otherwise this quickly becomes expecting a variety of new work from
> |  the same cast of characters, just with more voices on a mailinglist
> |  chiming in with proposals as to what they would like see accomplished.
> |  
> |  I am well aware that assembling resources and pulling them together is
> |  _hard_, and many attempts to do so founder. I’ve been participant in
> |  any number of foundered attempts myself over the years, or attempts
> |  that have accomplished a few useful things, but far from even the
> |  modest initial goals they set out with.
> |  
> |  But I do not want this aspect of the DevOps Group charter to fade from
> |  consciousness — getting these resources is not automatic. It requires
> |  constant shaking of tree branches, and constant attempts to
> |  reformulate problems and break them down in ways that make more
> |  collaboration amenable — as well as not-infrequent followup on partial
> |  commitments or indications towards such in the past, to try to pin
> |  down their concrete implementation.
> |  
> |  What I am seeing right now is that there is a danger of settling into
> |  a “new status quo” with no new resources, and I think that would be a
> |  not good thing for the future prospects of the DevOps Group, and
> |  probably quickly lead to it being yet another stillborn effort.
> |  
> |  I am not offering anything at the moment here — I have nothing _to_
> |  offer. But this is my attempt to provide a gentle “poke” to all those
> |  on the list who thought they might have some ability to play a role in
> |  this to please be forthcoming with some proposals as to how they might
> |  help. One important aspect to bear in mind is that at this point, it
> |  seems to me that money is _not_ the issue. That is to say, if there
> |  was a volunteer of some skilled-ops time which had experience
> |  wrangling with CI, that would be idea. But if there was a proffer of
> |  some such time, but from e.g. some associated contractor who would
> |  need to be paid to help out, then it would probably be feasible to
> |  fund that as well, from any number of sources.
> |  
> |  The plan for migrating CI is only partially complete. The working
> |  hypothesis is that when it is complete, it will mean less work in the
> |  long-run. But in my opinion, dealing with someone else’s flaky boxes
> |  (e.g. those of CircleCI) is not much better than dealing with your own
> |  flaky boxes, except you have to now bother other people to figure out
> |  more stuff for you. So if we could get someone with experience to be a
> |  deputy CircleCI ombudsman or the like, and take charge of some aspect
> |  of this work, I think we would have a much greater chance of A)
> |  success, and B) genuinely distributing the workload more widely.
> |  
> |  Best,
> |  Gershom
> |  
> |  
> |  On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty
> |  (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote:
> |  
> |  > Hi Gershom,
> |  >
> |  > Ben is surely the main actor and he has put considerable effort into
> |  this. However, we (Tweag) did help out initially writing some of the
> |  original CI configurations.
> |  >
> |  > Having said that, it would be absolutely fabulous if other
> |  developers could help out. Please let Ben and me know if you know
> |  anybody who would be happy to help!
> |  >
> |  > Cheers,
> |  > Manuel
> |  >
> |  > > 05.02.2018 13:41 Gershom B :
> |  > >
> |  > > A question from an observer here -- my understanding was that part
> |  > > of the plan with the shift in CI infrastructure was that the
> |  burden
> |  > > would be lifted from Ben's exclusive shoulders here and there
> |  would
> |  > > be some greater division of labor, which is made possible in part
> |  by
> |  > > using shared standard services rather than self-hosted solutions.
> |  > > But at the moment I see reports largely of Ben continuing to try
> |  to
> |  > > resolve issues and move this plan forward on his own. Is there
> |  still
> |  > > some medium-term plan have a more collective effort in this
> |  > > transition?
> |  > >
> |  > > --Gershom
> |  > >
> |  > >
> |  > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote:
> |  > >> Ben Gamari writes:
> |  > >>
> |  > >>> Manuel M T Chakravarty writes:
> |  > >>>
> |  > >>>> Hi Ben,
> |  > >>>>
> |  > >>>> I meant to post on
> |  > >>>>
> |  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> |  > >>>>
> |  github.com%2Fappveyor%2Fci%2Fissues%2F517&data=02%7C01%7Csimonpj%
> |  > >>>>
> |  40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb73
> |  > >>>>
> |  4b0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=THCgzdMyj
> |  > >>>> CjJH554JKyJZ%2FLyHSYhpH7NRBlCQI%2BiFlM%3D&reserved=0
> |  > >>>> to request an increased
> |  > >>>> limit, but didn’t get around to it yet. If you’d be able to put
> |  a
> |  > >>>> request on that issue, that’d be great.
> |  > >>>>
> |  > >>> Sure, I'm on it.
> |  > >>>
> |  > >> I have created a new GHCAppveyor (due to name length constraints)
> |  > >> Appveyor [1] project and configured it to pull from the ghc/ghc
> |  > >> GitHub mirror. I have also requested, and was granted, the
> |  typical
> |  > >> build time limit extension to 90 minutes.
> |  > >>
> |  > >> Unfortunately, it seems that even 90 minutes is insufficient to
> |  > >> even finish a build, much less run the testsuite, under
> |  Appveyor's
> |  > >> build environment. Given where the build was terminated, I would
> |  > >> guess that it would need at least another 10 minutes of
> |  compilation
> |  > >> to make it to the testsuite. On top of this the testsuite will
> |  > >> require another ~35 minutes (as it is quite heavy on process
> |  > >> spawning, which is very expensive on Windows).
> |  > >>
> |  > >> I haven't yet inquired as to whether a further build time
> |  extension
> |  > >> would be possible. However, I am not hopeful that our plan of
> |  using
> |  > >> Appveyor will be feasible without purchasing build time.
> |  > >>
> |  > >>
> |  > >> On the CircleCI front, I have been continuing work to clear up
> |  the
> |  > >> remaining build failures. At this point only two remain:
> |  > >>
> |  > >> * I have a patch (D4360) to fix T11489 by running our build jobs
> |  as
> |  > >> an unprivileged user
> |  > >>
> |  > >> * scc01 appears to be slightly non-deterministic; I am
> |  > >> investigating this.
> |  > >>
> |  > >> Unfortunately the CircleCI infrastructure is still exhibiting a
> |  > >> fair amount of flakiness. See, for instance, this build which is
> |  > >> shown to be "Cancelled" despite having finished (and having
> |  > >> apparently been run at least twice). Judging from the build
> |  > >> history, this seems to be a fairly regular occurrence. I have
> |  > >> contacted CircleCI about this but have not yet heard back.
> |  > >>
> |  > >> I am also occassionally seeing rather extreme variance in test
> |  times.
> |  > >> In particular the linux-llvm target usually completes in around 4
> |  > >> hours
> |  > >> 20 minutes, but sometimes takes over 5 hours, resulting in the
> |  > >> build timing out. It appears that the build hangs during the
> |  > >> testsuite run (e.g. [2]); it's not impossible that this is due to
> |  a
> |  > >> bug in the testsuite driver but I have been able to reproduce
> |  this
> |  > >> neither locally nor remotely on CircleCI infrastructure so it has
> |  > >> proved to be a tough nut to crack.
> |  > >>
> |  > >> Cheers,
> |  > >>
> |  > >> - Ben
> |  > >>
> |  > >> [1]
> |  > >>
> |  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci
> |  > >>
> |  .appveyor.com%2Fproject%2FGHCAppveyor%2Fghc&data=02%7C01%7Csimonpj%
> |  > >>
> |  40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b
> |  > >>
> |  0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=mDcBqtT9QibXc
> |  > >> ozn%2FWCPxr2mnHAKjPL3uP2mTZRIXt0%3D&reserved=0
> |  > >> [2]
> |  > >>
> |  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci
> |  > >>
> |  rcleci.com%2Fgh%2Fghc%2Fghc%2F1558&data=02%7C01%7Csimonpj%40microso
> |  > >>
> |  ft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b0c8589bcd
> |  > >>
> |  847f1c277%7C1%7C0%7C636533998742980043&sdata=nXIxvQCdxoV9U8mbsVdLNe
> |  > >> gX2cIKgoEUwkIbvFtODAM%3D&reserved=0
> |  > >>
> |  > >> _______________________________________________
> |  > >> Ghc-devops-group mailing list
> |  > >> Ghc-devops-group at haskell.org
> |  > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-
> |  group
> |  > >>
> |  > > _______________________________________________
> |  > > Ghc-devops-group mailing list
> |  > > Ghc-devops-group at haskell.org
> |  > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
> |  >
> |  _______________________________________________
> |  Ghc-devops-group mailing list
> |  Ghc-devops-group at haskell.org
> |  https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
> _______________________________________________
> Ghc-devops-group mailing list
> Ghc-devops-group at haskell.org
> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group



More information about the Ghc-devops-group mailing list