Windows testsuite failures

Phyx lonetiger at gmail.com
Fri Jan 17 07:38:52 UTC 2020


On Fri, Jan 17, 2020 at 7:02 AM Ömer Sinan Ağacan <omeragacan at gmail.com>
wrote:

> > Now we have rewritten the CI and it's pointing out actual issues in the
> > compiler. And your suggestion is well let's just ignore it.
>
> When is the last time Windows CI caught an actual bug? All I see is random
> system failures [1, 2, 3].
>

[1]: Symbolic link privileges are missing from the CI user, something has
gone wrong with the permissions on that slave.
      There's code in the testsuite to symlink or copy. Should fix the
permissions, or add permission detection to the python code or switch to
copy.
[2]: git checkout error, disk probably full. Testsuite runs tend to create
a lot of temp files which aren't cleaned up. Over time the disk fills and
you get errors such as these.
      There's a cron job to periodically clean these, but of course that is
prone to a race condition. This can be made more reliable by using OS event
triggers instead of a cron job.
      i.e. monitor disk 80% full events and run the cleanup.
[3]: It's either trying to execute a non-executable file or something it
executed loaded a shared library for a different architecture. Hard to tell
which one by just that output. Will need more logs.

Now to answer your question [4] and [5] are issues the CI caught that were
quite important.

[4] https://gitlab.haskell.org/ghc/ghc/issues/17480
[5] https://gitlab.haskell.org/ghc/ghc/issues/17691

just to name two, I can go on, plugin test failures, which pointed out
someone submitted an patch tested only on ELF that broke loading on plugins
as non-shared objects, etc.
The list is quite long.


>
> It must be catching *some* bugs, but that's a rare event in my experience.
>

Sure if you go "ahh it's just Windows that's broken" and don't look at the
underlying issues.


> Sure, I don't write Windows-specific code (e.g. IO manager, or library
> code),
> but then why am I fighting the Windows CI literally every day, it makes no
> sense. Give an option to skip Windows CI for my patches.
>
> > How about you use some of that energy to help I stead of taking the easy
> way?
> > And I bet you're going to say you don't care about Windows to which I
> would
> > say I don't care about the non-threaded runtime and wish we would get
> rid of
> > it. But can't always get what you want.
>
> I'm not suggesting we release buggy GHCs for Windows or stop Windows
> support.
>

I'm sorry, how is disabling the Windows CI not exactly that? If you
Disabling the CI just means you test it even less.
You test it even less means by the time you get to testing it the issues
are too many to fix. Over time you just stop
trying and stop releasing it.  So sorry, how *exactly* is your suggestion
not exactly that.


>
> > And to say we'll actually fix anything before release doesn't align with
> what
> > I've seen so far, which had me scrambling last minute to ensure we can
> release
> > Windows instead of making releases without it.
>
> Are you saying we skip a platform we support when it's buggy? That makes no
> sense. I don't know when did Windows become a first-tier platform but
> since it
> is now we should be releasing Windows binaries similar to Linux and OSX
> binaries.
>

It's *always* been a tier one platform as far as I can tell. It's certainly
been for the past 6 years.


> It's not uncommon to do some testing for every patch, and do more
> comprehensive
> testing before releases. We did this many times in other projects in the
> past
> and I know some other compilers do this today.
>

Yes, but a project that doesn't test a tier one platform during
development, which is what your want to do
means it's not tier one. Which means you won't fix it for release.


>
> > Quite frankly I don't need you to tell me to submit MRs to fix it since
> that's
> > what I spent again a lot of time doing. Or maybe you would like to pay my
> > paycheck so I can spend more than a considerable amount of my free time
> on it.
>
> I wish someone paid me for the time I wasted because I'm only paid by the
> time I
> spend productively. I'd be happier waiting for the CI then.
>

Yeah, not waiting for CI is how we got in this mess in the first place.

Tamar.


> Ömer
>
> [1]: https://gitlab.haskell.org/ghc/ghc/-/jobs/237457
> [2]: https://gitlab.haskell.org/osa1/ghc/-/jobs/238236
> [3]: https://gitlab.haskell.org/osa1/ghc/-/jobs/237279
>
> Phyx <lonetiger at gmail.com>, 17 Oca 2020 Cum, 09:49 tarihinde şunu yazdı:
> >
> > Oh I spent a non-insignificant amount of time back in the phabricator
> days to make the CI stable. Now because people were committing to master
> directly without going through CI it was always a cat and mouse game and I
> gave up eventually.
> >
> > Now we have rewritten the CI and it's pointing out actual issues in the
> compiler. And your suggestion is well let's just ignore it.
> >
> > How about you use some of that energy to help I stead of taking the easy
> way? And I bet you're going to say you don't care about Windows to which I
> would say I don't care about the non-threaded runtime and wish we would get
> rid of it. But can't always get what you want.
> >
> > And to say we'll actually fix anything before release doesn't align with
> what I've seen so far, which had me scrambling last minute to ensure we can
> release Windows instead of making releases without it.
> >
> > Quite frankly I don't need you to tell me to submit MRs to fix it since
> that's what I spent again a lot of time doing. Or maybe you would like to
> pay my paycheck so I can spend more than a considerable amount of my free
> time on it.
> >
> > Kind regards,
> > Tamar
> >
> >
> > Sent from my Mobile
> >
> > On Fri, Jan 17, 2020, 06:17 Ömer Sinan Ağacan <omeragacan at gmail.com>
> wrote:
> >>
> >> We release more often than once in 6 months.
> >>
> >> We clearly have no idea how to test on Windows. If you know how to do
> it then
> >> feel free to submit a MR. Otherwise blocking every MR indefinitely is
> worse than
> >> testing Windows less frequently.
> >>
> >> Ömer
> >>
> >> Phyx <lonetiger at gmail.com>, 17 Oca 2020 Cum, 09:10 tarihinde şunu
> yazdı:
> >> >
> >> > Sure because only testing once every 6 months is a very very good
> idea...
> >> >
> >> > Sent from my Mobile
> >> >
> >> > On Fri, Jan 17, 2020, 06:03 Ömer Sinan Ağacan <omeragacan at gmail.com>
> wrote:
> >> >>
> >> >> Hi Ben,
> >> >>
> >> >> Can we please disable Windows CI? I've spent more time fighting the
> CI than
> >> >> doing useful work this week, it's really frustrating.
> >> >>
> >> >> Since we have no idea how to fix it maybe we should test Windows
> only before a
> >> >> release, manually (and use bisect in case of regressions).
> >> >>
> >> >> Ömer
> >> >>
> >> >> Ben Gamari <ben at smart-cactus.org>, 14 Oca 2020 Sal, 14:30 tarihinde
> şunu yazdı:
> >> >> >
> >> >> > Hi all,
> >> >> >
> >> >> > Currently Windows CI is a bit flaky due to some unfortunately
> rather elusive testsuite driver bugs. Progress in resolving this has been a
> bit slow due to travel over the last week but I will be back home tomorrow
> and should be able to resolve the issue soon thereafter.
> >> >> >
> >> >> > Cheers,
> >> >> >
> >> >> > - Ben
> >> >> > _______________________________________________
> >> >> > ghc-devs mailing list
> >> >> > ghc-devs at haskell.org
> >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
> >> >> _______________________________________________
> >> >> ghc-devs mailing list
> >> >> ghc-devs at haskell.org
> >> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20200117/57e8ed09/attachment.html>


More information about the ghc-devs mailing list