<div dir="ltr"><div dir="ltr"><div>Hi all,</div><div><br></div><div>This weekend at Zurihac, I wanted to start making it possible to explore fragile tests.</div><div><br></div><div>Fragile tests are tests that pass nondeterministically. Marking them fragile means that they do not influence the overall success of the testsuite. It's been a way of sweeping problems under the rug.</div><div></div><div><br></div><div>For a few years, the GHC test infrastructure has been recording fragile test results into a database. We record whether the test passed or failed.<br></div><div><br></div><div>Now we can start peeking under the rug to see what kind of patterns have developed.</div><div><br></div><div>I was mostly interested in whether or not fragile tests are truly fragile. It looks like many are not.<br></div><div><br></div><div>The dashboard has bucketed results into months. Every cell shows the pass rate for a given month (column) and test (row).</div><div><br></div><div>Red cells mean 0% success. Green cells, 100% success. Shades of yellow are everything in between.</div><div><br></div><div>It's remarkable how much green there is. One would assume even fragile tests would fail for legitimate reasons sometimes!</div><div><br></div><div>At this point, the visualization is mostly good for seeing large trends. Any other observations would be appreciated!</div><div><br></div><div>The dashboard is interactive at <a href="https://grafana.gitlab.haskell.org/goto/XcuO3ZUIg?orgId=2">https://grafana.gitlab.haskell.org/goto/XcuO3ZUIg?orgId=2</a> <br></div><div><br></div><div><img src="cid:ii_lx90t5tn0" alt="image.png" width="558" height="265"><br></div><div><br></div><div>-Bryan<br></div><div><br></div><div><br></div></div></div>