CI: auto-retry known random Cypress E2E failures + PHPUnit's dreaded "RecursiveDirectoryIterator failed to open directory" (since Drupal 11.2) + ComponentAudit DB-dependent result order + Playwright flakiness
>>> [!note] Migrated issue
<!-- Drupal.org comment -->
<!-- Migrated from issue #3571997. -->
Reported by: [wim leers](https://www.drupal.org/user/99777)
Related to !617 !616 !591 !552
>>>
<h3 id="overview">Overview</h3>
<h4>PHPUnit</h4>
<p>1️⃣ 97% of the PHPUnit CI job failures are due to this pattern:</p>
<pre>UnexpectedValueException: RecursiveDirectoryIterator::__construct(/builds/project/canvas/web/sites/simpletest/36558090/files/config/sync): Failed to open directory: No such file or directory</pre><p>
— example: <a href="https://git.drupalcode.org/project/canvas/-/jobs/8379417#L1868">https://git.drupalcode.org/project/canvas/-/jobs/8379417#L1868</a></p>
<p>This has only started happening ever since Drupal 11.2 — before that time, this <em>never</em> happened. IOW: it's due to upstream changes: either in Drupal core, or in PHPunit or some other dependency.</p>
<p>It sounds very race condition-y (and it probably is), but fixing this is out of scope for Canvas. The fact that it jumps arbitrarily between different tests upon retrying manually confirms this.</p>
<p>2️⃣ 2% of PHPUnit tests fail due to DB-specific sorts: <a href="https://git.drupalcode.org/project/canvas/-/jobs/8416113">https://git.drupalcode.org/project/canvas/-/jobs/8416113</a></p>
<p>4️⃣ 1% of PHPUnit tests fail due to <code>ComponentAudit</code> suffering from DB-specific sorts: <a href="https://www.drupal.org/project/canvas/issues/3571997#comment-16473385">#32</a> + <a href="https://www.drupal.org/project/canvas/issues/3571997#comment-16474925">#37</a></p>
<h4>Cypress E2E</h4>
<p>3️⃣ Similar story as 1️⃣ for some of the Cypress E2E tests. They're a lot more reliable now than a year ago, but sometimes (seemingly when there's high CI infra load), there's a very high failure rate. And we find ourselves re-testing time and time again.</p>
<p>Most notably: <code>global-regions-interact.cy.js</code>.</p>
<h4>Playwright</h4>
<p><strong>5️⃣ There's a low failure rate, but given 1️⃣ + 2️⃣ + 3️⃣ have been solved, it's now the leading cause of CI failure!</strong></p>
<p>See <a href="https://www.drupal.org/project/canvas/issues/3571997#comment-16474940">#39</a> for details + proposal.</p>
<h3 id="proposed-resolution">Proposed resolution</h3>
<p>Automatically retry, so we don't have to retry manually! 😬</p>
<h3 id="ui-changes">User interface changes</h3>
<p>None, except for us working on Canvas:<br>
<img src="https://www.drupal.org/files/issues/2026-02-06/Screenshot%202026-02-06%20at%205.25.57%E2%80%AFPM.png"></p>
<p>🥳</p>
> Related issue: [Issue #2862699](https://www.drupal.org/node/2862699)
> Related issue: [Issue #3572371](https://www.drupal.org/node/3572371)
> Related issue: [Issue #3562563](https://www.drupal.org/node/3562563)
> Related issue: [Issue #3582249](https://www.drupal.org/node/3582249)
issue