CI: auto-retry known random Cypress E2E failures + PHPUnit's dreaded "RecursiveDirectoryIterator failed to open directory" (since Drupal 11.2) + ComponentAudit DB-dependent result order + Playwright flakiness
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3571997. --> Reported by: [wim leers](https://www.drupal.org/user/99777) Related to !617 !616 !591 !552 >>> <h3 id="overview">Overview</h3> <h4>PHPUnit</h4> <p>1&#65039;&#8419; 97% of the PHPUnit CI job failures are due to this pattern:</p> <pre>UnexpectedValueException: RecursiveDirectoryIterator::__construct(/builds/project/canvas/web/sites/simpletest/36558090/files/config/sync): Failed to open directory: No such file or directory</pre><p> &mdash; example: <a href="https://git.drupalcode.org/project/canvas/-/jobs/8379417#L1868">https://git.drupalcode.org/project/canvas/-/jobs/8379417#L1868</a></p> <p>This has only started happening ever since Drupal 11.2 &mdash; before that time, this <em>never</em> happened. IOW: it's due to upstream changes: either in Drupal core, or in PHPunit or some other dependency.</p> <p>It sounds very race condition-y (and it probably is), but fixing this is out of scope for Canvas. The fact that it jumps arbitrarily between different tests upon retrying manually confirms this.</p> <p>2&#65039;&#8419; 2% of PHPUnit tests fail due to DB-specific sorts: <a href="https://git.drupalcode.org/project/canvas/-/jobs/8416113">https://git.drupalcode.org/project/canvas/-/jobs/8416113</a></p> <p>4&#65039;&#8419; 1% of PHPUnit tests fail due to <code>ComponentAudit</code> suffering from DB-specific sorts: <a href="https://www.drupal.org/project/canvas/issues/3571997#comment-16473385">#32</a> + <a href="https://www.drupal.org/project/canvas/issues/3571997#comment-16474925">#37</a></p> <h4>Cypress E2E</h4> <p>3&#65039;&#8419; Similar story as 1&#65039;&#8419; for some of the Cypress E2E tests. They're a lot more reliable now than a year ago, but sometimes (seemingly when there's high CI infra load), there's a very high failure rate. And we find ourselves re-testing time and time again.</p> <p>Most notably: <code>global-regions-interact.cy.js</code>.</p> <h4>Playwright</h4> <p><strong>5&#65039;&#8419; There's a low failure rate, but given 1&#65039;&#8419; + 2&#65039;&#8419; + 3&#65039;&#8419; have been solved, it's now the leading cause of CI failure!</strong></p> <p>See <a href="https://www.drupal.org/project/canvas/issues/3571997#comment-16474940">#39</a> for details + proposal.</p> <h3 id="proposed-resolution">Proposed resolution</h3> <p>Automatically retry, so we don't have to retry manually! &#128556;</p> <h3 id="ui-changes">User interface changes</h3> <p>None, except for us working on Canvas:<br> <img src="https://www.drupal.org/files/issues/2026-02-06/Screenshot%202026-02-06%20at%205.25.57%E2%80%AFPM.png"></p> <p>&#129395;</p> > Related issue: [Issue #2862699](https://www.drupal.org/node/2862699) > Related issue: [Issue #3572371](https://www.drupal.org/node/3572371) > Related issue: [Issue #3562563](https://www.drupal.org/node/3562563) > Related issue: [Issue #3582249](https://www.drupal.org/node/3582249)
issue