Add non-required visual regression tests for the four built UI elements
--- AI TRACKER METADATA ---
Update Summary: Add non-required visual regression tests for the four built UI elements
Check-in Date: MM/DD/YYYY (US format) [When we should see progress/get an update]
Due Date: MM/DD/YYYY (US format) [When the issue should be fully completed]
Blocked by: [#XXXXXX] (New issues on new lines)
Additional Collaborators:
AI Tracker found here: https://www.drupalstarforge.ai/
--- END METADATA ---
### Problem/Motivation
We ship four compiled front-end bundles, built by the `js_build` CI job:
| # | Element | Source | Build output |
|---|---------|--------|--------------|
| 1 | mdxeditor | `ui/mdxeditor` | `ui/mdxeditor/dist` |
| 2 | json-schema-editor | `ui/json-schema-editor` | `ui/json-schema-editor/dist` |
| 3 | default-tools-editor | `ui/default-tools-editor` | `ui/default-tools-editor/dist` |
| 4 | CKEditor plugin (ai_ckeditor) | `modules/ai_ckeditor` | `modules/ai_ckeditor/js/build` |
A build can succeed (npm exits 0) while the rendered widget is broken — missing CSS, a component that fails to mount, a bundler/dependency change that silently drops part of the UI. Our PHPUnit suite does not catch "it compiled but looks wrong". We want a lightweight visual check that each of the four elements **renders** the way we expect, so a bad build is visible in MR pipelines.
### Goal
Add a visual regression test that, for each of the four built elements:
1. Renders the built widget in a browser (we already run FunctionalJavascript against Selenium in CI).
2. Captures a screenshot.
3. Compares it pixel-for-pixel against a committed baseline image using ImageMagick.
4. **Fails the job if the difference exceeds the threshold**, but the job is **not required** (`allow_failure: true`) so it never blocks a merge — it surfaces as a warning.
5. **Stores the captured screenshot as a CI artifact**, so when a change is legitimate the maintainer can download it and commit it as the new baseline.
6. On failure, prints an error message that **links to the documentation** explaining how to review the diff and update the baseline.
### Proposed resolution
#### Test implementation
- One FunctionalJavascript test (extending the existing `BaseClassFunctionalJavascriptTests`, which already has `takeScreenshot()` / `getDriver()->getScreenshot()`), with a data provider / one method per element. For each element: visit the route/page that renders it, wait for the JS app to finish mounting, then screenshot.
- The CKEditor case needs a node/edit form with a CKEditor-enabled field and the AI plugin loaded.
- The three `ui/` apps need a host page that mounts each built bundle (a small test route in a test module, or the real admin screen where each is used).
- Fix the **viewport size** (e.g. 1280×900) and disable animations/caret blink so screenshots are deterministic.
#### Baselines (in repo)
- Commit one baseline PNG per element under, e.g., `tests/visual-regression/baselines/{mdxeditor,json-schema-editor,default-tools-editor,ckeditor}.png`.
- **Baselines must be generated inside the CI Chromium/Selenium image**, not on a developer laptop — host font/antialiasing differences are the #1 cause of false diffs. The docs (below) describe how to regenerate them in-pipeline.
#### Comparison + threshold
Use ImageMagick `compare`, counting differing pixels as a fraction of the total, with a fuzz factor to absorb subpixel/antialiasing noise:
```bash
# returns the count of pixels differing beyond the fuzz tolerance
diff=$(compare -metric AE -fuzz 5% baseline.png actual.png /tmp/diff.png 2>&1)
total=$(identify -format "%[fx:w*h]" baseline.png)
pct=$(awk "BEGIN { print ($diff / $total) * 100 }")
# fail if pct > THRESHOLD
```
**Recommended threshold: 0.1% of pixels, with `-fuzz 5%`.** Rationale: the original suggestion of 0.02% is likely too tight — Chromium antialiasing and font subpixel rendering routinely produce diffs in the 0.02–0.05% range between otherwise-identical runs, which would make the check flaky. 0.1% + a small fuzz still trivially catches the failures we care about (blank/unstyled widget, missing component = many percent different) while tolerating render jitter. Treat this as a **starting value to tune** once we see the spread of real baseline noise; tighten it if runs prove stable. The `diff.png` (highlighted delta) should also be saved as an artifact.
#### CI
- Either extend the existing FunctionalJavascript job or add a dedicated `visual_regression` job that `needs: [js_build]` (it requires the built artifacts).
- `allow_failure: true`.
- `artifacts:` should always upload the captured screenshots and `diff.png` (`when: always`), named so they can be dropped straight into `tests/visual-regression/baselines/` to promote them.
#### Error message → documentation
On failure the test/job must print something like:
> Visual regression for **mdxeditor** exceeded threshold (0.34% > 0.1%). The element may not have built correctly, or this is an intentional change. To review the diff and update the baseline, see: https://project.pages.drupalcode.org/ai/developers/visual_regression_testing/ (docs/developers/visual_regression_testing.md)
### Documentation (part of this issue)
Add `docs/developers/visual_regression_testing.md` and a nav entry in `mkdocs.yml`, covering:
1. What the test does and which four elements it covers.
2. How to read the failure (threshold, where the captured screenshot + `diff.png` artifacts are in the pipeline).
3. How to decide if a diff is a real regression or an intentional change.
4. **How to update a baseline**: download the captured screenshot artifact from the failing job and commit it over the matching file in `tests/visual-regression/baselines/`, regenerating in the CI image so fonts/antialiasing match.
5. How to add a fifth element when a new built UI is introduced.
### Remaining tasks
- [ ] Decide host pages/routes to render each of the 4 elements (incl. CKEditor field setup).
- [ ] Add the FunctionalJavascript test + deterministic viewport/animation settings.
- [ ] Generate and commit the 4 baseline PNGs from the CI image.
- [ ] Add the comparison script + threshold (0.1% / fuzz 5%, tunable).
- [ ] Wire up the CI job: `needs: js_build`, `allow_failure: true`, `artifacts: when: always`.
- [ ] Failure message links to the doc.
- [ ] Write `docs/developers/visual_regression_testing.md` + add to `mkdocs.yml` nav.
### Considerations / gotchas
- **Environment determinism is everything** — baselines must be regenerated in the CI image; otherwise every contributor sees false diffs.
- Pin the Chromium/driver versions if possible; a browser upgrade will require regenerating all baselines (document this).
- Keep screenshots reasonably small/cropped to the widget to reduce noise and artifact size.
issue