Add non-required visual regression tests for the four built UI elements (#3586461) · Issues · project / ai

Add non-required visual regression tests for the four built UI elements

--- AI TRACKER METADATA --- Update Summary: Add non-required visual regression tests for the four built UI elements Check-in Date: MM/DD/YYYY (US format) [When we should see progress/get an update] Due Date: MM/DD/YYYY (US format) [When the issue should be fully completed] Blocked by: [#XXXXXX] (New issues on new lines) Additional Collaborators: AI Tracker found here: https://www.drupalstarforge.ai/ --- END METADATA --- ### Problem/Motivation We ship four compiled front-end bundles, built by the `js_build` CI job: | # | Element | Source | Build output | |---|---------|--------|--------------| | 1 | mdxeditor | `ui/mdxeditor` | `ui/mdxeditor/dist` | | 2 | json-schema-editor | `ui/json-schema-editor` | `ui/json-schema-editor/dist` | | 3 | default-tools-editor | `ui/default-tools-editor` | `ui/default-tools-editor/dist` | | 4 | CKEditor plugin (ai_ckeditor) | `modules/ai_ckeditor` | `modules/ai_ckeditor/js/build` | A build can succeed (npm exits 0) while the rendered widget is broken — missing CSS, a component that fails to mount, a bundler/dependency change that silently drops part of the UI. Our PHPUnit suite does not catch "it compiled but looks wrong". We want a lightweight visual check that each of the four elements **renders** the way we expect, so a bad build is visible in MR pipelines. ### Goal Add a visual regression test that, for each of the four built elements: 1. Renders the built widget in a browser (we already run FunctionalJavascript against Selenium in CI). 2. Captures a screenshot. 3. Compares it pixel-for-pixel against a committed baseline image using ImageMagick. 4. **Fails the job if the difference exceeds the threshold**, but the job is **not required** (`allow_failure: true`) so it never blocks a merge — it surfaces as a warning. 5. **Stores the captured screenshot as a CI artifact**, so when a change is legitimate the maintainer can download it and commit it as the new baseline. 6. On failure, prints an error message that **links to the documentation** explaining how to review the diff and update the baseline. ### Proposed resolution #### Test implementation - One FunctionalJavascript test (extending the existing `BaseClassFunctionalJavascriptTests`, which already has `takeScreenshot()` / `getDriver()->getScreenshot()`), with a data provider / one method per element. For each element: visit the route/page that renders it, wait for the JS app to finish mounting, then screenshot. - The CKEditor case needs a node/edit form with a CKEditor-enabled field and the AI plugin loaded. - The three `ui/` apps need a host page that mounts each built bundle (a small test route in a test module, or the real admin screen where each is used). - Fix the **viewport size** (e.g. 1280×900) and disable animations/caret blink so screenshots are deterministic. #### Baselines (in repo) - Commit one baseline PNG per element under, e.g., `tests/visual-regression/baselines/{mdxeditor,json-schema-editor,default-tools-editor,ckeditor}.png`. - **Baselines must be generated inside the CI Chromium/Selenium image**, not on a developer laptop — host font/antialiasing differences are the #1 cause of false diffs. The docs (below) describe how to regenerate them in-pipeline. #### Comparison + threshold Use ImageMagick `compare`, counting differing pixels as a fraction of the total, with a fuzz factor to absorb subpixel/antialiasing noise: ```bash # returns the count of pixels differing beyond the fuzz tolerance diff=$(compare -metric AE -fuzz 5% baseline.png actual.png /tmp/diff.png 2>&1) total=$(identify -format "%[fx:w*h]" baseline.png) pct=$(awk "BEGIN { print ($diff / $total) * 100 }") # fail if pct > THRESHOLD ``` **Recommended threshold: 0.1% of pixels, with `-fuzz 5%`.** Rationale: the original suggestion of 0.02% is likely too tight — Chromium antialiasing and font subpixel rendering routinely produce diffs in the 0.02–0.05% range between otherwise-identical runs, which would make the check flaky. 0.1% + a small fuzz still trivially catches the failures we care about (blank/unstyled widget, missing component = many percent different) while tolerating render jitter. Treat this as a **starting value to tune** once we see the spread of real baseline noise; tighten it if runs prove stable. The `diff.png` (highlighted delta) should also be saved as an artifact. #### CI - Either extend the existing FunctionalJavascript job or add a dedicated `visual_regression` job that `needs: [js_build]` (it requires the built artifacts). - `allow_failure: true`. - `artifacts:` should always upload the captured screenshots and `diff.png` (`when: always`), named so they can be dropped straight into `tests/visual-regression/baselines/` to promote them. #### Error message → documentation On failure the test/job must print something like: > Visual regression for **mdxeditor** exceeded threshold (0.34% > 0.1%). The element may not have built correctly, or this is an intentional change. To review the diff and update the baseline, see: https://project.pages.drupalcode.org/ai/developers/visual_regression_testing/ (docs/developers/visual_regression_testing.md) ### Documentation (part of this issue) Add `docs/developers/visual_regression_testing.md` and a nav entry in `mkdocs.yml`, covering: 1. What the test does and which four elements it covers. 2. How to read the failure (threshold, where the captured screenshot + `diff.png` artifacts are in the pipeline). 3. How to decide if a diff is a real regression or an intentional change. 4. **How to update a baseline**: download the captured screenshot artifact from the failing job and commit it over the matching file in `tests/visual-regression/baselines/`, regenerating in the CI image so fonts/antialiasing match. 5. How to add a fifth element when a new built UI is introduced. ### Remaining tasks - [ ] Decide host pages/routes to render each of the 4 elements (incl. CKEditor field setup). - [ ] Add the FunctionalJavascript test + deterministic viewport/animation settings. - [ ] Generate and commit the 4 baseline PNGs from the CI image. - [ ] Add the comparison script + threshold (0.1% / fuzz 5%, tunable). - [ ] Wire up the CI job: `needs: js_build`, `allow_failure: true`, `artifacts: when: always`. - [ ] Failure message links to the doc. - [ ] Write `docs/developers/visual_regression_testing.md` + add to `mkdocs.yml` nav. ### Considerations / gotchas - **Environment determinism is everything** — baselines must be regenerated in the CI image; otherwise every contributor sees false diffs. - Pin the Chromium/driver versions if possible; a browser upgrade will require regenerating all baselines (document this). - Keep screenshots reasonably small/cropped to the widget to reduce noise and artifact size.

issue