Shared dataset registry: design discussion
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3586840. --> Reported by: [zorz](https://www.drupal.org/user/1924846) >>> <h3 id="summary-problem-motivation">Problem/Motivation</h3> <p>This issue is a design conversation, filed as a sibling to <a href="https://www.drupal.org/project/ai_agents_test/issues/3585124">#3585124</a> (research: merging ai_agents_test with ai_eval). It is not a recommendation and not a unilateral commitment by the ai_eval maintainer. The purpose is to make the registry idea legible to the AI Initiative team so the boundary discussion can happen in writing, in parallel with the convergence ticket.</p> <p>Datasets in ai_eval today are flat YAML files at a configured path on the local filesystem. This works for single-site, single-team usage but creates friction in three real scenarios:</p> <ol> <li>Multi-site agencies maintaining the same eval suite across many client installations: copy-paste workflow, no shared source of truth.</li> <li>Reproducibility and traceability under EU AI Act style obligations: "what was version 1.2 of dataset X on date Y" has no immutable answer in a filesystem-based model.</li> <li>Convergence with ai_agents: <code>ai_agents_test</code> needs test fixtures of essentially the same shape as ai_eval datasets. Two parallel storage layers will diverge.</li> </ol> <p>A user-facing UI editor for local files (<a href="https://www.drupal.org/project/ai_eval/issues/3586770">#3586770</a>) would help case (1) but not (2) or (3), and risks freezing the wrong abstraction.</p> <h4 id="summary-steps-reproduce">Steps to reproduce</h4> <p>Not applicable. This is a design discussion.</p> <h3 id="summary-proposed-resolution">Proposed resolution</h3> <p>Two concerns, deliberately split so the boundary question can be answered without dragging the implementation question with it:</p> <ul> <li><strong>(a) Where the dataset entity lives.</strong> Schema, fields, validation rules, versioning semantics. This is the boundary question that overlaps with <a href="https://www.drupal.org/project/ai_agents_test/issues/3585124">#3585124</a>.</li> <li><strong>(b) The registry recipe.</strong> Versioned HTTP-served storage, browse/edit UI, OAuth2-protected write, audit log. This is independent of (a) once (a) is decided. It can ship in ai_eval regardless of where the entity lives.</li> </ul> <p>Three options for (a). No recommendation pending input from the AI Initiative team:</p> <h4>Option A: entity in <code>drupal/ai</code> core</h4> <ul> <li>Aligns with <a href="https://www.drupal.org/project/ai_agents_test/issues/3585124#comment-NNNNN">@lbesenyei's comment #4</a> on #3585124 ("shared test dataset config entity in core").</li> <li>ai_eval, ai_agents_test, AiLlm all consume the same entity type.</li> <li>Schema governance lives with the AI Initiative.</li> <li>ai_eval's role: graders, gates, optimizer, results. The registry recipe (b) still makes sense in ai_eval.</li> <li>Cost: requires a core change with a known timeline before downstream modules can rely on it.</li> </ul> <h4>Option B: entity in ai_eval</h4> <ul> <li>What the original draft of this issue assumed.</li> <li>ai_eval defines the entity, ai_agents_test reads it through the adapter pattern from #3585124.</li> <li>Schema governance lives with ai_eval, with explicit invitation for ai_agents_test contributors.</li> <li>Cost: schema decisions made in ai_eval may not get full AI Initiative buy-in retroactively.</li> </ul> <h4>Option C: hybrid</h4> <ul> <li>Entity type and base schema in <code>drupal/ai</code> core.</li> <li>Bundles, registry recipe, and editing UX in ai_eval.</li> <li>Per-domain extensions live on bundles defined by consumers (ai_eval ships <code>rag</code>, <code>agent</code>, <code>classification</code>, <code>judge_validation</code> bundles; ai_agents_test could ship <code>agent_fixture</code> as another bundle on the same entity type).</li> <li>Cost: more moving parts, slowest to design, but cleanest separation of governance from product.</li> </ul> <p>Concern (b), <strong>the registry recipe</strong>, sketched below for shape regardless of which option wins on (a):</p> <ul> <li>New module <code>ai_eval_registry</code> (sub-module of ai_eval). Recipe-installable. Any Drupal site can be a registry.</li> <li>JSON:API endpoints for the read hot path (<code>GET dataset {id} version {v}</code>). OAuth2 for write.</li> <li>UI: standard Drupal entity forms plus structured/YAML editor plus diff view plus per-version audit log.</li> <li>Consuming side: new <code>@AiEvalDatasetSource</code> plugin type. Existing file-based loader becomes the default <code>FileSource</code> plugin; new <code>RegistrySource</code> plugin for the registry. Eval target config grows a <code>dataset_source</code> field; existing configs default to file and keep working unchanged.</li> <li>Caching: stale-on-failure on the consumer side so registry downtime does not break eval runs.</li> </ul> <p>What this is not:</p> <ul> <li>Not a unilateral commitment. The boundary question (a) is open. Implementation depends on the answer.</li> <li>Not replacing file-based datasets. FileSource stays the default for single-site users. Registry is opt-in.</li> <li>Not a hosted-product pitch. Reference implementation is open-source Drupal; anyone can self-host. A public PointBlank instance might exist for convenience but is not the gating path.</li> <li>Not a position on graders or gates. Graders, gates, optimization, scoring, and judge validation stay in ai_eval regardless of where the dataset entity lives.</li> </ul> <h3 id="summary-remaining-tasks">Remaining tasks</h3> <p>Open decisions (gate code work):</p> <ol> <li><strong>Pick option A, B, or C for the dataset entity location.</strong> Input requested from @marcus_johansson, @lbesenyei, @yautja_cetanu. If the answer is "core" but the core change is not on a known timeline, please say so explicitly so consumers can decide whether to wait or to ship in ai_eval and migrate later.</li> <li><strong>Bundles or single-shape entity?</strong> Mature eval ecosystems (HuggingFace datasets, LangSmith, DeepEval, Argilla) all model per-domain extensions: a RAG eval row carries different fields from an agent-routing eval row from a classification eval row. Bundles, or an equivalent per-domain extension mechanism, look necessary regardless of which option above wins. Formal schema work to be tracked separately.</li> <li><strong>Convergence with ai_agents_test fixtures.</strong> Same entity type with separate bundles, or two distinct entity types?</li> <li><strong>Auth model for registry writes.</strong> OAuth2 against the registry, or piggyback on the consuming site's existing auth?</li> <li><strong>Canonical instance.</strong> Self-host only? PointBlank-hosted reference instance? Long-term move to a drupal.org-hosted official registry? This is the lowest-priority decision and can be revisited.</li> </ol> <p>Implementation tasks (gated on the decisions above):</p> <ul> <li>Define <code>eval_dataset</code> entity schema in the agreed module location, with versioning semantics.</li> <li>Implement <code>@AiEvalDatasetSource</code> plugin type in ai_eval. Refactor existing loader into <code>FileSource</code> plugin.</li> <li>Implement <code>RegistrySource</code> plugin (HTTP fetch, cache, stale-on-failure).</li> <li>Implement registry-side: entity (or entity consumer), JSON:API config, OAuth2 scopes, admin UI.</li> <li>Recipe to install registry mode on a fresh site.</li> <li>Migration drush command for existing local datasets.</li> <li>Documentation in README and integration.md.</li> </ul> <h3 id="summary-ui-changes">User interface changes</h3> <p>On the consuming Drupal site (any option):</p> <ul> <li>Eval target form gains a <code>dataset_source</code> select (file, registry).</li> <li>When source is registry, dataset field becomes a structured input: registry URL plus dataset ID plus version pin.</li> <li>Settings form gains a registry credentials section (OAuth2 client ID/secret) when at least one target uses a registry source.</li> </ul> <p>On a registry-mode site (new, ships in ai_eval regardless of option):</p> <ul> <li>Admin browse/list of dataset entities with version history and diff view.</li> <li>Entity form for creating and editing datasets, with structured editor (YAML or table view) and JSON-schema validation surfaced inline.</li> <li>Per-version immutable audit log view.</li> </ul> <h3 id="summary-api-changes">API changes</h3> <p>Independent of which option wins on (a):</p> <ul> <li>New plugin type <code>@AiEvalDatasetSource</code> in ai_eval. Existing direct file-loader call sites refactor to go through the plugin manager. <code>FileSource</code> ships as the default plugin and preserves current behavior.</li> <li>New service interface for dataset retrieval: <code>DatasetSourceInterface::load(string $id, ?string $version): array</code>.</li> <li><code>EvalTarget</code> config entity grows a <code>dataset_source</code> property (default: <code>file</code>). Existing config keeps working without change.</li> <li>JSON:API endpoints exposed by registry-mode sites: <ul> <li><code>GET /jsonapi/eval_dataset/{id}/version/{v}</code>: read.</li> <li><code>POST /jsonapi/eval_dataset/{id}/versions</code>: write (creates new immutable version), OAuth2 protected.</li> <li><code>GET /jsonapi/eval_dataset/{id}/versions</code>: list versions.</li> </ul> </li> <li>OAuth2 scopes: <code>ai_eval_registry:read</code>, <code>ai_eval_registry:write</code>.</li> </ul> <p>Specific to option B (entity in ai_eval) only:</p> <ul> <li>ai_eval ships the entity type definition and bundle storage handlers.</li> </ul> <p>Specific to options A and C (entity in core):</p> <ul> <li>Core ships the entity type. ai_eval ships bundles (option C) or just consumes (option A).</li> </ul> <h3 id="summary-data-model-changes">Data model changes</h3> <p>On the consuming site (any option):</p> <ul> <li><code>EvalTarget</code> config entity adds <code>dataset_source</code> string property (default: <code>file</code>).</li> <li>No schema change for existing tables.</li> </ul> <p>Dataset entity (location depends on outcome of decision 1):</p> <ul> <li>Content entity with revisions enabled.</li> <li>Bundles by dataset type (<code>rag</code>, <code>agent</code>, <code>classification</code>, <code>judge_validation</code>) for per-domain field schemas, in line with conventions across mature eval ecosystems. Single-shape entity left as an option for discussion under decision 2 above.</li> <li>Core fields: <code>label</code>, <code>machine_name</code>, <code>owner</code>, <code>visibility</code> (public/private/org), <code>tags</code>, <code>schema_version</code>, <code>questions</code> (JSON payload), <code>changelog</code>, <code>created</code>, <code>changed</code>.</li> <li>Bundle-specific fields declared by the bundle's owning module.</li> <li>Revisions are immutable once published. Edits create new revisions; old revisions remain addressable by version pin.</li> <li>Audit log: rely on Drupal core revisions plus standard <code>watchdog</code> entries. No new tables.</li> </ul> > Related issue: [Issue #3586770](https://www.drupal.org/node/3586770) > Related issue: [Issue #3585124](https://www.drupal.org/node/3585124)
issue