Add a Moderation Guardrail plugin (configurable moderation provider/model as a guardrail)
## Goal
Add a single configurable **Moderation Guardrail** plugin to the **Guardrails** system, so any moderation provider/model can be used as a guardrail. This makes "each moderation provider becomes a possible guardrail" achievable through configuration of one plugin, and is the foundation for folding external moderation into Guardrails.
This issue covers **only creating the plugin**. The migration of existing `ai.external_moderation` configuration onto guardrail sets, and the deprecation of the old moderation runner, are handled separately in **#3586528** (which is blocked by this issue).
## Background
External moderation currently lives in AI Core as a parallel mechanism to Guardrails (#3479913):
- **Config:** `ai.external_moderation` → key `moderations` (schema `type: ignore`). Each entry is `{ provider: <chat provider id>, models: ["<providerId>__<modelId>", …], tags: "<comma,separated>" }`.
- **Runtime:** `src/EventSubscriber/ModeratePreRequestEventSubscriber.php` subscribes to `PreGenerateResponseEvent`, matches entries by the request's **provider id** and **tags** (`matchConfigs()`), runs each configured moderation model via `$provider->moderation($input, $model_id)->getNormalized()->isFlagged()`, and throws `AiUnsafePromptException` when flagged.
So moderation is currently: **pre-request only, input only, hard-stop, scoped by provider + tags**, invoked through the `moderation` operation type.
Guardrails (also core) is the newer abstraction for exactly this job:
- **Plugin type** `ai_guardrail` — attribute `Drupal\ai\Attribute\AiGuardrail`, manager `plugin.manager.ai_guardrail`, namespace `Plugin/AiGuardrail`, base `AiGuardrailPluginBase`, interface `AiGuardrailInterface` (`label()`, `isAvailable()`, `processInput(InputInterface): GuardrailResultInterface`, `processOutput(OutputInterface): GuardrailResultInterface`). `NonDeterministicGuardrailInterface` marks guardrails that need an AI provider — moderation is exactly this.
- **Result types:** `PassResult`, `StopResult` (carries `getScore()`), `RewriteInputResult`, `RewriteOutputResult`.
- **Config entities:** `ai_guardrail` (a configured plugin instance: `guardrail` + `guardrail_settings`) and `ai_guardrail_set` (`stop_threshold`, `pre_generate_guardrails`, `post_generate_guardrails`).
Making moderation "just another guardrail" lets it reuse the existing global/agent/automator wiring (e.g. the agent `guardrail_set` property, and #3586447 for Automators).
## Proposed approach
Add a **Moderation Guardrail** plugin in `src/Plugin/AiGuardrail/ModerationGuardrail.php`:
- `#[AiGuardrail(id: 'moderation_guardrail', label: new TranslatableMarkup('Moderation Guardrail'), description: …)]`
- implements `NonDeterministicGuardrailInterface` (needs an AI provider) and `NonStreamableGuardrailInterface`.
- A **single configurable plugin** (not a per-provider deriver): its `guardrail_settings` select the moderation **provider** + **model** (and optional provider config). Site builders create one configured `ai_guardrail` entity per moderation provider/model they want.
- **Input only:** implement `processInput()` — run `$provider->moderation($input, $model_id)->getNormalized()->isFlagged()` and return a `StopResult` (with a score) when flagged, `PassResult` otherwise. `processOutput()` returns a pass (no output moderation in this scope). This preserves today's input-only behaviour; output moderation can be a follow-up.
- **Stop via result, not exception:** the throw-based `AiUnsafePromptException` stop is replaced by returning `StopResult`; the guardrail set's `stop_threshold` decides whether the request halts. A `StopResult` is sufficient — no BC shim for the old exception is planned.
- New schema: `ai.guardrail.settings.moderation_guardrail` (provider id, model id, optional `llm_config`-style mapping, violation message).
## Resolved decisions
* **Stop semantics** → returning a `StopResult` (with `stop_threshold`) is sufficient; the old `AiUnsafePromptException` path is dropped, no compatibility shim.
* **Plugin shape** → a single configurable `moderation_guardrail` plugin (provider/model chosen in settings), not a per-provider deriver.
* **Output moderation** → input only for this issue; `processOutput()` is a no-op pass. Output moderation can be a follow-up.
* **Naming** → the plugin is named **Moderation Guardrail** (`moderation_guardrail`). See #3586471.
## Resources
* Migration / deprecation sibling (blocked by this issue): #3586528
* External moderation → core: #3479913
* Guardrails naming: #3586471 · Guardrails on Automators: #3586447 · During-generate modes: #3586491
* Current runner (reference for behaviour to reproduce): `src/EventSubscriber/ModeratePreRequestEventSubscriber.php`
* Guardrail plugin type: `src/Attribute/AiGuardrail.php`, `src/Guardrail/AiGuardrailInterface.php`, `src/Guardrail/AiGuardrailPluginManager.php`
* Guardrail config entities: `src/Entity/AiGuardrail.php`, `src/Entity/AiGuardrailSet.php`
## Decision
<!--Fill in before closing: summarise what was decided and the key reason. Leave empty until resolved.-->
issue