Feature: Translation result caching and cross-field deduplication
## Problem/Motivation
`ai_translate` re-sends every field to the LLM on every translation run, with no memory of previous translations. This causes two avoidable costs:
1. **Re-translation of unchanged content.** On a re-translation run (fixing one typo, updating a single field), every unchanged field is sent to the LLM again, even though the output would be identical to the previous run.
2. **Repeated translation of identical strings.** Paragraph-heavy entities and Layout Builder pages often contain the same short string in multiple components (shared CTA labels, navigation titles, standard headings). Each occurrence is translated independently, even though a single translation would suffice.
On a large multilingual site both effects can compound significantly: avoidable token spend and rate-limit pressure accumulate across every translation run.
This issue is related to #3585527 (batch multiple fields per request), which addresses per-field redundancy within a single run.
**Caching** and **deduplication** address redundancy across runs and across identical strings within a run. Both optimizations are complementary and independent.
---
## Proposed resolution
Add an opt-in cache and deduplication layer inside the translation pipeline, between field extraction and the LLM call. No changes to existing interfaces or behaviour when the feature is not enabled.
**Deduplication (within a single run):**
- After extracting all field texts for an entity, hash each unique string (SHA-256).
- Group field keys by hash; send each unique string to the LLM only once.
- Map the result back to every field that shared that string.
**Caching (across runs):**
- Before any LLM call, check a Drupal cache backend keyed on `ai_translate:src_lang:tgt_lang:sha256`.
- On a cache hit, return the cached translation immediately.
- On a cache miss, translate and write the result to the cache.
- Use Drupal's standard cache tag and lifetime infrastructure; no special management needed.
**Effect:**
- Re-translating an entity whose content has not changed costs 0 LLM requests.
- A Layout Builder page with 10 identical CTA labels triggers 1 LLM call instead of 10.
- Cache is invalidated automatically by Drupal's cache lifecycle.
**Implementation approach we have working:** We built this in a custom module during a production project. The core logic lives in a `translateMetadata()` method that intercepts the batch of extracted field items, runs the hash/cache lookup pipeline, calls the bulk or single-field translator for misses only, and maps results back. We are happy to share the implementation as a starting point or patch.
---
## Remaining tasks
- [ ] Decide on cache backend (injectable; default `cache.default`)
- [ ] Decide on opt-in surface (config flag vs. always-on)
- [ ] Align implementation with `TextTranslator` / `TextTranslatorInterface` conventions
- [ ] Tests: cache hit, cache miss, deduplication within a run, cross-run invalidation
- [ ] phpcs (Drupal + DrupalPractice) clean; GitLab CI green
- [ ] Maintainer review
---
## Questions for maintainers
1. Is there interest in receiving this as a contribution?
2. Preferred scope: always-on optimisation or opt-in config flag?
3. Preferred cache backend: `cache.default`, a dedicated `cache.ai_translate` bin, or injectable?
## User interface changes
None. This is a transparent performance optimisation.
## API changes
Additive only. A new optional cache layer inserted into the translation pipeline. `TextTranslator::translateContent()` and all existing interfaces remain unchanged.
issue