Draft: Symfony AI Store integration with AI (no changes to ai_search)

Description

Integration Plan: symfony/ai-store into the Drupal AI Module

Date: 2026-05-27
Scope: Core ai module — ai_search submodule changes are out of scope for this document but are referenced as a downstream consumer.


1. Executive Summary

The Drupal ai module currently ships its own VDB-provider abstraction (AiVdbProviderInterface) and embedding pipeline that are tightly coupled to Drupal's plugin system and the Search API. The symfony/ai-store library (already present in vendor/symfony/ai-store) provides a backend-agnostic, pipeline-oriented abstraction for vectorization, document indexing, and retrieval, together with 25+ ready-made store adapters (Qdrant, Weaviate, Milvus, pgvector, Pinecone, OpenSearch, …).

The goal of this plan is to:

  1. Bridge Drupal's existing embedding and VDB abstractions into symfony's interfaces — zero breaking changes for existing providers and consumers.
  2. Centralize vectorization, indexing, and querying behind three new Drupal services so all submodules and contrib modules share one pipeline.
  3. Enable any symfony/ai-store bridge package (e.g. symfony/qdrant-store) to be plugged in as a Drupal VDB backend with minimal glue code.

2. Current Architecture

2.1 AI Provider / Embedding System

Component Path Role
AiProviderInterface src/AiProviderInterface.php All AI providers implement this; exposes embeddings()
EmbeddingsInterface src/OperationType/Embeddings/EmbeddingsInterface.php Operation type interface: embeddings(input, model): EmbeddingsOutput
EmbeddingsInput src/OperationType/Embeddings/EmbeddingsInput.php DTO wrapping the text (or image) to embed
EmbeddingsOutput src/OperationType/Embeddings/EmbeddingsOutput.php DTO with getNormalized(): array (float vector)
AiProviderPluginManager src/AiProviderPluginManager.php Drupal plugin manager — discovers providers via #[AiProvider] attribute
EmbeddingsTrait src/Traits/OperationType/EmbeddingsTrait.php Default maxEmbeddingsInput() / embeddingsVectorSize() for providers

2.2 Vector Database System

Component Path Role
AiVdbProviderInterface src/AiVdbProviderInterface.php All VDB providers implement this
AiVdbProviderClientBase src/Base/AiVdbProviderClientBase.php Abstract base class; also implements Search API integration
AiVdbProviderPluginManager src/AiVdbProviderPluginManager.php Discovers VDB plugins via #[AiVdbProvider] attribute
VdbSimilarityMetrics src/Enum/VdbSimilarityMetrics.php CosineSimilarity, EuclideanDistance, InnerProduct

Registered VDB backends: azure_ai_search, milvus, pinecone, postgres (pgvector), sqlite.

2.3 Key AiVdbProviderInterface Operations

Collection management : getCollections, createCollection, dropCollection
Data write            : insertIntoCollection, deleteFromCollection, deleteItems, deleteAllItems
Query                 : querySearch(…), vectorSearch(…, QueryInterface $query, …)
Utility               : getVdbIds, getRawEmbeddingFieldName, ping, isSetup

2.4 Chunking / Tokenization

  • ai.text_chunkersrc/Service/TextChunker.php
  • ai.tokenizersrc/Utility/Tokenizer.php
  • Used currently by ai_search submodule embedding strategies; no centralized indexing service exists in the core ai module.

3. symfony/ai-store Overview

3.1 Core Abstractions

Interface Role
StoreInterface add(VectorDocument[]), remove(id[]), query(QueryInterface), supports()
ManagedStoreInterface setup(), drop() — optional collection lifecycle
IndexerInterface index($input, $options) — high-level entry point
RetrieverInterface retrieve(string $query, $options): iterable<VectorDocument>
VectorizerInterface `vectorize(string
TransformerInterface Mutate a stream of documents (chunking, cleaning, summarizing)
FilterInterface Remove documents from a stream before indexing
LoaderInterface Load documents from a source (file, URL, CSV, RSS, …)

3.2 Core Value Objects

Class Key Fields
TextDocument id, content, Metadata
VectorDocument id, Vector, Metadata, score
Metadata ArrayObject subclass; reserved keys _text, _parent_id, _source, _title, _summary
Vector Float array wrapper

3.3 Indexing Pipeline

DocumentProcessor: Filters → Transformers → Vectorize (batched) → Store::add()
DocumentIndexer  : accepts EmbeddableDocument(s), delegates to DocumentProcessor
SourceIndexer    : accepts source path/URL, uses LoaderInterface, delegates to DocumentProcessor

3.4 Retrieval Pipeline

Retriever:
  1. Dispatch PreQueryEvent (allow query expansion / modification)
  2. Build query:  TextQuery | VectorQuery | HybridQuery  (based on store capabilities)
  3. StoreInterface::query()
  4. Dispatch PostQueryEvent (allow reranking / filtering)
  5. Yield VectorDocuments

3.5 Available Query Types

Class Description
VectorQuery Wraps a Vector for similarity search
TextQuery Wraps a string (or string array, OR logic) for full-text search
HybridQuery Combines Vector + text with configurable semanticRatio (0–1)

3.6 Available Bridge Packages (installable via Composer)

Qdrant, Weaviate, Milvus, Pinecone, ChromaDB, Elasticsearch, OpenSearch, Meilisearch, ManticoreSearch, pgvector, SQLite, MongoDB, Redis, Supabase, SurrealDB, Neo4j, ClickHouse, MariaDB, Azure AI Search, AWS S3 Vectors, Cloudflare Vectorize, Typesense, Vektor.


4. Gap Analysis and Integration Challenges

4.1 Vectorizer: Different Platform Abstractions

symfony's Vectorizer uses symfony/ai-platform's PlatformInterface::invoke(), while the Drupal AI module uses AiProviderInterface::embeddings(). The two are incompatible at the API level.

Resolution: A DrupalAiVectorizerAdapter bridges Drupal providers → VectorizerInterface without touching the symfony Platform stack.

4.2 Store: The Search API Coupling

AiVdbProviderInterface::vectorSearch() requires a \Drupal\search_api\Query\QueryInterface parameter. This is the main obstacle to wrapping existing VDB providers as StoreInterface.

// Current signature — requires Search API
public function vectorSearch(
    string $collection_name,
    array $vector_input,
    array $output_fields,
    \Drupal\search_api\Query\QueryInterface $query,  // <-- problem
    string $filters = '',
    int $limit = 10, int $offset = 0, string $database = 'default',
): array;

Resolution: Introduce a new optional interface AiVdbDirectSearchInterface (see §5.2) with a vectorSearchDirect() method that has no Search API dependency. DrupalVdbStoreAdapter uses it when available, and falls back gracefully otherwise. AiVdbProviderClientBase gains a default implementation that delegates to vectorSearch() via a minimal no-op query — keeping all existing providers working immediately.

4.3 Collection / Namespace Management

StoreInterface has no concept of collections or namespaces. symfony stores are typically scoped to a single collection at construction time. AiVdbProviderInterface manages multiple named collections.

Resolution: DrupalVdbStoreAdapter is constructed with a fixed $collectionName + $database. Collection creation/management remains on AiVdbProviderInterface and is performed by the factory service before handing out the adapter.

4.4 ID Mapping

Drupal entity IDs are composite strings (e.g. entity:node/1:en). symfony's VectorDocument expects int|string IDs. Chunked documents add a suffix (e.g. entity:node/1:en#chunk-2).

Resolution: The DrupalEntityDocumentMapper (§5.5) encodes/decodes Drupal IDs into strings accepted by symfony, and writes the original Drupal ID into Metadata::_source.

4.5 Metadata Conventions

Drupal VDB providers store ad-hoc metadata arrays; symfony uses Metadata with typed accessors and reserved underscore-prefixed keys.

Resolution: The mapper flattens Drupal metadata into Metadata, using _source for the Drupal entity ID, _title for the entity label, and custom namespace keys (drupal_entity_type, drupal_bundle, etc.) for Drupal-specific fields.

4.6 Batch Vectorization

Drupal AI providers are called one item at a time. symfony Vectorizer uses Capability::INPUT_MULTIPLE to optionally batch. The adapter defaults to sequential calls and will support batching once providers expose a batch embeddings method.

4.7 Dependency Declaration

symfony/ai-store is currently in vendor/ but not in composer.json of the ai module.

Resolution: Add it to require (see §6.1).


5. Proposed Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Consumers (contrib / custom)                   │
│  ai_search  │  ai_automators  │  ai_chatbot  │  custom modules   │
└──────┬──────────────┬───────────────┬──────────────┬────────────┘
       │              │               │              │
┌──────▼──────────────▼───────────────▼──────────────▼────────────┐
│                 Centralized Drupal Services  (new)                │
│   ai.indexing_pipeline   ai.retrieval_pipeline   ai.vectorization│
└──────┬──────────────────────────────────┬────────────────────────┘
       │                                  │
┌──────▼──────────────┐     ┌─────────────▼──────────────────────┐
│  Bridge Layer (new) │     │  Bridge Layer (new)                 │
│                     │     │                                     │
│ DrupalAiVectorizer  │     │ DrupalVdbStoreAdapter               │
│ Adapter             │     │ (StoreInterface wrapping            │
│ (VectorizerInterface│     │  AiVdbProviderInterface)            │
│  wrapping Drupal    │     │                                     │
│  AI providers)      │     │ SymfonyStoreBridgePlugin            │
└──────┬──────────────┘     │ (AiVdbProviderInterface wrapping    │
       │                    │  any symfony StoreInterface)        │
       │                    └─────────────┬───────────────────────┘
       │                                  │
┌──────▼──────────────┐     ┌─────────────▼──────────────────────┐
│ Existing Drupal     │     │ Existing Drupal VDB Providers       │
│ AI Providers        │     │ (milvus, pinecone, pgvector, …)     │
│ (openai, ollama, …) │     │                                     │
└─────────────────────┘     │ OR symfony Store Bridges            │
                            │ (qdrant-store, weaviate-store, …)   │
                            └─────────────────────────────────────┘

5.1 DrupalAiVectorizerAdapter

Path: src/Bridge/SymfonyAiStore/DrupalAiVectorizerAdapter.php
Implements: Symfony\AI\Store\Document\VectorizerInterface

Responsibilities:

  • Accepts a Drupal AiProviderInterface instance (resolved from AiProviderPluginManager), a provider ID, and a model ID.
  • For a string|Stringable input: wraps in EmbeddingsInput, calls $provider->embeddings(), returns a Vector from EmbeddingsOutput::getNormalized().
  • For an EmbeddableDocumentInterface input: does the above, wraps result as VectorDocument with the document's ID and metadata.
  • For an array input: iterates and processes each element (sequential; batch support is a future enhancement).
  • Optionally implements \Psr\Log\LoggerAwareInterface for observability.
// Sketch — not final code
final class DrupalAiVectorizerAdapter implements VectorizerInterface, LoggerAwareInterface {
    public function __construct(
        private readonly AiProviderPluginManager $providerManager,
        private readonly string $providerId,
        private readonly string $modelId,
        private LoggerInterface $logger = new NullLogger(),
    ) {}

    public function vectorize(string|\Stringable|EmbeddableDocumentInterface|array $values, array $options = []): Vector|VectorDocument|array {
        if (is_array($values)) {
            return array_map(fn($v) => $this->vectorizeSingle($v, $options), $values);
        }
        return $this->vectorizeSingle($values, $options);
    }

    private function vectorizeSingle(mixed $value, array $options): Vector|VectorDocument {
        $text = $value instanceof EmbeddableDocumentInterface ? (string) $value->getContent() : (string) $value;
        $provider = $this->providerManager->createInstance($this->providerId);
        $output = $provider->embeddings(new EmbeddingsInput($text), $this->modelId);
        $vector = new Vector($output->getNormalized());

        return $value instanceof EmbeddableDocumentInterface
            ? new VectorDocument($value->getId(), $vector, $value->getMetadata())
            : $vector;
    }
}

5.2 New Optional Interface: AiVdbDirectSearchInterface

Path: src/AiVdbDirectSearchInterface.php

Decouples vector search from Search API. Existing providers are not required to implement this — it is entirely additive.

interface AiVdbDirectSearchInterface {
    /**
     * Perform a vector similarity search without Search API dependency.
     *
     * @param array $vector_input   Float array (embedding vector)
     * @param array $output_fields  Fields to return in results
     * @param array $filters        Simple key=>value filter map
     * @return array                Matching records with optional score field
     */
    public function vectorSearchDirect(
        string $collection_name,
        array $vector_input,
        array $output_fields,
        array $filters = [],
        int $limit = 10,
        int $offset = 0,
        string $database = 'default',
    ): array;
}

AiVdbProviderClientBase gains a default implementation of vectorSearchDirect() that:

  1. Converts the $filters array into the string format prepareFilters() already accepts.
  2. Calls $this->vectorSearch(…, $query = new NullSearchApiQuery()) — a minimal internal stub that produces empty conditions.

This means all existing providers get vectorSearchDirect() for free without any changes to their own code.

5.3 DrupalVdbStoreAdapter

Path: src/Bridge/SymfonyAiStore/DrupalVdbStoreAdapter.php
Implements: Symfony\AI\Store\StoreInterface, Symfony\AI\Store\ManagedStoreInterface

Wraps one AiVdbProviderInterface instance scoped to a single collection.

StoreInterface method Maps to
add(VectorDocument[]) insertIntoCollection($collectionName, $data) — converts VectorDocument to the flat array format providers expect
remove(string[]) deleteFromCollection($collectionName, $ids)
query(VectorQuery) vectorSearchDirect() if provider implements AiVdbDirectSearchInterface, else vectorSearch() with null stub query
query(TextQuery) querySearch() with filter built from text
query(HybridQuery) Executes both vector and text paths, merges results using Reciprocal Rank Fusion (mirrors symfony's CombinedStore)
supports($class) Returns true for VectorQuery; returns true for TextQuery and HybridQuery only if provider's querySearch() is meaningful
setup($options) createCollection() — reads dimension and metric_type from $options
drop($options) dropCollection()

Key design notes:

  • The adapter is collection-scoped: one adapter instance = one collection. The factory service creates adapters for the configured collection.
  • Metadata conversion: VectorDocument::getMetadata() is stored as a JSON-encoded extra field; on retrieval it is decoded back.
  • Score: raw distance returned by VDB providers is mapped to VectorDocument::withScore().

5.4 SymfonyStoreBridgePlugin

Path: src/Bridge/SymfonyAiStore/SymfonyStoreBridgePlugin.php
Extends: AiVdbProviderClientBase
Purpose: Base class for Drupal VDB provider plugins that delegate to a symfony StoreInterface.

This enables contrib/custom modules to ship thin Drupal plugins for any symfony store bridge package (e.g. a QdrantVdbProvider plugin that wraps symfony/qdrant-store) with minimal boilerplate.

Responsibilities:

  • Implements the full AiVdbProviderInterface contract.
  • Stores the wrapped StoreInterface instance (injected or constructed in create()).
  • Maps collection management to ManagedStoreInterface::setup()/drop() if available, else no-op.
  • Maps insertIntoCollection()StoreInterface::add().
  • Maps deleteFromCollection()StoreInterface::remove().
  • Maps vectorSearch() / vectorSearchDirect()StoreInterface::query(VectorQuery).
  • Implements AiVdbDirectSearchInterface natively.

Concrete bridge plugins extend this class, supply the symfony store instance, and add their own settings form (host, API key, etc.).

5.5 DrupalEntityDocumentMapper

Path: src/Mapper/DrupalEntityDocumentMapper.php

Converts Drupal entities and Search API items to/from TextDocument/VectorDocument.

Entity → TextDocument:
  id        = "<entity_type>/<entity_id>/<langcode>"
  content   = concatenated text fields (configurable)
  metadata:
    _source        = "<entity_type>/<entity_id>/<langcode>"
    _title         = entity label
    drupal_entity_type  = "node"
    drupal_bundle       = "article"
    drupal_langcode     = "en"
    drupal_url          = absolute URL

VectorDocument → result array:
  id         = parsed back to entity_type/entity_id/langcode
  score      = similarity score
  metadata   = passed through

Chunked documents append #chunk-N to the ID and set Metadata::KEY_PARENT_ID.

5.6 Centralized Drupal Services

Three new services replace scattered per-module implementations.

ai.vectorization_serviceVectorizationService

Path: src/Service/VectorizationService.php

- getVectorizer(string $provider_id, string $model_id): VectorizerInterface
- vectorize(string $text, string $provider_id, string $model_id): array   // float[]
- vectorizeBatch(array $texts, string $provider_id, string $model_id): array  // float[][]

Internally:

  • Creates (or caches) a DrupalAiVectorizerAdapter per provider+model combination.
  • Dispatches Drupal events before/after vectorization for observability (ai_observability submodule integration).
  • Cache results using cache.ai backend with a hash of the input text.

ai.indexing_pipelineIndexingPipelineService

Path: src/Service/IndexingPipelineService.php

- indexEntity(EntityInterface $entity, array $config): void
- indexDocuments(iterable $documents, array $config): void
- indexFromSource(string $source, array $config): void

Internally:

  • Builds a symfony DocumentProcessor from config:
    • Vectorizer: DrupalAiVectorizerAdapter (from ai.vectorization_service)
    • Store: DrupalVdbStoreAdapter (wrapping the configured VDB provider)
    • Transformers: at minimum a TextSplitTransformer (replaces current ai.text_chunker for this pipeline)
    • Filters: pluggable via Drupal tagged services
  • For entity input, uses DrupalEntityDocumentMapper to produce TextDocument instances.
  • Ensures the collection exists before indexing (via AiVdbProviderPluginManager::ensureCollectionExists()).

ai.retrieval_pipelineRetrievalPipelineService

Path: src/Service/RetrievalPipelineService.php

- retrieve(string $query, array $config, int $limit = 10): array<VectorDocument>
- retrieveAsEntities(string $query, array $config, int $limit = 10): array<EntityInterface>

Internally:

  • Builds a symfony Retriever with:
    • VectorizerInterface: DrupalAiVectorizerAdapter
    • StoreInterface: DrupalVdbStoreAdapter
  • Dispatches PreQueryEvent / PostQueryEvent — contrib modules can register rerankers or query expanders via Drupal's event system.
  • Supports hybrid queries when the underlying store (via DrupalVdbStoreAdapter) reports supports(HybridQuery::class).

ai.symfony_store_factorySymfonyStoreFactory

Path: src/Service/SymfonyStoreFactory.php

- createStore(string $vdb_provider_id, array $collection_config): StoreInterface

Looks up the named VDB provider plugin, constructs a DrupalVdbStoreAdapter wrapping it, ensures collection exists, returns the adapter. Used internally by IndexingPipelineService and RetrievalPipelineService.


6. Implementation Plan

Phase 1 — Dependency and Scaffold (no functional changes)

  1. Add symfony/ai-store to composer.json of the ai module (require section, ^0.9).
  2. Create directory structure:
    web/modules/contrib/ai/src/Bridge/SymfonyAiStore/
    web/modules/contrib/ai/src/Mapper/
    web/modules/contrib/ai/src/Service/         (already exists partially)
  3. Verify autoloading — the module already uses PSR-4 under Drupal\ai; new subdirectories are picked up automatically.

Phase 2 — New Optional Interface AiVdbDirectSearchInterface

  1. Create src/AiVdbDirectSearchInterface.php.
  2. Add default implementation of vectorSearchDirect() to AiVdbProviderClientBase:
    • Converts array $filters to a string using the same logic as prepareFilters().
    • Creates a minimal no-op internal query stub (private inner class, no public API).
    • Calls $this->vectorSearch(…) with that stub.
  3. Mark existing VDB provider plugins as implementing AiVdbDirectSearchInterface progressively (optional, no deadline).

Backward compatibility: Zero changes to AiVdbProviderInterface. Existing providers continue to work exactly as before.

Phase 3 — DrupalAiVectorizerAdapter

  1. Create src/Bridge/SymfonyAiStore/DrupalAiVectorizerAdapter.php.
  2. Write unit tests in tests/src/Unit/Bridge/DrupalAiVectorizerAdapterTest.php using existing mock provider infrastructure.
  3. Register as a factory in ai.services.yml:
    ai.drupal_vectorizer_adapter_factory:
      class: Drupal\ai\Bridge\SymfonyAiStore\DrupalAiVectorizerAdapterFactory
      arguments: ['@ai.provider']

Phase 4 — DrupalVdbStoreAdapter

  1. Create src/Bridge/SymfonyAiStore/DrupalVdbStoreAdapter.php.
  2. Implement add(), remove(), query(), supports(), setup(), drop().
  3. Write integration tests against the InMemory VDB provider (test infrastructure already exists).
  4. Register factory:
    ai.symfony_store_factory:
      class: Drupal\ai\Service\SymfonyStoreFactory
      arguments: ['@ai.vdb_provider']

Phase 5 — SymfonyStoreBridgePlugin Base Class

  1. Create src/Bridge/SymfonyAiStore/SymfonyStoreBridgePlugin.php extending AiVdbProviderClientBase.
  2. Add documentation and a code example showing how to write a concrete plugin for a symfony bridge package (e.g. qdrant).
  3. No service registration — this is a base class for other plugins.

Phase 6 — DrupalEntityDocumentMapper

  1. Create src/Mapper/DrupalEntityDocumentMapper.php.
  2. Define a DrupalEntityDocumentMapperInterface so it can be swapped or decorated.
  3. Register as a service:
    ai.entity_document_mapper:
      class: Drupal\ai\Mapper\DrupalEntityDocumentMapper
      arguments: ['@entity_type.manager', '@url_generator']

Phase 7 — Centralized Services

  1. Create VectorizationService, IndexingPipelineService, RetrievalPipelineService.
  2. Inject all dependencies via constructor; use lazy loading where appropriate.
  3. Add service definitions to ai.services.yml (see §7).
  4. Dispatch Drupal events at key pipeline steps to integrate with ai_observability and ai_logging.

Phase 8 — Documentation and Migration Guide

  1. Add docblock-level usage examples to each new class.
  2. Write a developer-facing guide in docs/symfony-ai-store.md explaining how to:
    • Use ai.indexing_pipeline to index Drupal entities.
    • Use ai.retrieval_pipeline to retrieve by semantic similarity.
    • Write a new VDB backend using SymfonyStoreBridgePlugin.
  3. Note which ai_search functionality can be migrated to the new centralized services in a future minor release.

7. Service Definitions (additions to ai.services.yml)

services:

  # Factory for DrupalAiVectorizerAdapter instances
  ai.drupal_vectorizer_adapter_factory:
    class: Drupal\ai\Bridge\SymfonyAiStore\DrupalAiVectorizerAdapterFactory
    arguments:
      - '@ai.provider'
      - '@logger.channel.ai'

  # Factory for DrupalVdbStoreAdapter instances
  ai.symfony_store_factory:
    class: Drupal\ai\Service\SymfonyStoreFactory
    arguments:
      - '@ai.vdb_provider'
      - '@logger.channel.ai'

  # Entity ↔ TextDocument mapping
  ai.entity_document_mapper:
    class: Drupal\ai\Mapper\DrupalEntityDocumentMapper
    arguments:
      - '@entity_type.manager'
      - '@url_generator'

  # Centralized vectorization
  ai.vectorization_service:
    class: Drupal\ai\Service\VectorizationService
    arguments:
      - '@ai.drupal_vectorizer_adapter_factory'
      - '@cache.ai'
      - '@event_dispatcher'
      - '@logger.channel.ai'

  # Centralized indexing pipeline
  ai.indexing_pipeline:
    class: Drupal\ai\Service\IndexingPipelineService
    arguments:
      - '@ai.vectorization_service'
      - '@ai.symfony_store_factory'
      - '@ai.entity_document_mapper'
      - '@ai.vdb_provider'
      - '@event_dispatcher'
      - '@logger.channel.ai'

  # Centralized retrieval pipeline
  ai.retrieval_pipeline:
    class: Drupal\ai\Service\RetrievalPipelineService
    arguments:
      - '@ai.vectorization_service'
      - '@ai.symfony_store_factory'
      - '@ai.entity_document_mapper'
      - '@event_dispatcher'
      - '@logger.channel.ai'

8. New File Structure

web/modules/contrib/ai/
├── composer.json                               # + symfony/ai-store ^0.9
├── ai.services.yml                             # + new service definitions (§7)
└── src/
    ├── AiVdbDirectSearchInterface.php          # NEW — optional, Search-API-free vector search
    ├── Base/
    │   └── AiVdbProviderClientBase.php         # MODIFIED — adds default vectorSearchDirect()
    ├── Bridge/
    │   └── SymfonyAiStore/
    │       ├── DrupalAiVectorizerAdapter.php        # NEW — VectorizerInterface wrapping Drupal AI provider
    │       ├── DrupalAiVectorizerAdapterFactory.php # NEW — factory service
    │       ├── DrupalVdbStoreAdapter.php            # NEW — StoreInterface wrapping AiVdbProviderInterface
    │       └── SymfonyStoreBridgePlugin.php         # NEW — base class for symfony-store-backed VDB plugins
    ├── Mapper/
    │   ├── DrupalEntityDocumentMapperInterface.php  # NEW — contract for entity-to-document mapping
    │   └── DrupalEntityDocumentMapper.php           # NEW — default implementation
    └── Service/
        ├── SymfonyStoreFactory.php                  # NEW — creates scoped DrupalVdbStoreAdapter instances
        ├── VectorizationService.php                 # NEW — centralized vectorization with caching
        ├── IndexingPipelineService.php              # NEW — centralized indexing using DocumentProcessor
        └── RetrievalPipelineService.php             # NEW — centralized retrieval using Retriever

Files marked MODIFIED receive only additive changes (new method on the base class). All existing files remain unchanged.


9. Backward Compatibility Guarantees

Artifact Status
AiVdbProviderInterface Unchanged — no added required methods
AiProviderInterface Unchanged
EmbeddingsInterface, EmbeddingsInput, EmbeddingsOutput Unchanged
AiVdbProviderPluginManager Unchanged
AiVdbProviderClientBase Additive only — one new method with a default implementation
Existing VDB provider plugins No changes required
ai_search submodule No changes required — continues to use the existing path
Custom modules using AiVdbProviderInterface No changes required

New interfaces (AiVdbDirectSearchInterface, DrupalEntityDocumentMapperInterface) are opt-in. No existing code is forced to implement them.


10. Risk Register

Risk Likelihood Impact Mitigation
symfony/ai-store API changes (marked experimental) Medium Medium Pin to ^0.9; wrap behind Drupal interfaces so symfony changes are isolated to bridge classes
Search API QueryInterface unavailable at runtime Low Medium The vectorSearchDirect() path explicitly avoids it; Search API stays a require-dev dependency
Performance regression from sequential vectorization Low Low Add batch path in DrupalAiVectorizerAdapter once providers expose batch embeddings; cache layer in VectorizationService covers repeated requests
ID collisions in chunked documents Low Medium Use deterministic suffix scheme (<id>#chunk-<index>) and document it; mapper validates uniqueness
Metadata overflow in VDB providers that impose field-size limits Low Low Mapper truncates oversized metadata fields with a configurable max length

11. Open Questions

  1. Should ai.indexing_pipeline replace ai_search's indexing path immediately, or coexist?
    Recommendation: coexist in the initial release. Mark the ai_search path as "will migrate to ai.indexing_pipeline in a future minor" in a @deprecated notice.

  2. Should SymfonyStoreBridgePlugin ship in the core ai module or a separate ai_symfony_store submodule?
    A submodule keeps the core module lean and avoids forcing symfony store adapters on all installations. Recommended: modules/ai_symfony_store/.

  3. Event naming for indexing/retrieval hooks — should they mirror symfony's PreQueryEvent/PostQueryEvent or follow the Drupal AI module's existing event naming conventions?
    Recommendation: follow Drupal conventions for Drupal-dispatched events, but document the symfony event counterparts for bridge consumers.

  4. How should the ai.indexing_pipeline handle re-indexing (update vs insert)?
    VDB providers differ; recommend exposing an $upsert = true option that calls deleteFromCollection() before insertIntoCollection() when the document already exists, matching current ai_search behavior.

  5. Caching strategy for VectorizationService — cache by content hash is correct for static content, but dynamic content (e.g. user-generated text) should bypass the cache. Expose a $cacheable flag in the API.

AI Compliance

Check the one that best describes your usage, or leave all unchecked if AI was not significantly used.

  • AI Assisted Code
    Mainly written by a human; AI used for autocomplete or partial generation under full human supervision.

  • AI Generated Code
    Mainly generated by AI, reviewed and approved by a human before this MR was created.

  • Vibe Coded
    Generated by AI and only functionally reviewed before this MR was created.

Edited by Artem Dmitriiev

Merge request reports

Loading