[Plan] AI Search 2.0.x and roadmap to stable
>>> [!note] Migrated issue
<!-- Drupal.org comment -->
<!-- Migrated from issue #3485449. -->
Reported by: [yautja_cetanu](https://www.drupal.org/user/626050)
>>>
<h2>Backports needed for AI Search 1.0.x stable release</h2>
Duplicated from the list below in _shoulds_:
<ol>
<li>[2x only] #3584015 - 2x branch is currently broken as a result of Symfony Platform integration</li>
<li>[2x only, but could be backported] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3547137" title="Status: Closed (fixed)">#3547137: Title is duplicated in chunk if present in the index as contextual content</a></span> - Fix double title</li>
<li>[2x only, needs backport to 1x] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3520606" title="Status: Closed (fixed)">#3520606: Long node titles can trigger exception 'The minimum overlap cannot be equal to or exceed the maximum chunk size.' when indexing content</a></span> - <del>Likely an edge case when chunk size is set too small to be useful perhaps (causing contextual space to be then also tiny as a percentage of total space per chunk)</del> Turns out this a more fundamental issue with strlen vs token count mixing causing the issue. Its ready for review but we should get it in before stable</li>
<li>[1x version since 2x cannot be backported] #3584014 - Ideally we get this in as well</li>
</ol>
<h2>Change records to go into AI Search 2.0.x release notes</h2>
<p><strong>Changes VDB Providers VDB Provider module maintainers must make</strong></p>
<ol>
<li>[2x only] VDB Providers should depend on <a href="https://www.drupal.org/project/ai_search">https://www.drupal.org/project/ai_search</a> instead which will require AI core 2.0.x. So:<br>
<pre> "require": {<br> ...<br> "drupal/ai_search": "^2.0",<br> ...<br> },</pre></li>
<li>[2x only] AiVdbProviderInterface.php has a new method for support for Grouping in the vector database when a search request wants single entities returned rather than chunks for improved performance. If your VDB Provider uses AiVdbProviderClientBase.php, no change is needed. See <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3526390" title="Status: Closed (fixed)">#3526390: Improve the AI Search recursive retrieval of a specific quantity of results</a></span>.</li>
<li>[2x only] If you use Milvus already and want to take advantage of the Grouping, you will need to fully drop your Collection (Milvus does not support modifications to the collection schema). It will be auto-recreated by AI Search if you resave your Search API Server, after which you can reindex your content into it. This essentially creates a new drupal_entity_id field at the top level (previously its only stored in the meta data field. The Grouping feature checks (and caches) if the new field exists otherwise carries on as it used to no harm.</li>
<li>[1x covers this] SearchApiAiSearchBackend now sets $query->setOption('search_api_ai_excluded_entity_ids') to help with recursive retrieval scenarios. If your VDB Provider can support that, check the $query if that is set and add it as a filter to your vector search to improve performance. See <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3526390" title="Status: Closed (fixed)">#3526390: Improve the AI Search recursive retrieval of a specific quantity of results</a></span>.</li>
<li>[2x only] AiVdbProviderClientBase constructor arguments have changed, removing key.repository and event_dispatcher services. They should be added back in to your specific provider if you need them (e.g. see Milvus and Pinecone VDB Providers). See <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3539214" title="Status: Closed (fixed)">#3539214: Remove unused dependencies from \Drupal\ai\Base\AiVdbProviderClientBase </a></span>.</li>
<li>[2x only] SearchApiAiVdbProviderBase extends AiVdbProviderClientBase for VDB Providers that want to use AI Search's Search API integration. Its arguments have similarly changed, so if you override the constructor your VDB Provider will need to be updated</li>
<li>[2x only] If you manually create EmbeddingsInput in any custom or contrib module, consider using the new optional cacheability option from <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3552522" title="Status: Closed (fixed)">#3552522: Cache embedding lookups</a></span>. For user entered content where the content may be repeated, it avoids extra embeddings calls and therefore can save money. E.g. useful for hybrid search where a user enters the same keywords (or in fact doesn't change keywords) but updates exposed filters. Comparatively for indexing, caching is probably not desired as the embeddings are likely of unique content rather than repeated content.</li>
</ol>
<p><strong>Changes VDB Providers VDB Provider module maintainers should consider</strong></p>
<ol>
<li>[2x only] ::getConfig() method is now optional for VDB Provider Plugins. The base class now just calls ::isSetup(). It has therefore been removed from the interface. See <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3539212" title="Status: Closed (fixed)">#3539212: Remove the required \Drupal\ai\AiVdbProviderInterface::getConfig() and use ::isSetup() instead in base class</a></span>.</li>
</ol>
<p><strong>General new features</strong></p>
<ol>
<li>[1x covers this] Searches can be done by pre-supplied vector input. Useful e.g. for similarity search where you want to find nodes similar to an already vectorised node. See <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3489566" title="Status: Closed (fixed)">#3489566: Support vector search by user supplied vectors in AI Search</a></span>.</li>
</ol>
<h2 id="summary-problem-motivation">Problem/Motivation</h2>
<p>[2x only] With AI Search coming into Drupal-CMS it would be good to bring AI Search out of experimental so they can be used a lot more. Below will include a list of features and links to issues that I think need to change before AI Search can really be used at scale meaning that we should have less API breaking changes.</p>
<p>[2x only] The current plan is to move to AI Search stable for 1.0.x first. 2.0.x will come at a later date as it requires coordinated releases with the VDB Provider maintainers (which in most cases are AI module maintainers as well).</p>
<p>[Note] It should generally be possible for site builders to upgrade from AI Search 1x to 2x, the problem lies with the coordinated releases of modules depending on AI Search (VDB Providers, AI Related Content, AI Search Block, etc) to adapt to the breaking changes described above.</p>
<h3 id="summary-proposed-resolution">Proposed resolution</h3>
<p><strong>Must:</strong></p>
<ul>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3486166" title="Status: Closed (fixed)">#3486166: AI Search: Pass metric type to vector search function</a></span> - Because its an interface change</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai_vdb_provider_milvus/issues/3501690" title="Status: Closed (fixed)">#3501690: Re-indexing on node save adds new chunks instead of replacing existing chunks</a></span> - Because it damages the index on node save.</li>
<li>[1x covers this as best it can without the grouping noted above] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3526390" title="Status: Closed (fixed)">#3526390: Improve the AI Search recursive retrieval of a specific quantity of results</a></span> - Because currently its hard to reliably get a desired quantity of results.</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3538476" title="Status: Closed (fixed)">#3538476: Change filter typehint from string to mixed</a></span> - Breaking change, incorrect types</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3489566" title="Status: Closed (fixed)">#3489566: Support vector search by user supplied vectors in AI Search</a></span> - I've come around to thinking similarity search is quite essential - we do it quite inefficiently if we regenerate embeddings to retrieve such results</li>
<li>[2x only, but could be backported] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3547137" title="Status: Closed (fixed)">#3547137: Title is duplicated in chunk if present in the index as contextual content</a></span> - Fix double title</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3546567" title="Status: Closed (fixed)">#3546567: AI Search Database & SOLR boost plugins are missing schema</a></span> - We shouldn't have broken schema for stable release</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3540885" title="Status: Closed (fixed)">#3540885: Create an 'embedding' object that can be validated</a></span> - Breaking change needs coordinated release</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3539212" title="Status: Closed (fixed)">#3539212: Remove the required \Drupal\ai\AiVdbProviderInterface::getConfig() and use ::isSetup() instead in base class</a></span> - Breaking change - don't actually need to deprecate if we are not merging until 2.0.x and leaving experimental then though I think</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3539029" title="Status: Closed (fixed)">#3539029: Use SearchApiAiSearchBackend::getClient() and remove duplicate logic</a></span> - Should be simple enough to get in, might as well change while we do other AI Search changes</li>
<li>[2x only, needs backport to 1x] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3520606" title="Status: Closed (fixed)">#3520606: Long node titles can trigger exception 'The minimum overlap cannot be equal to or exceed the maximum chunk size.' when indexing content</a></span> - <del>Likely an edge case when chunk size is set too small to be useful perhaps (causing contextual space to be then also tiny as a percentage of total space per chunk)</del> Turns out this a more fundamental issue with strlen vs token count mixing causing the issue. Its ready for review but we should get it in before stable</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3539214" title="Status: Closed (fixed)">#3539214: Remove unused dependencies from \Drupal\ai\Base\AiVdbProviderClientBase </a></span> - Because its a breaking change so needs coordinated release</li>
</ul>
<p><strong>Should:</strong></p>
<ul>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-3"><a href="https://www.drupal.org/project/ai/issues/3491445" title="Status: Closed (duplicate)">#3491445: Add search api integration test</a></span> - Not sure how far we can take coverage though but the more the better</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3491446" title="Status: Closed (fixed)">#3491446: Solr 'boost' of results should find results that are not found by traditional Solr search</a></span> - To check</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3491447" title="Status: Closed (fixed)">#3491447: Improve 'Boost' plugin descriptions to better reflect that they both boost existing result relevance + find results that otherwise would not have been found</a></span> - Decided not to change the name but instead add more description</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3548193" title="Status: Closed (fixed)">#3548193: Fix ValueError when metric is not defined in AI Search NewServerEventSubscriber</a></span> - I think this should get in but needs review</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link drupalorg-gitlab-link-wrapper"><a href="https://git.drupalcode.org/project/ai_search/-/work_items/3549911" class="drupalorg-gitlab-link">https://git.drupalcode.org/project/ai_search/-/work_items/3549911</a></span> - If there are access controlled content indexed in Milvus, desired number of results may not get returned</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3536096" title="Status: Closed (fixed)">#3536096: AiVdbProviderClientBase calls undefined method deleteFromCollection</a></span> - Bugfix for deleting items</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3477973" title="Status: Closed (fixed)">#3477973: Dynamically load Tokenizer after selecting Embedding Engine</a></span> - Makes the tokenizer more reliable. It's an AI Core interface change, but I expect all Providers extend the base class, so maybe not an issue, hence the should</li>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3552522" title="Status: Closed (fixed)">#3552522: Cache embedding lookups</a></span> - This would be a huge win for reducing token usage</li>
</ul>
<p><strong>Could:</strong></p>
<ul>
<li>[2x only] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3487487" title="Status: Closed (fixed)">#3487487: Improve AI Search Module Indexing to Handle Long-Running Chunk Embedding Processes</a></span></li>
<li>[wont fix] <span class="drupalorg-gitlab-issue-link drupalorg-gitlab-link-wrapper"><a href="https://git.drupalcode.org/project/ai_search/-/work_items/3539213" class="drupalorg-gitlab-link">https://git.drupalcode.org/project/ai_search/-/work_items/3539213</a></span> - This to me would be a significantly worse breaking change because it would be hard to have update hooks for. Other breaking changes are more just needing VDB Provider coordinated releases</li>
<li><span class="drupalorg-gitlab-issue-link drupalorg-gitlab-link-wrapper"><a href="https://git.drupalcode.org/project/ai_search/-/work_items/3549349" class="drupalorg-gitlab-link">https://git.drupalcode.org/project/ai_search/-/work_items/3549349</a></span> - Maybe an edge case but in a testing tool (in the explorer) so definitely not a blocker</li>
<li>[1x covers this] <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-13"><a href="https://www.drupal.org/project/ai/issues/3525914" title="Status: Needs work">#3525914: Clearly explain AI Search + AI Agents + AI Assistants combination setup</a></span> - Documentation shouldn't be a 'should' it should be a 'must' IMO. Needs review from those who try this with fresh eyes</li>
</ul>
> Related issue: [Issue #3491316](https://www.drupal.org/node/3491316)
> Related issue: [Issue #3485451](https://www.drupal.org/node/3485451)
issue