AI Search Bug Fixes and Improvements
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3565210. --> Reported by: [mbatterton](https://www.drupal.org/user/1574576) Related to !16 >>> <h3 id="summary-problem-motivation">Problem/Motivation</h3> <p> During testing of AI Search 2.0.0-alpha1 with OpenAI embeddings and Milvus vector database, several critical issues were discovered:</p> <p> 1. **Word2Vec Processor Compatibility**: When complex query structures are passed to the search backend, the recursive array flattening was not properly handling<br> nested arrays, causing embeddings to be generated from empty strings.</p> <p> 2. **Null Pointer Issues**: Missing null safety checks for server instances caused potential crashes in several processor classes.</p> <p> 3. **Solr Boosting Algorithm**: The hybrid search implementation using `elevateIds` was completely replacing SOLR results rather than boosting AI results,<br> reducing search effectiveness.</p> <p> 4. **Developer Experience**: No warnings when developers configure fulltext search on embedding fields, leading to confusion about semantic vs keyword matching.</p> <p> 5. **Documentation Gaps**: Lack of troubleshooting guidance for common issues like exact phrase searches not working with vector databases.</p> <h3 id="summary-proposed-resolution">Proposed resolution</h3> <h4>1. Fix Word2Vec/Complex Query Compatibility</h4> <p> **File**: `src/Plugin/search_api/backend/SearchApiAiSearchBackend.php`</p> <p> In the `getSearchVectorInput()` method (around line 794), implement recursive array flattening that properly handles nested query structures:</p> <div class="codeblock"> <pre><span style="color: #000000"><span style="color: #0000BB">&lt;?php<br>$flatten </span><span style="color: #007700">= function(</span><span style="color: #0000BB">$array</span><span style="color: #007700">) use (&amp;</span><span style="color: #0000BB">$flatten</span><span style="color: #007700">) {<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$result </span><span style="color: #007700">= [];<br>&nbsp;&nbsp; foreach (</span><span style="color: #0000BB">$array </span><span style="color: #007700">as </span><span style="color: #0000BB">$key </span><span style="color: #007700">=&gt; </span><span style="color: #0000BB">$value</span><span style="color: #007700">) {<br>&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #FF8000">// Skip render array keys that start with #.<br>&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #007700">if (</span><span style="color: #0000BB">is_string</span><span style="color: #007700">(</span><span style="color: #0000BB">$key</span><span style="color: #007700">) &amp;&amp; </span><span style="color: #0000BB">strpos</span><span style="color: #007700">(</span><span style="color: #0000BB">$key</span><span style="color: #007700">, </span><span style="color: #DD0000">'#'</span><span style="color: #007700">) === </span><span style="color: #0000BB">0</span><span style="color: #007700">) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; continue;<br>&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp;&nbsp;&nbsp; if (</span><span style="color: #0000BB">is_array</span><span style="color: #007700">(</span><span style="color: #0000BB">$value</span><span style="color: #007700">)) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #0000BB">$result </span><span style="color: #007700">= </span><span style="color: #0000BB">array_merge</span><span style="color: #007700">(</span><span style="color: #0000BB">$result</span><span style="color: #007700">, </span><span style="color: #0000BB">$flatten</span><span style="color: #007700">(</span><span style="color: #0000BB">$value</span><span style="color: #007700">));<br>&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp;&nbsp;&nbsp; elseif (</span><span style="color: #0000BB">is_string</span><span style="color: #007700">(</span><span style="color: #0000BB">$value</span><span style="color: #007700">)) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #0000BB">$result</span><span style="color: #007700">[] = </span><span style="color: #0000BB">$value</span><span style="color: #007700">;<br>&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp; }<br>&nbsp;&nbsp; return </span><span style="color: #0000BB">$result</span><span style="color: #007700">;<br>&nbsp;};<br>&nbsp;</span><span style="color: #0000BB">$search_words </span><span style="color: #007700">= </span><span style="color: #0000BB">implode</span><span style="color: #007700">(</span><span style="color: #DD0000">' '</span><span style="color: #007700">, </span><span style="color: #0000BB">$flatten</span><span style="color: #007700">(</span><span style="color: #0000BB">$search_words</span><span style="color: #007700">));<br></span><span style="color: #0000BB">?&gt;</span></span></pre></div> <p> This ensures that complex query structures like `{"#conjunction":"OR","0":"champagne"}` are properly converted to searchable text.</p> <h4>2. Add Null Safety Checks</h4> <p> **Files**:<br> - `src/Plugin/search_api/processor/DatabaseBoostByAiSearch.php` (line 30-31)<br> - `src/Plugin/search_api/processor/ScoreThreshold.php` (line 30-31)<br> - `src/Plugin/search_api/processor/BoostByAiSearchBase.php` (line 177-179)</p> <p> Add null checks before accessing server backend:</p> <div class="codeblock"> <pre><span style="color: #000000"><span style="color: #0000BB">&lt;?php<br>&nbsp;$server </span><span style="color: #007700">= </span><span style="color: #0000BB">$index</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getServerInstance</span><span style="color: #007700">();<br>&nbsp;if (</span><span style="color: #0000BB">$server </span><span style="color: #007700">&amp;&amp; </span><span style="color: #0000BB">$server</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getBackendId</span><span style="color: #007700">() == </span><span style="color: #DD0000">'search_api_db'</span><span style="color: #007700">) {<br>&nbsp;&nbsp; return </span><span style="color: #0000BB">TRUE</span><span style="color: #007700">;<br>&nbsp;}<br>&nbsp;return </span><span style="color: #0000BB">FALSE</span><span style="color: #007700">;<br></span><span style="color: #0000BB">?&gt;</span></span></pre></div> <p> And add null safety for non-string keywords:</p> <div class="codeblock"> <pre><span style="color: #000000"><span style="color: #0000BB">&lt;?php<br>&nbsp;</span><span style="color: #007700">foreach (</span><span style="color: #0000BB">$keywords </span><span style="color: #007700">as </span><span style="color: #0000BB">$keyword</span><span style="color: #007700">) {<br>&nbsp;&nbsp; </span><span style="color: #FF8000">// Skip non-string keywords (e.g., nested arrays from complex queries).<br>&nbsp;&nbsp; </span><span style="color: #007700">if (!</span><span style="color: #0000BB">is_string</span><span style="color: #007700">(</span><span style="color: #0000BB">$keyword</span><span style="color: #007700">)) {<br>&nbsp;&nbsp;&nbsp;&nbsp; continue;<br>&nbsp;&nbsp; }<br>&nbsp;&nbsp; </span><span style="color: #FF8000">// ... rest of processing<br>&nbsp;</span><span style="color: #007700">}<br></span><span style="color: #0000BB">?&gt;</span></span></pre></div> <h4>3. Improve Solr Boost Algorithm</h4> <p> **File**: `src/Plugin/search_api/processor/SolrBoostByAiSearch.php`</p> <p> Replace the `elevateIds` approach with weighted boost queries (around line 112-126):</p> <div class="codeblock"> <pre><span style="color: #000000"><span style="color: #0000BB">&lt;?php<br>&nbsp;</span><span style="color: #FF8000">// Use boost queries to boost AI results rather than elevateIds which<br>&nbsp;// appears to replace results entirely. Build weighted boost queries based<br>&nbsp;// on AI search scores and position.<br>&nbsp;</span><span style="color: #0000BB">$boost_queries </span><span style="color: #007700">= [];<br>&nbsp;</span><span style="color: #0000BB">$position </span><span style="color: #007700">= </span><span style="color: #0000BB">count</span><span style="color: #007700">(</span><span style="color: #0000BB">$param_parts</span><span style="color: #007700">);<br>&nbsp;foreach (</span><span style="color: #0000BB">$param_parts </span><span style="color: #007700">as </span><span style="color: #0000BB">$id</span><span style="color: #007700">) {<br>&nbsp;&nbsp; </span><span style="color: #FF8000">// Boost decreases with position: first result gets highest boost.<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$boost_value </span><span style="color: #007700">= </span><span style="color: #0000BB">100 </span><span style="color: #007700">* </span><span style="color: #0000BB">$position</span><span style="color: #007700">;<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$boost_queries</span><span style="color: #007700">[] = </span><span style="color: #DD0000">'id:"' </span><span style="color: #007700">. </span><span style="color: #0000BB">$id </span><span style="color: #007700">. </span><span style="color: #DD0000">'"^' </span><span style="color: #007700">. </span><span style="color: #0000BB">$boost_value</span><span style="color: #007700">;<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$position</span><span style="color: #007700">--;<br>&nbsp;}<br><br>&nbsp;if (!empty(</span><span style="color: #0000BB">$boost_queries</span><span style="color: #007700">)) {<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$solarium_query</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">addParam</span><span style="color: #007700">(</span><span style="color: #DD0000">'bq'</span><span style="color: #007700">, </span><span style="color: #0000BB">$boost_queries</span><span style="color: #007700">);<br>&nbsp;}<br></span><span style="color: #0000BB">?&gt;</span></span></pre></div> <p> Also update the query combination logic (around line 107) to use OR:</p> <div class="codeblock"> <pre><span style="color: #000000"><span style="color: #0000BB">&lt;?php<br>&nbsp;</span><span style="color: #007700">if (</span><span style="color: #0000BB">$query </span><span style="color: #007700">= </span><span style="color: #0000BB">$solarium_query</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getQuery</span><span style="color: #007700">()) {<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$query </span><span style="color: #007700">= </span><span style="color: #0000BB">$query </span><span style="color: #007700">. </span><span style="color: #DD0000">' OR id:("' </span><span style="color: #007700">. </span><span style="color: #0000BB">implode</span><span style="color: #007700">(</span><span style="color: #DD0000">'" "'</span><span style="color: #007700">, </span><span style="color: #0000BB">$param_parts</span><span style="color: #007700">) . </span><span style="color: #DD0000">'")'</span><span style="color: #007700">;<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$solarium_query</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">setQuery</span><span style="color: #007700">(</span><span style="color: #0000BB">$query</span><span style="color: #007700">);<br>&nbsp;}<br></span><span style="color: #0000BB">?&gt;</span></span></pre></div> <h4>4. Add Developer Warnings</h4> <p> **File**: `src/Plugin/search_api/backend/SearchApiAiSearchBackend.php`</p> <p> Add warnings when fulltext search is used on embedding fields (around line 426):</p> <div class="codeblock"> <pre><span style="color: #000000"><span style="color: #0000BB">&lt;?php<br>&nbsp;</span><span style="color: #007700">if (</span><span style="color: #0000BB">$query</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getKeys</span><span style="color: #007700">()) {<br>&nbsp;&nbsp; </span><span style="color: #0000BB">$fulltext_fields </span><span style="color: #007700">= </span><span style="color: #0000BB">$query</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getOption</span><span style="color: #007700">(</span><span style="color: #DD0000">'search_api_fulltext_fields'</span><span style="color: #007700">, []);<br>&nbsp;&nbsp; if (!empty(</span><span style="color: #0000BB">$fulltext_fields</span><span style="color: #007700">)) {<br>&nbsp;&nbsp;&nbsp;&nbsp; foreach (</span><span style="color: #0000BB">$fulltext_fields </span><span style="color: #007700">as </span><span style="color: #0000BB">$field_id</span><span style="color: #007700">) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #0000BB">$field </span><span style="color: #007700">= </span><span style="color: #0000BB">$index</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getField</span><span style="color: #007700">(</span><span style="color: #0000BB">$field_id</span><span style="color: #007700">);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (</span><span style="color: #0000BB">$field </span><span style="color: #007700">&amp;&amp; </span><span style="color: #0000BB">$field</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">getType</span><span style="color: #007700">() === </span><span style="color: #DD0000">'embedding'</span><span style="color: #007700">) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #0000BB">$this</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">logger</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">warning</span><span style="color: #007700">(</span><span style="color: #DD0000">'Fulltext search is being used on embedding field "@field". This performs semantic similarity matching, not exact keyword matching.<br>&nbsp; Results may not match traditional search expectations.'</span><span style="color: #007700">, [<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #DD0000">'@field' </span><span style="color: #007700">=&gt; </span><span style="color: #0000BB">$field_id</span><span style="color: #007700">,<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ]);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="color: #0000BB">$this</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">messenger</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">addWarning</span><span style="color: #007700">(</span><span style="color: #0000BB">$this</span><span style="color: #007700">-&gt;</span><span style="color: #0000BB">t</span><span style="color: #007700">(</span><span style="color: #DD0000">'Note: Searching on AI-embedded fields performs semantic similarity matching based on meaning, not exact keyword<br>&nbsp;matching. For exact matches, use a traditional search backend or add scalar fields.'</span><span style="color: #007700">));<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; break;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp;&nbsp;&nbsp; }<br>&nbsp;&nbsp; }<br>&nbsp;}<br></span><span style="color: #0000BB">?&gt;</span></span></pre></div>
issue