Change how users select `tokenizer chat model` on AI Search / Search API server
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3472212. --> Reported by: [jackbravo](https://www.drupal.org/user/138388) >>> <h3 id="summary-problem-motivation">Problem/Motivation</h3> <p>The <strong>Tokenizer chat model</strong> select input on the AI Search Server configuration page can show models not supported by the current implementation of the <strong>Drupal\ai\Utility\Tokenizer</strong> class.</p> <h4 id="summary-steps-reproduce">Steps to reproduce</h4> <p>1. Enable an AI provider that is not OpenAI, like Ollama with any models like llama3.1, qwen2, or gemma2.<br> 2. Configure an AI Search server (with Milvus, which is only option right now)<br> 3. The `Tokenizer chat model` select input will show those options<br> 4. Configure an AI Search index<br> 5. Index, you'll get the next error:</p> <p><code>InvalidArgumentException: Unknown model name: llama3.1:latest in Yethee\Tiktoken\EncoderProvider-&gt;getForModel() (line 123 of /var/www/html/vendor/yethee/tiktoken/src/EncoderProvider.php).</code></p> <h3 id="summary-proposed-resolution">Proposed resolution</h3> <p>A couple of suggestions:</p> <ol> <li>Instead of showing enabled models (many of which may not be supported by the current <strong>Yethee\Tiktoken\EncoderProvider</strong>), show only the current supported models.</li> <li>Provide also a simple character splitting option, besides the specialized token splitting one.</li> <li>Provide a good helper text to help guide users of the module decide between the available options.</li> <li>Provide a good default value, maybe <strong>gpt-4 -&gt; cl100k_base</strong></li> </ol> <h3 id="summary-remaining-tasks">Remaining tasks</h3> <h3 id="summary-ui-changes">User interface changes</h3> <h3 id="summary-api-changes">API changes</h3> <h3 id="summary-data-model-changes">Data model changes</h3>
issue