Do popular provider api test on tagging
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3575615. --> Reported by: [marcus_johansson](https://www.drupal.org/user/385947) Related to !1262 >>> <p>[Tracker]<br> <strong>Update Summary: </strong>[One-line status update for stakeholders]<br> <strong>Short Description: </strong>[One-line issue summary for stakeholders]<br> <strong>Check-in Date: </strong>MM/DD/YYYY<br> <em>Metadata is used by the <a href="https://www.drupalstarforge.ai/" title="AI Tracker">AI Tracker.</a> Docs and additional fields <a href="https://www.drupalstarforge.ai/ai-dashboard/docs" title="AI Issue Tracker Documentation">here</a>.</em><br> [/Tracker]</p> <h3 id="summary-problem-motivation">Problem/Motivation</h3> <p>We have had a regression lately in <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3572765" title="Status: Closed (fixed)">#3572765: Regression: setting a parameter empty class breaks certain models in LiteLLM</a></span> due to a change that made other providers start working, see <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai/issues/3567784" title="Status: Closed (fixed)">#3567784: Tools Function Input should give back an empty json schema skeleton</a></span>. This made two things clear, that was unclear before:</p> <p>1. Providers are highly sensitive to deviations in input format and its not standardized even with the OpenAI chat completion model.<br> 2. LiteLLM is not fully abstracting the models, since the initial change made the models work for Mistral on LiteLLM, but instead broke AWS on LiteLLM.</p> <p>What we should do, since its quite a chore is to have a way in gitlab to test most common chat requests against most common apis and given that api-keys/settings exists locally also expose a way to test it locally.</p> <p>Since this could be quite expensive and someone have to provide keys, we will have to think about the best way of doing this. The following is the list of the most popular providers over 20 usages:</p> <table> <thead> <tr> <th>provider</th> <th>total_usage</th> </tr> </thead> <tbody> <tr> <td>ai_provider_openai</td> <td>8742</td> </tr> <tr> <td>ai_provider_anthropic</td> <td>5437</td> </tr> <tr> <td>ai_provider_amazeeio</td> <td>2810</td> </tr> <tr> <td>gemini_provider</td> <td>705</td> </tr> <tr> <td>ai_provider_litellm</td> <td>552</td> </tr> <tr> <td>ai_provider_azure</td> <td>371</td> </tr> <tr> <td>ai_provider_ollama</td> <td>244</td> </tr> <tr> <td>ai_provider_mistral</td> <td>125</td> </tr> <tr> <td>ai_provider_aws_bedrock</td> <td>123</td> </tr> <tr> <td>ai_provider_deepl</td> <td>107</td> </tr> <tr> <td>ai_provider_deepseek</td> <td>96</td> </tr> <tr> <td>ai_provider_dxpr</td> <td>84</td> </tr> <tr> <td>ai_provider_huggingface</td> <td>75</td> </tr> <tr> <td>ai_provider_groq</td> <td>67</td> </tr> <tr> <td>elevenlabs</td> <td>66</td> </tr> <tr> <td>ai_provider_google_vertex</td> <td>60</td> </tr> <tr> <td>ai_provider_perplexity</td> <td>37</td> </tr> <tr> <td>ai_provider_lmstudio</td> <td>35</td> </tr> <tr> <td>ai_provider_openrouter</td> <td>29</td> </tr> <tr> <td>ai_provider_x</td> <td>21</td> </tr> </tbody> </table> <p>I would argue that the ones over 100 are important to test against + elevenlabs for its unique usage and google vertex, since its one of the three enterprise engines (with AWS and Azure). I'll also add Fireworks because I have a free key there.</p> <p>Not all of these providers have all the capabilities for chat and some are for other operation types. My suggestions would that we run for the capabilities according to this document but via code: <a href="https://project.pages.drupalcode.org/ai/1.2.x/developers/testing_an_ai_provider/">https://project.pages.drupalcode.org/ai/1.2.x/developers/testing_an_ai_provider/</a></p> <p>For the providers we need to have API keys that can be saved in Gitlab, what is known so far:</p> <table> <tr> <th>Provider</th> <th>Mean of connecting</th> <th>Notes</th> </tr> <tr> <td>ai_provider_openai</td> <td>? Free tier will not work</td> <td></td> </tr> <tr> <td>ai_provider_anthropic</td> <td>? Free tier will not work</td> <td></td> </tr> <tr> <td>ai_provider_amazeeio</td> <td>Talk to Dan or Matthew</td> <td>Should test multiple models</td> </tr> <tr> <td>gemini_provider</td> <td>? Free tier will not work</td> <td></td> </tr> <tr> <td>ai_provider_litellm</td> <td>If we have keys its <a href="https://www.litellm.ai/#pricing">free for open source</a>.</td> <td>Test other models then AmazeeIO like OpenAI</td> </tr> <tr> <td>ai_provider_azure</td> <td>Marcus can provider on demand</td> <td></td> </tr> <tr> <td>ai_provider_ollama</td> <td>Check if we can provide it <a href="https://sebastianpdw.medium.com/serverless-llm-inference-with-ollama-29596ba5dd4e">serverless</a></td> <td></td> </tr> <tr> <td>ai_provider_mistral</td> <td>Free tier should work</td> <td></td> </tr> <tr> <td>ai_provider_aws_bedrock</td> <td>? On demand account should work</td> <td></td> </tr> <tr> <td>ai_provider_deepl</td> <td>Free account should work</td> <td></td> </tr> <tr> <td>ai_provider_google_vertex</td> <td>? On demand account should work</td> <td></td> </tr> <tr> <td>elevenlabs</td> <td>Marcus has key</td> <td></td> </tr> <tr> <td>fireworks</td> <td>Marcus has key</td> <td></td> </tr> </table> <h3 id="summary-proposed-resolution">Proposed resolution</h3> <ul> <li>Figure out how we can get keys.</li> <li>Implement kernel tests for each of these providers we can run and add a group to them, with the actual setup as it works in the module.</li> <li>Setup gitlab to only run this on tagging.</li> </ul> <h3 id="summary-remaining-tasks">Remaining tasks</h3> <h3>Optional: Other details as applicable (e.g., User interface changes, API changes, Data model changes)</h3> <h3 id="summary-ai-usage">AI usage (if applicable)</h3> <p>[ ] AI Assisted Issue<br> This issue was generated with AI assistance, but was reviewed and refined by the creator.</p> <p>[ ] AI Assisted Code<br> This code was mainly generated by a human, with AI autocompleting or parts AI generated, but under full human supervision.</p> <p>[ ] AI Generated Code<br> This code was mainly generated by an AI with human guidance, and reviewed, tested, and refined by a human.</p> <p>[ ] Vibe Coded<br> This code was generated by an AI and has only been functionally tested.</p>
issue