Research: abstraction layer for model fine-tuning across providers (#3585694) · Issues · project / ai_initiative

Research: abstraction layer for model fine-tuning across providers

>>> [!note] Migrated issue   Reported by: [marcus_johansson](https://www.drupal.org/user/385947) >>> [Tracker] Update Summary: [One-line status update for stakeholders] Short Description: Research whether the AI module should ship a provider-agnostic fine-tuning interface now that the File API makes dataset upload possible, or whether each provider module should handle fine-tuning independently. Check-in Date: MM/DD/YYYY [/Tracker] <h3 id="summary-problem-motivation">Problem/Motivation</h3> The first version of the File API for different providers is landing in AI 1.4.x. This is the last missing primitive that was blocking fine-tuning workflows - you can now upload training datasets to a provider through the module's normalized interface. With that in place, someone interested in model fine-tuning can start building a module for it. Multiple major providers already ship fine-tuning APIs: <ul> <li><a href="https://developers.openai.com/api/docs/guides/model-optimization">OpenAI - Model Optimization</a></li> <li><a href="https://docs.fireworks.ai/fine-tuning/fine-tuning-models">Fireworks AI - Fine-tuning Models</a></li> <li><a href="https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-fine-tuning.html">AWS Bedrock - Custom Model Fine-tuning</a></li> <li><a href="https://www.together.ai/fine-tuning">Together AI - Fine-tuning</a></li> <li><a href="https://docs.runpod.io/fine-tune">RunPod - Fine-tune (to Hugging Face)</a></li> </ul> The core question is: should the AI module define a shared abstraction for fine-tuning (an interface, operation type, or plugin system that all providers implement), or should each provider module handle fine-tuning on its own with no shared contract? Getting this wrong early is expensive. If we ship a premature abstraction that doesn't fit the provider APIs well, every provider module is locked into a bad shape. If we don't abstract at all and every provider module reinvents the workflow, site builders can't swap providers without relearning a different UI and data model. The provider APIs listed above already differ significantly in how they handle training-data format (JSONL vs conversational vs instruction pairs), job lifecycle (synchronous vs async polling vs webhook), hyperparameter exposure, base-model eligibility, result-model registration, and cost/billing feedback. Whether those differences are surface-level (easily normalized) or fundamental (would produce a leaky abstraction) is the research question. <h3 id="summary-proposed-resolution">Proposed resolution</h3> This is a research task. The deliverable is a written recommendation (posted as a comment here) with enough detail that the maintainers can decide the direction before any code is written. <ul> <li>Map the fine-tuning API of each provider listed above across these dimensions: training-data format and upload mechanism, supported base models, job creation parameters and hyperparameters, job lifecycle (create/poll/cancel/list), result-model naming and how the fine-tuned model becomes available for inference, and billing/cost reporting.</li> <li>Identify the common denominator: which steps are shared by all five providers and could be normalized behind a single interface without losing meaningful functionality.</li> <li>Identify the provider-specific surface: which capabilities only one or two providers expose (e.g. Bedrock's provisioned throughput for custom models, RunPod's push-to-Hugging-Face output) and would be lost or awkwardly faked behind a shared abstraction.</li> <li>Evaluate the three possible directions: (a) a full <code>FineTuningInterface</code> in AI core that providers implement, with a shared job entity, training-set config entity, and admin UI; (b) a minimal shared contract (e.g. just the dataset format and a "create job" + "check status" method pair) with everything else left to the provider module; (c) no shared abstraction - each provider module ships its own fine-tuning surface and the AI module stays out of it.</li> <li>For each direction, evaluate: how much of the File API (now landing in 1.4.x) can be reused for dataset upload, whether the existing operation-type plugin pattern is the right shape or if fine-tuning needs a different plugin type (it's a management/lifecycle operation, not a per-request inference call), and what the minimum viable surface looks like for a site builder who just wants to "upload a JSONL and get a fine-tuned model back".</li> <li>Check whether Symfony AI or any other framework in the PHP ecosystem has already defined a fine-tuning abstraction worth aligning with.</li> <li>Recommend a direction with a concrete scope for a first implementation (which provider to target first, which subset of the API to normalize, what to leave out of v1).</li> </ul> <h3 id="summary-ai-usage">AI usage (if applicable)</h3> [x] AI Assisted Issue This issue was generated with AI assistance, but was reviewed and refined by the creator. [ ] AI Assisted Code [ ] AI Generated Code [ ] Vibe Coded - This issue was created with the help of AI

issue