Create ai_recipe_audio_transcription: transcribe + summarize audio media via cron
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3585306. --> Reported by: [marcus_johansson](https://www.drupal.org/user/385947) >>> <p>[Tracker]<br> <strong>Update Summary: </strong>[One-line status update for stakeholders]<br> <strong>Short Description: </strong>Ship a new drupal.org recipe project, ai_recipe_audio_transcription, that adds transcription and summary long_text fields to the audio media type and populates them via AI Automators on cron.<br> <strong>Check-in Date: </strong>MM/DD/YYYY<br> [/Tracker]</p> <h3 id="summary-problem-motivation">Problem/Motivation</h3> <p>We already ship <a href="https://www.drupal.org/project/ai_recipe_image_classification">ai_recipe_image_classification</a> as a reference recipe for AI-powered media enrichment. It installs the core image media type, adds an image description + tags field, and wires AI Automators against <code>chat_with_image_vision</code>. The equivalent story for audio media is missing and is a common ask: take an uploaded audio file, run it through a speech-to-text provider, and write the transcription to the media entity so it becomes searchable and usable in views, plus a short summary so editors can triage quickly.</p> <p>This should ship as a separate contrib recipe on drupal.org under the <code>ai_recipe_</code> prefix (name suggestion: <code>ai_recipe_audio_transcription</code>), following the same shape and dependency conventions as <code>ai_recipe_image_classification</code>, so maintainers can keep the two evolving in parallel.</p> <h3 id="summary-proposed-resolution">Proposed resolution</h3> <p>Create a new drupal.org project <code>ai_recipe_audio_transcription</code> that, when applied to a site with the <code>ai</code> module and a speech-to-text provider configured, produces the behaviour below.</p> <ul> <li>Base the recipe on Drupal CMS's media recipe when it already provides an audio media type. If Drupal CMS's media recipe does not include audio media, have <code>ai_recipe_audio_transcription</code> depend on the relevant core/contrib audio-media recipe (or install audio media itself) so the recipe can be applied standalone too. Decide this with the maintainers - preference is to detect and reuse what's already there, fall back to installing otherwise.</li> <li>Enable the required modules (mirror <code>ai_recipe_image_classification</code>'s install list): <code>ai</code>, <code>ai_automators</code>, plus whatever the audio media type itself needs.</li> <li>Verify via <code>ai.settings.verifySetupAi</code> that the <code>speech_to_text</code> operation type has a default model configured, and that the <code>chat</code> operation type has one too (needed for the summary step). Fail the recipe apply clearly if either is missing.</li> <li>Attach two new long_text fields to the audio media bundle: <code>field_audio_transcription</code> (long text, the full transcribed body) and <code>field_audio_summary</code> (long text, a 2-3 sentence editor-friendly summary).</li> <li>Configure the first AI Automator against <code>field_audio_transcription</code> using the <code>speech_to_text</code> operation type with the default model. Source: the audio media's file field. Trigger: cron (not on save) so the editor doesn't wait for a long transcription round-trip.</li> <li>Configure the second AI Automator against <code>field_audio_summary</code> using the <code>chat</code> operation type. Source: <code>field_audio_transcription</code>. Prompt it for a 2-3 sentence summary. Trigger: cron, and ordered after the transcription automator so it never summarises stale/empty text. Skip the summary when the transcription is empty.</li> <li>Add the two new fields to the media <code>views.view.media</code> (and <code>views.view.media_library</code> across all its displays) as exposed filters, matching <code>ai_recipe_image_classification</code>'s pattern. Site-builders should be able to search audio media by transcription text and by summary from the standard media listing.</li> <li>Ship a README that documents: required modules, the speech-to-text and chat provider prerequisites, how cron scheduling interacts with the two automators, and how to trigger a one-off backfill via the VBO actions.</li> </ul> <p><strong>Open questions:</strong></p> <ul> <li>Does Drupal CMS's media recipe currently include an audio media type, or do we always need to install one? The answer determines whether the recipe depends on Drupal CMS's media recipe or ships its own audio-media dependency.</li> </ul> <h3 id="summary-ai-usage">AI usage (if applicable)</h3> <p>[x] AI Assisted Issue<br> This issue was generated with AI assistance, but was reviewed and refined by the creator.</p> <p>[ ] AI Assisted Code</p> <p>[ ] AI Generated Code</p> <p>[ ] Vibe Coded</p> <p>- <strong>This issue was created with the help of AI</strong></p>
issue