[Meta] AI Image and Media Agents Track
>>> [!note] Migrated issue
<!-- Drupal.org comment -->
<!-- Migrated from issue #3541825. -->
Reported by: [yautja_cetanu](https://www.drupal.org/user/626050)
>>>
<h2>Track Summary</h2>
<p>This issue outlines the scope, technical specifications, and user stories for the <strong>AI Image and Media Agents Track</strong>, designed to integrate AI capabilities into Drupal's media module and ecosystem.</p>
<ul>
<li><strong>Description:</strong> Empower the Media library with AI to make it easier to find specific media, improve the creation of metadata about images that specifically take advantage of what AI Search can offer. It may include tools for transforming images directly in Drupal.</li>
<li><strong>Workstream:</strong> 1A: Smart Content/Page Creation - Allows AI Agents to find images in the library and insert them into landing pages, bringing landing pages to life.</li>
<li><strong>Business Value:</strong> Provides a collection of tools addressing many issues people have with manipulating images and finding them.</li>
<li><strong>Lead:</strong> [TBD]</li>
</ul>
<h3>Introduction and Scenarios</h3>
<p>The Drupal AI Media Track will enhance the media module by allowing users to interact with their media library through natural language and AI-powered tools. This feature will utilize Drupal AI agents and sub-agents to interpret user commands and translate them into specific media operations.</p>
<h4>Sample scenarios include:</h4>
<ul>
<li>An event organizer wants to bring in speaker headshots into their event landing page.</li>
<li>A non-profit organization wants their landing page to find stock images in their library that communicate the feel of their new initiative.</li>
<li>A content creator needs help taking their existing image in the library and fitting it into a specific landing page including having good focus, size, etc. at all different responsive points.</li>
</ul>
<h3>Scope</h3>
<p>The project's scope covers:</p>
<ul>
<li>Development of the Drupal AI Media Finder Agent for finding specific media assets. This can be used as a chatbot, within the XB AI Agent workflow or directly through a UI.</li>
<li>Integration with the existing Drupal AI Agent framework.</li>
<li>Identification and conversion of key parameters from natural language input (e.g., what filters should be used for search based on the goal and intention of the user).</li>
<li>Programmatic creation and configuration of media entities based on context and metadata to enable more powerful AI Search.</li>
<li>AI-powered image transformation and optimization tools.</li>
<li>Potential Media dashboard that can use AI to perform a variety of transforms and operations to existing media.</li>
</ul>
<h3>Technical Specifications</h3>
<p>The Drupal AI Image and Media Agents Track will process natural language instructions using an LLM to derive specific media operations and search parameters.</p>
<h4>Example instruction:</h4>
<p>"Find a landscape image from our summer campaign that would work well as a hero banner for our environmental initiative page."</p>
<p>Extracted parameters from this instruction would include: TODO (Needs to work with Media search's parameters)</p>
<ul>
<li>Image type: <code>landscape</code></li>
<li>Time period: <code>summer campaign</code></li>
<li>Use case: <code>hero banner</code></li>
<li>Context: <code>environmental initiative</code></li>
<li>Aspect ratio: <code>wide/banner format</code></li>
</ul>
<p>These parameters will then be passed to function call plugins to find, update, or transform the media entity.</p>
<h3>Priority - User Stories</h3>
<p>The user stories for covering the AI Media Track capabilities:</p>
<table>
<thead>
<tr>
<th>User Story</th>
<th>Description</th>
<th>Component/Feature</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>As a Site Administrator, I want the AI Media Agent to be available as a tool for other AI agents, so that automated workflows can find and use appropriate media.</strong></td>
<td>This focuses on exposing media search capabilities as an API/tool for other agents.</td>
<td>Media Agent API/Tool</td>
</tr>
<tr>
<td><strong>As a Content Editor, I want to search for images using natural language descriptions, so that I can find relevant media without knowing exact filenames.</strong></td>
<td>This involves the core functionality of AI-powered search, interpreting conceptual descriptions and matching them to media library items.</td>
<td>AI Search Media Library</td>
</tr>
<tr>
<td><strong>As a Developer, I want the AI Media Agent to capture visual screenshots of pages, so that these can be used for visual review and comparison.</strong></td>
<td>This enables visual regression testing and page review capabilities.</td>
<td>Visual Review Agents</td>
</tr>
<tr>
<td><strong>As a Site Administrator, I want AI to automatically generate detailed metadata for uploaded images, so that they can be found more easily through search.</strong></td>
<td>This covers AI analysis of images to create rich, searchable metadata including descriptions, contexts, and use cases.</td>
<td>AI Metadata Generation</td>
</tr>
<tr>
<td><strong>As a Content Editor, I want to override media metadata (like alt-text) for specific page contexts, so that the same image can have different descriptions based on usage.</strong></td>
<td>This enables context-specific metadata overrides without changing the base media entity.</td>
<td>Media Metadata Overrides</td>
</tr>
<tr>
<td><strong>As a Site Administrator, I want AI to extract contextual metadata from the pages where images are used, so that search results are more relevant.</strong></td>
<td>This involves analyzing the context around image usage to enhance searchability.</td>
<td>Contextual Metadata Extraction</td>
</tr>
<tr>
<td><strong>As a Developer, I want comprehensive documentation for the AI Media Agent API, so that I can extend its functionality or integrate it with other modules.</strong></td>
<td>This addresses the need for clear technical documentation and a robust API.</td>
<td>Documentation</td>
</tr>
</tbody>
</table>
<h3>Priority - Issues and Epics</h3>
<h4>Pre-requisites</h4>
<ul>
<li><a href="https://www.drupal.org/project/ai_initiative/issues/3545342">AI Media Discovery: Store and retrieve extended media data in vector database</a></li>
<li><a href="https://www.drupal.org/project/ai_initiative/issues/3545343">AI Media Discovery: Investigate JavaScript scanners as a method of extracting information from images</a></li>
<li><a href="https://www.drupal.org/project/ai_initiative/issues/3545344">AI Media Discovery: Investigate JavaScript scanners as a method of extracting contextual information about media from web pages</a></li>
</ul>
<h4>Phase 1: Core Search and Discovery for XB AI (Essential)</h4>
<ul>
<li><strong><span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-1"><a href="https://www.drupal.org/project/canvas/issues/3541872" title="Status: Active">#3541872: Provide the AI Search Media Library as a tool/Agent</a></span></strong> - Expose AI search of media library as a tool that agents can use to create chatbots that can find media or other forms of automation.
<ul>
<li><strong>Problem:</strong> For AI Agents - Many problems that involve using AI to help build something will benefit from AI finding good images.</li>
<li><strong>Workstream:</strong> AI Agents can provide images to landing pages from the end-users own library, making pages feel more relevant instead of relying on placeholder images (1A: Smart Content/Page Creation)</li>
</ul>
</li>
<li><strong><span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-1"><a href="https://www.drupal.org/project/canvas/issues/3541873" title="Status: Active">#3541873: AI Search enabled Media Library</a></span></strong> - Enable the use of vector databases and/or agentic search to find images in the media library.
<ul>
<li><strong>Problem:</strong> For Content Editors - Often when adding an image you know conceptually what you want but Drupal's media library only allows searching by name, making it difficult to find images (e.g., a night time landscape picture).</li>
<li><strong>Workstream:</strong> We need to make it so humans can find things effectively before providing this to agents for automation (1A: Smart Content/Page Creation)</li>
</ul>
</li>
<li><strong>Visual Review Agents</strong> - Have the ability to scan pages and take images/screenshots that can be saved and used elsewhere in a process or provided to AI to help AI improve the images/page layouts.
<ul>
<li><strong>Problem:</strong> For AI Agents - Currently when a user asks AI to produce something visual, AI will regularly get things wrong the first time. Humans have to keep going back to the AI to improve upon its results.</li>
<li><strong>Workstream:</strong> Important for all XB related agents to make pages look much better first time round (1A: Smart Content/Page Creation)</li>
</ul>
</li>
</ul>
<h4>Phase 2: Improved Semantic Search of Media (Nice to Have)</h4>
<ul>
<li><strong>Media Metadata Overrides</strong> - Create the ability for metadata such as alt-text on a media entity to be overridden when used in a specific field/page.
<ul>
<li><strong>Problem:</strong> For Content Editors - Alt-text is created for the media entity and can't be changed per usage, but context-specific alt-text may be needed for different pages.</li>
<li><strong>Workstream:</strong> Analytics agents may need metadata on images for accessibility or search to change on specific pages (3: Performance Intelligence)</li>
</ul>
</li>
<li><strong>AI Relevant and Generated Detailed Search Metadata</strong> - Create agents that can generate unstructured, lengthy and detailed metadata (use-cases, descriptions) that humans can edit and search can index.
<ul>
<li><strong>Problem:</strong> For AI Agents - AI Agents benefit from more context about images that humans just know. RAG search needs relevant detailed metadata to find good semantic matches.</li>
<li><strong>Workstream:</strong> AI Agents will provide more relevant images to the landing pages (1A: Smart Content/Page Creation)</li>
</ul>
</li>
<li><strong>AI Generated Contextual Image Metadata</strong> - Using AI to generate AI relevant metadata about the image from the content around where it was first uploaded.
<ul>
<li><strong>Problem:</strong> For Content Editors - People struggle to find information as search needs contextual information not on the media entity but on the page where it was first used (e.g., a headshot on a teams page).</li>
<li><strong>Workstream:</strong> AI Agents will provide more relevant images to the landing pages (1B: Improvements - Smart Content/Page Creation)</li>
</ul>
</li>
</ul>
<h3>Future Considerations</h3>
<p>Future enhancements may include:</p>
<ul>
<li>Integration with external DAM systems</li>
<li>Advanced AI image generation capabilities</li>
<li>Video and multimedia support</li>
<li>Multi-language metadata support</li>
<li>Batch processing capabilities for existing media libraries</li>
<li>Integration with third-party AI services (Midjourney, DALL-E, etc.)</li>
</ul>
<h3>Risks</h3>
<p>Potential risks identified are:</p>
<ul>
<li>LLM inaccuracies in metadata generation</li>
<li>Complexity of where functionality should reside (AI module, Media Module, or separate module)</li>
<li>Performance overhead for large media libraries</li>
<li>Security vulnerabilities from natural language interpretation</li>
<li>Instability and breaking changes due to ongoing Drupal AI module development</li>
<li>Prompt injection from uploaded images/videos with hidden text</li>
<li>Copyright and licensing concerns with AI-generated or modified content - For now, AI media generation is not in scope and instead finding images and inserting them is.</li>
</ul>
<h3>Deadlines</h3>
<ul>
<li>Phase 1 Completion: [TBD]</li>
<li>Phase 2 Completion: [TBD]</li>
<li>Initial Beta Release: [TBD]</li>
<li>Production Ready: [TBD]</li>
</ul>
<h3>Next Steps</h3>
<p>The next steps involve:</p>
<ol>
<li>Review and approval of this document</li>
<li>Prioritization of Phase 1 epics</li>
<li>Assignment of development resources</li>
<li>Creation of individual issues for each epic</li>
<li>Establishment of testing and feedback processes</li>
<li>Integration planning with Experience Builder team</li>
</ol>
<h3>References</h3>
<p>Related documentation:</p>
<ul>
<li>Drupal AI Strategy Document: <a href="https://new.drupal.org/assets/2025-06/Drupal-AI-Strategy-June-25_0.pdf">https://new.drupal.org/assets/2025-06/Drupal-AI-Strategy-June-25_0.pdf</a></li>
<li>AI Views Agent Track (reference implementation)</li>
<li>Experience Builder Integration Guidelines</li>
</ul>
<h3>Future Phases</h3>
<p>The user stories for covering the AI Media Track capabilities:</p>
<table>
<thead>
<tr>
<th>User Story</th>
<th>Description</th>
<th>Component/Feature</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>As a Content Editor, I want AI to automatically detect focal points in images, so that crops and responsive versions maintain the most important parts of the image.</strong></td>
<td>This involves ML/AI analysis to identify the visual focus of images for better automatic cropping.</td>
<td>AI Focal Point Analysis</td>
</tr>
<tr>
<td><strong>As a Content Editor, I want to use AI-powered image transformation tools (crop, resize, optimize), so that I can prepare images for specific uses without external software.</strong></td>
<td>This covers integration with AI image manipulation services for in-Drupal image editing.</td>
<td>AI Image Transformation</td>
</tr>
<tr>
<td><strong>As a Content Editor, I want an AI assistant to guide me through image optimization options, so that I can make the best choices for my specific use case.</strong></td>
<td>This involves a chatbot interface that helps users understand and apply available image tools.</td>
<td>Image Augmentation Assistant</td>
</tr>
<tr>
<td><strong>As a Content Editor, I want AI to suggest responsive image styles based on my layout, so that images are optimized for all device sizes automatically.</strong></td>
<td>This covers automatic calculation and application of appropriate image styles for responsive designs.</td>
<td>Responsive Image Optimization</td>
</tr>
</tbody>
</table>
<h4>Phase 3: Basic AI Image Manipulation (Aspirational)</h4>
<ul>
<li><strong>Image Augmentation AI Assistant + Agent</strong> - A chatbot assistant to help users understand available tools and provide ideas based on image purpose.
<ul>
<li><strong>Problem:</strong> For Content Editors/Agents - Users need help understanding what is possible with many different tools available.</li>
<li><strong>Workstream:</strong> Enables agents to change not just page content and layout but also the images themselves (3: Performance Intelligence)</li>
</ul>
</li>
<li><strong>Responsive Image Styles and Resizing Agents</strong> - AI-assisted calculation and application of appropriate image styles for responsive designs.
<ul>
<li><strong>Problem:</strong> For Content Editors - Currently requires complicated math to figure out image sizing without distortion across multiple columns and responsive breakpoints.</li>
<li><strong>Workstream:</strong> Makes AI generated landing pages better use the image library across all design versions (3: Performance Intelligence)</li>
</ul>
</li>
<li><strong>AI Image Crops</strong> - Using AI to optimize and intelligently crop images based on focal points and use case.
<ul>
<li><strong>Problem:</strong> For Content Editors/Agents - Image crops combined with focal point analysis could make images much nicer and more effective. People pre-crop images because Drupal tools are complicated.</li>
<li><strong>Workstream:</strong> Analytics agents may perform these operations to achieve more effective landing pages (3: Performance Intelligence)</li>
</ul>
</li>
<li><strong>AI powered Focal Point Analysis</strong> - Create a tool that uses AI or ML to understand the likely focal point of an image for better cropping and resizing.
<ul>
<li><strong>Problem:</strong> For AI Agents - Many tools benefit from knowing focal points; resizing and cropping without this knowledge could crop out the useful section.</li>
<li><strong>Workstream:</strong> Makes other AI augmentation features more effective (3: Performance Intelligence)</li>
</ul>
</li>
<li><strong>Space for Image Augmentation with AI</strong> - Create a unified interface for AI-powered image manipulation tools within Drupal.
<ul>
<li><strong>Problem:</strong> For Content Editors - AI provides many new tools but there needs to be an easy to use consistent space to use these tools.</li>
<li><strong>Workstream:</strong> We need to first provide tools to humans to ensure they work before automating them (3: Performance Intelligence)</li>
</ul>
</li>
</ul>
<h4>Phase 4 - Advanced AI Image Manipulation (or never)</h4>
<ul>
<li><strong>AI Image Generation Tools</strong> - Integration with specialized image generation tools like Midjourney for creating images from scratch.
<ul>
<li><strong>Problem:</strong> For Content Editors/Agents - There may be situations where generating images from scratch is appropriate, especially for full automation with human in the loop.</li>
<li><strong>Workstream:</strong> Could support automated page generation when appropriate images don't exist (3: Performance Intelligence)</li>
</ul>
</li>
<li><strong>AI Image Transformation Suite (DreamStudio Integration)</strong> - Integration with AI augmentation tools including inpaint, outpaint, background removal/replacement, recolor, style transfer, upscale, and variations.
<ul>
<li><strong>Problem:</strong> For Content Editors/Agents - Many different image manipulation problems need solving with advanced AI tools.</li>
<li><strong>Workstream:</strong> Analytics agents may perform these operations to achieve more effective landing pages (Currently: NONE - For future consideration)</li>
</ul>
</li>
</ul>
> Related issue: [Issue #3530701](https://www.drupal.org/node/3530701)
> Related issue: [Issue #3541873](https://www.drupal.org/node/3541873)
> Related issue: [Issue #3541872](https://www.drupal.org/node/3541872)
> Related issue: [Issue #3541812](https://www.drupal.org/node/3541812)
issue