[META] Context source plugin feature (context from PDF/MD/Word/URL/etc)
>>> [!note] Migrated issue <!-- Drupal.org comment --> <!-- Migrated from issue #3569310. --> Reported by: [kristen pol](https://www.drupal.org/user/8389) Related to !103 >>> <p>[Tracker]<br> <strong>Update Summary: </strong>[One-line status update for stakeholders]<br> <strong>Check-in Date: </strong>MM/DD/YYYY<br> <strong>Blocked by: </strong>[#XXXXXX] (New issues on new lines)<br> <strong>Additional Collaborators: </strong> @username1, @username2<br> <em>Metadata is used by the <a href="https://www.drupalstarforge.ai/" title="AI Tracker">AI Tracker.</a> Docs and additional fields <a href="https://www.drupalstarforge.ai/ai-dashboard/docs" title="AI Issue Tracker Documentation">here</a>.</em><br> [/Tracker]</p> <h3 id="summary-problem-motivation">Problem/Motivation</h3> <p>We want a generic mechanism for adding context source plugins, such as loading context from a:</p> <ul> <li>PDF</li> <li>MD file</li> <li>Word doc</li> <li>Google doc</li> <li>Single webpage (URL)</li> </ul> <p>And converting the info to markdown, and inserting it into the ai_context_item content entity content field.</p> <p>Note that the source will provide the initial information, but the user can edit it using the ai_context_item content entity form as they do for manually-entered context.</p> <p>For now, we can assume external documents are publicly accessible, but a future feature would allow private documents, so would need access control handling.</p> <p>We also need a way to resync with the source, which will wipe out any changes the user has made. This is more relevant for external sources, such as a web page with brand guidelines.</p> <h3 id="summary-proposed-resolution">Proposed resolution</h3> <p>See discussions in:</p> <p><span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai_context/issues/3547034" title="Status: Closed (fixed)">#3547034: [Spike] Research URL support for CCC</a></span><br> <span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai_context/issues/3547035" title="Status: Closed (fixed)">#3547035: [Spike] Research PDF upload support for CCC</a></span></p> <p>We will use the new <a href="https://www.drupal.org/project/document_loader">Document Loader</a> system.</p> <p>We might use the <a href="https://www.drupal.org/project/ai_file_to_text">AI File to Text </a> module which:</p> <ul> <li>Extracts content from Word (.docx, .doc), OpenDocument (.odt, .ods), PDF, CSV, TXT, and Markdown (.md) files.</li> <li>Three output formats: plain text, styled HTML, or Markdown.</li> </ul> <h3 id="summary-remaining-tasks">Target date or deadline</h3> <p>April 2026</p> <h3 id="summary-remaining-tasks">Remaining tasks</h3> <ul> <li>Design workflow</li> <li>Create plugin architecture</li> <li>Implement key plugins</li> <li>See additional child issues</li> </ul> <h3 id="summary-ai-usage">AI usage (if applicable)</h3> <p>[ ] AI Assisted Issue<br> This issue was generated with AI assistance, but was reviewed and refined by the creator.</p> <p>[ ] AI Assisted Code<br> This code was mainly generated by a human, with AI autocompleting or parts AI generated, but under full human supervision.</p> <p>[ ] AI Generated Code<br> This code was mainly generated by an AI with human guidance, and reviewed, tested, and refined by a human.</p> <p>[ ] Vibe Coded<br> This code was generated by an AI and has only been functionally tested.</p> > Related issue: [Issue #3580850](https://www.drupal.org/node/3580850)
issue