[META] Context source plugin feature (context from PDF/MD/Word/URL/etc)
>>> [!note] Migrated issue
<!-- Drupal.org comment -->
<!-- Migrated from issue #3569310. -->
Reported by: [kristen pol](https://www.drupal.org/user/8389)
Related to !103
>>>
<p>[Tracker]<br>
<strong>Update Summary: </strong>[One-line status update for stakeholders]<br>
<strong>Check-in Date: </strong>MM/DD/YYYY<br>
<strong>Blocked by: </strong>[#XXXXXX] (New issues on new lines)<br>
<strong>Additional Collaborators: </strong> @username1, @username2<br>
<em>Metadata is used by the <a href="https://www.drupalstarforge.ai/" title="AI Tracker">AI Tracker.</a> Docs and additional fields <a href="https://www.drupalstarforge.ai/ai-dashboard/docs" title="AI Issue Tracker Documentation">here</a>.</em><br>
[/Tracker]</p>
<h3 id="summary-problem-motivation">Problem/Motivation</h3>
<p>We want a generic mechanism for adding context source plugins, such as loading context from a:</p>
<ul>
<li>PDF</li>
<li>MD file</li>
<li>Word doc</li>
<li>Google doc</li>
<li>Single webpage (URL)</li>
</ul>
<p>And converting the info to markdown, and inserting it into the ai_context_item content entity content field.</p>
<p>Note that the source will provide the initial information, but the user can edit it using the ai_context_item content entity form as they do for manually-entered context.</p>
<p>For now, we can assume external documents are publicly accessible, but a future feature would allow private documents, so would need access control handling.</p>
<p>We also need a way to resync with the source, which will wipe out any changes the user has made. This is more relevant for external sources, such as a web page with brand guidelines.</p>
<h3 id="summary-proposed-resolution">Proposed resolution</h3>
<p>See discussions in:</p>
<p><span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai_context/issues/3547034" title="Status: Closed (fixed)">#3547034: [Spike] Research URL support for CCC</a></span><br>
<span class="drupalorg-gitlab-issue-link project-issue-status-info project-issue-status-7"><a href="https://www.drupal.org/project/ai_context/issues/3547035" title="Status: Closed (fixed)">#3547035: [Spike] Research PDF upload support for CCC</a></span></p>
<p>We will use the new <a href="https://www.drupal.org/project/document_loader">Document Loader</a> system.</p>
<p>We might use the <a href="https://www.drupal.org/project/ai_file_to_text">AI File to Text </a> module which:</p>
<ul>
<li>Extracts content from Word (.docx, .doc), OpenDocument (.odt, .ods), PDF, CSV, TXT, and Markdown (.md) files.</li>
<li>Three output formats: plain text, styled HTML, or Markdown.</li>
</ul>
<h3 id="summary-remaining-tasks">Target date or deadline</h3>
<p>April 2026</p>
<h3 id="summary-remaining-tasks">Remaining tasks</h3>
<ul>
<li>Design workflow</li>
<li>Create plugin architecture</li>
<li>Implement key plugins</li>
<li>See additional child issues</li>
</ul>
<h3 id="summary-ai-usage">AI usage (if applicable)</h3>
<p>[ ] AI Assisted Issue<br>
This issue was generated with AI assistance, but was reviewed and refined by the creator.</p>
<p>[ ] AI Assisted Code<br>
This code was mainly generated by a human, with AI autocompleting or parts AI generated, but under full human supervision.</p>
<p>[ ] AI Generated Code<br>
This code was mainly generated by an AI with human guidance, and reviewed, tested, and refined by a human.</p>
<p>[ ] Vibe Coded<br>
This code was generated by an AI and has only been functionally tested.</p>
> Related issue: [Issue #3580850](https://www.drupal.org/node/3580850)
issue