[Meta] AI Logging/Observability
>>> [!note] Migrated issue
<!-- Drupal.org comment -->
<!-- Migrated from issue #3533109. -->
Reported by: [breidert](https://www.drupal.org/user/135619)
Related to !897
>>>
<h2>Overview</h2>
<p>This is a meta issue to address logging and observability requirements for Drupal AI.</p>
<h2>Problem / Motivation</h2>
<p>AI agentic systems, with their autonomous decision-making, tool use, and multi-step planning, present unique monitoring challenges that go beyond traditional logging. Effective observability for these systems is crucial not just for detecting errors, but for truly understanding why and how an agent behaves. This deep insight is vital for efficient troubleshooting, optimizing performance, managing costs, and ensuring responsible AI deployment.</p>
<h2>Report / Analysis</h2>
<p>To address the challenge a research was done and a report created how to move forward.</p>
<p><a href="https://docs.google.com/document/d/1TFsgOdkj56icU11E-cr-sxtvsRIq8SlDRCpgpCYyyuE/edit?tab=t.0">The report is available here</a>, comment access can be requrested.</p>
<p>This report explores the conceptual approaches to monitoring AI agents, details the essential data typically collected, provides examples of how this data can be visualized, and surveys the monitoring strategies of leading AI providers. A key industry trend is a move from fragmented logging to standardized, end-to-end observability, often powered by OpenTelemetry. The focus is on capturing granular details of workflow execution, operational performance, and critical quality and safety evaluations. This comprehensive approach is essential for continuous improvement and building trust in AI systems.</p>
<h2>Results / How to move forward</h2>
<p>Leveraging Drupal's existing OpenTelemetry integration, a robust observability strategy for Drupal AI can be built. This involves systematically using traces, spans, attributes, and events to capture detailed AI functionality, from simple LLM calls to complex multi-agent systems with guardrails. This approach enables comprehensive monitoring and analysis, while allowing data collection and visualization to be handled by third-party services like Grafana, especially beneficial for development environments.</p>
<h2>Sub-Tasks for implementation</h2>
<ul>
<li>Co-maintain or re-create OpenTelemetry module</li>
<li>Update AI Logging sub-module, to support OpenTelemetry</li>
<li>Update DDEV with something like Grafana</li>
<li>Implement AI Logging to listen listens to AI Core and AI Agents events and implement OpenTelemetrys traces, spans, attributes, events and correctly process</li>
<li>Upgrade existing "Simple" logging implementation</li>
<li>Document and make available to AI module developers</li>
</ul>
issue