Idea: AI Cost & Usage module — local cost dashboard + opt-in aggregated community telemetry
## Summary Explore building an **AI Cost & Usage** module that gives site owners a local dashboard of their AI spending and usage, and (optionally, opt‑in) shares aggregated, obfuscated usage data back to a community endpoint so the AI Initiative can see how the AI module and its providers are actually being used in the wild. ## Motivation There are two needs that overlap nicely: 1. **Site owners** routinely ask for visibility into AI costs: tokens per request, $/day, which model/provider/feature is responsible for spend, and what their usage looks like compared to similar sites. Today this is opaque unless they wire up provider dashboards themselves. 2. **The AI Initiative and the broader community** have no signal beyond the Drupal.org project usage counts (https://new.drupal.org/project/usage/ai) about how the AI module is *actually used*: which providers, which submodules, which features. That data is necessary to make sane decisions about what to deprecate, where to invest, and how to talk about adoption with ecosystem partners. The project usage data alone is noisy — it heavily reflects local dev sites (usage drops over holidays), and many production sites disable update reporting. We need something with finer granularity than "module installed yes/no" but with strong privacy guarantees. A cost dashboard is a natural carrier for usage telemetry: it has a legitimate reason to aggregate per‑provider/per‑model counts locally, and a legitimate reason to "call home" already (to pull current provider pricing). Adding an optional "share aggregated usage" checkbox on top of that surface is a much smaller ask than a stand‑alone telemetry module. ## Proposed scope A new contrib module (working name: `ai_cost_usage` or `ai_cost_dashboard`) that: ### Local features (always on) - Records per‑request usage metadata against each AI operation: provider, model, operation type, input/output token counts, latency, and the calling module/agent where available. - Computes cost using current per‑model pricing tables. - Fetches up‑to‑date pricing periodically from a community endpoint ("call home" for pricing data is the natural justification for outbound traffic). - Exposes a dashboard: cost over time, top providers/models, top feature areas, comparisons against the site's own history. ### Optional shared telemetry (opt‑in) - A clearly‑labeled "share aggregated usage data" toggle, surfaced during initial setup with UX nudging adoption. - Pushes obfuscated, aggregated rollups (no prompts, no completions, no PII, no per‑user data) to a community endpoint on a schedule. - In return, the dashboard shows community averages for opted‑in sites: "your average input tokens per prompt vs. the community", "your model mix vs. the community" — so there's a tangible benefit to opting in. The reciprocity is the key UX hook: opting in unlocks comparison data the site couldn't compute on its own. ## Relationship to existing work - **AI Observability** already provides OpenTelemetry export, but it's opt‑in and targets the site owner's own infrastructure. It doesn't address the community‑level signal problem and doesn't help non‑observability‑savvy site owners see their costs. - **Drupal.org project usage** is too coarse and known to be noisy. - This module would sit alongside Observability, not replace it. A site could enable both. ## Open questions for the initiative These need decisions before code; the point of this issue is to align on direction: - **Packaging** — is this a submodule of the AI module, a separate top‑level contrib module, or a recipe? Is it shipped in Drupal CMS? On by default with an off switch, or off by default? - **Privacy / ToS** — what exactly is in the shared payload? How is it obfuscated/aggregated? What consent surface is required (site‑owner click‑through, ToC link, etc.)? Who reviews the privacy framing? Does this need legal review on Drupal.org's side before a community endpoint exists? - **Endpoint** — where does the aggregated data land? Drupal.org infrastructure? A separately‑hosted service operated by the initiative? Who owns it long‑term and how is it funded? - **Pricing data source** — where do the current per‑model pricing tables come from, who maintains them, and how often are they refreshed? Is this a manual curation effort or pulled from provider APIs where available? - **Schema stability** — usage data is only useful if rollups stay comparable over time. What's the minimum schema, and how do we evolve it without breaking historical comparisons? - **Interaction with AI Observability** — should the cost module consume Observability data when it's enabled, or always maintain its own collection path? Avoid double‑instrumentation. ## Proposed next steps 1. Spike a minimal local‑only version: usage recording + a basic dashboard, no telemetry. Validate that the recording layer is cheap and accurate. 2. Draft the shared‑telemetry payload schema and privacy framing as a separate sub‑issue, so it can be reviewed independently before any code calls home. 3. Decide packaging (submodule / standalone / recipe) based on how it integrates with Drupal CMS install flows. 4. Identify owners for the community endpoint side of the equation. ## AI Usage - [x] AI Assisted Issue — This issue was generated with AI assistance, but was reviewed and refined by the creator.
issue