redpanda-data · micheleRP · Apr 29, 2026 · Apr 28, 2026
@@ -1,4 +1,100 @@
 = Token Budgets and Limits
-:description: Control AI costs with token budgets and rate limits.
+:description: See what AI spending the Agentic Data Plane records automatically, where to view it, and what cap-management capabilities arrive after GA.
+:page-topic-type: overview
+:personas: platform_admin, evaluator
+// TODO: confirm persona vocabulary. The Governance V0 PRD names HoT (Head of Trust), CIO/CFO, CISO, and FDE; this page uses canonical docs-team-standards personas. Confirm with docs-team-standards owner whether to add `executive` and `security_admin` (or equivalents) so the metadata matches the PRD audience.
+:learning-objective-1: Identify what spending data the Agentic Data Plane records automatically
+:learning-objective-2: Locate where to view spend in the dashboard, in transcripts, and through breakdown queries
+:learning-objective-3: Recognize which cap-management capabilities ship at GA versus arrive in a later release
 
-// TODO: Add content
+include::ROOT:partial$adp-la.adoc[]
+
+The Agentic Data Plane records every LLM call as a spending event the moment your first agent or MCP server runs through the gateway. The current release lets you read that data — through the governance dashboard, through individual transcripts, and through breakdown queries by provider, model, user, organization, or provider type. Configurable caps, halt-vs-notify enforcement, alerts, and per-tenant cap-setting arrive in a later release.
+
+After reading this page, you will be able to:
+
+* [ ] {learning-objective-1}
+* [ ] {learning-objective-2}
+* [ ] {learning-objective-3}
+
+== What ADP records automatically
+
+Every LLM call routed through AI Gateway becomes a *spending event*. Each event captures:
+
+* Input tokens, output tokens, and cached tokens.
+* Total cost (in microcents).
+* Request count.
+* The provider, model, user, and organization context the call ran under.
+
+Events flow through a Kafka pipeline and roll up into queryable storage. No setup required — spending is captured the moment your first agent or MCP server runs through the gateway.
+
+[NOTE]
+====
+Cost is reported in *microcents*. 1 cent = 100 microcents, $1 = 10,000 microcents. Divide `total_cost_microcents` by 10,000 to convert to dollars.
+====
+
+// TODO: confirm whether spending events are captured by default for every deployment, or whether some deployments require an opt-in flag. Open Q A1 in the companion plan.
+
+== Where to view your spend
+
+You don't view spend on this page. The dashboard, transcripts, and breakdown queries are the read surfaces:
+
+[cols="1,3"]
+|===
+|Surface |Use it for
+
+|*Governance dashboard*
+|Summary cards (total spend, agent count, request count, trend), provider breakdown chart, events timeline, agents and MCP servers tables. The single-pane-of-glass view across your whole deployment. See xref:governance:dashboard/index.adoc[Read the governance overview].
+
+|*Transcripts*
+|Per-call cost on individual executions. Useful when investigating a specific agent run or debugging a cost anomaly. See xref:observability:transcripts.adoc[Read a transcript].
+
+|*Breakdown queries*
+|Aggregated spend by *provider*, *model*, *user*, *organization*, or *provider type*. Available through the dashboard's provider-breakdown widget and through `GetSpendingBreakdown` for programmatic access.
+|===
+
+The breakdown dimensions all read from the same `SpendingFilter` shape: a time range plus optional `provider_name`, `model_id`, `user_id`, or `organization_id`. Combine dimensions to scope a query (for example, "all spend on Anthropic for user `alice` in April").
+
+// TODO: confirm `user_id` and `organization_id` are populated automatically from request context (OIDC claims) or require setup. Open Q A2 in the companion plan.
+
+== Guardrail evaluator cost
+
+Some guardrail evaluators call an LLM to do their work. A toxicity classifier, for example, runs the request or response through a separate model and accrues per-call cost in the process. PII detection over regex doesn't, but anything LLM-based does.
+
+Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically.
+
+For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails].
+
+// TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan.
+
+== Multi-tenant patterns at GA — viewing only
+
+The `SpendingFilter` exposes `organization_id` and `user_id`, so every dashboard query and every API call can scope to a single tenant or user. Use this to:
+
+* See per-tenant spend in the dashboard's provider-breakdown view.
+* Pull per-user cost reports through `GetSpendingBreakdown`.
+* Identify which organization or user is driving the highest cost on a specific provider.
+
+At the current release, you can *see* per-tenant spend; you cannot *cap* per-tenant spend. Cap-setting at any scope is a later-release feature.
+
+// TODO: confirm whether `organization_id` is multi-tenant-aware in the public ADP API at GA, or whether it's an internal-only field. The proto exposes the filter; runtime population is the open item. Open Q B1 in the companion plan.
+
+== Coming in a later release
+
+Cap-management arrives after GA per the Governance V0 PRD. The planned feature set includes:
+
+* *Configurable caps* — set a maximum spend per period (daily, monthly), per scope (organization, agent, user), per resource (provider, model).
+* *Halt vs. notify behavior* — when a cap is reached, choose whether the gateway blocks new requests (halt) or continues serving while alerting an operator (notify).
+* *Per-agent caps* — limit each agent's spend independently of organization-wide caps.
+* *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded.
+* *Multi-tenant cap-setting* — per-tenant caps with override semantics.
+
+Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking.
+
+// TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan.
+
+== Next steps
+
+* Open the dashboard to see your current spend: xref:governance:dashboard/index.adoc[Read the governance overview].
+* Investigate a specific agent's cost: xref:observability:transcripts.adoc[Read a transcript].
+* Configure platform-level safety filtering: xref:governance:guardrails.adoc[Configure guardrails].