Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 98 additions & 2 deletions modules/governance/pages/budgets.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,100 @@
= Token Budgets and Limits
:description: Control AI costs with token budgets and rate limits.
:description: See what AI spending the Agentic Data Plane records automatically, where to view it, and what cap-management capabilities arrive after GA.
:page-topic-type: overview
:personas: platform_admin, evaluator
// TODO: confirm persona vocabulary. The Governance V0 PRD names HoT (Head of Trust), CIO/CFO, CISO, and FDE; this page uses canonical docs-team-standards personas. Confirm with docs-team-standards owner whether to add `executive` and `security_admin` (or equivalents) so the metadata matches the PRD audience.
:learning-objective-1: Identify what spending data the Agentic Data Plane records automatically
:learning-objective-2: Locate where to view spend in the dashboard, in transcripts, and through breakdown queries
:learning-objective-3: Recognize which cap-management capabilities ship at GA versus arrive in a later release

// TODO: Add content
include::ROOT:partial$adp-la.adoc[]

The Agentic Data Plane records every LLM call as a spending event the moment your first agent or MCP server runs through the gateway. The current release lets you read that data — through the governance dashboard, through individual transcripts, and through breakdown queries by provider, model, user, organization, or provider type. Configurable caps, halt-vs-notify enforcement, alerts, and per-tenant cap-setting arrive in a later release.

After reading this page, you will be able to:

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}
* [ ] {learning-objective-3}

== What ADP records automatically

Every LLM call routed through AI Gateway becomes a *spending event*. Each event captures:

* Input tokens, output tokens, and cached tokens.
* Total cost (in microcents).
* Request count.
* The provider, model, user, and organization context the call ran under.

Events flow through a Kafka pipeline and roll up into queryable storage. No setup required — spending is captured the moment your first agent or MCP server runs through the gateway.

[NOTE]
====
Cost is reported in *microcents*. 1 cent = 100 microcents, $1 = 10,000 microcents. Divide `total_cost_microcents` by 10,000 to convert to dollars.
====

// TODO: confirm whether spending events are captured by default for every deployment, or whether some deployments require an opt-in flag. Open Q A1 in the companion plan.

== Where to view your spend

You don't view spend on this page. The dashboard, transcripts, and breakdown queries are the read surfaces:

[cols="1,3"]
|===
|Surface |Use it for

|*Governance dashboard*
|Summary cards (total spend, agent count, request count, trend), provider breakdown chart, events timeline, agents and MCP servers tables. The single-pane-of-glass view across your whole deployment. See xref:governance:dashboard/index.adoc[Read the governance overview].

|*Transcripts*
|Per-call cost on individual executions. Useful when investigating a specific agent run or debugging a cost anomaly. See xref:observability:transcripts.adoc[Read a transcript].

|*Breakdown queries*
|Aggregated spend by *provider*, *model*, *user*, *organization*, or *provider type*. Available through the dashboard's provider-breakdown widget and through `GetSpendingBreakdown` for programmatic access.
|===

The breakdown dimensions all read from the same `SpendingFilter` shape: a time range plus optional `provider_name`, `model_id`, `user_id`, or `organization_id`. Combine dimensions to scope a query (for example, "all spend on Anthropic for user `alice` in April").

// TODO: confirm `user_id` and `organization_id` are populated automatically from request context (OIDC claims) or require setup. Open Q A2 in the companion plan.

== Guardrail evaluator cost

Some guardrail evaluators call an LLM to do their work. A toxicity classifier, for example, runs the request or response through a separate model and accrues per-call cost in the process. PII detection over regex doesn't, but anything LLM-based does.

Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically.

For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails].

// TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan.

== Multi-tenant patterns at GA — viewing only

The `SpendingFilter` exposes `organization_id` and `user_id`, so every dashboard query and every API call can scope to a single tenant or user. Use this to:

* See per-tenant spend in the dashboard's provider-breakdown view.
* Pull per-user cost reports through `GetSpendingBreakdown`.
* Identify which organization or user is driving the highest cost on a specific provider.

At the current release, you can *see* per-tenant spend; you cannot *cap* per-tenant spend. Cap-setting at any scope is a later-release feature.

// TODO: confirm whether `organization_id` is multi-tenant-aware in the public ADP API at GA, or whether it's an internal-only field. The proto exposes the filter; runtime population is the open item. Open Q B1 in the companion plan.

== Coming in a later release

Cap-management arrives after GA per the Governance V0 PRD. The planned feature set includes:

* *Configurable caps* — set a maximum spend per period (daily, monthly), per scope (organization, agent, user), per resource (provider, model).
* *Halt vs. notify behavior* — when a cap is reached, choose whether the gateway blocks new requests (halt) or continues serving while alerting an operator (notify).
* *Per-agent caps* — limit each agent's spend independently of organization-wide caps.
* *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded.
* *Multi-tenant cap-setting* — per-tenant caps with override semantics.

Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking.

// TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan.

== Next steps

* Open the dashboard to see your current spend: xref:governance:dashboard/index.adoc[Read the governance overview].
* Investigate a specific agent's cost: xref:observability:transcripts.adoc[Read a transcript].
* Configure platform-level safety filtering: xref:governance:guardrails.adoc[Configure guardrails].