diff --git a/modules/governance/pages/budgets.adoc b/modules/governance/pages/budgets.adoc index 80a8eed..bf2f073 100644 --- a/modules/governance/pages/budgets.adoc +++ b/modules/governance/pages/budgets.adoc @@ -1,4 +1,100 @@ = Token Budgets and Limits -:description: Control AI costs with token budgets and rate limits. +:description: See what AI spending the Agentic Data Plane records automatically, where to view it, and what cap-management capabilities arrive after GA. +:page-topic-type: overview +:personas: platform_admin, evaluator +// TODO: confirm persona vocabulary. The Governance V0 PRD names HoT (Head of Trust), CIO/CFO, CISO, and FDE; this page uses canonical docs-team-standards personas. Confirm with docs-team-standards owner whether to add `executive` and `security_admin` (or equivalents) so the metadata matches the PRD audience. +:learning-objective-1: Identify what spending data the Agentic Data Plane records automatically +:learning-objective-2: Locate where to view spend in the dashboard, in transcripts, and through breakdown queries +:learning-objective-3: Recognize which cap-management capabilities ship at GA versus arrive in a later release -// TODO: Add content +include::ROOT:partial$adp-la.adoc[] + +The Agentic Data Plane records every LLM call as a spending event the moment your first agent or MCP server runs through the gateway. The current release lets you read that data — through the governance dashboard, through individual transcripts, and through breakdown queries by provider, model, user, organization, or provider type. Configurable caps, halt-vs-notify enforcement, alerts, and per-tenant cap-setting arrive in a later release. + +After reading this page, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== What ADP records automatically + +Every LLM call routed through AI Gateway becomes a *spending event*. Each event captures: + +* Input tokens, output tokens, and cached tokens. +* Total cost (in microcents). +* Request count. +* The provider, model, user, and organization context the call ran under. + +Events flow through a Kafka pipeline and roll up into queryable storage. No setup required — spending is captured the moment your first agent or MCP server runs through the gateway. + +[NOTE] +==== +Cost is reported in *microcents*. 1 cent = 100 microcents, $1 = 10,000 microcents. Divide `total_cost_microcents` by 10,000 to convert to dollars. +==== + +// TODO: confirm whether spending events are captured by default for every deployment, or whether some deployments require an opt-in flag. Open Q A1 in the companion plan. + +== Where to view your spend + +You don't view spend on this page. The dashboard, transcripts, and breakdown queries are the read surfaces: + +[cols="1,3"] +|=== +|Surface |Use it for + +|*Governance dashboard* +|Summary cards (total spend, agent count, request count, trend), provider breakdown chart, events timeline, agents and MCP servers tables. The single-pane-of-glass view across your whole deployment. See xref:governance:dashboard/index.adoc[Read the governance overview]. + +|*Transcripts* +|Per-call cost on individual executions. Useful when investigating a specific agent run or debugging a cost anomaly. See xref:observability:transcripts.adoc[Read a transcript]. + +|*Breakdown queries* +|Aggregated spend by *provider*, *model*, *user*, *organization*, or *provider type*. Available through the dashboard's provider-breakdown widget and through `GetSpendingBreakdown` for programmatic access. +|=== + +The breakdown dimensions all read from the same `SpendingFilter` shape: a time range plus optional `provider_name`, `model_id`, `user_id`, or `organization_id`. Combine dimensions to scope a query (for example, "all spend on Anthropic for user `alice` in April"). + +// TODO: confirm `user_id` and `organization_id` are populated automatically from request context (OIDC claims) or require setup. Open Q A2 in the companion plan. + +== Guardrail evaluator cost + +Some guardrail evaluators call an LLM to do their work. A toxicity classifier, for example, runs the request or response through a separate model and accrues per-call cost in the process. PII detection over regex doesn't, but anything LLM-based does. + +Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically. + +For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails]. + +// TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan. + +== Multi-tenant patterns at GA — viewing only + +The `SpendingFilter` exposes `organization_id` and `user_id`, so every dashboard query and every API call can scope to a single tenant or user. Use this to: + +* See per-tenant spend in the dashboard's provider-breakdown view. +* Pull per-user cost reports through `GetSpendingBreakdown`. +* Identify which organization or user is driving the highest cost on a specific provider. + +At the current release, you can *see* per-tenant spend; you cannot *cap* per-tenant spend. Cap-setting at any scope is a later-release feature. + +// TODO: confirm whether `organization_id` is multi-tenant-aware in the public ADP API at GA, or whether it's an internal-only field. The proto exposes the filter; runtime population is the open item. Open Q B1 in the companion plan. + +== Coming in a later release + +Cap-management arrives after GA per the Governance V0 PRD. The planned feature set includes: + +* *Configurable caps* — set a maximum spend per period (daily, monthly), per scope (organization, agent, user), per resource (provider, model). +* *Halt vs. notify behavior* — when a cap is reached, choose whether the gateway blocks new requests (halt) or continues serving while alerting an operator (notify). +* *Per-agent caps* — limit each agent's spend independently of organization-wide caps. +* *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded. +* *Multi-tenant cap-setting* — per-tenant caps with override semantics. + +Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking. + +// TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan. + +== Next steps + +* Open the dashboard to see your current spend: xref:governance:dashboard/index.adoc[Read the governance overview]. +* Investigate a specific agent's cost: xref:observability:transcripts.adoc[Read a transcript]. +* Configure platform-level safety filtering: xref:governance:guardrails.adoc[Configure guardrails].