feat(rook): add gateway usage accounting#763
Conversation
Deploying corvus with
|
| Latest commit: |
a532f60
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://6796dd7c.corvus-42x.pages.dev |
| Branch Preview URL: | https://feat-rook-usage-accounting-6.corvus-42x.pages.dev |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThis PR implements real usage accounting for the Rook gateway by adding a persistent ChangesUsage Accounting Implementation
Sequence DiagramsequenceDiagram
participant Client
participant GatewayHandler as Gateway Handler
participant UsageService as Usage Service
participant SqliteDb as SQLite DB
participant AdminAPI as Admin API Handler
Note over Client,AdminAPI: Usage Recording Flow
Client->>GatewayHandler: POST /v1/chat/completions
activate GatewayHandler
GatewayHandler->>GatewayHandler: Parse request, start timer
GatewayHandler->>GatewayHandler: Route to upstream
alt Upstream Success
GatewayHandler->>GatewayHandler: Extract tokens from response
else Upstream Error or Route Failure
GatewayHandler->>GatewayHandler: Classify outcome
end
GatewayHandler->>UsageService: spawn_usage_record(event)
activate UsageService
UsageService->>SqliteDb: insert_usage_event(StoredUsageEvent)
activate SqliteDb
SqliteDb->>SqliteDb: Write row to usage_events
deactivate SqliteDb
deactivate UsageService
GatewayHandler->>Client: Response
deactivate GatewayHandler
Note over Client,AdminAPI: Usage Query Flow
Client->>AdminAPI: GET /api/usage?period=day&limit=10
activate AdminAPI
AdminAPI->>AdminAPI: Parse query, compute window
AdminAPI->>UsageService: summary(UsageSummaryQuery)
activate UsageService
UsageService->>SqliteDb: aggregate_totals(since, until)
activate SqliteDb
SqliteDb->>SqliteDb: SUM requests, tokens over window
deactivate SqliteDb
UsageService->>SqliteDb: aggregate_group(logical_model, limit)
activate SqliteDb
SqliteDb->>SqliteDb: GROUP BY logical_model, order, limit
deactivate SqliteDb
UsageService->>AdminAPI: UsageSummary {totals, by_model, …}
deactivate UsageService
AdminAPI->>AdminAPI: Build UsageSummaryView with RFC3339 window
AdminAPI->>Client: JSON {available: true, window, totals, by_model, …}
deactivate AdminAPI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 1 minute and 17 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
openspec/specs/gateway/spec.md (1)
21-21:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale "placeholder" language in the Purpose section contradicts R25.
Line 21 still reads "placeholder usage endpoint" but R25 now mandates that
GET /api/usagereturns real, persisted accounting data. A reader of the Purpose section alone would incorrectly conclude the endpoint is still a stub.📝 Suggested fix
-settings, and the placeholder usage endpoint, including redacted response views and coexistence +settings, and the usage accounting endpoint, including redacted response views and coexistenceAs per coding guidelines, docs must stay aligned with code changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/specs/gateway/spec.md` at line 21, Update the Purpose section's wording that currently reads "placeholder usage endpoint" to reflect R25 by describing the endpoint as a real, persisted usage/accounting endpoint (e.g., mention that GET /api/usage returns real persisted accounting data per R25) and remove any implication that it is a stub; ensure the updated sentence references GET /api/usage and R25 so readers know the endpoint is implemented and returns persisted data.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/rook/migrations/0007_usage_events.sql`:
- Around line 21-25: The current single-column indexes
(idx_usage_events_occurred_at, idx_usage_events_logical_model,
idx_usage_events_vendor, idx_usage_events_account_id, idx_usage_events_outcome)
cause poor plan choices for the /api/usage time-bounded, grouped queries; add
composite time-window indexes that pair occurred_at with each grouping key to
support the WHERE occurred_at range plus GROUP BY on
logical_model/vendor/outcome/account_id (e.g., indexes on (logical_model,
occurred_at), (vendor, occurred_at), (outcome, occurred_at), and optionally
(account_id, occurred_at)); update the migration to create these composite
indexes alongside or instead of the single-column ones so SQLite can use the
range filter and grouping key together for efficient aggregation.
In `@clients/rook/src/admin/handlers.rs`:
- Around line 252-255: handle_get_usage currently accepts Query<GetUsageQuery>
which lets Axum produce its own rejection for malformed params; change the
extractor to Query<Result<GetUsageQuery,
axum::extract::rejection::QueryRejection>> (or the generic Query<Result<_, _>>)
so you can match on Err and return your structured error shape: map parsing
errors to (StatusCode::BAD_REQUEST, Json(AdminErrorResponse { ... })) and on
Ok(query) continue normal processing to produce Json<UsageSummaryView>; update
references inside handle_get_usage to use the unwrapped query on success and
ensure AdminErrorResponse is used for all query-parsing failures.
In `@clients/rook/src/admin/types.rs`:
- Around line 582-610: Add a non-empty grouped entry to the test to assert the
flattened group shape: construct a UsageGroupView in the
UsageSummaryView.by_model (e.g., key "gpt-4o" with an aggregate
UsageAggregateView having requests = 1) and serialize to json; then add
assertions that json["by_model"][0]["key"] == "gpt-4o" and
json["by_model"][0]["requests"] == 1 to verify UsageGroupView (with
#[serde(flatten]) flattens key and aggregate fields at the same level.
- Around line 261-266: The UsageSummaryWindowView struct currently types the
since and until fields as String which permits invalid timestamps; change the
types of UsageSummaryWindowView::since and ::until from String to
chrono::DateTime<chrono::Utc> and remove the .to_rfc3339() conversions at call
sites (e.g., in the handler that builds UsageSummaryWindowView) so you pass
DateTime<Utc> directly; rely on chrono's serde feature to produce RFC3339 on
serialization and update any imports/uses referencing these fields accordingly.
In `@clients/rook/src/db/usage.rs`:
- Around line 157-174: The query currently interpolates a raw &str `column` into
the SQL (in the format! block) creating a SQL injection risk; replace that with
a small allowlisted enum (e.g. Column { ProjectId, OrganizationId, Model, ... })
and change the function signature (the function that builds this SQL — e.g.
summarize_usage) to accept this enum instead of &str; map the enum to the exact
identifier string via a match (returning safe literals like "project_id",
"organization_id", etc.) and use that matched literal in the format! call,
leaving all other values as SQL parameters bound via sqlx::query_as, then update
all call sites to pass the enum variant. Ensure no user-controlled string is
ever interpolated directly into the SQL.
- Around line 81-95: The three bindings for event.prompt_tokens,
event.completion_tokens, and event.total_tokens silently convert overflowed u64
-> i64 to NULL via .and_then(|v| i64::try_from(v).ok()); change them to
propagate the conversion error instead of swallowing it: replace each
.and_then(... .ok()) with code that attempts i64::try_from(value) and returns
Err on failure (e.g., i64::try_from(value).map_err(|e| /* wrap into the
function's error type */)?), so the calling function (the usage persistence
routine) observes and returns the conversion error rather than persisting NULL;
use the same error-wrapping convention used elsewhere in this module (see how
latency_ms is handled) when mapping the conversion error.
In `@clients/rook/src/gateway/handlers.rs`:
- Around line 135-161: spawn_usage_record currently fire-and-forgets the ledger
write (tokio::spawn calling usage.record(event).await) which can lose writes on
crash; change it to perform the persistence inline or via a bounded background
worker that supports flush-on-shutdown: either await usage.record(event).await
directly inside spawn_usage_record and return/propagate errors, or push the
StoredUsageEvent into an existing bounded sender and ensure the worker drains
and awaits pending writes during application shutdown; update references to
spawn_usage_record, usage.record, and any tokio::spawn usage so the record is
not dropped silently (log errors as before but ensure the call completes or is
flushed on shutdown).
- Around line 341-353: The current code calls spawn_usage_record(...) with
stream: true and outcome "success" immediately when starting the SSE, causing
truncated/aborted streams to be misreported; instead, wrap the response stream
so that you only call spawn_usage_record(...) after the stream terminally
completes or errors (use the stream's completion/finalization handler), passing
the same UsageRecordInput fields (started_at, logical_model:
request.model.clone(), metric_context.clone_static(), account_id, tokens, etc.)
but set stream: true and outcome to "success" on clean completion or to an error
outcome/status and proper token counts on failure; replace the immediate call at
the start with deferred calls inside the stream finalizer (referencing
spawn_usage_record, UsageRecordInput, started_at, logical_model, metric_context,
account_id, and TokenUsageParts::none()) so ledger events reflect actual stream
termination rather than stream open.
In `@clients/rook/src/transport/rate_limit.rs`:
- Line 123: The prune closure is using
now.duration_since(state.window_started_at) which can panic if a newer
window_started_at is inserted before the async mutex is acquired; update the
comparison in pruning() (and any similar uses in check()) to call
now.saturating_duration_since(state.window_started_at) < ttl so late timestamps
won’t cause a panic—ensure you import/use Instant::saturating_duration_since and
replace the duration_since call inside windows.retain(|_, state| ...)
accordingly.
---
Outside diff comments:
In `@openspec/specs/gateway/spec.md`:
- Line 21: Update the Purpose section's wording that currently reads
"placeholder usage endpoint" to reflect R25 by describing the endpoint as a
real, persisted usage/accounting endpoint (e.g., mention that GET /api/usage
returns real persisted accounting data per R25) and remove any implication that
it is a stub; ensure the updated sentence references GET /api/usage and R25 so
readers know the endpoint is implemented and returns persisted data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 32ace91b-6833-46a1-9b14-85669029de73
📒 Files selected for processing (14)
clients/rook/migrations/0007_usage_events.sqlclients/rook/src/admin/handlers.rsclients/rook/src/admin/mod.rsclients/rook/src/admin/types.rsclients/rook/src/db/mod.rsclients/rook/src/db/usage.rsclients/rook/src/gateway/handlers.rsclients/rook/src/registry/mod.rsclients/rook/src/server/mod.rsclients/rook/src/services/mod.rsclients/rook/src/services/usage.rsclients/rook/src/transport/rate_limit.rsopenspec/specs/gateway/rook-acceptance-regression-matrix.mdopenspec/specs/gateway/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: sonar
- GitHub Check: pr-checks
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: submit-gradle
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/rook/src/services/mod.rsclients/rook/src/transport/rate_limit.rsclients/rook/src/services/usage.rsclients/rook/src/admin/handlers.rsclients/rook/src/db/mod.rsclients/rook/src/db/usage.rsclients/rook/src/admin/types.rsclients/rook/src/server/mod.rsclients/rook/src/gateway/handlers.rsclients/rook/src/admin/mod.rsclients/rook/src/registry/mod.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/rook/src/services/mod.rsclients/rook/src/transport/rate_limit.rsclients/rook/src/services/usage.rsclients/rook/migrations/0007_usage_events.sqlclients/rook/src/admin/handlers.rsclients/rook/src/db/mod.rsopenspec/specs/gateway/spec.mdclients/rook/src/db/usage.rsclients/rook/src/admin/types.rsclients/rook/src/server/mod.rsclients/rook/src/gateway/handlers.rsclients/rook/src/admin/mod.rsclients/rook/src/registry/mod.rsopenspec/specs/gateway/rook-acceptance-regression-matrix.md
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
openspec/specs/gateway/spec.mdopenspec/specs/gateway/rook-acceptance-regression-matrix.md
🔇 Additional comments (8)
openspec/specs/gateway/rook-acceptance-regression-matrix.md (1)
49-49: LGTM — traceability updates are accurate and consistent with the implementation.The
#594caveat,#599evidence command list, and the Implemented status for usage accounting all align with the PR's delivered artifacts and caveats (cost deferred, streaming token fields unknown).Also applies to: 69-69, 75-75
openspec/specs/gateway/spec.md (1)
1137-1181: R25 andUsageSummaryViewcontract are technically accurate and implementation-aligned.
- Redaction rules, bounded metadata, and
route_rejected/unroutedsemantics match the persistence layer.estimated_cost_usd: nullrule aligns withOption<f64>always beingNonetoday.- Group array contract ("same aggregate fields as
totals") correctly describes the#[serde(flatten)]output fromUsageGroupView.Also applies to: 3053-3090
clients/rook/src/admin/types.rs (2)
252-259:UsageSummaryPeriodderivations and#[serde(rename_all)]are correct.
"hour"/"day"/"month"snake_case output matches the spec,#[default]onDaysatisfiesDefaultfor query parameter defaulting, and bothSerialize/Deserializeare needed since the type is used as both a query input and a response field.
268-296: Aggregate, group, and summary view types are correctly defined.
UsageAggregateViewcorrectly derives onlyPartialEq(notEq) becausef64doesn't implementEq.UsageGroupViewwith#[serde(flatten)]onaggregateproduces{ "key": "...", "requests": ..., ... }, matching the spec's "key + same aggregate fields as totals" contract. No field name collision exists sinceUsageAggregateViewhas nokeyfield.estimated_cost_usd: Option<f64>serializes asnullby default (noskip_serializing_if), which is the correct behavior per spec.clients/rook/src/services/mod.rs (1)
13-13: LGTM — follows the established module export pattern.clients/rook/src/db/usage.rs (3)
5-60: LGTM — struct definitions match the schema.Field types, optionality, and naming all align precisely with the
usage_eventsschema defined in0007_usage_events.sql.
122-148: LGTM — parameterized query with proper error mapping.
260-329: LGTM — test coverage for the core accounting paths.Both time-window filtering and limit-clamped group aggregation are covered. Token arithmetic in the
event()helper correctly exercises theCOALESCE(SUM(...))paths.
|
Addressed the reviewed findings that apply to the current code. Summary:
Validation:
|
|



Related Issues
Fixes #685
Summary
Implements real Rook gateway usage accounting in place of the previous
/api/usageplaceholder.usage_eventsledger and usage service wiring throughRookRegistry./v1/chat/completions, including invalid requests, unrouted requests, upstream failures, buffered successes, and streaming successes.Tested Information
Ran the following validation locally:
All passed.
Pre-commit and pre-push hooks also passed, including the offline doc link check.
Documentation Impact
openspec/specs/gateway/spec.mdopenspec/specs/gateway/rook-acceptance-regression-matrix.mdBreaking Changes
No breaking changes expected.
/api/usagenow returns the documented real usage summary instead of the former placeholder response; this is the intended feature change for #685.Checklist