feat: add per-request logs for proxy and MITM traffic by dangtony98 · Pull Request #98 · Infisical/agent-vault

dangtony98 · 2026-04-21T04:22:30Z

Summary

New per-vault request log captures every proxy and transparent-MITM request (actor, matched service, credential keys injected, status, latency), surfaced via a Logs tab in the dashboard and GET /v1/vaults/{name}/logs.
MITM and explicit-proxy paths share a single requestlog.Sink, so both ingress paths feed the same pipeline with consistent fields.
Background retention worker trims by age and per-vault row cap; knobs exposed as env vars and CLI flags.

Changes

Store

Migration 039_request_logs.sql adds the request_logs table + two indexes.
CRUD in internal/store/sqlite.go; methods added to the Store interface.

Pipeline

New internal/requestlog/ package: Sink interface, SQLite-backed implementation, retention worker.
AttachLogSink on the Server wires the sink at startup; the explicit-proxy and MITM handlers fan events into it.

API + docs

GET /v1/vaults/{name}/logs (vault-member auth) returns paginated logs.
Skill files (cmd/skill_cli.md, cmd/skill_http.md) updated.
Env-var / CLI-flag reference updated (.env.example, docs/reference/cli.mdx, docs/self-hosting/environment-variables.mdx).

UI

web/src/pages/vault/LogsTab.tsx + web/src/components/LogsView.tsx render the table with filtering and status/latency cells.
AllAgentsTab and router.tsx updated to integrate the new layout.

Test plan

go test ./... green
Manual: start the broker, exercise both explicit /proxy/* and MITM HTTPS_PROXY flows, confirm rows land in the Logs tab with correct actor / status / latency / credential-keys fields
Manual: verify retention — set a low per-vault cap via env, generate > cap rows, confirm trim on next worker tick
Manual: verify auth — non-vault-member gets 403 on GET /v1/vaults/{name}/logs

Notes

Branched off main. Does not depend on fix: exempt loopback from TierAuth so local vault run doesn't get 429'd #97 (rate-limit loopback fix) — they touch disjoint hunks.
Frontend changes ship as source; make build will re-embed web/dist into the binary.

🤖 Generated with Claude Code

Capture proxy and transparent-MITM requests into a per-vault request log, surfaced through a new Logs tab in the dashboard and a `GET /v1/vaults/{name}/logs` API. The MITM and explicit-proxy paths share a single `requestlog.Sink`, so both ingress paths feed the same pipeline with consistent fields (actor, matched service, credential keys injected, status, latency). - Store: new `request_logs` table + migration 039, CRUD in sqlite.go, methods added to the Store interface. - Pipeline: `internal/requestlog` package with Sink + SQLite backend and a background retention worker that trims by age and per-vault row cap. - Server: `AttachLogSink` wires the sink at startup; the proxy and MITM handlers fan events into it on every request. - API: `GET /v1/vaults/{name}/logs` returns paginated logs for a vault (vault-member auth); skill docs (CLI + HTTP) updated to match. - UI: Logs tab under each vault with filtering and latency/status rendering; AllAgentsTab updated to route through the new layout. - Ops: new env vars + CLI flags for retention knobs documented in .env.example, docs/reference/cli.mdx, and docs/self-hosting/environment-variables.mdx. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mintlify · 2026-04-21T04:22:35Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
agent-vault	🟢 Ready	View Preview	Apr 21, 2026, 4:23 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Every entry in the `where` slice is a static predicate string; all user input flows through `args` as ? placeholders. Matches the existing `#nosec G202` pattern at sqlite.go:2684 for the same safe-concat shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-04-21T04:37:52Z

+			Path:           r.Path,
+			MatchedService: r.MatchedService,
+			CredentialKeys: r.CredentialKeys,
+			Status:         r.Status,
+			LatencyMs:      r.LatencyMs,
+			ErrorCode:      r.ErrorCode,
+			ActorType:      r.ActorType,
+			ActorID:        r.ActorID,
+			CreatedAt:      r.CreatedAt.Format(time.RFC3339),
+		}
+	}
+
+	// When the caller is tailing (passed after=) and no new rows exist, we
+	// echo back their cursor so the next poll stays on the same high-water
+	// mark without a separate round-trip to learn "still nothing new".
+	if latestID == 0 {
+		latestID = opts.After
+	}
+
+	// next_cursor is the id to pass as `before` on the next page going back
+	// in time. Nil when we don't have a full page (end of history) or when
+	// the caller is tailing forward (`after` set).
+	var nextCursor any
+	if opts.After == 0 && len(rows) == limit {
+		nextCursor = rows[len(rows)-1].ID
+	}
+
+	jsonOK(w, map[string]any{
+		"logs":        items,
+		"next_cursor": nextCursor,


🔴 The live-tail polling query always uses ORDER BY id DESC LIMIT even when an after= cursor is set, so during any burst inserting more than the page limit (default 50) rows between polls, only the 50 newest are returned and latestID is set from rows[0].ID (the highest). The frontend advances latestIdRef.current to that value, permanently skipping the older burst rows — they remain in the DB but never appear in live tail. Fix by using ORDER BY id ASC LIMIT for after= queries and reading latestID from the last row.

Extended reasoning...

Root cause: In internal/store/sqlite.go, ListRequestLogs always appends ORDER BY id DESC LIMIT ? to the query regardless of whether opts.After is set. For the tailing case the intent is to fetch rows newer than the cursor, but descending order means the LIMIT is applied starting from the newest matching row.

Concrete code path: During a tailing poll the frontend calls GET /v1/vaults/{name}/logs?after=100&limit=50. SQLite executes WHERE id > 100 ORDER BY id DESC LIMIT 50. If 200 new rows arrived (IDs 101-300), the query returns IDs 300, 299, ..., 251. In handle_logs.go lines 92-93, latestID is set from rows[0].ID which is 300. The response carries latest_id: 300.

Why existing code does not prevent this: The frontend guard in LogsView.tsx line 123 (if data.latest_id > latestIdRef.current) prevents rollback but unconditionally accepts a forward jump. After receiving latest_id: 300 it updates latestIdRef.current to 300 and the next poll sends after=300. Records 101-250 are now permanently unreachable from live tail - no catch-up or gap-detection mechanism exists.

Step-by-step proof:

State: latestIdRef.current = 100; 200 new rows inserted (IDs 101-300) during the 3s poll interval.

Poll fires: GET ...?after=100&limit=50 -> SQL: WHERE id > 100 ORDER BY id DESC LIMIT 50 -> returns rows 300, 299, ..., 251.

latestID = rows[0].ID = 300 (handle_logs.go line 92).

Response: {logs: [300..251], latest_id: 300}.

Frontend prepends rows 300-251; guard advances cursor to 300.

Next poll: after=300 - rows 101-250 are permanently skipped from live tail.

Impact: Under any traffic rate exceeding limit/poll_interval (default ~16 req/s), some fraction of live-tailed logs will silently vanish from the UI on every burst. Records remain in the DB and are accessible via load-more (historical paging with before=), but the real-time tailing view is lossy.

Correct fix: When opts.After > 0, use ORDER BY id ASC LIMIT ? so the cursor advances incrementally through the burst. In handle_logs.go, read latestID from rows[len(rows)-1].ID (last/highest in ASC order) instead of rows[0].ID. A burst of 200 rows is then delivered in at most ceil(200/50)=4 consecutive polls with no gaps.

claude · 2026-04-21T04:37:53Z

 func (s *Server) handleProxy(w http.ResponseWriter, r *http.Request) {
 	start := time.Now()
 	event := brokercore.ProxyEvent{
-		Ingress: "explicit",
+		Ingress: brokercore.IngressExplicit,
 		Method:  r.Method,
 		Path:    r.URL.Path,
 	}
+	// Captured as the handler resolves the request; Emit hands off the
+	// terminal snapshot to both the slog line and the log sink.
+	var (
+		logVaultID   string
+		logActorType string
+		logActorID   string
+	)
 	emit := func(status int, errCode string) {
 		event.Emit(s.logger, start, status, errCode)
+		if s.logSink != nil {
+			s.logSink.Record(r.Context(), requestlog.FromEvent(event, logVaultID, logActorType, logActorID))
+		}
 	}

 	// 1. Parse target host and path from /proxy/{target_host}/{path...}


🔴 In handleProxy, logVaultID is declared as an empty string and only assigned after vault resolution, but emit() is called in five early-exit paths before that assignment, producing Record{VaultID: ""} entries. When SQLiteSink flushes its 250ms batch, InsertRequestLogs wraps all rows in a single transaction; the FK constraint vault_id REFERENCES vaults(id) with PRAGMA foreign_keys=ON rejects the empty-string row, the deferred tx.Rollback() fires, and every valid record that landed in the same batch window is silently dropped. Fix: guard the logSink.Record() call with logVaultID != "".

Extended reasoning...

Root cause — declaration vs. assignment ordering

In handleProxy (internal/server/handle_proxy.go), logVaultID is declared as var logVaultID string (zero value = "") at lines 118-122 of the new code, and is only assigned at line 170 after resolveVaultForSession returns successfully. The emit closure captures logVaultID by reference and calls s.logSink.Record(..., requestlog.FromEvent(event, logVaultID, ...)). Any emit() invocation that fires before line 170 therefore passes logVaultID = "".

Five early-exit paths trigger this

Looking at the added code in handleProxy:

Line 133 - empty trimmed path - emit(400, bad_request)

Line 140 - empty targetHost - emit(400, bad_request)

Line 152 - invalid host - emit(400, bad_request)

Line 162 - sess == nil - emit(403, forbidden)

Line 167 - resolveVaultForSession error - emit(0, vault_error)

Each produces a requestlog.Record with VaultID = "" that is enqueued into the SQLiteSink channel.

Why the FK constraint fires

Migration 039 defines: vault_id TEXT NOT NULL REFERENCES vaults(id) ON DELETE CASCADE. The SQLite DSN is opened with _pragma=foreign_keys(on) (see internal/store/sqlite.go). An empty string is not NULL so it passes the NOT NULL check, but it violates the FK constraint because no vault with id = "" exists.

Why the entire batch is lost

InsertRequestLogs (internal/store/sqlite.go) begins a transaction, prepares a single INSERT statement, and loops over all rows. If any stmt.ExecContext returns a FK error, the function returns immediately, triggering the defer tx.Rollback(). The entire batch, including all valid records that happened to be flushed in the same 250ms window alongside the error-path record, is silently discarded with no retry.

Concrete proof-of-concept

Suppose at T=0ms a bad-URL request arrives (emit(400, bad_request) fires with logVaultID = ""). At T=10ms ten legitimate proxy requests from vault abc-123 succeed and their records land in the channel. At T=250ms the ticker fires, commit is called with a batch of 11 records. Row 1 has vault_id = "". SQLite rejects it. tx.Rollback() fires. All 11 rows are gone. The commit error is logged at ERROR level, but the valid records are permanently lost with no indication of which vaults were affected.

Fix

The simplest one-line fix is to guard the Record call inside emit:

if s.logSink != nil && logVaultID != "" {
s.logSink.Record(r.Context(), requestlog.FromEvent(event, logVaultID, logActorType, logActorID))
}

Error-path requests before vault resolution are inherently un-attributable to a vault, so skipping them is semantically correct. Alternatively, InsertRequestLogs could skip rows where vault_id == "" rather than failing the whole batch, which would add defence-in-depth but does not fix the root cause.

claude · 2026-04-21T04:37:53Z

+    latestIdRef.current = 0;
+    setRows([]);
+    setNextCursor(null);
+    setError("");
+    setLoading(true);
+    loadInitial();
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [endpoint, filterQS]);
+
+  // Poll for new rows (tailing).
+  useEffect(() => {
+    if (pollMs <= 0 || paused) return;
+    const id = setInterval(() => {
+      pollNew();
+    }, pollMs);
+    return () => clearInterval(id);


🔴 The pollNew() guard if (latestIdRef.current === 0) return permanently disables live tailing for any vault that has no logs when the Logs tab is opened. Because the server echoes latest_id: 0 for an empty vault and the client sets latestIdRef.current = data.latest_id || 0, the ref never advances past 0, so every subsequent interval tick returns immediately — the green "Live" badge pulses but no polls are ever sent. The same stuck state is reproduced whenever a filter change produces zero results, because the filter useEffect resets the ref to 0 before calling loadInitial(). Fix: replace the numeric guard with a separate boolean initializedRef that is set to true only after loadInitial() has completed.

Extended reasoning...

What the bug is and how it manifests

pollNew() in web/src/components/LogsView.tsx contains the guard if (latestIdRef.current === 0) return. The intent is to block polls before loadInitial() has run. However the ref is also the live-tail cursor: it is set from data.latest_id after each initial load. When a vault is empty, the server returns latest_id: 0, which causes the ref to remain 0 indefinitely. Every 3-second interval tick fires pollNew(), hits the guard, and exits without issuing any network request. The UI shows a pulsing green "Live" badge, misleadingly suggesting that tailing is active.

The specific code path that triggers it

Component mounts; latestIdRef.current = 0 (initial value, useRef<number>(0)).

loadInitial() fetches /v1/vaults/{name}/logs?limit=50 with no after= param, so opts.After = 0 on the server.

Server (handle_logs.go ~line 115): vault has no rows so latestID stays 0; fallback latestID = opts.After also gives 0; response: {"logs":[], "latest_id":0}.

Client sets latestIdRef.current = data.latest_id || 0 = 0.

setInterval fires every pollMs (default 3000 ms) and calls pollNew().

pollNew() ~line 117: if (latestIdRef.current === 0) return exits immediately every time. No HTTP request is issued.

No recovery path exists: unless the page is reloaded after at least one log row exists, the ref stays 0 permanently.

Why existing code does not prevent it

The guard was added to avoid sending poll requests before loadInitial() completes (which would fire after=0 queries that return the full page on every tick and prepend duplicate rows). The problem is that the guard conflates two distinct states: "initial load not yet done" (latestIdRef transiently 0) vs. "vault is empty" (latestIdRef permanently 0). Using the cursor value as a readiness sentinel is incorrect for any vault whose legitimate cursor stays 0.

The filter-reset reproducer

The filter useEffect explicitly resets latestIdRef.current = 0 before calling loadInitial(). If the new filter returns zero matching rows (e.g. switching to "Errors" on a vault with no errors), the cursor is reset to 0 and the same stuck state is reproduced — live tailing stops until the page is reloaded.

Step-by-step proof

Open a brand-new vault that has received no proxied requests.

Navigate to the Logs tab → loadInitial() returns {logs:[], latest_id:0} → latestIdRef.current = 0.

Wait 3+ seconds and observe the browser network tab — no /logs?after=0 request is ever issued.

Send a proxied request through the vault (e.g. via curl through the proxy) → a row is inserted in the DB.

The UI table remains empty because pollNew() is still short-circuiting.

Reload the page → loadInitial() now returns {logs:[{id:1,...}], latest_id:1} → latestIdRef.current = 1 → subsequent polls work correctly.

Impact

The live-tailing feature — a headline capability of this PR — is silently non-functional for the most common scenario: opening the Logs tab on any new or idle vault. Users see the green pulsing "Live" indicator and may incorrectly conclude there is simply no traffic.

How to fix

Introduce a separate initializedRef = useRef(false) boolean. Set it to true inside loadInitial() once the response is processed. Change the pollNew() guard to if (\!initializedRef.current) return. Reset initializedRef.current = false in the filter useEffect alongside the existing resets. This correctly separates "not yet initialized" from "vault is empty with a valid cursor of 0".

Three bugs flagged on #98, all fixed: 1. Live-tail burst loss (sqlite.go, handle_logs.go): ListRequestLogs always used ORDER BY id DESC LIMIT, so a burst of >page_size rows between polls would return the newest N and silently skip the older ones. Tail queries (After > 0) now use ORDER BY id ASC; handler reverses before responding so the API contract stays newest-first. Regression test in sqlite_test.go pins the ordering. 2. FK-violation batch rollback (handle_proxy.go): Early-exit paths (missing host, invalid target, no session, vault_error) called emit() before logVaultID was set, producing Record{VaultID: ""} entries. vault_id has a FK constraint — the empty-string row rejects the whole 250ms SQLiteSink batch, discarding every valid record that rode with it. Guard the Record() call with logVaultID != "" so un-attributable errors stay in slog only. 3. Live-tail stuck on empty vault (LogsView.tsx): pollNew's `if (latestIdRef.current === 0) return` conflated "not yet initialized" with "vault legitimately has cursor 0", so opening the Logs tab on an empty vault permanently disabled polling. Same stuck state reproduced after any filter change that returns zero rows. Replaced with a separate initializedRef boolean, cleared in the filter reset and set in loadInitial on success. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dangtony98 · 2026-04-21T04:47:12Z

All three review findings addressed in f242fd1.

1. Live-tail burst loss (handle_logs.go:129 + sqlite.go)
Tail queries (`After > 0`) now use `ORDER BY id ASC LIMIT`, so a burst >page-size is consumed oldest-first and the cursor advances through all of it over consecutive polls with no gaps. Handler reverses the slice before responding so the newest-first API contract is preserved, and `latestID` is taken from the new highest id. Regression test in sqlite_test.go pins the invariant.

2. FK-violation batch rollback (handle_proxy.go:129)
Guarded `s.logSink.Record()` with `logVaultID != ""` so the five early-exit paths (missing host, invalid host, no session, vault_error) no longer enqueue unattributable rows that the FK constraint would reject and take down the whole 250ms batch. These events still go to slog — they just aren't persisted, which is semantically correct since they occur before vault resolution. The MITM forward path (forward.go:48) was already safe because `scope.VaultID` is resolved at CONNECT time.

3. Live-tail stuck on empty vault (LogsView.tsx:85)
Replaced the numeric `latestIdRef.current === 0` guard with a separate `initializedRef` boolean. It's reset alongside `latestIdRef` in the filter `useEffect` and set to `true` once `loadInitial` completes. Opening the Logs tab on an empty vault now polls correctly and picks up the first request without a reload; filter changes that produce zero results no longer permanently disable live tail.

`go test ./...` and `golangci-lint run ./...` both green.

Resolved conflict in internal/server/server.go: #97 (merged) dropped the ipAuth wrapper from /v1/service-catalog, /v1/skills/{cli,http}, and /v1/mitm/ca.pem. Kept that change and preserved this branch's new /v1/vaults/{name}/logs route alongside it. Also applied De Morgan's to two QF1001 staticcheck findings in the new TestListRequestLogsTailOrdering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mintlify Bot deployed to staging - docs April 21, 2026 04:23 View deployment

claude Bot reviewed Apr 21, 2026

View reviewed changes

dangtony98 merged commit df9a76d into main Apr 21, 2026
4 checks passed

dangtony98 deleted the feat/request-logs branch April 21, 2026 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add per-request logs for proxy and MITM traffic#98

feat: add per-request logs for proxy and MITM traffic#98
dangtony98 merged 4 commits intomainfrom
feat/request-logs

dangtony98 commented Apr 21, 2026

Uh oh!

mintlify Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

dangtony98 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dangtony98 commented Apr 21, 2026

Summary

Changes

Test plan

Notes

Uh oh!

mintlify Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

dangtony98 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented Apr 21, 2026 •

edited

Loading