feat: add tiered rate limiting with owner-tunable settings by dangtony98 · Pull Request #94 · Infisical/agent-vault

dangtony98 · 2026-04-21T00:13:43Z

Summary

Introduces internal/ratelimit — a tiered in-memory limiter with four public tiers (AUTH, PROXY, AUTHED, GLOBAL) plus an internal VERIFY_FAIL failure counter. Replaces the scattered per-handler sliding-window limiters in handle_auth.go / handle_oauth.go / handle_users.go.
Both ingress paths — explicit /proxy/* and the MITM forward handler — share one Registry so an agent can't bypass PROXY limits by switching ingress. Enforcement is unified in Registry.EnforceProxy (per-scope token bucket + per-scope concurrency semaphore).
Instance owners can tune limits live from Manage Instance → Settings → Rate Limiting: profile dropdown (default / strict / loose / off) + advanced per-tier overrides with a debounced server-side preview and per-tier source labels (env / override / default). Registry reloads without a restart.
Operators pinning limits on PaaS can set AGENT_VAULT_RATELIMIT_LOCK=true (UI becomes read-only) and/or per-tier AGENT_VAULT_RATELIMIT_<TIER>_<KNOB> overrides. Env always beats UI.

Tiers at a glance

Tier	Covers	Keyed on
AUTH	login, register, forgot/reset, verify, resend, OAuth login/callback, invite/approval-token lookup, MITM CONNECT flood	client IP (login also checks normalized email)
PROXY	`/proxy/*` + MITM forward	`(actor, vault)` — token bucket + concurrency semaphore
AUTHED	everything behind `requireAuth`	actor (user or agent ID)
GLOBAL	server-wide backstop	RPS bucket + in-flight semaphore

Adjacent hardening bundled

MITM inner http.Server gained ReadTimeout / WriteTimeout / IdleTimeout — without these a drip-feed tunnel could pin a PROXY concurrency slot.
/proxy/* and MITM forward now cap forwarded bodies at 64 MB (brokercore.MaxProxyBodyBytes).
MITM CONNECT-flood limiter runs before ParseProxyAuth so bad-auth floods can't burn CPU on session lookups.

Docs

.env.example, docs/self-hosting/environment-variables.mdx, docs/reference/cli.mdx, README.md, and both agent skill files (cmd/skill_cli.md, cmd/skill_http.md) updated for 429 / Retry-After semantics.

Test plan

go test -race ./... green
go vet ./... clean
npx tsc --noEmit clean in web/
Manual smoke: built binary, started with AGENT_VAULT_RATELIMIT_AUTH_MAX=3, hit /v1/auth/register 5× → first 3 201, 4th+5th 429 with Retry-After, X-RateLimit-Limit: 3, X-RateLimit-Remaining: 0.
Exercise Manage Instance → Settings → Rate Limiting in a browser: pick strict, confirm the advanced-table numbers shrink live; set a per-tier override, confirm the row's source label flips to override; save + refresh and confirm the persisted value survives; set AGENT_VAULT_RATELIMIT_LOCK=true, restart, confirm the pane renders read-only with the "Pinned by env" banner.
Exercise /proxy and MITM ingresses concurrently with a shared token to confirm the PROXY concurrency semaphore applies across both paths.

🤖 Generated with Claude Code

Introduce internal/ratelimit — a tiered in-memory limiter with four public tiers plus an internal failure counter: - AUTH: unauthenticated surface (login, register, forgot/reset, verify, OAuth, invite/approval-token redemption, MITM CONNECT). Sliding-window keyed on client IP; login additionally keys on the normalized email (reject if either bucket is exhausted). - PROXY: /proxy/* and MITM forward, keyed on (actor, vault). Token bucket for smooth traffic + per-scope concurrency semaphore so slow upstream calls can't pile up on one scope. - AUTHED: everything behind requireAuth. Per-actor token bucket. - GLOBAL: server-wide RPS ceiling + total in-flight cap applied at the outer mux. - VERIFY_FAIL (internal): failure counter for verification and password-reset codes; not surfaced in the UI. Both ingress paths share one Registry so switching ingress can't bypass PROXY limits. The registry reloads on settings writes, no restart required. Operator controls: - AGENT_VAULT_RATELIMIT_PROFILE={default,strict,loose,off} - AGENT_VAULT_RATELIMIT_LOCK=true pins settings; UI becomes read-only - AGENT_VAULT_RATELIMIT_<TIER>_<KNOB> per-tier env overrides - Manage Instance → Settings → Rate Limiting pane: profile dropdown + advanced per-tier overrides with debounced live preview and per-tier source labels (env / override / default) Adjacent hardening bundled because they enable the rate-limit design: - MITM inner http.Server grew ReadTimeout/WriteTimeout/IdleTimeout (slow-loris defense — without these a drip-feed tunnel could pin a PROXY concurrency slot). - /proxy and MITM forward cap forwarded bodies at 64 MB (brokercore.MaxProxyBodyBytes). - CONNECT-flood limiter runs before ParseProxyAuth so a bad-auth flood can't burn CPU. Docs (.env.example, docs/self-hosting/environment-variables.mdx, docs/reference/cli.mdx, README.md) and both agent skill files updated for 429/Retry-After semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mintlify · 2026-04-21T00:13:52Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
agent-vault	🟢 Ready	View Preview	Apr 21, 2026, 12:14 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

tokenBucketMap.size and semaphore.capacity had no callers; CI's unused check failed on them. The analogous slidingWindow.size stays (used in tests) and semaphore.holders stays (used by keyedSemaphore eviction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-04-21T00:35:04Z

+	ipDecision := s.rateLimit.Allow(ratelimit.TierAuth, "ip:"+ip)
+	emailDecision := s.rateLimit.Allow(ratelimit.TierAuth, "email:"+strings.ToLower(strings.TrimSpace(req.Email)))
+	if !ipDecision.Allow || !emailDecision.Allow {
+		d := ipDecision
+		if !emailDecision.Allow {
+			d = emailDecision
+		}
+		ratelimit.WriteDenial(w, d, "Too many login attempts, try again later")
 		return
 	}


🔴 The login handler at handle_auth.go:595-596 double-charges the TierAuth IP sliding window on every request — the route is already wrapped in the ipAuth middleware that records one slot, and then handleLogin records a second slot using the identical key, halving the effective per-IP login rate limit to 5 attempts instead of the configured 10. Additionally, both ipDecision and emailDecision are evaluated unconditionally before the denial check, so in the edge case where the middleware allows (bucket at max-1) but the in-handler IP check then finds the bucket full, the email Allow() call still executes and consumes an email-bucket slot. Fix: remove the in-handler IP Allow() call and rely solely on ipAuth middleware for IP enforcement; keep only the email check in-handler since the body must be parsed before the email is known.

Extended reasoning...

Double-charge of the IP sliding window (bug_004)

The login route is registered as s.requireInitialized(ipAuth(limitBody(s.handleLogin))) (server.go:571), where ipAuth = s.tier(ratelimit.TierAuth, s.ipKeyer()). The HandlerFunc middleware (middleware.go:32) calls r.Allow(TierAuth, "ip:"+clientIP(r)) and records one timestamp in the sliding window before dispatching to the handler. Inside handleLogin, line 595 independently calls s.rateLimit.Allow(ratelimit.TierAuth, "ip:"+ip) with the identical tier and key. Since both calls go to the same slidingWindow under the same mutex, each appends a distinct timestamp. The result: every POST /v1/auth/login request consumes two TierAuth slots for the client IP instead of one, halving the effective rate limit. With the default Auth.Max=10, legitimate users hit the 429 wall after only 5 login attempts.

Step-by-step proof of double-charge:

Request 1 from 1.2.3.4: ipAuth middleware calls Allow(TierAuth, "ip:1.2.3.4") → bucket has 0 entries < 10 → appends timestamp → bucket now has 1 entry → Allow → handler runs → handleLogin calls Allow(TierAuth, "ip:1.2.3.4") again → sees 1 entry < 10 → appends second timestamp → bucket now has 2 entries.

After 5 sequential requests: bucket has 10 entries (5 from middleware + 5 from handler).

Request 6: ipAuth middleware sees 10 entries ≥ 10 → returns Deny/429 → handler never runs.

Effective limit: 5 attempts, not 10.

This regression was introduced by this PR: the old code used loginIPLimiter (an independent slidingWindowLimiter not shared with any middleware), so there was no double-count. After migration to the shared Registry, the in-handler IP call was not removed when ipAuth was added to the route registration. The comment in middleware.go:84-86 explicitly references "login's email bucket" as the intended in-handler check, implying the email-only check was the design intent.

Unconditional email Allow() evaluation and the refutation of the extreme DoS scenario (bug_003)

The original bug report describes an attack where an attacker with an already-exhausted IP bucket continues sending requests and drains the victim's email bucket. One verifier correctly refutes this specific scenario: when the IP bucket is fully exhausted, the ipAuth middleware itself returns 429 before calling next, so handleLogin never executes and the email Allow() at line 596 is never reached. This gate is effective and the refutation is correct for the sequential, single-IP attack vector described.

However, the unconditional evaluation of both ipDecision and emailDecision before the combined denial check does create a real (if narrower) problem in the edge-case where the middleware allows a request that the handler's in-handler IP check then denies. This can occur when the sliding window has partially expired entries between requests, leaving the count at an odd value (e.g., 9): the middleware sees 9 < 10, records timestamp #10, allows the request; the handler's ipDecision then sees 10 ≥ 10 and denies — but emailDecision at line 596 still runs unconditionally and records an attempt in the victim's email bucket. The old short-circuit evaluation (if !loginIPLimiter.allow(ip) || !loginEmailLimiter.allow(req.Email)) prevented this by never evaluating the email check when the IP check failed.

Root cause and fix

The root cause for both issues is the redundant in-handler Allow(TierAuth, "ip:"+ip) call at handle_auth.go:595. Removing it and relying solely on the ipAuth middleware for IP enforcement (as was the design intent) fixes the double-charge and removes the unconditional evaluation risk. The email check at line 596 must remain in the handler because the request body must be decoded to obtain the email address — the middleware cannot do this before dispatching. The fix is minimal: delete the ipDecision variable and its Allow call, and simplify the denial check to only test emailDecision.

claude · 2026-04-21T00:35:04Z

+func (r *Registry) Reload(newCfg Config) {
+	old := r.cfg.Load()
+	r.cfg.Store(&newCfg)
+	r.mu.Lock()
+	defer r.mu.Unlock()
+	for t := Tier(0); t < tierCount; t++ {
+		if r.sliding[t] != nil {
+			r.sliding[t].reconfigure(newCfg.Tiers[t])
+		}
+		if r.buckets[t] != nil {
+			r.buckets[t].reconfigure(newCfg.Tiers[t])
+		}
+		if r.kSemaphores[t] != nil {
+			r.kSemaphores[t].reconfigure(newCfg.Tiers[t])
+		} else if t == TierProxy && newCfg.Tiers[t].Concurrency > 0 {
+			// Boot config had Concurrency=0 so no sem was allocated;
+			// a UI / env change re-enabled it — build it now.
+			r.kSemaphores[t] = newKeyedSemaphore(newCfg.Tiers[t], 2*time.Second)
+		}
+		if r.failures[t] != nil {
+			r.failures[t].reconfigure(newCfg.Tiers[t])
+		}
+	}
+	// Rebuild global sem if capacity changed.
+	if old == nil || newCfg.Tiers[TierGlobal].Concurrency != old.Tiers[TierGlobal].Concurrency {
+		r.globalSem = newSemaphore(newCfg.Tiers[TierGlobal].Concurrency, 500*time.Millisecond)
+	}
+	if r.globalRate != nil {
+		r.globalRate.reconfigure(TierConfig{
+			Rate:  newCfg.Tiers[TierGlobal].Rate,


🔴 When the server starts with AGENT_VAULT_RATELIMIT_PROFILE=off (or any config with cfg.Off=true), Registry.build() returns early at line 83-85, leaving all limiter fields (sliding[], buckets[], failures[], globalRate) as nil. If an admin later switches the profile to default via the UI, Reload() is called with Off=false, but it only reconfigure()s limiters that are already non-nil — it has no creation paths for sliding[t], buckets[t], failures[t], or globalRate. After the reload, r.cfg.Load().Off is false so the early-return guards in Allow(), AllowGlobalRPS(), and FailureCheck() are bypassed, but each method immediately finds nil and falls through to AllowOK(0,0), silently disabling all rate limiting.

Extended reasoning...

The bug in detail

When New(cfg) is called with cfg.Off=true, build() (registry.go lines 83-85) returns immediately without initializing any limiter. All of r.sliding[], r.buckets[], r.failures[], r.globalRate, r.globalSem, and r.kSemaphores[] remain at their zero values (nil).

The failing Off→On transition

When an instance owner changes the profile from off to default via the UI, handleUpdateRateLimitSetting calls applyRateLimitSettingToRegistry, which calls r.rateLimit.Reload(cfg) with Off=false. Inside Reload() (lines 50-67), the loop only calls reconfigure() on non-nil limiters. The only creation path that exists is for kSemaphores[TierProxy] (the explicit else-if branch at lines 59-62). No creation paths exist for sliding[t], buckets[t], failures[t], or globalRate. The globalSem is rebuilt unconditionally when capacity changes (old=0 != new=512), but that is the only other allocation that fires.

Why it silently fails open

After Reload(), r.cfg.Load().Off is false, so the early-return guards in Allow(), AllowGlobalRPS(), and FailureCheck() are bypassed. Each method then checks its respective field for nil and falls through to AllowOK(0,0) — the same result as Off mode, but without the early return. This means:

Allow(TierAuth, key) returns AllowOK — login floods, registration spams, MITM CONNECT floods unchecked

Allow(TierAuthed, key) returns AllowOK — authenticated CRUD abuse unchecked

AllowGlobalRPS() returns AllowOK (globalRate nil) — global RPS ceiling gone

FailureCheck(TierVerifyFailure, key) returns true (failures[t] nil) — verification-code brute force unchecked

Step-by-step proof

Server starts with AGENT_VAULT_RATELIMIT_PROFILE=off: LoadFromEnv() returns cfg.Off=true → ratelimit.New(cfg) → build(cfg) returns immediately → all per-tier limiter fields are nil.

Admin opens Manage Instance → Settings → Rate Limiting, changes dropdown to default, clicks Save.

handleUpdateRateLimitSetting → applyRateLimitSettingToRegistry → resolveRateLimitConfig returns a config with Off=false and populated tier configs.

s.rateLimit.Reload(newCfg): the loop finds r.sliding[TierAuth] == nil (skips), r.buckets[TierAuthed] == nil (skips), r.failures[TierVerifyFailure] == nil (skips); post-loop r.globalRate == nil (skips). Only globalSem and kSemaphores[TierProxy] are created.

Every subsequent call to r.Allow(TierAuth, "ip:1.2.3.4") bypasses the Off guard (Off is false), finds r.sliding[TierAuth] == nil, falls through all nil-checks, and returns AllowOK(0,0). Rate limiting is silently disabled system-wide except for proxy concurrency.

Fix

Apply the same creation-path pattern already used for kSemaphores[TierProxy] consistently to all other limiter types. Add else if branches in the loop to construct sliding[t], buckets[t], and failures[t] when nil and the new config calls for them (check the Algorithm field and non-zero parameters). Add a symmetric nil-check creation path for globalRate after the loop. Alternatively, detect the Off→On transition (old.Off && \!newCfg.Off) at the top of Reload() and delegate to r.build(newCfg) directly.

claude · 2026-04-21T00:35:04Z

+	if !ok {
+		b = &bucket{tokens: m.burst, lastRefill: now}
+		m.buckets[key] = b
+		m.evictIfNeededLocked(now)
+	} else {
+		elapsed := now.Sub(b.lastRefill).Seconds()
+		if elapsed > 0 {
+			b.tokens = math.Min(m.burst, b.tokens+elapsed*m.rate)
+			b.lastRefill = now
+		}
+	}
+
+	if b.tokens >= 1 {
+		b.tokens--
+		return AllowOK(int(b.tokens), int(m.burst))
+	}
+
+	// Wait for one token to refill.
+	need := 1 - b.tokens
+	wait := time.Duration(need / m.rate * float64(time.Second))
+	if wait < time.Second {
+		wait = time.Second
+	}
+	return Deny("rate", wait, int(m.burst))
+}
+
+// evictIfNeededLocked is called under m.mu. Prefers to drop buckets
+// whose tokens are near full (idle keys — zero fairness impact). If
+// that isn't enough (every bucket is hot), falls back to evicting the
+// oldest-by-lastRefill entry so the map stays bounded even under


🔴 In tokenBucketMap.allow() (internal/ratelimit/bucket.go:78-81), newly-created buckets are initialized with tokens=burst and added to the map, then evictIfNeededLocked is called immediately — but the eviction predicate (tokens >= burst-0.0001) is always true for the fresh bucket, so under map pressure the new entry is deleted from the map before the current request returns. The local *bucket pointer survives eviction (Go pointer semantics), so the first request is allowed and b.tokens-- runs on the orphaned struct; on the next request for the same key a brand-new full-burst bucket is created again, permanently resetting the per-key rate limit. The fix is to pass the newly-added key as a skipKey to evictIfNeededLocked, mirroring the identical protection already present in keyedSemaphore.evictLocked(skipKey string).

Extended reasoning...

Bug mechanics. In tokenBucketMap.allow() (bucket.go:78-81), when a key is absent from the map a new bucket struct is created with tokens = m.burst, inserted into m.buckets[key], and then evictIfNeededLocked is called. The eviction loop (bucket.go:113-116) removes every entry where b.tokens >= m.burst - 0.0001. Because the freshly-created bucket has exactly tokens == burst, it trivially satisfies this condition and is a candidate for deletion on the very iteration that follows its insertion.

Why the new entry is always the victim under adversarial pressure. When the map is at capacity and every existing entry is hot (tokens near zero from heavy use), the new entry is the only one that satisfies the near-full predicate. Go's map iteration order is randomized, so the new key may be reached on the first step; as soon as it is deleted the map shrinks back to maxKeys and eviction stops. The fallback oldest-by-lastRefill loop is never reached — the new entry was just inserted with lastRefill = now and would be the newest, not oldest, anyway.

Why the current request is still allowed (the orphan pointer problem). Go's garbage collector does not free the bucket struct because the local variable b *bucket inside allow() still holds a reference. After evictIfNeededLocked returns, execution continues: b.tokens >= 1 is true (tokens == burst), b.tokens-- executes on the now-untracked struct, and AllowOK is returned to the caller. The request is allowed, but the decrement has no lasting effect.

Why the next request sees a fresh full-burst bucket. On the next call to allow(key), m.buckets[key] returns !ok because the key was evicted. A new bucket with tokens = burst is created and the entire cycle repeats. Under adversarial distinct-key flooding that keeps the map saturated with hot keys, every request — regardless of how frequently the same key appears — receives a fresh full-burst allocation, completely defeating per-key rate limiting on TierProxy ((actor, vault) keys) and TierAuthed (actor keys).

Design inconsistency confirms the fix. keyedSemaphore.evictLocked(skipKey string) (semaphore.go:130-158) explicitly skips the newly-added key during eviction to prevent exactly this scenario. tokenBucketMap.evictIfNeededLocked lacks the equivalent skipKey parameter — this is a clear omission. The fix: add skipKey string to evictIfNeededLocked, skip that key in the near-full loop, and pass key from allow() to it.

Step-by-step proof (maxKeys=3, existing hot keys A/B/C with tokens≈0, attacker sends key D each request):

allow("D") called; m.buckets["D"] → not found.

b = &bucket{tokens: burst, lastRefill: now} allocated on heap.

m.buckets["D"] = b → map size = 4, exceeds maxKeys=3.

evictIfNeededLocked(now) iterates map.

A.tokens≈0, B.tokens≈0, C.tokens≈0 — none pass >= burst-0.0001.

D.tokens==burst — passes predicate; delete(m.buckets, "D") called.

len==3 <= maxKeys → eviction returns.

Back in allow: b points to orphaned struct; b.tokens--; AllowOK returned. Request allowed.

Same attacker sends key D again: m.buckets["D"] → not found → fresh burst bucket → same eviction cycle → allowed again.

Repeat indefinitely: attacker using distinct (or the same) key bypasses per-key rate limiting entirely while hot keys fill the map.

mintlify Bot deployed to staging - docs April 21, 2026 00:14 View deployment

dangtony98 merged commit a73b56e into main Apr 21, 2026
4 checks passed

dangtony98 deleted the feat/tiered-rate-limiting branch April 21, 2026 00:33

claude Bot reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add tiered rate limiting with owner-tunable settings#94

feat: add tiered rate limiting with owner-tunable settings#94
dangtony98 merged 2 commits intomainfrom
feat/tiered-rate-limiting

dangtony98 commented Apr 21, 2026

Uh oh!

mintlify Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

claude Bot Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dangtony98 commented Apr 21, 2026

Summary

Tiers at a glance

Adjacent hardening bundled

Docs

Test plan

Uh oh!

mintlify Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented Apr 21, 2026 •

edited

Loading