feat: add tiered rate limiting with owner-tunable settings#94
feat: add tiered rate limiting with owner-tunable settings#94dangtony98 merged 2 commits intomainfrom
Conversation
Introduce internal/ratelimit — a tiered in-memory limiter with four
public tiers plus an internal failure counter:
- AUTH: unauthenticated surface (login, register, forgot/reset,
verify, OAuth, invite/approval-token redemption, MITM CONNECT).
Sliding-window keyed on client IP; login additionally keys on the
normalized email (reject if either bucket is exhausted).
- PROXY: /proxy/* and MITM forward, keyed on (actor, vault). Token
bucket for smooth traffic + per-scope concurrency semaphore so
slow upstream calls can't pile up on one scope.
- AUTHED: everything behind requireAuth. Per-actor token bucket.
- GLOBAL: server-wide RPS ceiling + total in-flight cap applied at
the outer mux.
- VERIFY_FAIL (internal): failure counter for verification and
password-reset codes; not surfaced in the UI.
Both ingress paths share one Registry so switching ingress can't
bypass PROXY limits. The registry reloads on settings writes, no
restart required.
Operator controls:
- AGENT_VAULT_RATELIMIT_PROFILE={default,strict,loose,off}
- AGENT_VAULT_RATELIMIT_LOCK=true pins settings; UI becomes read-only
- AGENT_VAULT_RATELIMIT_<TIER>_<KNOB> per-tier env overrides
- Manage Instance → Settings → Rate Limiting pane: profile dropdown
+ advanced per-tier overrides with debounced live preview and
per-tier source labels (env / override / default)
Adjacent hardening bundled because they enable the rate-limit
design:
- MITM inner http.Server grew ReadTimeout/WriteTimeout/IdleTimeout
(slow-loris defense — without these a drip-feed tunnel could pin
a PROXY concurrency slot).
- /proxy and MITM forward cap forwarded bodies at 64 MB
(brokercore.MaxProxyBodyBytes).
- CONNECT-flood limiter runs before ParseProxyAuth so a bad-auth
flood can't burn CPU.
Docs (.env.example, docs/self-hosting/environment-variables.mdx,
docs/reference/cli.mdx, README.md) and both agent skill files
updated for 429/Retry-After semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
tokenBucketMap.size and semaphore.capacity had no callers; CI's unused check failed on them. The analogous slidingWindow.size stays (used in tests) and semaphore.holders stays (used by keyedSemaphore eviction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| ipDecision := s.rateLimit.Allow(ratelimit.TierAuth, "ip:"+ip) | ||
| emailDecision := s.rateLimit.Allow(ratelimit.TierAuth, "email:"+strings.ToLower(strings.TrimSpace(req.Email))) | ||
| if !ipDecision.Allow || !emailDecision.Allow { | ||
| d := ipDecision | ||
| if !emailDecision.Allow { | ||
| d = emailDecision | ||
| } | ||
| ratelimit.WriteDenial(w, d, "Too many login attempts, try again later") | ||
| return | ||
| } |
There was a problem hiding this comment.
🔴 The login handler at handle_auth.go:595-596 double-charges the TierAuth IP sliding window on every request — the route is already wrapped in the ipAuth middleware that records one slot, and then handleLogin records a second slot using the identical key, halving the effective per-IP login rate limit to 5 attempts instead of the configured 10. Additionally, both ipDecision and emailDecision are evaluated unconditionally before the denial check, so in the edge case where the middleware allows (bucket at max-1) but the in-handler IP check then finds the bucket full, the email Allow() call still executes and consumes an email-bucket slot. Fix: remove the in-handler IP Allow() call and rely solely on ipAuth middleware for IP enforcement; keep only the email check in-handler since the body must be parsed before the email is known.
Extended reasoning...
Double-charge of the IP sliding window (bug_004)
The login route is registered as s.requireInitialized(ipAuth(limitBody(s.handleLogin))) (server.go:571), where ipAuth = s.tier(ratelimit.TierAuth, s.ipKeyer()). The HandlerFunc middleware (middleware.go:32) calls r.Allow(TierAuth, "ip:"+clientIP(r)) and records one timestamp in the sliding window before dispatching to the handler. Inside handleLogin, line 595 independently calls s.rateLimit.Allow(ratelimit.TierAuth, "ip:"+ip) with the identical tier and key. Since both calls go to the same slidingWindow under the same mutex, each appends a distinct timestamp. The result: every POST /v1/auth/login request consumes two TierAuth slots for the client IP instead of one, halving the effective rate limit. With the default Auth.Max=10, legitimate users hit the 429 wall after only 5 login attempts.
Step-by-step proof of double-charge:
- Request 1 from 1.2.3.4:
ipAuthmiddleware callsAllow(TierAuth, "ip:1.2.3.4")→ bucket has 0 entries < 10 → appends timestamp → bucket now has 1 entry → Allow → handler runs →handleLogincallsAllow(TierAuth, "ip:1.2.3.4")again → sees 1 entry < 10 → appends second timestamp → bucket now has 2 entries. - After 5 sequential requests: bucket has 10 entries (5 from middleware + 5 from handler).
- Request 6:
ipAuthmiddleware sees 10 entries ≥ 10 → returns Deny/429 → handler never runs. - Effective limit: 5 attempts, not 10.
This regression was introduced by this PR: the old code used loginIPLimiter (an independent slidingWindowLimiter not shared with any middleware), so there was no double-count. After migration to the shared Registry, the in-handler IP call was not removed when ipAuth was added to the route registration. The comment in middleware.go:84-86 explicitly references "login's email bucket" as the intended in-handler check, implying the email-only check was the design intent.
Unconditional email Allow() evaluation and the refutation of the extreme DoS scenario (bug_003)
The original bug report describes an attack where an attacker with an already-exhausted IP bucket continues sending requests and drains the victim's email bucket. One verifier correctly refutes this specific scenario: when the IP bucket is fully exhausted, the ipAuth middleware itself returns 429 before calling next, so handleLogin never executes and the email Allow() at line 596 is never reached. This gate is effective and the refutation is correct for the sequential, single-IP attack vector described.
However, the unconditional evaluation of both ipDecision and emailDecision before the combined denial check does create a real (if narrower) problem in the edge-case where the middleware allows a request that the handler's in-handler IP check then denies. This can occur when the sliding window has partially expired entries between requests, leaving the count at an odd value (e.g., 9): the middleware sees 9 < 10, records timestamp #10, allows the request; the handler's ipDecision then sees 10 ≥ 10 and denies — but emailDecision at line 596 still runs unconditionally and records an attempt in the victim's email bucket. The old short-circuit evaluation (if !loginIPLimiter.allow(ip) || !loginEmailLimiter.allow(req.Email)) prevented this by never evaluating the email check when the IP check failed.
Root cause and fix
The root cause for both issues is the redundant in-handler Allow(TierAuth, "ip:"+ip) call at handle_auth.go:595. Removing it and relying solely on the ipAuth middleware for IP enforcement (as was the design intent) fixes the double-charge and removes the unconditional evaluation risk. The email check at line 596 must remain in the handler because the request body must be decoded to obtain the email address — the middleware cannot do this before dispatching. The fix is minimal: delete the ipDecision variable and its Allow call, and simplify the denial check to only test emailDecision.
| func (r *Registry) Reload(newCfg Config) { | ||
| old := r.cfg.Load() | ||
| r.cfg.Store(&newCfg) | ||
| r.mu.Lock() | ||
| defer r.mu.Unlock() | ||
| for t := Tier(0); t < tierCount; t++ { | ||
| if r.sliding[t] != nil { | ||
| r.sliding[t].reconfigure(newCfg.Tiers[t]) | ||
| } | ||
| if r.buckets[t] != nil { | ||
| r.buckets[t].reconfigure(newCfg.Tiers[t]) | ||
| } | ||
| if r.kSemaphores[t] != nil { | ||
| r.kSemaphores[t].reconfigure(newCfg.Tiers[t]) | ||
| } else if t == TierProxy && newCfg.Tiers[t].Concurrency > 0 { | ||
| // Boot config had Concurrency=0 so no sem was allocated; | ||
| // a UI / env change re-enabled it — build it now. | ||
| r.kSemaphores[t] = newKeyedSemaphore(newCfg.Tiers[t], 2*time.Second) | ||
| } | ||
| if r.failures[t] != nil { | ||
| r.failures[t].reconfigure(newCfg.Tiers[t]) | ||
| } | ||
| } | ||
| // Rebuild global sem if capacity changed. | ||
| if old == nil || newCfg.Tiers[TierGlobal].Concurrency != old.Tiers[TierGlobal].Concurrency { | ||
| r.globalSem = newSemaphore(newCfg.Tiers[TierGlobal].Concurrency, 500*time.Millisecond) | ||
| } | ||
| if r.globalRate != nil { | ||
| r.globalRate.reconfigure(TierConfig{ | ||
| Rate: newCfg.Tiers[TierGlobal].Rate, |
There was a problem hiding this comment.
🔴 When the server starts with AGENT_VAULT_RATELIMIT_PROFILE=off (or any config with cfg.Off=true), Registry.build() returns early at line 83-85, leaving all limiter fields (sliding[], buckets[], failures[], globalRate) as nil. If an admin later switches the profile to default via the UI, Reload() is called with Off=false, but it only reconfigure()s limiters that are already non-nil — it has no creation paths for sliding[t], buckets[t], failures[t], or globalRate. After the reload, r.cfg.Load().Off is false so the early-return guards in Allow(), AllowGlobalRPS(), and FailureCheck() are bypassed, but each method immediately finds nil and falls through to AllowOK(0,0), silently disabling all rate limiting.
Extended reasoning...
The bug in detail
When New(cfg) is called with cfg.Off=true, build() (registry.go lines 83-85) returns immediately without initializing any limiter. All of r.sliding[], r.buckets[], r.failures[], r.globalRate, r.globalSem, and r.kSemaphores[] remain at their zero values (nil).
The failing Off→On transition
When an instance owner changes the profile from off to default via the UI, handleUpdateRateLimitSetting calls applyRateLimitSettingToRegistry, which calls r.rateLimit.Reload(cfg) with Off=false. Inside Reload() (lines 50-67), the loop only calls reconfigure() on non-nil limiters. The only creation path that exists is for kSemaphores[TierProxy] (the explicit else-if branch at lines 59-62). No creation paths exist for sliding[t], buckets[t], failures[t], or globalRate. The globalSem is rebuilt unconditionally when capacity changes (old=0 != new=512), but that is the only other allocation that fires.
Why it silently fails open
After Reload(), r.cfg.Load().Off is false, so the early-return guards in Allow(), AllowGlobalRPS(), and FailureCheck() are bypassed. Each method then checks its respective field for nil and falls through to AllowOK(0,0) — the same result as Off mode, but without the early return. This means:
Allow(TierAuth, key)returns AllowOK — login floods, registration spams, MITM CONNECT floods uncheckedAllow(TierAuthed, key)returns AllowOK — authenticated CRUD abuse uncheckedAllowGlobalRPS()returns AllowOK (globalRatenil) — global RPS ceiling goneFailureCheck(TierVerifyFailure, key)returns true (failures[t]nil) — verification-code brute force unchecked
Step-by-step proof
- Server starts with
AGENT_VAULT_RATELIMIT_PROFILE=off:LoadFromEnv()returnscfg.Off=true→ratelimit.New(cfg)→build(cfg)returns immediately → all per-tier limiter fields are nil. - Admin opens Manage Instance → Settings → Rate Limiting, changes dropdown to
default, clicks Save. handleUpdateRateLimitSetting→applyRateLimitSettingToRegistry→resolveRateLimitConfigreturns a config withOff=falseand populated tier configs.s.rateLimit.Reload(newCfg): the loop findsr.sliding[TierAuth] == nil(skips),r.buckets[TierAuthed] == nil(skips),r.failures[TierVerifyFailure] == nil(skips); post-loopr.globalRate == nil(skips). OnlyglobalSemandkSemaphores[TierProxy]are created.- Every subsequent call to
r.Allow(TierAuth, "ip:1.2.3.4")bypasses the Off guard (Off is false), findsr.sliding[TierAuth] == nil, falls through all nil-checks, and returnsAllowOK(0,0). Rate limiting is silently disabled system-wide except for proxy concurrency.
Fix
Apply the same creation-path pattern already used for kSemaphores[TierProxy] consistently to all other limiter types. Add else if branches in the loop to construct sliding[t], buckets[t], and failures[t] when nil and the new config calls for them (check the Algorithm field and non-zero parameters). Add a symmetric nil-check creation path for globalRate after the loop. Alternatively, detect the Off→On transition (old.Off && \!newCfg.Off) at the top of Reload() and delegate to r.build(newCfg) directly.
| if !ok { | ||
| b = &bucket{tokens: m.burst, lastRefill: now} | ||
| m.buckets[key] = b | ||
| m.evictIfNeededLocked(now) | ||
| } else { | ||
| elapsed := now.Sub(b.lastRefill).Seconds() | ||
| if elapsed > 0 { | ||
| b.tokens = math.Min(m.burst, b.tokens+elapsed*m.rate) | ||
| b.lastRefill = now | ||
| } | ||
| } | ||
|
|
||
| if b.tokens >= 1 { | ||
| b.tokens-- | ||
| return AllowOK(int(b.tokens), int(m.burst)) | ||
| } | ||
|
|
||
| // Wait for one token to refill. | ||
| need := 1 - b.tokens | ||
| wait := time.Duration(need / m.rate * float64(time.Second)) | ||
| if wait < time.Second { | ||
| wait = time.Second | ||
| } | ||
| return Deny("rate", wait, int(m.burst)) | ||
| } | ||
|
|
||
| // evictIfNeededLocked is called under m.mu. Prefers to drop buckets | ||
| // whose tokens are near full (idle keys — zero fairness impact). If | ||
| // that isn't enough (every bucket is hot), falls back to evicting the | ||
| // oldest-by-lastRefill entry so the map stays bounded even under |
There was a problem hiding this comment.
🔴 In tokenBucketMap.allow() (internal/ratelimit/bucket.go:78-81), newly-created buckets are initialized with tokens=burst and added to the map, then evictIfNeededLocked is called immediately — but the eviction predicate (tokens >= burst-0.0001) is always true for the fresh bucket, so under map pressure the new entry is deleted from the map before the current request returns. The local *bucket pointer survives eviction (Go pointer semantics), so the first request is allowed and b.tokens-- runs on the orphaned struct; on the next request for the same key a brand-new full-burst bucket is created again, permanently resetting the per-key rate limit. The fix is to pass the newly-added key as a skipKey to evictIfNeededLocked, mirroring the identical protection already present in keyedSemaphore.evictLocked(skipKey string).
Extended reasoning...
Bug mechanics. In tokenBucketMap.allow() (bucket.go:78-81), when a key is absent from the map a new bucket struct is created with tokens = m.burst, inserted into m.buckets[key], and then evictIfNeededLocked is called. The eviction loop (bucket.go:113-116) removes every entry where b.tokens >= m.burst - 0.0001. Because the freshly-created bucket has exactly tokens == burst, it trivially satisfies this condition and is a candidate for deletion on the very iteration that follows its insertion.
Why the new entry is always the victim under adversarial pressure. When the map is at capacity and every existing entry is hot (tokens near zero from heavy use), the new entry is the only one that satisfies the near-full predicate. Go's map iteration order is randomized, so the new key may be reached on the first step; as soon as it is deleted the map shrinks back to maxKeys and eviction stops. The fallback oldest-by-lastRefill loop is never reached — the new entry was just inserted with lastRefill = now and would be the newest, not oldest, anyway.
Why the current request is still allowed (the orphan pointer problem). Go's garbage collector does not free the bucket struct because the local variable b *bucket inside allow() still holds a reference. After evictIfNeededLocked returns, execution continues: b.tokens >= 1 is true (tokens == burst), b.tokens-- executes on the now-untracked struct, and AllowOK is returned to the caller. The request is allowed, but the decrement has no lasting effect.
Why the next request sees a fresh full-burst bucket. On the next call to allow(key), m.buckets[key] returns !ok because the key was evicted. A new bucket with tokens = burst is created and the entire cycle repeats. Under adversarial distinct-key flooding that keeps the map saturated with hot keys, every request — regardless of how frequently the same key appears — receives a fresh full-burst allocation, completely defeating per-key rate limiting on TierProxy ((actor, vault) keys) and TierAuthed (actor keys).
Design inconsistency confirms the fix. keyedSemaphore.evictLocked(skipKey string) (semaphore.go:130-158) explicitly skips the newly-added key during eviction to prevent exactly this scenario. tokenBucketMap.evictIfNeededLocked lacks the equivalent skipKey parameter — this is a clear omission. The fix: add skipKey string to evictIfNeededLocked, skip that key in the near-full loop, and pass key from allow() to it.
Step-by-step proof (maxKeys=3, existing hot keys A/B/C with tokens≈0, attacker sends key D each request):
allow("D")called;m.buckets["D"]→ not found.b = &bucket{tokens: burst, lastRefill: now}allocated on heap.m.buckets["D"] = b→ map size = 4, exceeds maxKeys=3.evictIfNeededLocked(now)iterates map.- A.tokens≈0, B.tokens≈0, C.tokens≈0 — none pass
>= burst-0.0001. - D.tokens==burst — passes predicate;
delete(m.buckets, "D")called. - len==3 <= maxKeys → eviction returns.
- Back in
allow:bpoints to orphaned struct;b.tokens--;AllowOKreturned. Request allowed. - Same attacker sends key D again:
m.buckets["D"]→ not found → fresh burst bucket → same eviction cycle → allowed again. - Repeat indefinitely: attacker using distinct (or the same) key bypasses per-key rate limiting entirely while hot keys fill the map.
Summary
internal/ratelimit— a tiered in-memory limiter with four public tiers (AUTH, PROXY, AUTHED, GLOBAL) plus an internalVERIFY_FAILfailure counter. Replaces the scattered per-handler sliding-window limiters inhandle_auth.go/handle_oauth.go/handle_users.go./proxy/*and the MITM forward handler — share oneRegistryso an agent can't bypass PROXY limits by switching ingress. Enforcement is unified inRegistry.EnforceProxy(per-scope token bucket + per-scope concurrency semaphore).default/strict/loose/off) + advanced per-tier overrides with a debounced server-side preview and per-tier source labels (env/override/default). Registry reloads without a restart.AGENT_VAULT_RATELIMIT_LOCK=true(UI becomes read-only) and/or per-tierAGENT_VAULT_RATELIMIT_<TIER>_<KNOB>overrides. Env always beats UI.Tiers at a glance
/proxy/*+ MITM forward(actor, vault)— token bucket + concurrency semaphorerequireAuthAdjacent hardening bundled
http.ServergainedReadTimeout/WriteTimeout/IdleTimeout— without these a drip-feed tunnel could pin a PROXY concurrency slot./proxy/*and MITM forward now cap forwarded bodies at 64 MB (brokercore.MaxProxyBodyBytes).ParseProxyAuthso bad-auth floods can't burn CPU on session lookups.Docs
.env.example,docs/self-hosting/environment-variables.mdx,docs/reference/cli.mdx,README.md, and both agent skill files (cmd/skill_cli.md,cmd/skill_http.md) updated for 429 /Retry-Aftersemantics.Test plan
go test -race ./...greengo vet ./...cleannpx tsc --noEmitclean inweb/AGENT_VAULT_RATELIMIT_AUTH_MAX=3, hit/v1/auth/register5× → first 3201, 4th+5th429withRetry-After,X-RateLimit-Limit: 3,X-RateLimit-Remaining: 0.Manage Instance → Settings → Rate Limitingin a browser: pickstrict, confirm the advanced-table numbers shrink live; set a per-tier override, confirm the row'ssourcelabel flips tooverride; save + refresh and confirm the persisted value survives; setAGENT_VAULT_RATELIMIT_LOCK=true, restart, confirm the pane renders read-only with the "Pinned by env" banner./proxyand MITM ingresses concurrently with a shared token to confirm the PROXY concurrency semaphore applies across both paths.🤖 Generated with Claude Code