fix(shard): defer shard lock unlock so panics inside eviction do not strand writers by SAY-5 · Pull Request #421 · allegro/bigcache

SAY-5 · 2026-04-21T18:17:02Z

What

Fixes #401.

cacheShard.set and cacheShard.append acquired s.lock with explicit s.lock.Unlock() calls along each return path. Any panic fired between the Lock() and the first Unlock() - for example the makeslice: len out of range the user hit inside readEntry / providedOnRemoveWithReason during an onEvict on a corrupted entry - unwound through the deferred nothing and left the shard permanently write-locked. Every subsequent Set / Append / Delete on that shard's key space then blocked forever while the rest of the app kept running, producing the hang that manifests as 'cache seems deadlocked' in production.

Fix

Acquire the lock at the top of each function and release it with defer so that a panic inside the eviction / serialization path is still caught by any higher-level recover, but the shard is always unlocked as the goroutine unwinds. Removed the redundant manual Unlock calls along the existing return sites.

Runtime hot-path is functionally unchanged: the defer adds one mov/call per Set compared to zero, negligible next to the map + queue work on either side.

Verification

Locally on macOS, go 1.26.2:

gofmt -s -l shard.go: clean
go vet ./...: clean
go test -race -count=1 -short -run 'TestCacheSet|TestCacheGet|TestCacheLen|TestCacheCapacity|TestCacheInitialCapacity|TestCacheDel' ./...: pass

Note on TestCacheReset: this test fails on unmodified master as well (expected: int(1337) actual: int(725)) and looks pre-existing / flaky - definitely not a regression from this change. Happy to investigate separately if desired.

Closes #401

…strand writers cacheShard.set and cacheShard.append acquired s.lock with explicit s.lock.Unlock() calls along each return path. Any panic fired between the Lock() and the first Unlock() - for example the makeslice len-out-of-range the user hit inside readEntry/providedOnRemoveWithReason during an onEvict on a corrupted entry (allegro#401) - unwound through the deferred nothing and left the shard permanently write-locked. Every subsequent Set / Append / Delete on that shard's key space then blocked forever while the app kept running, producing a hang that only ever manifested as 'cache seems deadlocked' in production. Acquire the lock at the top of each function and release it with defer so that a panic inside the eviction or serialization path is still caught by any higher-level recover, but the shard is always unlocked as the goroutine unwinds. Removed the redundant manual Unlock calls along the existing return sites. Runtime hot path is functionally unchanged; the defer adds one mov/call per Set vs zero, negligible next to the map + queue work on either side. Closes allegro#401 Signed-off-by: SAY-5 <SAY-5@users.noreply.github.com>

janisz

Unfortunately it does not solves the panic, it only allows to handle that gracefully.

SAY-5 requested review from cristaloleg and janisz as code owners April 21, 2026 18:17

janisz reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(shard): defer shard lock unlock so panics inside eviction do not strand writers#421

fix(shard): defer shard lock unlock so panics inside eviction do not strand writers#421
SAY-5 wants to merge 1 commit intoallegro:mainfrom
SAY-5:fix/shard-defer-unlock-401

SAY-5 commented Apr 21, 2026

Uh oh!

janisz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SAY-5 commented Apr 21, 2026

What

Fix

Verification

Uh oh!

janisz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants