Skip to content

fix: make stamps removal sync and data race in psh chunk tests#5446

Open
sbackend123 wants to merge 3 commits intomasterfrom
fix/remove-stamp-items
Open

fix: make stamps removal sync and data race in psh chunk tests#5446
sbackend123 wants to merge 3 commits intomasterfrom
fix/remove-stamp-items

Conversation

@sbackend123
Copy link
Copy Markdown
Contributor

@sbackend123 sbackend123 commented Apr 26, 2026

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

  1. Collect all matching *StampItem in a slice during the iteration, then Delete only after the iterator returns.

  2. Separate problem: fix data race in TestPushChunkToNextClosest, TestMultiplePushesAsForwarder TestPushChunkToClosestErrorAttemptRetry tests.

The record.bytes() method in streamtest returned a direct reference to the internal slice without holding the mutex, causing a data race when tests read stream records while background goroutines were still writing to them. Fixed by acquiring the lock and returning a copy of the slice.

  1. Fix gsoc test timeout (2s too small for CI) and fix TestSubscribe

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Context:
removeStampItems deletes all StampItem rows for a given batch by prefix-iterating the stamper store and calling store.Delete for each entry. For a time this used a helper goroutine and an unbuffered channel: the iterator sent each *StampItem to the goroutine, which called Delete outside the callback.

That pattern was carried over from an earlier fire-and-forget SetExpired implementation (node startup) where the goal was not to block; it was not a good fit for the later synchronous API, HandleStampExpiry, where callers and tests expect all matching rows to be gone when the function returns.

Bonus problem:
Previous implementation may lead to deadlock.

Many stores protect Iterate and Delete with a lock, often an RWMutex.
With the goroutine + unbuffered channel pattern and more than one StampItem for the same batch prefix we may have scenario:

  • The iterator’s callback does the first send on the channel. The consumer receives and calls Delete. Delete blocks: it must wait for the write lock, but the iterator still holds the read lock for the walk.
  • The callback returns; the walk continues to the next key. The second send blocks: the unbuffered channel is empty but the consumer is stuck inside the first Delete (not reading the next value).
  • The walk cannot complete while the callback is blocked on send. The read lock is not released until the walk finishes, so the first Delete never acquires the write lock.

Related Issue (Optional)

Screenshots (if appropriate):

AI Disclosure

  • This PR contains code that has been generated by an LLM.
  • I have reviewed the AI generated code thoroughly.
  • I possess the technical expertise to responsibly review the code generated in this PR.

@sbackend123 sbackend123 marked this pull request as ready for review April 26, 2026 16:30
@sbackend123 sbackend123 changed the title fix: make stamps removal sync fix: make stamps removal sync and data race in psh chunk tests Apr 27, 2026
@sbackend123 sbackend123 force-pushed the fix/remove-stamp-items branch from c21902b to 15e6607 Compare April 27, 2026 12:16
Comment thread pkg/postage/service.go
}

for _, item := range toDelete {
_ = ps.store.Delete(item)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the error is swallowed, while other branches return an error. i would at least check whether an error occurred or not, and return an error (i'm not a big fan of errors.Join as a wall of errors is usually not very helpful - having perhaps just the first error returned is fine while continuing the execute the loop)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants