Skip to content

feat(worker): cooperative finisher merging from Redis processed set#730

Open
thomasrockhu-codecov wants to merge 2 commits intotom/fix-finisher-blocking-timeoutfrom
tom/cooperative-finisher-merging
Open

feat(worker): cooperative finisher merging from Redis processed set#730
thomasrockhu-codecov wants to merge 2 commits intotom/fix-finisher-blocking-timeoutfrom
tom/cooperative-finisher-merging

Conversation

@thomasrockhu-codecov
Copy link
Contributor

@thomasrockhu-codecov thomasrockhu-codecov commented Feb 27, 2026

Summary

Stacked on #729.

  • Instead of each finisher task only merging its own chord's 1-2 uploads, the lock holder now queries the full Redis "processed" set and merges ALL pending uploads in one lock acquisition.
  • Adds an early exit path: when a finisher starts and finds the Redis "processed" set is empty (because a previous cooperative finisher already merged everything), it returns immediately without even trying the lock.
  • Reduces base_retry_countdown from 200s to 30s so retried finishers check back sooner, discover there's nothing to merge, and exit quickly.

Before vs After

For a commit with 100+ parallel CI uploads (e.g., stacks-network with 167 finisher tasks):

Metric Before After
Finishers doing real work ~100 (each merges 1-2 uploads) ~1-3 (each merges everything pending)
Lock acquisitions ~100 ~1-3
Worker-minutes burned ~100+ ~5
Time to notification Hours (serialized) Minutes

Key changes

  • ProcessingState.get_all_processed_uploads(): new method using smembers (no MERGE_BATCH_SIZE cap)
  • _process_reports_with_lock: inside the lock, calls _reconstruct_processing_results to get ALL pending uploads from Redis instead of using the chord's processing_results
  • run_impl: early exit when ProcessingState.get_upload_numbers() shows nothing pending
  • Both LockManager instances use base_retry_countdown=30 for faster retry cycling

Test plan

  • Added test_get_all_processed_uploads_returns_full_set for the new ProcessingState method
  • Added test_early_exit_when_redis_processed_set_empty verifying early exit path
  • Added test_cooperative_merge_uses_all_redis_uploads verifying lock holder merges all uploads
  • Added test_cooperative_merge_exits_when_lock_holder_already_merged verifying graceful exit on race
  • CI passes

Made with Cursor


Note

Medium Risk
Changes core upload-finishing/merge concurrency behavior (lock usage + Redis-driven selection), which could affect report completeness or notification timing if Redis state is inconsistent or races occur. Added coverage reduces risk but this is still a production-critical workflow change.

Overview
Cooperative finisher merging: the lock-holder finisher now reconstructs merge inputs from the full Redis processed set (via new ProcessingState.get_all_processed_uploads() using smembers) and merges/cleans up all pending uploads at once, instead of only its chord’s subset.

Less redundant work: run_impl adds an early-exit path when Redis shows both processing and processed are empty (another finisher already merged), and both lock acquisitions now use a shorter base_retry_countdown=30. Tests were updated/added to cover the full-set fetch, early-exit, and cooperative-merge race scenarios.

Written by Cursor Bugbot for commit 977b1fa. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Instead of each finisher task only merging its own chord's 1-2 uploads,
the lock holder now queries the full Redis "processed" set and merges
ALL pending uploads in one lock acquisition. This means one finisher
does the work of many, and redundant finishers exit immediately.

Changes:
- Add get_all_processed_uploads() to ProcessingState (unbounded smembers)
- _process_reports_with_lock: after acquiring lock, reconstruct
  processing_results from Redis instead of using chord arguments
- Early exit in run_impl when Redis processed set is empty (another
  finisher already merged our uploads)
- Reduce base_retry_countdown to 30s (from 200s default) so retried
  finishers discover "nothing to merge" sooner and exit faster

Made-with: Cursor
- Skip the Redis-based early exit when processing_results were
  reconstructed from DB (Redis TTL may have expired while uploads
  still need merging).
- Add _setup_mock_redis_for_processing helper so tests exercising
  the full merge flow correctly bypass the cooperative early exit.
- Update _reconstruct_processing_results callers to use
  get_all_processed_uploads instead of get_uploads_for_merging.

Made-with: Cursor
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tom/fix-finisher-blocking-timeout branch from 716ae09 to 3e575fe Compare March 5, 2026 22:37
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tom/cooperative-finisher-merging branch from 67bc08f to 977b1fa Compare March 5, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant