feat(worker): DB-backed ProcessingState (replaces Redis) by thomasrockhu-codecov · Pull Request #749 · codecov/umbrella

thomasrockhu-codecov · 2026-03-10T16:55:41Z

Summary

Replaces Redis-based upload processing state tracking with the database as the single source of truth. ProcessingState now reads and writes Upload.state_id directly, eliminating Redis sets for processing/processed/merged state.

Changes by layer

UploadState.MERGED enum — new state (db_id=6) representing uploads fully merged into the master report
ProcessingState DB-backed queries — get_upload_numbers and get_uploads_for_merging use Upload.state_id counts/filters, scoped to coverage reports (report_type IS NULL OR report_type = 'coverage')
Processor dual-write — process_upload passes db_session to ProcessingState. Only state_id is set to PROCESSED (not the legacy state string) to avoid tripping the finisher's idempotency check
Finisher DB reads — finisher passes db_session, reads merge candidates from DB. mark_uploads_as_merged includes a PROCESSED state guard
MERGED lifecycle — update_uploads sets state="merged" and state_id=MERGED after successful merge
Remove dual-write — db_session is now required on ProcessingState. All Redis operations removed. mark_uploads_as_processing is a no-op (uploads start as UPLOADED). Safety-net finisher trigger and clear_in_progress_uploads are resilient to transaction failures

Key design decisions

state vs state_id separation: The processor only sets state_id. The legacy state string is set after merging, preserving the finisher's idempotency check
Coverage-only scope: DB queries filter by report_type IS NULL OR report_type = 'coverage' to avoid interfering with bundle analysis / test results pipelines
Best-effort cleanup: clear_in_progress_uploads and the safety-net finisher trigger are wrapped in try/except since they run in error-recovery paths where the transaction may be aborted

Test plan

Unit tests for all ProcessingState DB paths
Unit tests for upload task schedule_task signature
Unit tests for processor process_upload
Integration tests (test_full_upload, test_full_carryforward)
Finisher test updated for MERGED state assertions

Note

Medium Risk
Replaces Redis-based processing/merging state with database queries and state transitions, affecting core upload/merge orchestration and finisher triggering. Risk is mitigated by scoping DB queries to coverage reports and adding extensive unit tests, but regressions could block merges or notifications if state transitions are wrong.

Overview
Upload processing state is now DB-backed instead of Redis-backed. ProcessingState now requires a SQLAlchemy db_session and derives processing/processed counts and merge candidates from Upload.state_id (coverage-only: report_type IS NULL OR coverage).

The processor and finisher were updated to pass db_session and to transition uploads via state_id (UPLOADED → PROCESSED → MERGED), with mark_uploads_as_processing becoming a no-op and clear_in_progress_uploads best-effort marking stuck UPLOADED uploads as ERROR. UploadTask.schedule_task now takes db_session so coverage scheduling can initialize ProcessingState, and new/updated tests assert the DB-driven lifecycle and merge/error outcomes.

^{Written by Cursor Bugbot for commit eb9c82e. This will update automatically on new commits. Configure here.}

apps/worker/services/processing/state.py

apps/worker/tasks/upload_finisher.py

apps/worker/services/processing/state.py

sentry · 2026-03-10T20:36:28Z

Codecov Report

❌ Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.21%. Comparing base (9add94b) to head (9726587).
⚠️ Report is 10 commits behind head on tomhu/finisher-source-of-truth.

⚠️ Current head 9726587 differs from pull request most recent head eb9c82e

Please upload reports for the commit eb9c82e to get more accurate results.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
apps/worker/services/processing/state.py	92.59%	2 Missing ⚠️

Additional details and impacted files

@@                        Coverage Diff                         @@
##           tomhu/finisher-source-of-truth     #749      +/-   ##
==================================================================
- Coverage                           92.25%   92.21%   -0.05%     
==================================================================
  Files                                1304     1304              
  Lines                               47973    47909      -64     
  Branches                             1628     1628              
==================================================================
- Hits                                44259    44179      -80     
- Misses                               3405     3421      +16     
  Partials                              309      309

Flag	Coverage Δ
workerintegration	`58.64% <84.21%> (-0.06%)`	⬇️
workerunit	`90.20% <84.21%> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codecov-notifications · 2026-03-10T20:36:39Z

Codecov Report

❌ Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
apps/worker/services/processing/state.py	92.59%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

apps/worker/services/processing/state.py

Add an optional db_session parameter to ProcessingState. When provided, all methods use DB queries (Upload.state_id) instead of Redis sets. When omitted, behavior is unchanged (Redis path). DB-backed methods: - get_upload_numbers: COUNT by state_id (UPLOADED=processing, PROCESSED=processed) - mark_upload_as_processed: UPDATE state_id to PROCESSED - mark_uploads_as_merged: UPDATE state_id to MERGED - get_uploads_for_merging: SELECT WHERE state_id=PROCESSED LIMIT batch_size - mark_uploads_as_processing / clear_in_progress_uploads: no-op (DB path) No callers change in this PR -- this is a pure capability addition. Made-with: Cursor

Activate the DB-backed state path in process_upload() by passing db_session to ProcessingState. The processor now writes PROCESSED state to the database instead of Redis. Also removes the should_trigger_postprocessing check and direct finisher triggering from the processor -- this orphaned-task recovery will be replaced by the gate key mechanism in a later PR. Made-with: Cursor

The DB-backed path was skipping Redis writes, but the finisher still reads from Redis. Keep writing to both until the finisher migrates to DB-backed state in a later PR. Made-with: Cursor

When db_session is present, both the DB path and the Redis fall-through path were incrementing CLEARED_UPLOADS. Move the Redis srem inside the DB block and return early so the metric is only counted once. Made-with: Cursor

1. Add Redis srem inside the DB block so stale entries in the Redis "processed" set are cleaned up during dual-write. 2. Add PROCESSED state filter to prevent accidentally overwriting ERROR-state uploads. Made-with: Cursor

Change update_uploads() to write state_id=MERGED, state="merged" for successful uploads instead of PROCESSED. This completes the semantic distinction: PROCESSED means "processor done, waiting for merge" while MERGED means "incorporated into the master report." Safe because the finisher's idempotency check already recognizes the "merged" state (done in the previous PR). Made-with: Cursor

- Make db_session a required parameter on ProcessingState - Remove all Redis operations (sadd, srem, smove, scard, srandmember) - Remove PROCESSING_STATE_TTL, _redis_key(), get_redis_connection import - mark_uploads_as_processing is now an explicit no-op (uploads already exist with state_id=UPLOADED which get_upload_numbers counts) - Pass db_session through upload.py schedule_task chain - Remove redis_state workaround in processing.py safety-net trigger, reuse the existing DB-backed state instance - Remove all Redis-only and Redis-mock unit tests Made-with: Cursor

This runs in a finally block where the DB transaction may already be in a failed state. Wrap in try/except so it doesn't mask the original error. The upload stays UPLOADED, which is safe. Made-with: Cursor

Don't rely on the task framework's finally cleanup to persist the MERGED state — commit immediately after the update. Made-with: Cursor

sentry · 2026-03-11T18:15:24Z

apps/worker/services/processing/state.py

 class ProcessingState:
-    def __init__(self, repoid: int, commitsha: str) -> None:
-        self._redis = get_redis_connection()
+    def __init__(self, repoid: int, commitsha: str, db_session: Session) -> None:


Bug: The call to ProcessingState() in upload_finisher.py is missing the required db_session argument, which will cause a TypeError at runtime.
_{Severity: CRITICAL}

Suggested Fix

Update the ProcessingState instantiation in upload_finisher.py to pass the db_session argument, which is available in the run_impl method's scope. The call should be ProcessingState(repoid, commitid, db_session).

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: apps/worker/services/processing/state.py#L82 Potential issue: The `ProcessingState.__init__` method was updated to require a `db_session` argument. However, the instantiation of `ProcessingState` in `upload_finisher.py` (line 303) was not updated to pass this required argument. The `db_session` is available in the scope of the calling `run_impl` method. This omission will cause a `TypeError` every time the `UploadFinisherTask` is executed, which will crash the task and block the entire coverage report finalization pipeline.

Update upload_finisher to construct ProcessingState with db_session after DB-only state migration so reconstruction and merge readiness checks use the new required interface. Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-11T18:23:20Z

apps/worker/services/tests/test_processing.py

+        assert upload.state_id == UploadState.PROCESSED.db_id
+        # state string is not updated by the processor -- the finisher sets it
+        # after merging (to avoid triggering the finisher's idempotency check early)
+        assert upload.state == "started"


Test assertion contradicts mock making test always fail

High Severity

The new assertions at lines 91–95 expect upload.state_id == UploadState.PROCESSED.db_id, but ProcessingState is fully mocked at line 56, making mark_upload_as_processed a no-op MagicMock. The upload is created with state_id=UploadState.UPLOADED.db_id (value 1), and nothing changes it, so the assertion comparing against PROCESSED.db_id (value 2) will always raise AssertionError.

Additional Locations (1)

apps/worker/services/tests/test_processing.py#L55-L58

cursor · 2026-03-11T18:23:20Z

apps/worker/services/processing/processing.py

                celery_app.tasks[upload_finisher_task_name].apply_async(
                    kwargs=finisher_kwargs
                )
-


No DB commit before dispatching async finisher task

Medium Severity

mark_upload_as_processed sets state_id=PROCESSED on the ORM object but the transaction is never committed before the finisher is dispatched via apply_async(). The finisher runs in a separate DB session and cannot see uncommitted changes. With the previous Redis-based approach, smove was immediately visible cross-process. The finisher's get_uploads_for_merging() may find zero PROCESSED uploads, falling through to a legacy fallback path.

Additional Locations (1)

apps/worker/services/processing/state.py#L157-L164

thomasrockhu-codecov mentioned this pull request Mar 10, 2026

feat(worker): remove dual-write, make ProcessingState DB-only #750

Merged

4 tasks

thomasrockhu-codecov marked this pull request as ready for review March 10, 2026 20:18

sentry bot reviewed Mar 10, 2026

View reviewed changes

apps/worker/services/processing/state.py Show resolved Hide resolved

thomasrockhu-codecov changed the title ~~feat(worker): DB-backed ProcessingState with dual-write~~ feat(worker): DB-backed ProcessingState (replaces Redis) Mar 10, 2026

cursor bot reviewed Mar 10, 2026

View reviewed changes

apps/worker/tasks/upload_finisher.py Outdated Show resolved Hide resolved

sentry bot reviewed Mar 10, 2026

View reviewed changes

apps/worker/services/processing/state.py Show resolved Hide resolved

cursor bot reviewed Mar 10, 2026

View reviewed changes

apps/worker/services/processing/state.py Show resolved Hide resolved

apps/worker/services/processing/state.py Show resolved Hide resolved

thomasrockhu-codecov added 9 commits March 12, 2026 03:10

fix(worker): dual-write ProcessingState to DB and Redis

22639c3

The DB-backed path was skipping Redis writes, but the finisher still reads from Redis. Keep writing to both until the finisher migrates to DB-backed state in a later PR. Made-with: Cursor

Fix CLEARED_UPLOADS double-counting in dual-write path

475647d

When db_session is present, both the DB path and the Redis fall-through path were incrementing CLEARED_UPLOADS. Move the Redis srem inside the DB block and return early so the metric is only counted once. Made-with: Cursor

fix: make clear_in_progress_uploads resilient to aborted transactions

0ca48c0

This runs in a finally block where the DB transaction may already be in a failed state. Wrap in try/except so it doesn't mask the original error. The upload stays UPLOADED, which is safe. Made-with: Cursor

fix: explicitly commit in mark_uploads_as_merged

2d7ec3f

Don't rely on the task framework's finally cleanup to persist the MERGED state — commit immediately after the update. Made-with: Cursor

thomasrockhu-codecov force-pushed the tomhu/processing-state-db-all branch from 9726587 to 2d7ec3f Compare March 11, 2026 18:13

thomasrockhu-codecov changed the base branch from main to tomhu/single-finisher-gate March 11, 2026 18:13

sentry bot reviewed Mar 11, 2026

View reviewed changes

fix(worker): pass db_session to ProcessingState in finisher runtime path

eb9c82e

Update upload_finisher to construct ProcessingState with db_session after DB-only state migration so reconstruction and merge readiness checks use the new required interface. Made-with: Cursor

cursor bot reviewed Mar 11, 2026

View reviewed changes

Base automatically changed from tomhu/single-finisher-gate to tomhu/finisher-source-of-truth March 11, 2026 18:45

thomasrockhu-codecov merged commit eb9c82e into tomhu/finisher-source-of-truth Mar 11, 2026
25 of 28 checks passed

thomasrockhu-codecov deleted the tomhu/processing-state-db-all branch March 11, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(worker): DB-backed ProcessingState (replaces Redis)#749

feat(worker): DB-backed ProcessingState (replaces Redis)#749
thomasrockhu-codecov merged 10 commits intotomhu/finisher-source-of-truthfrom
tomhu/processing-state-db-all

thomasrockhu-codecov commented Mar 10, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sentry bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

codecov-notifications bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

sentry bot Mar 11, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 11, 2026

Uh oh!

cursor bot Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thomasrockhu-codecov commented Mar 10, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes by layer

Key design decisions

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sentry bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-notifications bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

sentry bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

Test assertion contradicts mock making test always fail

Uh oh!

cursor bot Mar 11, 2026

Choose a reason for hiding this comment

No DB commit before dispatching async finisher task

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomasrockhu-codecov commented Mar 10, 2026 •

edited by cursor bot

Loading

sentry bot commented Mar 10, 2026 •

edited

Loading

codecov-notifications bot commented Mar 10, 2026 •

edited

Loading