Skip to content

feat(worker): pass db_session to ProcessingState in finisher#745

Open
thomasrockhu-codecov wants to merge 5 commits intotomhu/processing-state-db-pathfrom
tomhu/finisher-passes-db-session
Open

feat(worker): pass db_session to ProcessingState in finisher#745
thomasrockhu-codecov wants to merge 5 commits intotomhu/processing-state-db-pathfrom
tomhu/finisher-passes-db-session

Conversation

@thomasrockhu-codecov
Copy link
Contributor

@thomasrockhu-codecov thomasrockhu-codecov commented Mar 10, 2026

Summary

  • Passes db_session to ProcessingState in the upload finisher, activating DB-backed state tracking.
  • Simplifies _reconstruct_processing_results by removing the Redis-expiry fallback path -- DB state is durable and doesn't expire like Redis keys (24h TTL).
  • Removes _find_started_uploads_with_reports entirely (no longer needed).
  • Fixes the idempotency check to recognize "merged" state alongside "processed" and "error", preventing re-merge of already-merged uploads when the finisher retries.

Net -173 lines of code.

Stacked on #742.

Test plan

  • CI passes
  • Idempotency test includes a merged-state upload and verifies skip
  • Reconstruction test verifies empty return when no PROCESSED uploads exist
  • Existing finisher behavior (merging, notifications, lock handling) unchanged

Made with Cursor


Note

Medium Risk
Changes how upload completion/merge readiness is determined by switching the finisher to DB-backed ProcessingState and simplifying reconstruction logic; mistakes could cause missed merges or duplicate finisher work. Also adjusts upload state transitions (PROCESSED/MERGED) and Redis cleanup, which can affect retry/recovery paths.

Overview
Switches upload processing/finisher coordination to persist upload progress in the DB by passing db_session into ProcessingState (processor + finisher) and dual-writing Redis only for lightweight counting.

Simplifies orphaned/retry recovery by reconstructing finisher processing_results from DB Upload.state_id == PROCESSED instead of Redis TTL fallbacks, and tightens finisher idempotency to treat merged as a final state.

Updates ProcessingState to remove DB short-circuits, keep Redis sets in sync during DB-driven clears/merges, and adds/adjusts tests to assert processor marks state_id as PROCESSED while leaving the string state untouched until merging.

Written by Cursor Bugbot for commit b1ae759. This will update automatically on new commits. Configure here.

@codecov-notifications
Copy link

codecov-notifications bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@sentry
Copy link

sentry bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.21%. Comparing base (3b30446) to head (b1ae759).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@                        Coverage Diff                         @@
##           tomhu/processing-state-db-path     #745      +/-   ##
==================================================================
- Coverage                           92.25%   92.21%   -0.04%     
==================================================================
  Files                                1304     1304              
  Lines                               47949    47933      -16     
  Branches                             1628     1628              
==================================================================
- Hits                                44233    44203      -30     
- Misses                               3407     3421      +14     
  Partials                              309      309              
Flag Coverage Δ
workerintegration 58.62% <88.88%> (+0.10%) ⬆️
workerunit 90.24% <100.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from 17aa4ce to debeee6 Compare March 10, 2026 14:12
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/processing-state-db-path branch 2 times, most recently from 6ce1a54 to 9c656c6 Compare March 10, 2026 14:19
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from debeee6 to 90240f9 Compare March 10, 2026 14:19
@thomasrockhu-codecov thomasrockhu-codecov marked this pull request as ready for review March 10, 2026 14:23

# Load Upload records from database to get arguments
uploads = db_session.query(Upload).filter(Upload.id_.in_(upload_ids)).all()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The upload processor updates Redis state but not the database. When the finisher reconstructs state from the DB, it finds no uploads, causing them to be orphaned and not merged.
Severity: HIGH

Suggested Fix

Ensure the database state is synchronized with the Redis state before the finisher's reconstruction logic is called. The processor task should update the Upload.state_id field in the database to PROCESSED when an upload is finished, in addition to updating the Redis sets.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: apps/worker/tasks/upload_finisher.py#L181

Potential issue: When an upload processor task triggers the finisher directly for
recovery, the finisher attempts to reconstruct the list of processed uploads. The
finisher now creates a `ProcessingState` with a `db_session`, causing it to query the
database for uploads with `state_id == PROCESSED.db_id`. However, the processor only
updates the state in Redis and does not set the `state_id` in the database. As a result,
the finisher's query finds no uploads, causing them to be silently ignored and not
merged, which leads to incomplete coverage reports.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by PR 2b (stacked below): the processor dual-writes to both Redis and DB, so the finisher's DB-backed queries find the uploads.

@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/processing-state-db-path branch from 9c656c6 to e0b3bdf Compare March 10, 2026 14:30
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from 90240f9 to f87f322 Compare March 10, 2026 14:31
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from f87f322 to 2204abd Compare March 10, 2026 14:45
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/processing-state-db-path branch from e0b3bdf to 696d952 Compare March 10, 2026 15:10
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from 2204abd to 3b5f4ba Compare March 10, 2026 15:10
Activate the DB-backed state path in process_upload() by passing
db_session to ProcessingState. The processor now writes PROCESSED
state to the database instead of Redis.

Also removes the should_trigger_postprocessing check and direct
finisher triggering from the processor -- this orphaned-task recovery
will be replaced by the gate key mechanism in a later PR.

Made-with: Cursor
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/processing-state-db-path branch from 696d952 to 3b30446 Compare March 10, 2026 15:20
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from 3b5f4ba to 9baf0ce Compare March 10, 2026 15:21
Comment on lines 209 to 216
# Only skip if ALL uploads exist in DB and ALL are in final states
if len(uploads_in_db) == len(upload_ids):
all_already_processed = all(
upload.state in ("processed", "error") for upload in uploads_in_db
upload.state in ("processed", "merged", "error")
for upload in uploads_in_db
)
if all_already_processed:
log.info(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The idempotency check incorrectly treats the intermediate state='processed' as a final state, causing post-processing steps like notifications to be skipped on task retry after a crash.
Severity: HIGH

Suggested Fix

Remove 'processed' from the list of terminal states in the idempotency check at the beginning of run_impl. The check should only consider truly final states, such as 'merged' and 'error', to prevent the task from exiting prematurely when an intermediate state is present upon retry.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: apps/worker/tasks/upload_finisher.py#L209-L216

Potential issue: The idempotency check in `run_impl` now treats
`upload.state='processed'` as a terminal state, causing the task to exit prematurely on
a retry. However, the state is set to 'processed' midway through the task's execution,
before post-processing steps like sending notifications are complete. If the task
crashes or times out after setting the state to 'processed' but before finishing, a
subsequent retry will incorrectly assume all work is done and skip sending notifications
or invalidating the cache. This leads to a situation where a report is successfully
merged, but the user is never notified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in PR 2b: processor no longer sets state string to 'processed'. The idempotency check works correctly now.

The DB-backed path was skipping Redis writes, but the finisher still
reads from Redis. Keep writing to both until the finisher migrates
to DB-backed state in a later PR.

Made-with: Cursor
When db_session is present, both the DB path and the Redis fall-through
path were incrementing CLEARED_UPLOADS. Move the Redis srem inside the
DB block and return early so the metric is only counted once.

Made-with: Cursor
Activate the DB-backed state path in the upload finisher:

- Pass db_session to ProcessingState, using DB as source of truth
  for upload state instead of Redis sets
- Simplify _reconstruct_processing_results: remove Redis-expiry
  fallback since DB state is durable (no TTL)
- Remove _find_started_uploads_with_reports (no longer needed)
- Fix idempotency check to recognize "merged" state, preventing
  re-merge of already-merged uploads when finisher retries

Made-with: Cursor
@thomasrockhu-codecov thomasrockhu-codecov force-pushed the tomhu/finisher-passes-db-session branch from 9baf0ce to ae13f48 Compare March 10, 2026 15:46
1. Add Redis srem inside the DB block so stale entries in the Redis
   "processed" set are cleaned up during dual-write.
2. Add PROCESSED state filter to prevent accidentally overwriting
   ERROR-state uploads.

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

# Only skip if ALL uploads exist in DB and ALL are in final states
if len(uploads_in_db) == len(upload_ids):
all_already_processed = all(
upload.state in ("processed", "error") for upload in uploads_in_db
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommitted DB state after mark_uploads_as_merged

High Severity

state.mark_uploads_as_merged(upload_ids) now performs a DB UPDATE (setting state_id=MERGED and state="merged") because state was changed to include db_session. However, this call happens after the db_session.commit() at line 370, and no subsequent commit is guaranteed. In particular, the early-return path where remaining_uploads > 0 never commits these changes. The uncommitted MERGED transition gets rolled back when the session closes, leaving uploads stuck in PROCESSED state. On the next finisher invocation, get_uploads_for_merging returns these uploads again (matching state_id == PROCESSED), causing them to be merged into the report a second time, producing duplicate coverage data. Before this PR, mark_uploads_as_merged only did Redis srem (no db_session), so no commit was needed.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant