Skip to content

fix(worker): mark uploads as error on permanent finisher failure#728

Closed
thomasrockhu-codecov wants to merge 1 commit intofix/catch-bot-exceptions-in-load-commit-difffrom
fix/mark-uploads-error-on-permanent-failure
Closed

fix(worker): mark uploads as error on permanent finisher failure#728
thomasrockhu-codecov wants to merge 1 commit intofix/catch-bot-exceptions-in-load-commit-difffrom
fix/mark-uploads-error-on-permanent-failure

Conversation

@thomasrockhu-codecov
Copy link
Contributor

@thomasrockhu-codecov thomasrockhu-codecov commented Feb 26, 2026

Summary

  • When the upload_finisher fails permanently (unrecoverable exception, soft time limit, or max lock retries exceeded), uploads remained in "started" state
  • The next upload to the same commit would re-discover these stuck uploads and spawn another finisher that fails again, creating a retry loop that inflated the queue (~60k tasks)
  • Add _mark_uploads_as_error() helper that best-effort transitions uploads to "error" state, breaking the cycle
  • Called from all four permanent failure paths: SoftTimeLimitExceeded, generic Exception, and both LockRetry max-retries-exceeded handlers
  • The helper rolls back the (possibly broken) DB session first and is wrapped in try/except so a dead DB connection does not cause a secondary failure

Stacked on #727.

Test plan

  • test_generic_exception_marks_uploads_as_error — creates uploads in "started" state, triggers an unrecoverable error, verifies uploads transition to "error"
  • test_soft_time_limit_marks_uploads_as_error — same for SoftTimeLimitExceeded
  • CI passes

Made with Cursor


Note

Medium Risk
Changes UploadFinisherTask failure handling to mutate upload state in the DB on several permanent-failure paths; mistakes could incorrectly mark uploads as failed or impact retry behavior.

Overview
Prevents upload-finisher retry loops by best-effort marking affected uploads as error when the finisher hits a permanent failure (soft time limit, unexpected exception, or lock retry exhaustion).

Adds _mark_uploads_as_error() (rollback + bulk update + commit, guarded by try/except) and wires it into the permanent failure handlers, with new unit tests asserting uploads transition from started/UPLOADED to error/ERROR for generic exceptions and SoftTimeLimitExceeded.

Written by Cursor Bugbot for commit 0d60ee9. This will update automatically on new commits. Configure here.

When the upload_finisher fails permanently (unrecoverable exception,
soft time limit, or max lock retries exceeded), uploads remained in
"started" state. The next upload to the same commit would re-discover
these stuck uploads and spawn another finisher that fails again,
creating a retry loop that inflated the queue.

Add _mark_uploads_as_error() helper that best-effort transitions
uploads to "error" state, breaking the cycle. Called from all four
permanent failure paths: SoftTimeLimitExceeded, generic Exception,
and both LockRetry max-retries-exceeded handlers.

The helper rolls back the (possibly broken) DB session first and is
wrapped in try/except so a dead DB connection does not cause a
secondary failure.

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Upload.state_id: UploadState.ERROR.db_id,
},
synchronize_session="fetch",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing state filter may revert processed uploads to error

High Severity

_mark_uploads_as_error updates all uploads matching upload_ids regardless of their current state. When _handle_finisher_lock hits max lock retries (or a SoftTimeLimitExceeded/Exception fires during that phase), uploads have already been committed as "processed" by the earlier _process_reports_with_lock call. The unfiltered update will revert them from "processed" to "error", incorrectly discarding successful work. The query needs an additional filter like Upload.state == "started" to only affect uploads that are genuinely stuck.

Additional Locations (1)

Fix in Cursor Fix in Web

"repoid": repoid,
},
)
self._mark_uploads_as_error(db_session, upload_ids)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed error state alters caller's control flow unintentionally

Medium Severity

When _process_reports_with_lock hits max lock retries, calling _mark_uploads_as_error commits the uploads to "error" state before returning. Back in run_impl, the remaining_uploads query filters on UploadState.UPLOADED — which now finds zero matches — so execution falls through to _handle_finisher_lock, triggering notifications and post-processing even though report merging never happened. Before this change, uploads stayed in UPLOADED state, remaining_uploads > 0 held true, and run_impl returned early.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant