Skip to content

resource/aws_batch_compute_environment: Retry delete on stale JobQueue relationship#46927

Open
quinnjr wants to merge 2 commits intohashicorp:mainfrom
quinnjr:b-batch_compute_environment-delete-retry-jq-relationship
Open

resource/aws_batch_compute_environment: Retry delete on stale JobQueue relationship#46927
quinnjr wants to merge 2 commits intohashicorp:mainfrom
quinnjr:b-batch_compute_environment-delete-retry-jq-relationship

Conversation

@quinnjr
Copy link

@quinnjr quinnjr commented Mar 13, 2026

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request

Description

When a aws_batch_compute_environment is replaced (via create_before_destroy / Pulumi deleteBeforeReplace: false), the provider correctly:

  1. Creates the new compute environment
  2. Updates the referencing aws_batch_job_queue to point to the new CE
  3. Attempts to delete the old CE

Step 3 consistently fails with:

ClientException: Cannot delete, found existing JobQueue relationship.

This happens because AWS Batch has eventual consistency — the job queue update returns success from the API, but the old CE still shows a job queue relationship when the delete is attempted moments later.

This PR wraps the DeleteComputeEnvironment call in tfresource.Retry, retrying on the specific ClientException containing "existing JobQueue relationship". This follows the same pattern used in the EKS cluster delete path for ResourceInUseException (internal/service/eks/cluster.go).

Affected Resource(s)

  • aws_batch_compute_environment

How Has This Been Tested?

  • The change compiles cleanly (go build ./internal/service/batch/...)
  • go vet passes
  • Existing unit tests pass (go test ./internal/service/batch/... -run "TestExpand|TestFlatten")
  • Existing acceptance test TestAccBatchComputeEnvironment_disappears covers the delete path
  • Reproduced the original error in a production Pulumi deployment with AWS Batch (EC2 compute type with BEST_FIT_PROGRESSIVE allocation strategy and a regular service role)

AI Disclosure

AI assistance (Claude) was used to research the provider's retry patterns across other services (EKS, ECR, S3) and to draft the initial code structure. The fix was reviewed and validated by a human contributor who experienced the bug firsthand. All code has been read, understood, and verified.

References

Made with Cursor

…e relationship

When a compute environment is replaced (create_before_destroy), the
referencing job queue is updated to point to the new CE before the old
one is deleted. Due to AWS Batch eventual consistency, the old CE may
still show a job queue relationship when the delete is attempted,
causing "Cannot delete, found existing JobQueue relationship".

Wrap DeleteComputeEnvironment in tfresource.Retry to retry on this
specific ClientException, matching the EKS cluster delete pattern.

Fixes hashicorp#46925

Made-with: Cursor
@quinnjr quinnjr requested a review from a team as a code owner March 13, 2026 18:34
@github-actions
Copy link
Contributor

Community Guidelines

This comment is added to every new Pull Request to provide quick reference to how the Terraform AWS Provider is maintained. Please review the information below, and thank you for contributing to the community that keeps the provider thriving! 🚀

Voting for Prioritization

  • Please vote on this Pull Request by adding a 👍 reaction to the original post to help the community and maintainers prioritize it.
  • Please see our prioritization guide for additional information on how the maintainers handle prioritization.
  • Please do not leave +1 or other comments that do not add relevant new information or questions; they generate extra noise for others following the Pull Request and do not help prioritize the request.

Pull Request Authors

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/batch Issues and PRs that pertain to the batch service. size/XS Managed by automation to categorize the size of a PR. labels Mar 13, 2026
@dosubot dosubot bot added the bug Addresses a defect in current functionality. label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Addresses a defect in current functionality. needs-triage Waiting for first response or review from a maintainer. service/batch Issues and PRs that pertain to the batch service. size/XS Managed by automation to categorize the size of a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aws_batch_compute_environment replacement fails with "Cannot delete, found existing JobQueue relationship" despite create_before_destroy

1 participant