[AI Generated] Azure: surface per-resource errors on truncated DeploymentFailed#4438
[AI Generated] Azure: surface per-resource errors on truncated DeploymentFailed#4438
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves Azure ARM deployment failure diagnostics in LISA by collecting and surfacing per-resource deployment operation errors when Azure returns truncated or otherwise unactionable top-level HttpResponseError details.
Changes:
- Add a helper to list ARM deployment operations and extract failed sub-resource errors.
- Extend
_deploy()error handling to optionally append these sub-resource errors when top-level deployment failures are aggregated/truncated or missing details.
✅ AI Test Selection — PASSED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest
Test case details
|
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
f446593 to
a95fe38
Compare
⏭️ AI Test Selection — SKIPPED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest Selected test cases |
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
a95fe38 to
20caf6d
Compare
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
20caf6d to
08ad4be
Compare
⏭️ AI Test Selection — SKIPPED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest Selected test cases |
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
08ad4be to
5ad13c5
Compare
⏭️ AI Test Selection — SKIPPED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest Selected test cases |
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
5ad13c5 to
1c1b0f0
Compare
⏭️ AI Test Selection — SKIPPED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest Selected test cases |
✅ AI Test Selection — PASSED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest
Test case details
|
Catch Azure-specific exceptions (HttpResponseError, ResourceNotFoundError) explicitly and log them at debug. Keep a broad fallback for unexpected errors but log with exc_info=True so the traceback is preserved instead of being silently swallowed. The original error path is unchanged: this helper still never raises. Addresses review comment on PR #4438.
1c1b0f0 to
48a3d30
Compare
✅ AI Test Selection — PASSED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest
Test case details
|
Summary
When an Azure ARM deployment fails, LISA currently only surfaces the top-level
HttpResponseError. In several common cases the top-level message is unactionable:DeploymentFailedwith the body"aggregated deployment error is too large"(Azure truncates the inner details when there are many failed sub-operations).ResourceDeploymentFailure: Encountered internal server error. Diagnostic information: timestamp '...', subscription id '...', tracking id '...'— only a tracking id is returned, no per-resource error.DeploymentFailedwith no nesteddetails.In these cases users have to re-run with
keep_environment:alwaysand dig through the Portal /az deployment operation group listto find which resource actually failed and why.Change
In
lisa/sut_orchestrator/azure/platform_.py:_collect_deployment_operation_errors()which callsself._rm_client.deployment_operations.list(resource_group_name, AZURE_DEPLOYMENT_NAME)after a failed deploy and extracts per-resourcetargetResource+ nestedstatusMessage.errorfor every operation whoseprovisioningState == "Failed"._deploy()'sHttpResponseErrorhandler, call this helper as a fallback when the top-level message looks aggregated/truncated, the code isResourceDeploymentFailure, orDeploymentFailedhas no nesteddetails. Append the collected sub-resource errors to both the log and the raisedLisaException.try/exceptand logs at debug on failure, so this never makes diagnostics worse.After this change a previously opaque error becomes:
Validation
smoke_test(success path) —Provisioning.smoke_test: PASSED(134.981 s case, 539 s total). Fallback does not affect the success path.smoke_testwithoutosdisk_size_in_gb:64(deliberate failure) — produces a normalDeploymentFailedwith details intact, fallback correctly skipped, error message identical to current behavior.The fallback branch itself triggers on real-world
ResourceDeploymentFailure/ aggregated-error responses, which are non-deterministic from Azure's side and can't be reliably synthesized; the helper is defensive and is exercised through the success / normal-failure paths above.🤖 Generated with the assistance of GitHub Copilot.