fix: stop retrying 429s caused by Fabric capacity limits#2539
fix: stop retrying 429s caused by Fabric capacity limits#2539BrendanWalsh merged 3 commits intomasterfrom
Conversation
Fabric returns HTTP 429 with body code "CapacityLimitExceeded" when serverless capacity is exhausted. Unlike transient OpenAI rate limits, these are non-retryable — retrying causes infinite loops and hangs. This change inspects 429 response bodies in sendWithRetries. If the body contains "CapacityLimitExceeded", the response is returned immediately instead of entering the retry/backoff loop. Normal rate-limit 429s continue to retry with exponential backoff as before. The response entity is buffered (BufferedHttpEntity) before inspection so callers can still read it after the method returns. Adds 3 regression tests: - CapacityLimitExceeded 429 returns immediately (no retry) - CapacityLimitExceeded 429 ignores Retry-After header - Non-capacity 429 with body still retries normally
|
Hey @BrendanWalsh 👋! We use semantic commit messages to streamline the release process. Examples of commit messages with semantic prefixes:
To test your commit locally, please follow our guild on building from source. |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR updates SynapseML’s HTTP retry logic to avoid infinite retry loops when Fabric serverless capacity is exhausted by treating 429 responses with CapacityLimitExceeded as non-retryable, while preserving existing retry behavior for other 429 scenarios.
Changes:
- Inspect 429 response bodies for
CapacityLimitExceededand return the 429 immediately (no retry). - Buffer 429 response entities so the body can be inspected without preventing callers from reading it later.
- Add regression tests covering capacity-exceeded vs non-capacity 429 behavior (including
Retry-Afterprecedence).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| core/src/main/scala/com/microsoft/azure/synapse/ml/io/http/HTTPClients.scala | Adds 429 body inspection + buffering to short-circuit retries for Fabric capacity-limit errors. |
| core/src/test/scala/com/microsoft/azure/synapse/ml/io/split1/VerifySendWithRetries.scala | Adds tests validating non-retry behavior for capacity-limit 429s and unchanged retries for other 429s. |
| // Buffer the response entity so the body can be inspected | ||
| // and still returned to the caller if we don't retry | ||
| if (response.getEntity != null) { | ||
| response.setEntity(new BufferedHttpEntity(response.getEntity)) | ||
| } | ||
| val bodyStr = Option(response.getEntity) | ||
| .flatMap(e => Try(IOUtils.toString(e.getContent, "UTF-8")).toOption) | ||
| .getOrElse("") |
There was a problem hiding this comment.
BufferedHttpEntity will buffer the entire 429 response payload in memory. If a service returns a large body (or misbehaves), this can create avoidable memory pressure during retries. Consider guarding this with a size cap (e.g., based on Content-Length) or reading only a bounded prefix sufficient to detect the error code, and skip buffering/inspection when the entity is too large.
| // Buffer the response entity so the body can be inspected | |
| // and still returned to the caller if we don't retry | |
| if (response.getEntity != null) { | |
| response.setEntity(new BufferedHttpEntity(response.getEntity)) | |
| } | |
| val bodyStr = Option(response.getEntity) | |
| .flatMap(e => Try(IOUtils.toString(e.getContent, "UTF-8")).toOption) | |
| .getOrElse("") | |
| val max429InspectionBytes = 1024 * 1024L | |
| val bodyStr = Option(response.getEntity).flatMap { entity => | |
| val contentLength = entity.getContentLength | |
| if (contentLength >= 0 && contentLength <= max429InspectionBytes) { | |
| // Buffer only small, known-size entities so the body can be | |
| // inspected and still returned to the caller if we don't retry. | |
| response.setEntity(new BufferedHttpEntity(entity)) | |
| Option(response.getEntity) | |
| .flatMap(e => Try(IOUtils.toString(e.getContent, "UTF-8")).toOption) | |
| } else { | |
| logWarning( | |
| s"Skipping 429 response body inspection for ${request.getURI} " + | |
| s"because Content-Length=$contentLength exceeds $max429InspectionBytes bytes " + | |
| s"or is unknown") | |
| None | |
| } | |
| }.getOrElse("") |
| val bodyStr = Option(response.getEntity) | ||
| .flatMap(e => Try(IOUtils.toString(e.getContent, "UTF-8")).toOption) | ||
| .getOrElse("") |
There was a problem hiding this comment.
The 429 body inspection reads from e.getContent but does not close the InputStream. Even with a BufferedHttpEntity (likely a ByteArrayInputStream), leaving streams unclosed is a resource-leak pattern. Wrap the read in a construct that always closes the stream (e.g., Using.resource or explicit try/finally).
| if (bodyStr.contains("CapacityLimitExceeded")) { | ||
| // Fabric capacity-exceeded 429s are NOT transient rate limits — | ||
| // retrying will not help and causes hangs | ||
| logWarning(s"Capacity limit exceeded (non-retryable 429) on ${request.getURI}: $bodyStr") | ||
| true | ||
| } else { |
There was a problem hiding this comment.
bodyStr.contains("CapacityLimitExceeded") is a brittle detector and could produce false positives/negatives (e.g., the substring appearing in a message, different JSON shapes, casing). Since the codebase already uses JSON parsing (e.g., spray-json in core), consider parsing the response JSON and checking error.code == "CapacityLimitExceeded" explicitly before deciding not to retry.
| if (bodyStr.contains("CapacityLimitExceeded")) { | ||
| // Fabric capacity-exceeded 429s are NOT transient rate limits — | ||
| // retrying will not help and causes hangs | ||
| logWarning(s"Capacity limit exceeded (non-retryable 429) on ${request.getURI}: $bodyStr") |
There was a problem hiding this comment.
This logs the full 429 response body ($bodyStr) at warning level. Error bodies can contain sensitive identifiers or be unexpectedly large, which can lead to data exposure and log bloat. Prefer logging only the extracted error code (and possibly a truncated message/first N chars) rather than the entire payload.
| logWarning(s"Capacity limit exceeded (non-retryable 429) on ${request.getURI}: $bodyStr") | |
| logWarning(s"Capacity limit exceeded (non-retryable 429, code=CapacityLimitExceeded) " + | |
| s"on ${request.getURI}") |
| val start = System.currentTimeMillis() | ||
| val response = HandlingUtils.sendWithRetries( | ||
| client, request, Array(100, 100, 100)) | ||
| val elapsed = System.currentTimeMillis() - start | ||
| val code = response.getStatusLine.getStatusCode | ||
| response.close() | ||
| client.close() | ||
|
|
||
| assert(code === 429, "Capacity-exceeded 429 should be returned immediately, not retried") | ||
| assert(requestCount.get() === 1, "Should not retry on CapacityLimitExceeded") | ||
| assert(elapsed < 1000, s"Should return immediately, took ${elapsed}ms") |
There was a problem hiding this comment.
The elapsed-time assertions (e.g., < 1000ms) can be flaky under CI contention/GC pauses since they depend on wall-clock timing rather than behavior. Consider removing the timing check (the requestCount assertion already validates no retry) or using a more forgiving bound to reduce intermittent failures.
- Guard BufferedHttpEntity with Content-Length cap (1MB) to avoid buffering unexpectedly large 429 response payloads - Remove full response body from log warning to prevent data exposure - Remove tight elapsed-time assertion from capacity test (requestCount already validates no-retry); widen Retry-After test bound to 4s
The Content-Length guard was skipping body inspection when getContentLength() returns -1 (chunked/unknown encoding). This meant capacity-exceeded 429s sent without Content-Length would still retry forever. Changed guard to only skip when Content-Length is known AND exceeds the 1MB cap — unknown-length responses are now inspected. Added regression test for chunked capacity-exceeded 429.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2539 +/- ##
=======================================
Coverage 84.66% 84.66%
=======================================
Files 335 335
Lines 17747 17753 +6
Branches 1595 1615 +20
=======================================
+ Hits 15025 15031 +6
Misses 2722 2722 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Problem
Fabric returns HTTP 429 with body code
"CapacityLimitExceeded"when serverless capacity is exhausted. Unlike transient OpenAI rate limits, these are not retryable — the capacity is genuinely full and retrying just causes infinite retry loops and hangs.The current
sendWithRetriesinHTTPClients.scalatreats all 429s identically, entering an unlimited exponential backoff retry loop regardless of the error type.Fix
On receiving a 429, the response body is now inspected for the
CapacityLimitExceededcode:The response entity is buffered via
BufferedHttpEntitybefore inspection so callers can still read the body aftersendWithRetriesreturns.Tests
3 new regression tests in
VerifySendWithRetries:429 with CapacityLimitExceeded body is not retried429 with CapacityLimitExceeded ignores Retry-After header429 with non-capacity error body still retries normallyAll 12 tests pass (9 existing + 3 new).