Optimize queue configuration and implement rate limiting for API#640
Merged
Optimize queue configuration and implement rate limiting for API#640
Conversation
Updates the system to generate 5 tasks and 5 events instead of 10. This change simplifies batch processing, reduces resource consumption, and improves performance by handling smaller input sizes. Adjustments include schema validations, prompt instructions, fallback data slicing, and related messaging updates.
Reduces the default number of items in a seed run from 20 to 10, optimizing resource usage during initialization. Introduces a new utility function to update the total item count for a seed run, enabling dynamic adjustments and improving flexibility in managing seed runs.
Introduces a rate limiter to cap the worker's concurrency at 4 jobs per 10 seconds, ensuring compliance with Azure quota limits for chat API usage. Adjusts default concurrency from 8 to 2 to align with the updated rate-limiting strategy. This change prevents excessive API requests that could result in rate-limiting errors (429) and ensures smoother operation for both background jobs and interactive chat.
KevyVo
commented
May 7, 2026
lionello
approved these changes
May 7, 2026
lionello
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces robust rate limit handling and retry logic for all AI/LLM and embedding calls, improves reliability under quota constraints, and adjusts the demo seed data and worker configuration for better quota management.
It seem like the Azure API gives hard limits back in it response which I had taken into account when making these changes. Since these changes only affect Azure, should we toggle this logic only for that cloud?
Here is the data of how these settings came to be:
Samples Checklist
✅ All good!