Phase 5: Script & Health Hardening#242
Draft
paultranvan wants to merge 5 commits intohardening/phase-4from
Draft
Phase 5: Script & Health Hardening#242paultranvan wants to merge 5 commits intohardening/phase-4from
paultranvan wants to merge 5 commits intohardening/phase-4from
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b866d39 to
955a9bf
Compare
db17b2d to
6129de9
Compare
955a9bf to
01275d6
Compare
5faca93 to
805e2bc
Compare
a70f5c2 to
f535e4b
Compare
- Add check_service_health() helper function with 3-second timeout - Probe LLM and VLM services concurrently using asyncio.gather - Return HTTP 503 when LLM (critical) is unhealthy - Return HTTP 200 with degraded status when only VLM (non-critical) is down - Include response_time_ms metrics for each service - Use httpx.AsyncClient with proper timeout and exception handling
- Add restore_state dict to track partitions_created, files_added, files_failed, chunks_inserted, errors - Fix MilvusDB init failure TODO: now returns 1 to stop execution - Distinguish critical vs non-critical failures: file insert failures log and continue instead of raising - Track partition creation on first successful file add - Log progress milestones every 100 files processed - Log final summary with counts and first 10 errors - Cap error list at 100 entries to prevent memory issues
- Replace bare exception handler with proper rollback flow - Log critical failure with restore_state context - Roll back in reverse order: VDB first (prevents orphaned vectors), then RDB (cascades to files) - Log per-partition rollback success/failure - Log rollback summary with partition count - Re-raise exception after rollback for proper exit code - Rollback loop handles empty partition list (early failures)
The LLM/VLM base_url config includes the API path (e.g. http://host:8000/v1/) but the /health endpoint is at the service root. Strip /v1 before probing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The enhanced health check now returns JSON with status/checks instead of plain text. Update assertion to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
805e2bc to
d47a074
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/health_checkendpoint with concurrent LLM/VLM service probes and response time metricsChanges
api.py:check_service_health()helper,asyncio.gatherfor concurrent probes, HTTP 503 if LLM down, 200 degraded if only VLM downscripts/restore.py:restore_statedict, critical vs non-critical failure handling, reverse-order rollback (VDB first then RDB), progress logging every 100 files, error cap at 100Test plan
ruff check openrag/passes🤖 Generated with Claude Code