Phase 5: Script & Health Hardening by paultranvan · Pull Request #242 · linagora/openrag

paultranvan · 2026-02-11T13:54:00Z

Summary

Enhance /health_check endpoint with concurrent LLM/VLM service probes and response time metrics
Harden restore script with state tracking, rollback on critical failure, and progress logging

Changes

api.py: check_service_health() helper, asyncio.gather for concurrent probes, HTTP 503 if LLM down, 200 degraded if only VLM down
scripts/restore.py: restore_state dict, critical vs non-critical failure handling, reverse-order rollback (VDB first then RDB), progress logging every 100 files, error cap at 100

Test plan

All 98 existing tests pass
ruff check openrag/ passes
Health endpoint responds within ~3s even with slow services
Restore script rolls back on critical failure

🤖 Generated with Claude Code

coderabbitai · 2026-02-11T13:54:08Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hardening/phase-5

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Add check_service_health() helper function with 3-second timeout - Probe LLM and VLM services concurrently using asyncio.gather - Return HTTP 503 when LLM (critical) is unhealthy - Return HTTP 200 with degraded status when only VLM (non-critical) is down - Include response_time_ms metrics for each service - Use httpx.AsyncClient with proper timeout and exception handling

- Add restore_state dict to track partitions_created, files_added, files_failed, chunks_inserted, errors - Fix MilvusDB init failure TODO: now returns 1 to stop execution - Distinguish critical vs non-critical failures: file insert failures log and continue instead of raising - Track partition creation on first successful file add - Log progress milestones every 100 files processed - Log final summary with counts and first 10 errors - Cap error list at 100 entries to prevent memory issues

- Replace bare exception handler with proper rollback flow - Log critical failure with restore_state context - Roll back in reverse order: VDB first (prevents orphaned vectors), then RDB (cascades to files) - Log per-partition rollback success/failure - Log rollback summary with partition count - Re-raise exception after rollback for proper exit code - Rollback loop handles empty partition list (early failures)

The LLM/VLM base_url config includes the API path (e.g. http://host:8000/v1/) but the /health endpoint is at the service root. Strip /v1 before probing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The enhanced health check now returns JSON with status/checks instead of plain text. Update assertion to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paultranvan force-pushed the hardening/phase-4 branch from b866d39 to 955a9bf Compare February 11, 2026 14:07

paultranvan force-pushed the hardening/phase-5 branch from db17b2d to 6129de9 Compare February 11, 2026 14:07

paultranvan force-pushed the hardening/phase-4 branch from 955a9bf to 01275d6 Compare February 12, 2026 17:51

paultranvan force-pushed the hardening/phase-5 branch 2 times, most recently from 5faca93 to 805e2bc Compare February 12, 2026 17:54

paultranvan force-pushed the hardening/phase-4 branch from a70f5c2 to f535e4b Compare February 12, 2026 21:31

paultranvan and others added 5 commits February 12, 2026 22:31

fix(05-01): strip /v1 path from health probe URLs

4d1d428

The LLM/VLM base_url config includes the API path (e.g. http://host:8000/v1/) but the /health endpoint is at the service root. Strip /v1 before probing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(05-01): update health check API test for new JSON response

d47a074

The enhanced health check now returns JSON with status/checks instead of plain text. Update assertion to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paultranvan force-pushed the hardening/phase-5 branch from 805e2bc to d47a074 Compare February 12, 2026 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 5: Script & Health Hardening#242

Phase 5: Script & Health Hardening#242
paultranvan wants to merge 5 commits intohardening/phase-4from
hardening/phase-5

paultranvan commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paultranvan commented Feb 11, 2026

Summary

Changes

Test plan

Uh oh!

coderabbitai bot commented Feb 11, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant