Skip to content

feat(BA-5577): handle operation client recovery when monitor client is healthy#10807

Draft
jopemachine wants to merge 2 commits intomainfrom
feat/BA-5577-operation-client-recovery
Draft

feat(BA-5577): handle operation client recovery when monitor client is healthy#10807
jopemachine wants to merge 2 commits intomainfrom
feat/BA-5577-operation-client-recovery

Conversation

@jopemachine
Copy link
Copy Markdown
Member

Summary

  • When the operation client is broken but the monitor client is healthy, selectively reconnect only the operation client instead of tearing down both
  • Expose operation/monitor sub-component health status through the internal health API (/health endpoint) so external monitoring can distinguish their states
  • Add ValkeyOperationPingable protocol for health checkers to detect MonitoringValkeyClient and report per-sub-component health

Changes

  • MonitoringValkeyClient: Add _reconnect_operation_only(), _is_monitor_healthy(), ping_operation_client() methods; modify monitor loop for selective reconnection
  • ComponentHealthStatus / ComponentConnectivityStatus: Add optional sub_components field
  • ValkeyHealthChecker: Detect ValkeyOperationPingable clients and report operation/monitor sub-component health separately
  • HealthProbe: Pass through sub-component data to DTOs

Test plan

  • test_selective_reconnection_operation_only_when_monitor_healthy — operation-only reconnect when monitor is healthy
  • test_full_reconnect_when_both_unhealthy — full reconnect when both are broken
  • test_monitor_loop_selective_reconnect_integration — end-to-end with monitor loop
  • test_ping_operation_client — direct operation client ping
  • test_reports_sub_components_for_monitoring_client — health checker sub-component reporting
  • test_no_sub_components_for_basic_client — basic clients have no sub-components
  • test_detects_operation_client_failure — health checker detects operation-only failure

🤖 Generated with Claude Code

@github-actions github-actions bot added size:L 100~500 LoC comp:common Related to Common component labels Apr 6, 2026
@jopemachine jopemachine marked this pull request as draft April 6, 2026 06:06
jopemachine added a commit that referenced this pull request Apr 6, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the feat/BA-5578-valkey-client-acquire-pattern branch from 334323b to c7ad9d0 Compare April 6, 2026 09:26
@jopemachine jopemachine force-pushed the feat/BA-5577-operation-client-recovery branch from adb02bb to 1193c22 Compare April 6, 2026 09:45
Base automatically changed from feat/BA-5578-valkey-client-acquire-pattern to main April 7, 2026 05:36
jopemachine and others added 2 commits April 9, 2026 14:25
… sub-components

When the operation client is broken but the monitor client is healthy,
reconnect only the operation client instead of tearing down both.
Expose operation/monitor sub-component health status through the
internal health API so external monitoring can distinguish their states.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the feat/BA-5577-operation-client-recovery branch from 1193c22 to 20388fb Compare April 9, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:common Related to Common component size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant