fix(docker): add HEALTHCHECK instruction to Dockerfile#1485
fix(docker): add HEALTHCHECK instruction to Dockerfile#1485dknos wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Enables Docker and orchestrators to detect unhealthy containers and trigger automatic restarts. Also adds hadolint ignore comments for pre-existing warnings (DL3006, SC2015). Fixes NVIDIA#1430 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdded HEALTHCHECK instructions to Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
Dockerfile (1)
177-178: Consider adding a startup grace period to reduce cold-start false negatives.A short
--start-periodtypically improves reliability when services take time to bootstrap.Proposed tweak
-HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ - CMD curl -sf http://127.0.0.1:18789/health || exit 1 +HEALTHCHECK --interval=30s --timeout=5s --start-period=45s --retries=3 \ + CMD curl -fsS http://127.0.0.1:18789/health || exit 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 177 - 178, The Dockerfile's HEALTHCHECK currently lacks a startup grace period, causing cold-starts to be reported as failures; update the HEALTHCHECK directive that uses "HEALTHCHECK --interval=30s --timeout=5s --retries=3 CMD curl -sf http://127.0.0.1:18789/health || exit 1" to include a suitable --start-period (for example --start-period=60s or another value appropriate for your service) so Docker will wait before beginning health probes, keeping the rest of the flags intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Dockerfile`:
- Around line 174-179: The Dockerfile change added a HEALTHCHECK block
(HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -sf
http://127.0.0.1:18789/health || exit 1) but the same directive must also be
added to Dockerfile.base; open Dockerfile.base and add an identical HEALTHCHECK
directive (same interval, timeout, retries and CMD curl -sf
http://127.0.0.1:18789/health || exit 1) at the analogous location where health
checks are defined in Dockerfile (near the end or after the service start/EXPOSE
steps) so both images include the same container health probe.
---
Nitpick comments:
In `@Dockerfile`:
- Around line 177-178: The Dockerfile's HEALTHCHECK currently lacks a startup
grace period, causing cold-starts to be reported as failures; update the
HEALTHCHECK directive that uses "HEALTHCHECK --interval=30s --timeout=5s
--retries=3 CMD curl -sf http://127.0.0.1:18789/health || exit 1" to include a
suitable --start-period (for example --start-period=60s or another value
appropriate for your service) so Docker will wait before beginning health
probes, keeping the rest of the flags intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| # Health check: poll the gateway's /health endpoint so Docker (and Compose) | ||
| # can detect and restart unhealthy containers in standalone deployments. | ||
| # Ref: https://github.com/NVIDIA/NemoClaw/issues/1430 | ||
| HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ | ||
| CMD curl -sf http://127.0.0.1:18789/health || exit 1 | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Dockerfiles present:"
fd -i '^Dockerfile(\.base)?$' | sort
echo
echo "HEALTHCHECK declarations by file:"
fd -i '^Dockerfile(\.base)?$' --exec rg -n '^\s*HEALTHCHECK\b' {}Repository: NVIDIA/NemoClaw
Length of output: 199
Add HEALTHCHECK to Dockerfile.base as well.
Issue #1430 requires HEALTHCHECK in both Dockerfile and Dockerfile.base. The change currently only addresses Dockerfile. Add the same HEALTHCHECK directive to Dockerfile.base at the appropriate location.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` around lines 174 - 179, The Dockerfile change added a HEALTHCHECK
block (HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -sf
http://127.0.0.1:18789/health || exit 1) but the same directive must also be
added to Dockerfile.base; open Dockerfile.base and add an identical HEALTHCHECK
directive (same interval, timeout, retries and CMD curl -sf
http://127.0.0.1:18789/health || exit 1) at the analogous location where health
checks are defined in Dockerfile (near the end or after the service start/EXPOSE
steps) so both images include the same container health probe.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Dockerfile.base (1)
128-129: Derived test image inherits insufficient base HEALTHCHECKThe base HEALTHCHECK (lines 128-129) only validates that
nodecan execute, not that the actual service is running. The verification confirms thattest/Dockerfile.sandbox(line 8 of provided context) inherits from${BASE_IMAGE}but does not define its own HEALTHCHECK override, leaving it vulnerable to false-positive health reports.To mitigate this, either require all child images to define a service-level HEALTHCHECK probe, or parameterize the base command so inheritors can inject a meaningful healthcheck (e.g., a startup script or port check).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile.base` around lines 128 - 129, The base HEALTHCHECK is too superficial (only running node) and causes child images like test/Dockerfile.sandbox to inherit a false-positive probe; add a build-time argument (e.g., ARG HEALTHCHECK_CMD) and use it in the HEALTHCHECK instruction so inheritors can supply a meaningful check, or alternatively remove the base HEALTHCHECK and document that every child image must define its own HEALTHCHECK; update the HEALTHCHECK line to reference HEALTHCHECK_CMD and ensure test/Dockerfile.sandbox sets HEALTHCHECK_CMD (or defines its own HEALTHCHECK) so the probe actually validates the service.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Dockerfile.base`:
- Around line 128-129: The base HEALTHCHECK is too superficial (only running
node) and causes child images like test/Dockerfile.sandbox to inherit a
false-positive probe; add a build-time argument (e.g., ARG HEALTHCHECK_CMD) and
use it in the HEALTHCHECK instruction so inheritors can supply a meaningful
check, or alternatively remove the base HEALTHCHECK and document that every
child image must define its own HEALTHCHECK; update the HEALTHCHECK line to
reference HEALTHCHECK_CMD and ensure test/Dockerfile.sandbox sets
HEALTHCHECK_CMD (or defines its own HEALTHCHECK) so the probe actually validates
the service.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee49aa74-0422-41e5-815b-0ce53bf2d577
📒 Files selected for processing (2)
DockerfileDockerfile.base
🚧 Files skipped from review as they are similar to previous changes (1)
- Dockerfile
Summary
HEALTHCHECKinstruction toDockerfilefor container health monitoringTest plan
HEALTHCHECKrunsdocker inspectFixes #1430
🤖 Generated with Claude Code
Summary by CodeRabbit