Skip to content

fix(docker): add HEALTHCHECK instruction to Dockerfile#1485

Open
dknos wants to merge 2 commits intoNVIDIA:mainfrom
dknos:fix/dockerfile-healthcheck-1430
Open

fix(docker): add HEALTHCHECK instruction to Dockerfile#1485
dknos wants to merge 2 commits intoNVIDIA:mainfrom
dknos:fix/dockerfile-healthcheck-1430

Conversation

@dknos
Copy link
Copy Markdown

@dknos dknos commented Apr 5, 2026

Summary

  • Adds HEALTHCHECK instruction to Dockerfile for container health monitoring
  • Enables Docker/Compose to detect and restart unhealthy containers in standalone deployments
  • Adds hadolint ignore comments for pre-existing warnings (DL3006, SC2015)

Test plan

  • Build Docker image and verify HEALTHCHECK runs
  • Confirm container reports healthy status via docker inspect

Fixes #1430

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Container health monitoring is now enabled for both base and gateway images: the system performs periodic runtime and gateway checks, automatically tracking container health to improve deployment reliability and observability.

Enables Docker and orchestrators to detect unhealthy containers
and trigger automatic restarts. Also adds hadolint ignore comments
for pre-existing warnings (DL3006, SC2015).

Fixes NVIDIA#1430

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

Added HEALTHCHECK instructions to Dockerfile and Dockerfile.base to let Docker probe runtime health (gateway endpoint in production image; a Node.js runtime check in base). Also added hadolint suppression comments; no other runtime/build steps changed. (49 words)

Changes

Cohort / File(s) Summary
Health Monitoring
Dockerfile, Dockerfile.base
Added HEALTHCHECK directives: Dockerfile probes http://127.0.0.1:18789/health via curl -sf; Dockerfile.base runs node -e "process.exit(0)". Added hadolint ignore comments (DL3006, SC2015). No other build/runtime commands altered.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hop inside the container bright,
I curl a little check by light,
A tiny ping, a cheerful beat,
Now Docker knows when things are neat —
Hooray for healthy hops tonight! 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding a HEALTHCHECK instruction to Dockerfile, which aligns with the primary objective of the PR.
Linked Issues check ✅ Passed The PR addresses all coding requirements from issue #1430: HEALTHCHECK instructions added to both Dockerfile and Dockerfile.base to enable container health monitoring in standalone Docker deployments.
Out of Scope Changes check ✅ Passed All changes are scope-appropriate: HEALTHCHECK additions to Dockerfile and Dockerfile.base, plus hadolint suppression comments for pre-existing linter warnings. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Dockerfile (1)

177-178: Consider adding a startup grace period to reduce cold-start false negatives.

A short --start-period typically improves reliability when services take time to bootstrap.

Proposed tweak
-HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
-    CMD curl -sf http://127.0.0.1:18789/health || exit 1
+HEALTHCHECK --interval=30s --timeout=5s --start-period=45s --retries=3 \
+    CMD curl -fsS http://127.0.0.1:18789/health || exit 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 177 - 178, The Dockerfile's HEALTHCHECK currently
lacks a startup grace period, causing cold-starts to be reported as failures;
update the HEALTHCHECK directive that uses "HEALTHCHECK --interval=30s
--timeout=5s --retries=3 CMD curl -sf http://127.0.0.1:18789/health || exit 1"
to include a suitable --start-period (for example --start-period=60s or another
value appropriate for your service) so Docker will wait before beginning health
probes, keeping the rest of the flags intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 174-179: The Dockerfile change added a HEALTHCHECK block
(HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -sf
http://127.0.0.1:18789/health || exit 1) but the same directive must also be
added to Dockerfile.base; open Dockerfile.base and add an identical HEALTHCHECK
directive (same interval, timeout, retries and CMD curl -sf
http://127.0.0.1:18789/health || exit 1) at the analogous location where health
checks are defined in Dockerfile (near the end or after the service start/EXPOSE
steps) so both images include the same container health probe.

---

Nitpick comments:
In `@Dockerfile`:
- Around line 177-178: The Dockerfile's HEALTHCHECK currently lacks a startup
grace period, causing cold-starts to be reported as failures; update the
HEALTHCHECK directive that uses "HEALTHCHECK --interval=30s --timeout=5s
--retries=3 CMD curl -sf http://127.0.0.1:18789/health || exit 1" to include a
suitable --start-period (for example --start-period=60s or another value
appropriate for your service) so Docker will wait before beginning health
probes, keeping the rest of the flags intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 315a1d49-e077-4507-a236-379162a409cd

📥 Commits

Reviewing files that changed from the base of the PR and between c99e3e8 and 27f1319.

📒 Files selected for processing (1)
  • Dockerfile

Comment on lines +174 to +179
# Health check: poll the gateway's /health endpoint so Docker (and Compose)
# can detect and restart unhealthy containers in standalone deployments.
# Ref: https://github.com/NVIDIA/NemoClaw/issues/1430
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -sf http://127.0.0.1:18789/health || exit 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Dockerfiles present:"
fd -i '^Dockerfile(\.base)?$' | sort

echo
echo "HEALTHCHECK declarations by file:"
fd -i '^Dockerfile(\.base)?$' --exec rg -n '^\s*HEALTHCHECK\b' {}

Repository: NVIDIA/NemoClaw

Length of output: 199


Add HEALTHCHECK to Dockerfile.base as well.

Issue #1430 requires HEALTHCHECK in both Dockerfile and Dockerfile.base. The change currently only addresses Dockerfile. Add the same HEALTHCHECK directive to Dockerfile.base at the appropriate location.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 174 - 179, The Dockerfile change added a HEALTHCHECK
block (HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -sf
http://127.0.0.1:18789/health || exit 1) but the same directive must also be
added to Dockerfile.base; open Dockerfile.base and add an identical HEALTHCHECK
directive (same interval, timeout, retries and CMD curl -sf
http://127.0.0.1:18789/health || exit 1) at the analogous location where health
checks are defined in Dockerfile (near the end or after the service start/EXPOSE
steps) so both images include the same container health probe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Dockerfile.base (1)

128-129: Derived test image inherits insufficient base HEALTHCHECK

The base HEALTHCHECK (lines 128-129) only validates that node can execute, not that the actual service is running. The verification confirms that test/Dockerfile.sandbox (line 8 of provided context) inherits from ${BASE_IMAGE} but does not define its own HEALTHCHECK override, leaving it vulnerable to false-positive health reports.

To mitigate this, either require all child images to define a service-level HEALTHCHECK probe, or parameterize the base command so inheritors can inject a meaningful healthcheck (e.g., a startup script or port check).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.base` around lines 128 - 129, The base HEALTHCHECK is too
superficial (only running node) and causes child images like
test/Dockerfile.sandbox to inherit a false-positive probe; add a build-time
argument (e.g., ARG HEALTHCHECK_CMD) and use it in the HEALTHCHECK instruction
so inheritors can supply a meaningful check, or alternatively remove the base
HEALTHCHECK and document that every child image must define its own HEALTHCHECK;
update the HEALTHCHECK line to reference HEALTHCHECK_CMD and ensure
test/Dockerfile.sandbox sets HEALTHCHECK_CMD (or defines its own HEALTHCHECK) so
the probe actually validates the service.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Dockerfile.base`:
- Around line 128-129: The base HEALTHCHECK is too superficial (only running
node) and causes child images like test/Dockerfile.sandbox to inherit a
false-positive probe; add a build-time argument (e.g., ARG HEALTHCHECK_CMD) and
use it in the HEALTHCHECK instruction so inheritors can supply a meaningful
check, or alternatively remove the base HEALTHCHECK and document that every
child image must define its own HEALTHCHECK; update the HEALTHCHECK line to
reference HEALTHCHECK_CMD and ensure test/Dockerfile.sandbox sets
HEALTHCHECK_CMD (or defines its own HEALTHCHECK) so the probe actually validates
the service.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee49aa74-0422-41e5-815b-0ce53bf2d577

📥 Commits

Reviewing files that changed from the base of the PR and between 27f1319 and 8d48ef6.

📒 Files selected for processing (2)
  • Dockerfile
  • Dockerfile.base
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No Dockerfile HEALTHCHECK — Unhealthy Containers Not Detected in Standalone Docker Deployments - IssueFinder - SN 08

2 participants