fix: improve healthcheck by lindesvard · Pull Request #349 · Openpanel-dev/openpanel

lindesvard · 2026-04-23T20:52:15Z

Summary by CodeRabbit

Release Notes

New Features
- Added Kubernetes-compatible health endpoints (/healthz/live, /healthz/ready) for better deployment integration
- Enhanced health checks with concurrent dependency verification and granular failure reporting
- Added heartbeat monitoring for event consumer processes
Bug Fixes
- Health checks now gracefully tolerate individual dependency failures and provide detailed per-dependency status
- Improved graceful shutdown handling for worker processes
Chores
- Updated dependency versions

coderabbitai · 2026-04-23T20:52:31Z

📝 Walkthrough

Walkthrough

This PR introduces comprehensive health check improvements across the API and worker services. It adds concurrent dependency health checks (Redis, Database, ClickHouse) with individual failure tolerance in the API healthcheck controller, implements Kubernetes-compatible health endpoints (/healthz/live, /healthz/ready) in the worker service with heartbeat monitoring for events consumers, and creates new utility modules for graceful shutdown state management and events consumer heartbeat tracking.

Changes

Cohort / File(s)	Summary
Health Check Endpoints `apps/api/src/controllers/healthcheck.controller.ts`, `apps/worker/src/index.ts`	Reworked `/healthcheck` to concurrently evaluate Redis, DB, and ClickHouse with per-dependency error capture; added Kubernetes endpoints `/healthz/live` (always 200) and `/healthz/ready` (shutdown-aware with events heartbeat monitoring). Both endpoints now use `tryCatch` for graceful failure handling and return structured dependency results.
Graceful Shutdown & Heartbeat Utilities `apps/worker/src/utils/graceful-shutdown.ts`, `apps/worker/src/utils/worker-heartbeat.ts`, `apps/worker/src/boot-workers.ts`	Introduced two new utility modules: `graceful-shutdown` managing shutdown state flags and `worker-heartbeat` tracking events consumer activity timestamps. Boot workers now conditionally enable heartbeat on events shard startup and update it on BullMQ/groupmq lifecycle events; error and termination handlers mark process as shutting down before graceful exit.
Common Try-Catch Utility `packages/common/src/try-catch.ts`, `packages/common/index.ts`	Added typed discriminated union `TryCatchResult<T, E>` and generic `tryCatch` helper supporting both thunks and direct promises, returning `{ ok, data, error }` structure for safe error handling across the codebase.
Dependency Upgrade `pnpm-workspace.yaml`	Updated `groupmq` from `2.0.0-next.3` to `2.0.0-next.4` in workspace catalog.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API Healthcheck
    participant Redis as Redis
    participant DB as Database
    participant CH as ClickHouse

    Client->>API: GET /healthcheck
    par Concurrent Checks
        API->>Redis: PING
        API->>DB: SELECT 1
        API->>CH: SELECT 1
    and
        rect rgba(255, 0, 0, 0.5)
            Note over Redis: Executes async
        end
        rect rgba(255, 0, 0, 0.5)
            Note over DB: Executes async
        end
        rect rgba(255, 0, 0, 0.5)
            Note over CH: Executes async
        end
    end
    
    alt All Dependencies Succeed
        Redis-->>API: Connected
        DB-->>API: Responsive
        CH-->>API: Available
        API->>Client: 200 OK {ready: true, dependencies: {...}}
    else Any Dependency Fails
        Redis-->>API: Error
        DB-->>API: Error/Connected
        CH-->>API: Error/Available
        API->>Client: 503 Service Unavailable {ready: false, dependencies: {...}, errors: {...}}
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: graceful shutdown #205: Modifies API healthcheck controller behavior and introduces/consumes graceful-shutdown utilities, with parallel rework of health check dependencies and shutdown handling patterns.

Poem

🐰 Hark, the services now beat as one,
Health checks dance, concurrent and fun,
Redis, Database, ClickHouse aligned,
Graceful shutdown, no stray threads left behind,
A heartbeat keeps the workers alive! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: improving healthcheck functionality across the API and worker services with concurrent dependency checks, better error handling, and Kubernetes-compatible endpoints.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/healthchecks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

apps/worker/src/index.ts (2)
29-29: Consider making the stale threshold configurable via env.

EVENTS_HEARTBEAT_STALE_MS is hard-coded to 60s. Since it's tightly coupled to EVENT_BLOCKING_TIMEOUT_SEC (which is already env-configurable), operators who tune the blocking timeout upward will silently get false-positive 503s. An env knob (or derivation from EVENT_BLOCKING_TIMEOUT_SEC * N) would avoid this.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/worker/src/index.ts` at line 29, EVENTS_HEARTBEAT_STALE_MS is hard-coded
to 60_000 and can become inconsistent with the env-configured
EVENT_BLOCKING_TIMEOUT_SEC; make the stale threshold configurable by reading an
env var (e.g., EVENTS_HEARTBEAT_STALE_MS) or derive it from
EVENT_BLOCKING_TIMEOUT_SEC (e.g., multiply by a safety factor) in the same
module where EVENTS_HEARTBEAT_STALE_MS is defined so operators can tune it via
environment and avoid false-positive 503s when EVENT_BLOCKING_TIMEOUT_SEC is
increased.
78-120: DRY: this /healthcheck handler is a near-verbatim duplicate of the API's healthcheck controller.

The same Redis/DB/CH probe logic, the same dependencies/dependencyErrors/failed|workingDependencies shape, and the same 200/503 decision live in apps/api/src/controllers/healthcheck.controller.ts (lines 12–58). Drift between the two is inevitable. Consider extracting a shared helper (e.g. probeDependencies() returning { status, body, log }) into @openpanel/common or a small @openpanel/health package, and calling it from both endpoints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/worker/src/index.ts` around lines 78 - 120, The worker healthcheck
duplicates the API controller's probe logic; extract the shared probe into a new
helper (e.g. probeDependencies) in a common package (like `@openpanel/common` or
`@openpanel/health`) that performs the Promise.all probes (using
getRedisCache().ping(), db.$executeRaw`SELECT 1`, chQuery('SELECT 1')), computes
dependencies and dependencyErrors, determines status (200/503), builds
failedDependencies and workingDependencies, and returns a structured result
(suggested shape: { status, body, log }) so both the worker's handler in
apps/worker/src/index.ts and the API's healthcheck.controller.ts can call
probeDependencies(), use the returned status to set response status/body (ready,
dependencies, failedDependencies, workingDependencies), and log via logger.warn
using the returned log payload.
packages/common/src/try-catch.ts (1)
12-14: Unsafe cast of caught value to E.

catch (error) can receive non-Error values (strings, objects, null), but this blindly casts to E (default Error). Current callers use error?.message with optional chaining so it degrades to undefined rather than crashing, but a safer default would be to normalize via error instanceof Error ? error : new Error(String(error)) when E defaults to Error. Not blocking — flagging for future hardening.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/common/src/try-catch.ts` around lines 12 - 14, The catch block
currently unsafely casts the caught value to E (catch (error) { return { ok:
false, data: null, error: error as E }; }), which can be non-Error; instead
normalize the caught value before returning: when the generic E is the default
Error type, convert non-Error values to an Error (e.g., error instanceof Error ?
error : new Error(String(error))) and then return that normalizedError (cast to
E only as a last step); update the catch in try-catch.ts to build a safe
normalizedError and return { ok: false, data: null, error: normalizedError as E
} so callers can reliably access .message without runtime surprises.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/worker/src/boot-workers.ts`:
- Around line 157-158: Remove the non-existent drained listener (the
worker.on('drained', markEventsActivity) line) and instead implement a periodic
idle heartbeat: set up a timer (e.g., setInterval) that calls markEventsActivity
at a safe interval shorter than the readiness timeout (e.g., 30s) so idle
workers stay healthy; start this timer when the Worker (groupmq Worker) is
initialized and ensure you clear it on terminal events such as 'closed', 'error'
or 'graceful-timeout' to avoid leaks, leaving the existing
worker.on('completed', markEventsActivity) intact for job completion updates.

In `@apps/worker/src/index.ts`:
- Around line 131-152: The readiness endpoint /healthz/ready can return 200
before workers are booted because app.listen(PORT, ...) starts the HTTP server
prior to bootWorkers()/bootCron() calling enableEventsHeartbeat(); change the
startup sequencing so readiness is false until boot complete: either delay
app.listen until after bootWorkers()/bootCron() resolve or add an explicit
booted flag (e.g., bootComplete = false -> true after bootWorkers()/bootCron()
finish) and update the /healthz/ready handler (which currently uses
isShuttingDown() and getEventsHeartbeat()/EVENTS_HEARTBEAT_STALE_MS) to return
503 when bootComplete is false; ensure the flag is set only after
enableEventsHeartbeat() is active so the endpoint never reports ready before
workers are running.

---

Nitpick comments:
In `@apps/worker/src/index.ts`:
- Line 29: EVENTS_HEARTBEAT_STALE_MS is hard-coded to 60_000 and can become
inconsistent with the env-configured EVENT_BLOCKING_TIMEOUT_SEC; make the stale
threshold configurable by reading an env var (e.g., EVENTS_HEARTBEAT_STALE_MS)
or derive it from EVENT_BLOCKING_TIMEOUT_SEC (e.g., multiply by a safety factor)
in the same module where EVENTS_HEARTBEAT_STALE_MS is defined so operators can
tune it via environment and avoid false-positive 503s when
EVENT_BLOCKING_TIMEOUT_SEC is increased.
- Around line 78-120: The worker healthcheck duplicates the API controller's
probe logic; extract the shared probe into a new helper (e.g. probeDependencies)
in a common package (like `@openpanel/common` or `@openpanel/health`) that performs
the Promise.all probes (using getRedisCache().ping(), db.$executeRaw`SELECT 1`,
chQuery('SELECT 1')), computes dependencies and dependencyErrors, determines
status (200/503), builds failedDependencies and workingDependencies, and returns
a structured result (suggested shape: { status, body, log }) so both the
worker's handler in apps/worker/src/index.ts and the API's
healthcheck.controller.ts can call probeDependencies(), use the returned status
to set response status/body (ready, dependencies, failedDependencies,
workingDependencies), and log via logger.warn using the returned log payload.

In `@packages/common/src/try-catch.ts`:
- Around line 12-14: The catch block currently unsafely casts the caught value
to E (catch (error) { return { ok: false, data: null, error: error as E }; }),
which can be non-Error; instead normalize the caught value before returning:
when the generic E is the default Error type, convert non-Error values to an
Error (e.g., error instanceof Error ? error : new Error(String(error))) and then
return that normalizedError (cast to E only as a last step); update the catch in
try-catch.ts to build a safe normalizedError and return { ok: false, data: null,
error: normalizedError as E } so callers can reliably access .message without
runtime surprises.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 19defc40-3643-456f-a719-f80a0e74c3f8

📥 Commits

Reviewing files that changed from the base of the PR and between 7cb7299 and 4955767.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (8)

apps/api/src/controllers/healthcheck.controller.ts
apps/worker/src/boot-workers.ts
apps/worker/src/index.ts
apps/worker/src/utils/graceful-shutdown.ts
apps/worker/src/utils/worker-heartbeat.ts
packages/common/index.ts
packages/common/src/try-catch.ts
pnpm-workspace.yaml

coderabbitai · 2026-04-23T20:58:06Z

+    worker.on('completed', markEventsActivity);
+    worker.on('drained', markEventsActivity);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Inspect groupmq's GroupWorker implementation for emitted events. fd -t d '^groupmq$' node_modules 2>/dev/null | head -5 rg -nP "emit\s*\(\s*['\"](completed|drained)" --glob '**/groupmq/**' -C2 2>/dev/null || \ rg -nP "emit\s*\(\s*['\"](completed|drained)" -C2

Repository: Openpanel-dev/openpanel

Length of output: 49

🌐 Web query:

groupmq 2.0.0-next.4 Worker events completed drained

💡 Result:

For “groupmq 2.0.0-next.4 Worker events completed drained” there isn’t enough evidence in the sources I found to answer what GroupMQ emits for “completed” and “drained” worker events in version 2.0.0-next.4 specifically. What I could verify: 1) GroupMQ is a Redis-backed per-group FIFO queue for Node/TypeScript, with a BullMQ-compatible API (site + README). It provides a Worker class and worker options (including concurrency), but the sources I found do not describe any “worker events” named “completed” or “drained”, nor do they document event payload semantics. 2) The only “completed/drained” event behavior I found is for BullMQ (different library). In BullMQ, there is a “drained” event and it has been discussed/changed regarding when it emits (e.g., only once when the worker is idle; and older versions emitting more often). This does not establish GroupMQ’s 2.0.0-next.4 behavior. 3) I located GroupMQ releases up to 2.0.0-next.1 (2026-03-12). I did not find a 2.0.0-next.4 release page/changelog entry in the searched results, so I couldn’t corroborate “2.0.0-next.4” content. Conclusion: I cannot confidently state what “completed” and “drained” worker events are in groupmq 2.0.0-next.4, or whether “drained” exists / how it fires. If you want, paste the exact code snippet where you attach the listeners (e.g., worker.on('completed'...) and worker.on('drained'...)) or share the groupmq 2.0.0-next.4 changelog / worker source file path, and I can re-run a targeted search against the exact implementation and produce a definitive, version-specific answer.

Citations:

1: https://github.com/Openpanel-dev/groupmq

2: https://openpanel-dev.github.io/groupmq/

3: https://github.com/Openpanel-dev/groupmq/blob/main/README.md

4: https://github.com/Openpanel-dev/groupmq/releases

5: *drained* event emitting on intermittent basis taskforcesh/bullmq#1121

6: High volume of drained events -- xtrim can't keep up taskforcesh/bullmq-pro-support#69

7: https://github.com/Openpanel-dev/groupmq/blob/main/CHANGELOG.md

🏁 Script executed:

#!/bin/bash # 1. Find package files and check groupmq version fd -type f -name "package*.json" | head -20 | xargs grep -l "groupmq" 2>/dev/null | head -3 # 2. Extract groupmq version from lock/package files fd -type f -name "package*.json" | xargs cat 2>/dev/null | grep -A 2 '"groupmq"' | head -15

Repository: Openpanel-dev/openpanel

Length of output: 408

🏁 Script executed:

#!/bin/bash # 3. Read the boot-workers.ts file mentioned in review to see context cat -n apps/worker/src/boot-workers.ts | sed -n '140,170p'

Repository: Openpanel-dev/openpanel

Length of output: 1416

🏁 Script executed:

#!/bin/bash # 4. Search for groupmq package in node_modules and find its main export find node_modules -maxdepth 2 -type d -name groupmq 2>/dev/null | head -1

Repository: Openpanel-dev/openpanel

Length of output: 49

🏁 Script executed:

#!/bin/bash # 5. Look for GroupWorker class definition in repository rg -n "class GroupWorker" --type ts --type tsx --type js --type jsx

Repository: Openpanel-dev/openpanel

Length of output: 94

🏁 Script executed:

#!/bin/bash # Check for groupmq in package locks and version cat package-lock.json 2>/dev/null | jq '.packages | to_entries[] | select(.key | contains("groupmq")) | {key, version: .value.version}' 2>/dev/null || \ cat pnpm-lock.yaml 2>/dev/null | grep -A 2 "groupmq" | head -10 || \ grep "groupmq" yarn.lock 2>/dev/null | head -5

Repository: Openpanel-dev/openpanel

Length of output: 49

🏁 Script executed:

#!/bin/bash # Find groupmq location (likely in node_modules) find node_modules -maxdepth 2 -name "groupmq" -type d 2>/dev/null

Repository: Openpanel-dev/openpanel

Length of output: 49

🏁 Script executed:

#!/bin/bash # Search for GroupWorker class definition - use correct rg syntax rg -n "class GroupWorker" -t ts -t js

Repository: Openpanel-dev/openpanel

Length of output: 49

🏁 Script executed:

#!/bin/bash # Search for groupmq mentions anywhere in the repo rg "groupmq" -i | head -20

Repository: Openpanel-dev/openpanel

Length of output: 2333

🏁 Script executed:

#!/bin/bash # Check package.json structure ls -la package.json 2>/dev/null && head -30 package.json

Repository: Openpanel-dev/openpanel

Length of output: 1319

🏁 Script executed:

#!/bin/bash # Find where GroupWorker is imported from rg "GroupWorker" -B 2 -A 2 | head -40

Repository: Openpanel-dev/openpanel

Length of output: 1407

🏁 Script executed:

#!/bin/bash # Check if groupmq is a workspace package (internal) or external dependency ls -la packages/groupmq 2>/dev/null || echo "Not found in packages" cat pnpm-workspace.yaml | grep -A 20 "packages:"

Repository: Openpanel-dev/openpanel

Length of output: 385

🏁 Script executed:

#!/bin/bash # Check the actual groupmq source - look for it in node_modules or as workspace package find . -path ./node_modules -prune -o -type f -name "*.ts" -print | xargs grep -l "class GroupWorker\|export.*GroupWorker" 2>/dev/null | head -5

Repository: Openpanel-dev/openpanel

Length of output: 49

🌐 Web query:

Openpanel-dev groupmq 2.0.0-next.4 Worker class events completed drained emit

💡 Result:

For Openpanel-dev/groupmq, the “drained” and “completed” worker/job lifecycle events you mention are not found in the official GroupMQ docs/pages returned by the research. The only worker events explicitly documented in the Openpanel-dev/groupmq README/docs are: ready, completed, failed, error, closed, and graceful-timeout. Additionally, GroupMQ’s documented job completion flow includes moving a successfully processed job into a :completed retention sorted set and unlocking the group lock, but this is internal state management—not an exposed “drained” event. So, based on the sources located for this query, there is no evidence of a “Worker class events … drained … completed … emit” sequence in groupmq 2.0.0-next.4. What is evidenced: - Worker emits documented events including completed (and other events), and drained is not listed. - Job completion flow marks the job as completed in the queue’s :completed sorted set and unlocks the group. If you can share the exact file/snippet from groupmq 2.0.0-next.4 you’re looking at (or the expected event names), I can re-check precisely whether drained exists under a different name, is part of QueueEvents (not Worker), or was removed/changed in the 2.0.0-next.x line.

Citations:

1: https://github.com/Openpanel-dev/groupmq/tree/4ff4c7571a75e7703fa3d28a059706d4eb671a7e

2: https://git.new/groupmq

3: https://github.com/Openpanel-dev/groupmq

🏁 Script executed:

#!/bin/bash # Look at node_modules to find groupmq source ls -la node_modules/groupmq 2>/dev/null | head -20 # If it exists, check main entry point cat node_modules/groupmq/package.json 2>/dev/null | jq '.main, .exports'

Repository: Openpanel-dev/openpanel

Length of output: 49

🏁 Script executed:

#!/bin/bash # Check the actual event listeners being used in boot-workers.ts grep -n "worker.on\|markEventsActivity" apps/worker/src/boot-workers.ts | head -20

Repository: Openpanel-dev/openpanel

Length of output: 193

🏁 Script executed:

#!/bin/bash # Try to find groupmq Worker implementation in node_modules find node_modules/groupmq -name "*.js" -o -name "*.ts" 2>/dev/null | head -20

Repository: Openpanel-dev/openpanel

Length of output: 49

🌐 Web query:

site:github.com/Openpanel-dev/groupmq Worker class emit events implementation

💡 Result:

In libraries like GroupMQ (a BullMQ-inspired Redis job queue for Node.js/TypeScript), the Worker class emits events such as 'ready', 'completed', 'failed', 'error', 'closed', and 'graceful-timeout'. These are listened to using standard Node.js EventEmitter syntax: worker.on('event', handler). Example implementation from GroupMQ documentation: worker.on('ready', => { console.log('Worker is ready'); }); worker.on('completed', (job) => { console.log('Job completed:', job.id); }); worker.on('failed', (job) => { console.log('Job failed:', job.id, job.failedReason); }); worker.on('error', (error) => { console.error('Worker error:', error); }); worker.on('closed', => { console.log('Worker closed'); }); worker.on('graceful-timeout', (job) => { console.log('Job exceeded graceful timeout:', job.id); }); // Remove listeners worker.off('completed', handler); worker.removeAllListeners; The Worker class extends Node.js EventEmitter, allowing it to emit these lifecycle and job status events for monitoring and handling in queue processing systems.

Citations:

1: https://github.com/Openpanel-dev/groupmq/tree/4ff4c7571a75e7703fa3d28a059706d4eb671a7e

2: https://github.com/Openpanel-dev/groupmq/blob/4ff4c7571a75e7703fa3d28a059706d4eb671a7e/README.md

3: https://github.com/Openpanel-dev/groupmq/blob/main/README.md

🏁 Script executed:

#!/bin/bash # Check the actual GroupMQ source code for Worker.emit calls # GroupMQ is at catalog version 2.0.0-next.4, search for its source on GitHub # Let's look at the local boot-workers more carefully to understand the context cat -n apps/worker/src/boot-workers.ts | sed -n '1,50p'

Repository: Openpanel-dev/openpanel

Length of output: 2182

🏁 Script executed:

#!/bin/bash # Check what markEventsActivity does and how often it's called rg -n "markEventsActivity" -B 5 -A 5 apps/worker/src/

Repository: Openpanel-dev/openpanel

Length of output: 2485

🌐 Web query:

site:github.com/Openpanel-dev/groupmq "drained" event Worker emit

💡 Result:

No evidence found of a "drained" event emitted by a Worker in any job queue library, including BullMQ, groupmq, Bee-Queue, or others. Searches across official documentation and code repositories (primarily groupmq README and BullMQ-inspired libraries) show Worker events like 'ready', 'completed', 'failed', 'error', 'closed', and 'graceful-timeout', but no 'drained' event. The query may refer to a custom implementation, a different library, or a misremembered event name. In Bee-Queue, 'drained' exists but as a queue event when no jobs remain, not emitted by Worker.

Citations:

1: https://github.com/Openpanel-dev/groupmq/blob/4ff4c7571a75e7703fa3d28a059706d4eb671a7e/README.md

2: Openpanel-dev/groupmq@66a73f4

3: https://github.com/Openpanel-dev/groupmq/tree/4ff4c7571a75e7703fa3d28a059706d4eb671a7e

🏁 Script executed:

#!/bin/bash # Let's check the actual GroupMQ source from the commit referenced in web results # Look for any emit calls in groupmq that mention "drained" rg "emit.*drained" --type ts --type js 2>/dev/null || echo "No 'drained' emit found" # Also check what events ARE emitted by the Worker class rg "this\.emit|worker\.emit" apps/worker/src/boot-workers.ts

Repository: Openpanel-dev/openpanel

Length of output: 90

🌐 Web query:

site:github.com/Openpanel-dev/groupmq Worker.emit all events 2.0.0-next

💡 Result:

In the groupmq library (a Redis-backed per-group FIFO queue for Node.js/TypeScript), Worker instances extend TypedEventEmitter and emit the following events: - 'ready': Emitted when the worker is ready. - 'completed': Emitted when a job completes, with the Job object. - 'failed': Emitted when a job fails, with the Job object and failedReason. - 'error': Emitted on worker errors, with the Error. - 'closed': Emitted when the worker closes. - 'graceful-timeout': Emitted when a job exceeds graceful timeout, with the Job object. No version 2.0.0-next or Cloudflare Workers context found matching the query. Latest groupmq version is v1.1.0 (Nov 2025), with next tags like 1.1.1-next.1. Use worker.on(event, handler) to listen and worker.off or worker.removeAllListeners to remove.

Citations:

1: https://github.com/Openpanel-dev/groupmq/blob/4ff4c7571a75e7703fa3d28a059706d4eb671a7e/README.md

2: https://github.com/Openpanel-dev/groupmq/tree/4ff4c7571a75e7703fa3d28a059706d4eb671a7e

3: https://github.com/Openpanel-dev/groupmq

4: https://github.com/Openpanel-dev/groupmq/blob/main/README.md

Remove the worker.on('drained') listener — the event does not exist in groupmq 2.0.0-next.4.

The groupmq Worker class emits only: 'ready', 'completed', 'failed', 'error', 'closed', and 'graceful-timeout'. There is no 'drained' event. This means the heartbeat will only refresh when jobs complete; during idle periods (no jobs in queue), markEventsActivity() never fires, and the readiness probe will report 503 after 60 seconds.

Either implement a separate idle heartbeat mechanism (e.g., a periodic timer that calls markEventsActivity() at shorter intervals) or accept that the readiness probe will cycle to unhealthy during sustained idle periods.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/boot-workers.ts` around lines 157 - 158, Remove the non-existent drained listener (the worker.on('drained', markEventsActivity) line) and instead implement a periodic idle heartbeat: set up a timer (e.g., setInterval) that calls markEventsActivity at a safe interval shorter than the readiness timeout (e.g., 30s) so idle workers stay healthy; start this timer when the Worker (groupmq Worker) is initialized and ensure you clear it on terminal events such as 'closed', 'error' or 'graceful-timeout' to avoid leaks, leaving the existing worker.on('completed', markEventsActivity) intact for job completion updates.

coderabbitai · 2026-04-23T20:58:06Z

+  app.get('/healthz/ready', (req, res) => {
+    if (isShuttingDown()) {
+      res.status(503).json({ ready: false, reason: 'shutting down' });
+      return;
+    }
+
+    const { enabled, lastActivityAt } = getEventsHeartbeat();
+    if (enabled) {
+      const idleMs = Date.now() - lastActivityAt;
+      if (idleMs > EVENTS_HEARTBEAT_STALE_MS) {
+        res.status(503).json({
+          ready: false,
+          reason: 'events consumer heartbeat stale',
+          idleMs,
+          thresholdMs: EVENTS_HEARTBEAT_STALE_MS,
+        });
+        return;
+      }
+    }
+
+    res.status(200).json({ ready: true });
  });


⚠️ Potential issue | 🟡 Minor

Startup race: /healthz/ready returns 200 before workers are booted.

app.listen(PORT, …) at line 154 runs before bootWorkers() at line 159, and enableEventsHeartbeat() is only called inside bootWorkers. In the window between the HTTP server starting and bootWorkers() completing, enabled is false, so this endpoint reports ready even though no events consumer is running yet. If K8s scrapes readiness in that window it may route traffic / mark the pod ready prematurely.

Consider flipping an explicit "booted" flag after bootWorkers()/bootCron() resolve and returning 503 until then, or starting the HTTP server after boot.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/worker/src/index.ts` around lines 131 - 152, The readiness endpoint /healthz/ready can return 200 before workers are booted because app.listen(PORT, ...) starts the HTTP server prior to bootWorkers()/bootCron() calling enableEventsHeartbeat(); change the startup sequencing so readiness is false until boot complete: either delay app.listen until after bootWorkers()/bootCron() resolve or add an explicit booted flag (e.g., bootComplete = false -> true after bootWorkers()/bootCron() finish) and update the /healthz/ready handler (which currently uses isShuttingDown() and getEventsHeartbeat()/EVENTS_HEARTBEAT_STALE_MS) to return 503 when bootComplete is false; ensure the flag is set only after enableEventsHeartbeat() is active so the endpoint never reports ready before workers are running.

fix: improve healthcheck

4955767

lindesvard mentioned this pull request Apr 23, 2026

Split liveness and readiness: add a shallow /live endpoint separate from /healthcheck #343

Closed

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

lindesvard merged commit 195b72c into main Apr 24, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve healthcheck#349

fix: improve healthcheck#349
lindesvard merged 1 commit intomainfrom
fix/healthchecks

lindesvard commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		worker.on('completed', markEventsActivity);
		worker.on('drained', markEventsActivity);

Conversation

lindesvard commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lindesvard commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading