Skip to content

[clickhouse] Skip emitting empty storage_health payloads#23553

Merged
sangeetashivaji merged 3 commits intomasterfrom
sangeeta.shivajirao/clickhouse-skip-empty-storage-health
May 1, 2026
Merged

[clickhouse] Skip emitting empty storage_health payloads#23553
sangeetashivaji merged 3 commits intomasterfrom
sangeeta.shivajirao/clickhouse-skip-empty-storage-health

Conversation

@sangeetashivaji
Copy link
Copy Markdown
Contributor

@sangeetashivaji sangeetashivaji commented Apr 30, 2026

Summary

Skip the storage_health event emission in ClickhousePartsAndMerges._emit_events when every collection (parts, merges, mutations, replication queue, detached parts, thresholds) is empty. Previously the integration emitted a payload every collection cycle regardless of whether the queries returned anything, including when every collector caught an exception and returned [].

Why this matters

The integration was sending one storage_health event per collection cycle (default 60s) for every
monitored ClickHouse instance, even when there was nothing to report. Two scenarios where this is
undesirable:

  • Idle / empty instances: a fresh ClickHouse with no user tables produces no useful storage_health
    data, but the integration was still emitting an empty payload every minute.
  • Total collection failure: when every collector hits an exception (auth issue, network blip,
    ClickHouse restart, restricted system tables on managed services), all eight collectors return [].
    The integration was emitting a payload claiming "everything is empty," which is misleading — we don't
    actually know the state of the database in that case.

The new guard makes the agent emit only when at least one collection has rows. For any production
instance with actual data, this is a no-op.

Behavior in real environments

The guard short-circuits only when every collection is empty:

Scenario Result
Healthy production with data parts and thresholds populated → emits as before
Healthy ClickHouse Cloud with data parts populated → emits as before
Idle moment, busy database parts/thresholds populated → emits as before
Fresh self-hosted, no user tables yet thresholds populated → emits thresholds-only payload
Fresh ClickHouse Cloud, no user data, restricted system tables all empty → skips
All collection queries fail (auth/network/restart) all empty → skips

For any production customer with actual data, this is a no-op.

Test plan

  • test_emit_events_shape — all collections populated, payload contains every section
  • test_emit_events_uses_query_activity_channel_not_metadata — emits via the query-activity
    channel rather than the metadata channel (updated to use non-empty input)
  • test_emit_events_skips_when_all_collections_empty(new) every collection empty, no
    emission
  • test_collect_and_emit_skips_when_all_collectors_empty(new) end-to-end: every _collect_*
    returns empty, _collect_and_emit produces no emission
  • test_collect_and_emit_runs_with_partial_failures — pre-existing, still passes: when only one
    collector returns rows, emission still happens
$ ddev --no-interactive test clickhouse -- -k "emit_events"
3 passed

🤖 Generated with Claude Code

Skip the database_monitoring_query_activity emission when every parts-and-merges
collection (parts, merges, mutations, replication queue, detached parts, thresholds)
is empty. This is the case for fresh ClickHouse Cloud instances with no user data
and restricted system tables, and for transient total-collection-failure scenarios.

Empty storage_health payloads previously wedged dbm-events-processor partitions on
the dd-go side; that processor is being fixed independently in
DataDog/dd-go#234748. This change also avoids emitting one useless Kafka message
per collection cycle for any genuinely-idle/empty instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mpty

Pins the wiring between _collect_and_emit and _emit_events for the case where
every collection query fails or returns no rows, which is the scenario that
fires when all queries hit exceptions (auth/network) or on a fresh, restricted
ClickHouse Cloud instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Apr 30, 2026

Validation Report

All 20 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.11%. Comparing base (46de832) to head (b26b6cc).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-official
Copy link
Copy Markdown
Contributor

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 92.91%

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: b26b6cc | Docs | Datadog PR Page | Give us feedback!

mock.patch('datadog_checks.clickhouse.parts_and_merges.datadog_agent') as agent_mock,
):
agent_mock.get_version.return_value = '7.64.0'
job._collect_and_emit()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make every collector empty

@sangeetashivaji sangeetashivaji changed the title [clickhouse][dbm] Skip emitting empty storage_health payloads [clickhouse] Skip emitting empty storage_health payloads Apr 30, 2026
@sangeetashivaji sangeetashivaji added this pull request to the merge queue May 1, 2026
Merged via the queue into master with commit 3c4e189 May 1, 2026
50 checks passed
@sangeetashivaji sangeetashivaji deleted the sangeeta.shivajirao/clickhouse-skip-empty-storage-health branch May 1, 2026 17:15
@dd-octo-sts dd-octo-sts Bot added this to the 7.79.0 milestone May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants