feat(nutanix): add support for alerts tracking by NouemanKHAL · Pull Request #23538 · DataDog/integrations-core

NouemanKHAL · 2026-04-30T09:17:12Z

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Replace generic _collect-based alert flow with custom two-mode logic: - First run: fetch only unresolved alerts (isResolved eq false), cache their extIds, and set the lastUpdatedTime cursor - Subsequent runs: fetch alerts by lastUpdatedTime, detect new opens, resolutions, and alerts created-and-resolved between runs Track open alert IDs in persistent cache so resolution detection survives agent restarts.

…-op behavioral tests

Emit nutanix.alert.open=1 each cycle for every unresolved alert, tagged with ext_id, severity, alertType, status, cluster, and source entity. Submit a one-shot 0 when an alert is detected as resolved or deleted, giving each alert its own time series for individual lifecycle monitoring. Reconcile open-alert state against the unresolved-alerts API on every cycle instead of using a persistent cursor cache. Removes orphaned metric emissions when resolutions are missed (agent downtime spanning a resolution, alert deletion, or cache survival across Prism Centrals). Distinguish acknowledged-but-unresolved alerts with ntnx_alert_status:acknowledged and event alert_type "warning". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the single nutanix.alert.open gauge with three per-state metrics that each represent one stage of the alert lifecycle: - nutanix.alert.open: 1 while alert is unresolved + unacknowledged - nutanix.alert.acknowledged: 1 while alert is unresolved + acknowledged - nutanix.alert.resolved: 1 once when alert enters the resolved state State transitions emit an explicit 0 to the previous state's gauge (open->ack zeroes nutanix.alert.open, ack->resolved zeroes nutanix.alert.acknowledged), so per-alert monitor cases recover cleanly when the alert leaves a state. The metric name itself encodes the state, so monitor queries don't need a tag filter to disambiguate. Add an exported monitor template at assets/monitors/alerts.json combining nutanix.alert.open and nutanix.alert.resolved into a per-ext_id formula. Cleanup pass to follow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Monitor template (assets/monitors/alerts.json): - replace placeholder title and description with real text - drop the exported monitor id, restriction_policy, silenced, priority - normalize created_at/last_updated_at to YYYY-MM-DD per convention - replace the personal @ mention in the message with @your-team - add integration:nutanix tag Register the monitor in manifest.json under assets.monitors. Code: - extract _submit_state_metric(alert, state, value) helper to remove the triplicated `gauge(f"alert.{state}", v, tags=_build_alert_tags(...))` pattern from emit_open_alert_metrics, the ack-transition branch in _reconcile_alerts, and _emit_resolution_event. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add three Tier-1 tags to alert events and metrics: - ntnx_originating_cluster_name: distinguishes the cluster the alert originated on from the cluster it's reported against. Critical when Prism Central federates managed clusters; today both cases collapse into ntnx_cluster_name. - ntnx_alert_user_defined: lets operators filter custom alert policies apart from Nutanix's built-in alerts. - ntnx_alert_service: identifies the Nutanix subsystem (e.g. IAMv2) when the alert carries a serviceName (sparse but high-signal). Parameterize _add_cluster_name_tag with a tag_name argument so the same helper handles both clusterUUID and originatingClusterUUID lookups. Drop the redundant ntnx_alert_status tag from per-state metrics: the metric name (nutanix.alert.open / .acknowledged / .resolved) already encodes the state, so the tag adds noise without information. Events keep the tag for filtering in the Events Explorer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- fold emit_open_alert_metrics into _reconcile_alerts (one final loop emits :1 to the current state for each tracked alert), removing a public method and its check.py call site - drop _submit_state_metric's extra_tags parameter (only one caller used it for ntnx_alert_auto_resolved, which is now appended once during _emit_resolution_event's tag construction) - in _emit_resolution_event, build the metric tag set once and reuse it for both the prev-state :0 and the .resolved :1, instead of calling _build_alert_tags three times per resolution - strip narrative docstrings on internal helpers (_reconcile_alerts, _submit_state_metric, _emit_resolution_event) and the inline narrative comments inside _reconcile_alerts; rely on the code being self-evident - replace _fixture_alert's hand-rolled file open with conftest's load_fixture (gains the existing fixture cache for free, drops the os/json imports) - remove test_alerts_collection (subsumed by test_alerts_first_run_collects_only_unresolved, which already asserts every event's status tag, no resolution events on first run, and cache size); fold the only unique assertions (event_type and source_type_name) into the surviving test - delete narrative inline comments throughout test_alerts.py Net: ~120 lines removed, no behavior change, all 119 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous bulk strip removed too much. Restore comments that orient the reader to test phases (first run / second run), explain non-obvious mock setup (acknowledgment override, datetime past fixture window), and state assertion intent (why two distinct cluster tags matter, why we pick an unack'd alert as the transition target). Kept removed: pure narration like "Verify message rendering" and "Verify tags" labels — those just label sections of assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- hoist severity->alert_type map to a module-level constant (_SEVERITY_TO_ALERT_TYPE) so it isn't rebuilt on every _process_alert call; collapses the 5-line dict literal + lookup into a single line at the call site - inline the now-unused `severity` and `status` locals in _process_alert (severity goes inline in the alert_type expression, status goes inline in the _build_alert_tags call); shaves ~4 lines without losing clarity - shrink _build_alert_tags's 5-line docstring to a one-liner - parametrize the two _emit_resolution_event tests (auto vs manual resolution) into a single pytest.mark.parametrize case; drops ~30 lines of fixture-dict duplication while keeping coverage All 119 unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add five tests for scenarios the prior suite didn't deterministically exercise: - ack -> resolved: closes nutanix.alert.acknowledged (not .open) and emits nutanix.alert.resolved=1. Previously only incidentally covered depending on fixture iteration order. - acknowledged -> open (un-ack): symmetric to the existing open->ack transition test; verifies :0 to .acknowledged and :1 to .open. - Filter excludes a previously-tracked alert: documents current behavior — reconciliation treats filter exclusion as resolution and emits a spurious resolution event + nutanix.alert.resolved=1. Operators changing filter rules should be aware. - Empty unresolved list: cold-start no-op (no events, no metrics). - Alert deleted in Nutanix (_get_alert returns None): falls back to the cached alert dict and emits the bare "Resolved" msg_text branch. 124 unit tests pass (up from 119). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NouemanKHAL and others added 12 commits April 15, 2026 18:20

feat(nutanix): add aggregation_key and status tag to alert events

b1aeb26

feat(nutanix): add _emit_resolution_event method for alert resolution

afba61a

test(nutanix): add resolution detection, created-and-resolved, and no…

c85e673

…-op behavioral tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nutanix): add support for alerts tracking#23538

feat(nutanix): add support for alerts tracking#23538
NouemanKHAL wants to merge 12 commits intomasterfrom
noueman/nutanix-alert-resolution-tracking

NouemanKHAL commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NouemanKHAL commented Apr 30, 2026

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant