Skip to content

feat(monitoring)!: bump kube-prometheus-stack to 83.4.3 and migrate to Gateway API#45

Merged
grifonas merged 2 commits intomainfrom
feat/monitoring-kps-83-gateway-api
Apr 15, 2026
Merged

feat(monitoring)!: bump kube-prometheus-stack to 83.4.3 and migrate to Gateway API#45
grifonas merged 2 commits intomainfrom
feat/monitoring-kps-83-gateway-api

Conversation

@grifonas
Copy link
Copy Markdown
Member

Summary

  • Bump kube-prometheus-stack Helm chart from 72.6.2 → 83.4.3
  • Migrate Grafana and Alertmanager exposure from nginx Ingress to Gateway API HTTPRoute (native route.main.enabled: true in the chart's values schema)
  • Convert additionalPrometheusRulesMap from list form (silently tolerated by 72.x) to proper map form required by 83.x
  • Fix a handful of typos in variable descriptions

Breaking changes

  • Removed variables:
    • grafana_ingress_class_name
    • alert_manager_ingress_class_name
    • cert_manager_cluster_issuer_name (no longer used — Gateway certs are handled by cert-manager watching the Gateway annotation, not per-Ingress)
  • New variables (both required):
    • grafana_gateway_parent_ref — object with name, namespace, section_name fields, identifying the Gateway listener the Grafana HTTPRoute attaches to
    • alert_manager_gateway_parent_ref — same shape for Alertmanager

Callers must pass the parent ref matching their cluster's shared Envoy Gateway (e.g. envoy-tailscale / envoy-gateway-system / https-ctmo for Contiamo EKS).

Verified on Contiamo EKS

Applied via Scalr run `run-v0p7o7eb6ovret179` using the commit-pinned module ref (?ref=082744c). Post-apply checks:

  • Grafana HTTPRoute `monitoring-stack-grafana` Accepted, serving at https://grafana.ctmo.io (HTTP 200 on /api/health)
  • Alertmanager HTTPRoute `monitoring-stack-kube-prom-alertmanager` Accepted, serving at https://alertmanager.ctmo.io (HTTP 200)
  • additionalPrometheusRulesMap now renders as `monitoring-stack-kube-prom-blackbox-exporter` and `monitoring-stack-kube-prom-contiamo-rules` — the previously orphaned `kube-prometheus-stack-0` rule (from the list-form render under 72.x) was replaced cleanly by Helm's 3-way merge
  • No Ingress resources remain in the monitoring namespace

Rollout story

The first two apply attempts errored with PrometheusRule monitoring-stack-kube-prom-%!s(int=0) — Go's fmt badverb output when the 83.x template ran printf "%s" $name with $name being an integer index. Root cause: 72.x accepted additionalPrometheusRulesMap as a list, 83.x expects a map. Commit 082744c fixes this.

A secondary issue: Scalr's module cache kept reusing the stale 9c35f38 render across fresh runs even after the fix landed on the branch. Workaround was to pin the caller's ref to the commit SHA, which changed the go-getter source string and forced a fresh cache entry. Project-ops will be un-pinned back to a proper version tag (e.g. v0.15.0) once this PR merges and release-please tags a release.

Follow-up

  • After merge, let release-please tag a new monitoring module release
  • Update Contiamo/eks-cluster/monitoring.tf in project-ops to reference the new tag instead of ref=082744c

🤖 Generated with Claude Code

@grifonas grifonas requested a review from a team as a code owner April 15, 2026 18:29
@grifonas grifonas merged commit dbd7e2b into main Apr 15, 2026
1 check passed
@grifonas grifonas deleted the feat/monitoring-kps-83-gateway-api branch April 15, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant