Skip to content

feat: add gateway pools for high-availability failover#6050

Open
bernie-g wants to merge 12 commits intomainfrom
feat/gateway-pools
Open

feat: add gateway pools for high-availability failover#6050
bernie-g wants to merge 12 commits intomainfrom
feat/gateway-pools

Conversation

@bernie-g
Copy link
Copy Markdown
Contributor

@bernie-g bernie-g commented Apr 16, 2026

Context

When a gateway goes down, every feature that depends on it (dynamic secrets, k8s auth, PAM) stops working with no failover. Gateway Pools solve this by allowing users to create a named collection of gateways sharing network access. The platform picks a random healthy member at request time, providing automatic failover.

This is an enterprise-only feature. V1 scope: only Kubernetes auth supports pool selection. Other consumers (dynamic secrets, PAM, app connections) keep gateway-only selection and will be added in follow-ups.

New API endpoints:

  • POST/GET/PATCH/DELETE /api/v2/gateway-pools (pool CRUD)
  • POST/DELETE /api/v2/gateway-pools/:poolId/memberships (member management)
  • GET /api/v2/gateway-pools/:poolId/resources (connected resources)
  • Modified k8s auth attach/update/get endpoints to accept gatewayPoolId

Database changes:

  • New gateway_pools table (id, orgId, name)
  • New gateway_pool_memberships join table (many-to-many, CASCADE deletes)
  • New gatewayPoolId column on identity_kubernetes_auths (FK, SET NULL on delete)

Screenshots

Steps to verify the change

  1. Create a gateway pool in Organization Settings > Networking > Gateways > Gateway Pools tab
  2. Add gateways to the pool via the pool detail sheet
  3. In Identity > Kubernetes Auth, verify the GatewayPicker dropdown shows both individual gateways and pools (with health status)
  4. Attach a k8s auth to a pool, verify it saves correctly
  5. Verify connected resources count updates in the pool table
  6. Verify pool deletion is blocked when referenced by a k8s auth config
  7. Verify non-enterprise users see the upgrade prompt on the Gateway Pools tab

Type

  • Fix
  • Feature
  • Improvement
  • Breaking
  • Docs
  • Chore

Checklist

  • Title follows the conventional commit format: type(scope): short description (scope is optional, e.g., fix: prevent crash on sync or fix(api): handle null response).
  • Tested locally
  • Updated docs (if needed)
  • Updated CLAUDE.md files (if needed)
  • Read the contributing guide

@mintlify
Copy link
Copy Markdown

mintlify bot commented Apr 16, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
infisical 🟢 Ready View Preview Apr 16, 2026, 3:56 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@maidul98
Copy link
Copy Markdown
Collaborator

maidul98 commented Apr 16, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@bernie-g bernie-g changed the base branch from main to feat/gateway-enrollment-tokens April 16, 2026 15:57
@bernie-g bernie-g changed the base branch from feat/gateway-enrollment-tokens to main April 16, 2026 19:37
Introduces gateway pools as an enterprise feature that provides automatic
failover when a gateway goes down. The platform picks a random healthy
member from the pool at request time.

Backend:
- New tables: gateway_pools, gateway_pool_memberships
- gatewayPoolId column on identity_kubernetes_auths
- Full CRUD + membership API under /api/v2/gateway-pools
- Connected resources endpoint for pools
- RBAC with separate GatewayPool permission subject
- Enterprise license gate on all pool endpoints
- Audit log events for pool CRUD and membership changes
- Random healthy gateway selection via pickRandomHealthyGateway

Frontend:
- Gateway Pools sub-tab with segmented toggle in Gateways page
- Pool detail sheet (v3) with member management and health checks
- Reusable GatewayPicker component with grouped sections
- Pool health badges, connected resources drawer
- Kubernetes auth form updated to support pool selection
- Enterprise upgrade prompt for non-enterprise users
- Add org-scoping check when attaching gateway pool to k8s auth
- Wrap pool deletion count+delete in transaction to prevent race condition
- Add MAX_GATEWAY_POOLS_PER_ORG=50 limit with advisory lock
- Add orgId parameter to findByIdWithMembers DAL method
- Fix useAddIdentityKubernetesAuth invalidating wrong query key
- Add gatewayPool: false to getDefaultOnPremFeatures free-tier defaults
- Add gatewayPool: false to test mock license-fns
- Remove MAX_GATEWAY_POOLS_PER_ORG limit and advisory lock
- Use slugSchema for pool names (max 32, lowercase alphanumeric + hyphens)
- Match gateway form pattern: manual state + safeParse on submit
- Add client-side name uniqueness check before API call
- Close Add Gateway popover after selecting a gateway
- Update migration column width to match slug max length
@bernie-g bernie-g marked this pull request as ready for review April 16, 2026 19:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6eb9042d6e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread backend/src/services/identity-kubernetes-auth/identity-kubernetes-auth-service.ts Outdated
Comment thread backend/src/services/identity-kubernetes-auth/identity-kubernetes-auth-service.ts Outdated
Comment thread backend/src/ee/services/gateway-pool/gateway-pool-service.ts
Comment thread backend/src/keystore/keystore.ts Outdated
Comment thread frontend/src/hooks/api/identities/mutations.tsx
- Move connected resources count to DAL subquery (eliminates N+1)
- Fix PoolDetailSheet treating null health status as unreachable
- Remove dead CreateGatewayPool advisory lock constant
- Remove spurious gateway pool invalidation from Azure auth delete
- Add warning log for unhealthy pool during k8s auth update
- Fix nullish coalescing dropping explicit null on gatewayPoolId clear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants