feat: add gateway pools for high-availability failover#6050
feat: add gateway pools for high-availability failover#6050
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Introduces gateway pools as an enterprise feature that provides automatic failover when a gateway goes down. The platform picks a random healthy member from the pool at request time. Backend: - New tables: gateway_pools, gateway_pool_memberships - gatewayPoolId column on identity_kubernetes_auths - Full CRUD + membership API under /api/v2/gateway-pools - Connected resources endpoint for pools - RBAC with separate GatewayPool permission subject - Enterprise license gate on all pool endpoints - Audit log events for pool CRUD and membership changes - Random healthy gateway selection via pickRandomHealthyGateway Frontend: - Gateway Pools sub-tab with segmented toggle in Gateways page - Pool detail sheet (v3) with member management and health checks - Reusable GatewayPicker component with grouped sections - Pool health badges, connected resources drawer - Kubernetes auth form updated to support pool selection - Enterprise upgrade prompt for non-enterprise users
- Add org-scoping check when attaching gateway pool to k8s auth - Wrap pool deletion count+delete in transaction to prevent race condition - Add MAX_GATEWAY_POOLS_PER_ORG=50 limit with advisory lock - Add orgId parameter to findByIdWithMembers DAL method - Fix useAddIdentityKubernetesAuth invalidating wrong query key
- Add gatewayPool: false to getDefaultOnPremFeatures free-tier defaults - Add gatewayPool: false to test mock license-fns - Remove MAX_GATEWAY_POOLS_PER_ORG limit and advisory lock
- Use slugSchema for pool names (max 32, lowercase alphanumeric + hyphens) - Match gateway form pattern: manual state + safeParse on submit - Add client-side name uniqueness check before API call - Close Add Gateway popover after selecting a gateway - Update migration column width to match slug max length
a40c5b4 to
c558d9f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6eb9042d6e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- Move connected resources count to DAL subquery (eliminates N+1) - Fix PoolDetailSheet treating null health status as unreachable - Remove dead CreateGatewayPool advisory lock constant - Remove spurious gateway pool invalidation from Azure auth delete - Add warning log for unhealthy pool during k8s auth update - Fix nullish coalescing dropping explicit null on gatewayPoolId clear
| logger.warn( | ||
| { gatewayPoolId: effectiveGatewayPoolId }, | ||
| "No healthy gateways in pool, skipping connectivity validation for k8s auth update" | ||
| ); |
There was a problem hiding this comment.
why do we skip this? it wouldn't be usable right which would be misleading for kubernetes auth logins?
There was a problem hiding this comment.
Because of this scenario:
- User creates a kube auth with a healthy gateway pool and saves it
- At some point all of the gateways in the pool go down
- User wants to make an unrelated change to the kube auth such as adding an allowed namespace
- User can't save his changes anymore because none of the gateways in the pool are healthy
What do you think?
Context
When a gateway goes down, every feature that depends on it (dynamic secrets, k8s auth, PAM) stops working with no failover. Gateway Pools solve this by allowing users to create a named collection of gateways sharing network access. The platform picks a random healthy member at request time, providing automatic failover.
This is an enterprise-only feature. V1 scope: only Kubernetes auth supports pool selection. Other consumers (dynamic secrets, PAM, app connections) keep gateway-only selection and will be added in follow-ups.
New API endpoints:
POST/GET/PATCH/DELETE /api/v1/gateway-pools(pool CRUD)POST/DELETE /api/v1/gateway-pools/:poolId/memberships(member management)GET /api/v1/gateway-pools/:poolId/resources(connected resources)gatewayPoolIdDatabase changes:
gateway_poolstable (id, orgId, name)gateway_pool_membershipsjoin table (many-to-many, CASCADE deletes)gatewayPoolIdcolumn onidentity_kubernetes_auths(FK, SET NULL on delete)Screenshots
Steps to verify the change
Type
Checklist
type(scope): short description(scope is optional, e.g.,fix: prevent crash on syncorfix(api): handle null response).