fix: ability to change ports with systemd#354
Conversation
📝 WalkthroughWalkthroughAdded a new end-to-end test validating database port changes across nodes with replication verification. Updated instance resource logic to preserve connection information during port updates and implemented a restart mechanism that checks for pending restarts after Patroni initialization. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
TIP This summary will be updated as you push new changes. Give us feedback
aeb80fb to
6f15645
Compare
Fixes a bug where changing the Patroni or Postgres port would cause the update operation to fail on systemd. This happened because, in the update method, we were setting the connection info based on the updated spec. This change means we copy the connection info from the previous instance state to retrieve the previously active Patroni port. Then, it adds a restart operation to handle cases where the Postgres port has changed. PLAT-548
6f15645 to
c80b7d5
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
server/internal/database/instance_resource.go (2)
103-116: Fallback path can re-introduce the original bug.When
previous.ConnectionInfoisnil(e.g., an older persisted state from beforeConnectionInfoexisted, or any corrupted/partial state), this falls back toupdateConnectionInfo(ctx, rc), which rebuilds the connection info from the new spec — exactly the behavior that caused the systemd port-change failure this PR is fixing. The per-PR-295 learning confirmsupdateConnectionInfointentionally uses the desired spec values.That fallback is acceptable as a last resort, but consider either:
- Logging a warning when it fires so the re-emergence of the bug is observable, or
- Only falling back when
previousitself is the zero/empty resource, and otherwise erroring (stop-the-world) since we know the pre-existing instance had some active ports we can’t discover.Also:
resource.FromContext[*InstanceResource](rc, r.Identifier())assumespreviousis non-nil whenevererr == nil. Worth a quick nil-guard at Line 112 (if previous != nil && previous.ConnectionInfo != nil) since a nilprevioushere would panic while dereferencing, masking the real problem.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/instance_resource.go` around lines 103 - 116, In Update, avoid silently rebuilding connection info from the new spec when a previous saved state is present but missing ConnectionInfo: add a nil-guard for previous (check result of resource.FromContext[*InstanceResource](rc, r.Identifier()) before dereferencing) and change the fallback so it either logs a warning when updateConnectionInfo(ctx, rc) is used as last resort or returns an error (use recordError) when previous is non‑zero but missing ConnectionInfo; reference the Update method, the previous variable from resource.FromContext, the ConnectionInfo field, updateConnectionInfo, and recordError to locate and implement the nil check plus the conditional logging/error behavior.
350-366: LGTM onrestartIfNeeded.Status is fetched once,
PendingRestartis nil-checked before dereference,ScheduleRestartis called exactly once (no retry/idempotency concerns perScheduleRestart's contract inserver/internal/patroni/client.go), and we re-wait for the running state. Errors are wrapped with useful context.One small optional tightening: on a very fast Postgres restart, Patroni can still report
state=runningbetween the schedule request and the actual restart, soWaitForPatroniRunningat Line 361 may return before the restart has started. In practice the subsequentGetPrimaryInstanceIDcall works around it, but you could re-checkPendingRestart == false(or poll until thepostmaster_start_timeadvances) for a tighter guarantee. Optional — the current shape is fine for the stated use case.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/instance_resource.go` around lines 350 - 366, The current restartIfNeeded flow can return before the actual restart has begun because Patroni may briefly report state=running; after calling client.ScheduleRestart and WaitForPatroniRunning, add a short polling loop that re-fetches client.GetInstanceStatus (using the same ctx) and verifies PendingRestart is either nil or false (or alternatively poll the instance's postmaster_start_time until it advances) with a small timeout/retry interval; update restartIfNeeded to only return nil once PendingRestart clears (or postmaster_start_time advances) and otherwise return a wrapped error on timeout. Use the existing functions GetInstanceStatus, ScheduleRestart, WaitForPatroniRunning and keep the timeout/polling logic confined to restartIfNeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@server/internal/database/instance_resource.go`:
- Around line 103-116: In Update, avoid silently rebuilding connection info from
the new spec when a previous saved state is present but missing ConnectionInfo:
add a nil-guard for previous (check result of
resource.FromContext[*InstanceResource](rc, r.Identifier()) before
dereferencing) and change the fallback so it either logs a warning when
updateConnectionInfo(ctx, rc) is used as last resort or returns an error (use
recordError) when previous is non‑zero but missing ConnectionInfo; reference the
Update method, the previous variable from resource.FromContext, the
ConnectionInfo field, updateConnectionInfo, and recordError to locate and
implement the nil check plus the conditional logging/error behavior.
- Around line 350-366: The current restartIfNeeded flow can return before the
actual restart has begun because Patroni may briefly report state=running; after
calling client.ScheduleRestart and WaitForPatroniRunning, add a short polling
loop that re-fetches client.GetInstanceStatus (using the same ctx) and verifies
PendingRestart is either nil or false (or alternatively poll the instance's
postmaster_start_time until it advances) with a small timeout/retry interval;
update restartIfNeeded to only return nil once PendingRestart clears (or
postmaster_start_time advances) and otherwise return a wrapped error on timeout.
Use the existing functions GetInstanceStatus, ScheduleRestart,
WaitForPatroniRunning and keep the timeout/polling logic confined to
restartIfNeeded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c46ce314-445b-4d92-b5fc-43b90def10d9
📒 Files selected for processing (2)
e2e/port_change_test.goserver/internal/database/instance_resource.go
Summary
Fixes a bug where changing the Patroni or Postgres port would cause the update operation to fail on systemd. This happened because, in the update method, we were setting the connection info based on the updated spec.
This change means we copy the connection info from the previous instance state to retrieve the previously active Patroni port. Then, it adds a restart operation to handle cases where the Postgres port has changed.
Changes
InstanceResource.Updatewith a fallback to the current spec (the previous behavior) when the previous state is malformed.InstanceResourceTesting
The
TestPortChangeE2E exercises this fix:Notes for Reviewers
PLAT-548