fix: update existing instances first in node update#368
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughE2E test added for replica addition functionality. Core functions updated to accept current state and distinguish existing versus new replicas. Operation sequencing changed to update existing replicas first. Golden test fixtures updated to reflect new operation expectations. Node update strategy signatures modified to support state-aware replica handling. Changes
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Category | Results |
|---|---|
| Complexity | 1 medium |
🟢 Metrics 0 complexity · 0 duplication
Metric Results Complexity 0 Duplication 0
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
server/internal/database/operations/update_nodes_test.go (1)
31-37: Nit: emptyNodeResource{}in start state is unusual.Other test cases populate
Name/InstanceIDson the start-stateNodeResource. Here it's left empty (just to satisfy state structure). For consistency with the surrounding cases, consider populating it like the others — purely cosmetic.♻️ Optional consistency tweak
start: makeState(t, []resource.Resource{ instance1.Instance, - &database.NodeResource{}, + &database.NodeResource{ + Name: "n1", + InstanceIDs: []string{instance1.InstanceID()}, + }, }, instance1.InstanceDependencies, ),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/database/operations/update_nodes_test.go` around lines 31 - 37, The start state for this test uses an empty &database.NodeResource{} which is inconsistent with other cases; update the test's start state (the makeState call that currently includes &database.NodeResource{}) to populate the NodeResource's Name and InstanceIDs fields (matching the values used in surrounding cases) so the initial state mirrors other tests and removes the empty placeholder.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/add_replica_test.go`:
- Around line 62-85: The call to db.Update in add_replica_test.go ignores the
returned error from DatabaseFixture.Update; change the call to capture the error
and fail the test on error (e.g., err := db.Update(...); require.NoError(t, err)
or if err != nil { t.Fatalf(...) }) so the test stops with a clear message when
the replica-add update fails; locate the db.Update call in add_replica_test.go
and replace the discarded return with an explicit error check using the test
helper (require.NoError or t.Fatalf) to surface the update failure immediately.
---
Nitpick comments:
In `@server/internal/database/operations/update_nodes_test.go`:
- Around line 31-37: The start state for this test uses an empty
&database.NodeResource{} which is inconsistent with other cases; update the
test's start state (the makeState call that currently includes
&database.NodeResource{}) to populate the NodeResource's Name and InstanceIDs
fields (matching the values used in surrounding cases) so the initial state
mirrors other tests and removes the empty placeholder.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3a4319b2-e5d2-4e44-bc63-2eaa3bde69a5
📒 Files selected for processing (9)
e2e/add_replica_test.goserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_a_replica.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_a_replica_with_primary_update.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_concurrent.jsonserver/internal/database/operations/golden_test/TestUpdateDatabase/adding_multiple_replicas_rolling.jsonserver/internal/database/operations/update_database.goserver/internal/database/operations/update_database_test.goserver/internal/database/operations/update_nodes.goserver/internal/database/operations/update_nodes_test.go
Modifies the node update process to ensure that existing instances are updated before adding any new replicas. This ensures that we apply configuration updates to the primary and other existing replicas first. This fixes a bug for systemd where the new replica could be unable to bootstrap from the primary due to missing pg_hba.conf entries. PLAT-566
cfc62b5 to
95f6441
Compare
moizpgedge
left a comment
There was a problem hiding this comment.
Tested the exact ticket scenario on dev-lima . Created a 3-node database, deleted n3, then reused the same host as a replica for n2.
Create 3-node database:
cp1-req create-database '{
"spec": {
"database_name": "plat566",
"database_users": [{"username": "admin", "password": "password", "db_owner": true, "attributes": ["LOGIN", "SUPERUSER"]}],
"port": 0, "patroni_port": 0,
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] },
{ "name": "n2", "host_ids": ["host-2"] },
{ "name": "n3", "host_ids": ["host-3"] }
]
}
}'
Delete n3:
cp1-req update-database 230fbbce-6644-465c-8e49-d2cb63fbd116 '{
"spec": {
"database_name": "plat566",
"database_users": [{"username": "admin", "db_owner": true, "attributes": ["LOGIN", "SUPERUSER"]}],
"port": 0, "patroni_port": 0,
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] },
{ "name": "n2", "host_ids": ["host-2"] }
]
}
}'
Confirmed n3 fully cleaned up via task log 019dd469-9309-7555-9b86-5d48e59201a5.
Reuse host-3 as n2 replica:
cp1-req update-database 230fbbce-6644-465c-8e49-d2cb63fbd116 '{
"spec": {
"database_name": "plat566",
"database_users": [{"username": "admin", "db_owner": true, "attributes": ["LOGIN", "SUPERUSER"]}],
"port": 0, "patroni_port": 0,
"nodes": [
{ "name": "n1", "host_ids": ["host-1"] },
{ "name": "n2", "host_ids": ["host-2", "host-3"] }
]
}
}'
Watch task:
cp-follow-task -s database -e 230fbbce-6644-465c-8e49-d2cb63fbd116 -t 019dd46e-7a97-7ef2-a4be-e2daf45b65aa
**Verify replica:**
curl -s http://localhost:3010/v1/databases/230fbbce-6644-465c-8e49-d2cb63fbd116 | jq '.instances[] | select(.node_name == "n2" and .host_id == "host-3")'
Replica 230fbbce-6644-465c-8e49-d2cb63fbd116-n2-ant97dj4 (host-3 reused) came up with:
"patroni_state": "running",
"role": "replica",
"state": "available"
Unit tests:
go test ./server/internal/database/operations/...
All pass including the new adding_a_replica and adding_a_replica_with_primary_update cases.
APPROVED
Summary
Modifies the node update process to ensure that existing instances are updated before adding any new replicas. This ensures that we apply configuration updates to the primary and other existing replicas first.
This fixes a bug for systemd where the new replica could be unable to bootstrap from the primary due to missing pg_hba.conf entries.
Testing
This issue was most noticeable on systemd. To run the E2E test:
To reproduce the issue by hand:
Notes for Reviewers
The E2E test that I added includes a second node. That wasn't necessary to reproduce this particular issue, but including it made a better general test.
PLAT-566