Skip to content

fix: embedded etcd should only connect to self#366

Merged
jason-lynch merged 1 commit intomainfrom
fix/PLAT-581/embedded-should-only-connect-to-self
Apr 27, 2026
Merged

fix: embedded etcd should only connect to self#366
jason-lynch merged 1 commit intomainfrom
fix/PLAT-581/embedded-should-only-connect-to-self

Conversation

@jason-lynch
Copy link
Copy Markdown
Member

@jason-lynch jason-lynch commented Apr 24, 2026

Summary

Fixes a bug where the etcd clients in hosts with embedded etcd were configured to connect to all cluster members that existed when the client was initialized. The correct behavior is that hosts with embedded etcd should only connect to themselves. This was the original intent and functionality, but I changed it while implementing support for remote Etcd. I did a few different implementations of the remote Etcd feature before settling on the current one, and I think this was just an accidental inclusion from a different implementation of that feature.

PLAT-581

Testing

This PR includes a cluster test for the issue:

# Using TEST_RERUN_FAILS in case the OS steals our ephemeral ports
make test-cluster TEST_RERUN_FAILS=2 CLUSTER_TEST_RUN=TestRollingAddRemove

To run the scenario by hand:

# start the servers in an uninitialized state
make dev-reset
make dev-watch

# join 4 hosts to the cluster
cp1-req init-cluster
cp2-req join-cluster $(cp1-req get-join-token)
cp3-req join-cluster $(cp1-req get-join-token)

# stop host-1
cp-docker-compose stop host-1

# remove host-1 - remember that this is asynchronous, so watch the logs for it to finish
cp2-req remove-host host-1

# add another host to the cluster
cp4-req join-cluster $(cp2-req get-join-token)

Note that our dev setup is slightly different than the scenario in the ticket and our cluster test because hosts 4-6 are in client mode. It still reproduces and validates the issue because the failure occurs in host-2.

Fixes a bug where the etcd clients in hosts with embedded etcd were
configured to connect to all cluster members that existed when the
client was initialized. This was the original intent and functionality,
but I changed it while implementing support for remote Etcd. I think
this was just an accidental inclusion from a different implementation of
the remote Etcd feature.

PLAT-581
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

The PR adds a new rolling add/remove host test for cluster management and refactors internal state update ordering. Changes include reordering the Cluster.Remove method's state updates and updating EmbeddedEtcd's client configuration to use ClientEndpoints instead of cluster URLs.

Changes

Cohort / File(s) Summary
Cluster Test Suite
clustertest/add_remove_host_test.go, clustertest/cluster_test.go
New TestRollingAddRemove test exercises iterative host addition and removal with cluster initialization. Cluster.Remove method refactored to update host state and recreate client before waiting on removal task completion rather than after.
Embedded Etcd Configuration
server/internal/etcd/embedded.go
GetClient method updated to build clientConfig using ClientEndpoints instead of cluster ClientURLs, with added clarification comment about connecting only to the embedded node's endpoint.

Poem

🐰 A cluster grows, then shrinks with care,
Adding hosts and removing there,
The etcd endpoint takes new ground,
State updates dance without a sound,
Rolling changes, smooth and fair! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing embedded etcd to connect only to itself rather than all cluster members.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensive and follows the template structure with Summary, Testing, and clear documentation of the bug fix with testing instructions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/PLAT-581/embedded-should-only-connect-to-self

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Comment thread clustertest/add_remove_host_test.go
@jason-lynch jason-lynch merged commit 7810b77 into main Apr 27, 2026
4 checks passed
@jason-lynch jason-lynch deleted the fix/PLAT-581/embedded-should-only-connect-to-self branch April 27, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants