Skip to content

Add handshaking for Raft CLUSTER MEET#3532

Open
murphyjacob4 wants to merge 2 commits intovalkey-io:cluster-v2from
murphyjacob4:cluster-v2-handshake
Open

Add handshaking for Raft CLUSTER MEET#3532
murphyjacob4 wants to merge 2 commits intovalkey-io:cluster-v2from
murphyjacob4:cluster-v2-handshake

Conversation

@murphyjacob4
Copy link
Copy Markdown
Contributor

Building off of #3530 - adds a simple handshake request/response between nodes.

Does not perform any Raft group mutations - the nodes are tracked as RAFT_ROLE_NON_MEMBER and are subject to removal by timeout.

The expectation is that nodes will decide individually who will join each other based on a somewhat deterministic mechanism to be implemented in a future PR.

Signed-off-by: Jacob Murphy <jkmurphy@google.com>
@murphyjacob4 murphyjacob4 force-pushed the cluster-v2-handshake branch from 1319c7c to 9417eeb Compare April 18, 2026 00:43
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

❌ Patch coverage is 70.60703% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.51%. Comparing base (6b93134) to head (daba9d1).

Files with missing lines Patch % Lines
src/cluster_raft.c 67.94% 92 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           cluster-v2    #3532      +/-   ##
==============================================
- Coverage       76.59%   76.51%   -0.09%     
==============================================
  Files             160      161       +1     
  Lines           79288    79582     +294     
==============================================
+ Hits            60733    60890     +157     
- Misses          18555    18692     +137     
Files with missing lines Coverage Δ
src/cluster.c 90.66% <100.00%> (-0.12%) ⬇️
src/cluster_legacy.c 91.24% <100.00%> (-0.51%) ⬇️
src/cluster_state.c 88.74% <100.00%> (+0.63%) ⬆️
src/config.c 78.33% <ø> (ø)
src/server.c 89.45% <100.00%> (-0.11%) ⬇️
src/server.h 100.00% <ø> (ø)
src/cluster_raft.c 67.94% <67.94%> (ø)

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep both our prototypes alive for a while to discover parth forward and compare them. I think we can wait with merging this to cluster-v2.

Yours has the key-value style Raft with CAS, binary protocol. Mine has domain-specific entry types, text protocol. Both probably have pros and cons. I don't feel strongly about the text/binary protocol nor the CAS style. I picked what seemed to be the simplest possible protocol to start with because I wanted to move forward to see how much of the existing cluster tests I could get passing. I believe the learnings from making all existing things work (automatic failover with ranks, manual failover with pause writes, etc) has value regardless, and I think it'd be not that hard to change the wire protocol later.

However, clusterDelSlot needs the refactoring from this PR. I'd like to merge it. It should be strait-forward to extract it from this PR. I can extract it and open it as a separate PR...

Comment thread src/cluster_bus.h

/* Clean up any protocol-specific data associated with a node before it is deleted.
* If NULL, no protocol-specific cleanup is performed. */
void (*cleanupNode)(clusterNode *node);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice this missing piece too.

cluster_link.c calls clusterDelNode which is defined in cluster_legacy.c but declared in cluster_state.h. It's incomplete work from the cluster protocol separation refactoring.

I see you moved clusterDelNode to cluster_state.c and made it call this callback.

It'd be good to have this as a separate commit or PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted this part to a separate PR now: #3542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants