Add handshaking for Raft CLUSTER MEET#3532
Add handshaking for Raft CLUSTER MEET#3532murphyjacob4 wants to merge 2 commits intovalkey-io:cluster-v2from
Conversation
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
1319c7c to
9417eeb
Compare
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
9417eeb to
daba9d1
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## cluster-v2 #3532 +/- ##
==============================================
- Coverage 76.59% 76.51% -0.09%
==============================================
Files 160 161 +1
Lines 79288 79582 +294
==============================================
+ Hits 60733 60890 +157
- Misses 18555 18692 +137
🚀 New features to boost your workflow:
|
zuiderkwast
left a comment
There was a problem hiding this comment.
I think we can keep both our prototypes alive for a while to discover parth forward and compare them. I think we can wait with merging this to cluster-v2.
Yours has the key-value style Raft with CAS, binary protocol. Mine has domain-specific entry types, text protocol. Both probably have pros and cons. I don't feel strongly about the text/binary protocol nor the CAS style. I picked what seemed to be the simplest possible protocol to start with because I wanted to move forward to see how much of the existing cluster tests I could get passing. I believe the learnings from making all existing things work (automatic failover with ranks, manual failover with pause writes, etc) has value regardless, and I think it'd be not that hard to change the wire protocol later.
However, clusterDelSlot needs the refactoring from this PR. I'd like to merge it. It should be strait-forward to extract it from this PR. I can extract it and open it as a separate PR...
|
|
||
| /* Clean up any protocol-specific data associated with a node before it is deleted. | ||
| * If NULL, no protocol-specific cleanup is performed. */ | ||
| void (*cleanupNode)(clusterNode *node); |
There was a problem hiding this comment.
I notice this missing piece too.
cluster_link.c calls clusterDelNode which is defined in cluster_legacy.c but declared in cluster_state.h. It's incomplete work from the cluster protocol separation refactoring.
I see you moved clusterDelNode to cluster_state.c and made it call this callback.
It'd be good to have this as a separate commit or PR.
There was a problem hiding this comment.
I've extracted this part to a separate PR now: #3542
Building off of #3530 - adds a simple handshake request/response between nodes.
Does not perform any Raft group mutations - the nodes are tracked as RAFT_ROLE_NON_MEMBER and are subject to removal by timeout.
The expectation is that nodes will decide individually who will join each other based on a somewhat deterministic mechanism to be implemented in a future PR.