Skip to content

Add BLOCKED_ASYNC blocking type and orchestrate cluster protocol dispatchers#3536

Open
zuiderkwast wants to merge 2 commits intovalkey-io:cluster-v2from
zuiderkwast:blocked-async
Open

Add BLOCKED_ASYNC blocking type and orchestrate cluster protocol dispatchers#3536
zuiderkwast wants to merge 2 commits intovalkey-io:cluster-v2from
zuiderkwast:blocked-async

Conversation

@zuiderkwast
Copy link
Copy Markdown
Contributor

Add a new blocking type for async operations. During a command execution, the client can be blocked and later unblocked, either during the same command execution (synchronously) or unblocked later (asynchronously).

Functions added in blocked.c API:

  • blockClientAsync: Block the client and return a handle.
  • consumeBlockedClientAsyncHandle: Returns the client associated with a handle, if it's still connected. If the client has disconnected, NULL is returned. Called before sending replies and unblocking the client.
  • unblockClientAsync: Called after adding reply (addReply) to the client.

This allows cluster protocol implementations (e.g. Raft cluster) to defer the reply until the operation is committed, while the legacy gossip implementation completes synchronously. The cluster implementation itself doens't need to bother with the blocking. It just calls the completion callbacks whenever it likes.

Cluster bus type callbacks orchestrated with the blocking: slotChange (ADDSLOTS, DELSLOTS, ADDSLOTSRANGE, DELSLOTSRANGE, FLUSHSLOTS, SETSLOT NODE), meet, setReplicaOf, forgetNode, and failover.

Tests for the blocking mechanism itself are added using a new DEBUG command variant:

DEBUG SLEEP <seconds> ASYNC

Add a new blocking type for async operations (e.g. Raft commit).
When a client is blocked with BLOCKED_ASYNC, the operation handle
is owned by the caller. On disconnect or timeout, no special cleanup
is needed — the handle will be consumed later and the client lookup
will return NULL.

Add blockedAsyncCreate/blockedAsyncConsume API in blocked.c. An
async handle (opaque blockedAsyncHandle struct) wraps a client ID
so that a completion callback can safely look up the client even
if it disconnected while waiting.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Block the client using blockClientAsync before calling vtable
callbacks that may complete asynchronously: slotChange (ADDSLOTS,
DELSLOTS, ADDSLOTSRANGE, DELSLOTSRANGE, FLUSHSLOTS, SETSLOT NODE),
meet, setReplicaOf, forgetNode, and failover. The completion
callbacks use consumeBlockedClientAsyncHandle and unblockClientAsync.

This allows protocol implementations (e.g. Raft) to defer the
reply until the operation is committed, while the legacy gossip
implementation completes synchronously inside call().

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@zuiderkwast zuiderkwast changed the title Add BLOCKED_ASYNC blocking type and orchestrate cluster command dispatchers Add BLOCKED_ASYNC blocking type and orchestrate cluster protocol dispatchers Apr 19, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2026

Codecov Report

❌ Patch coverage is 93.87755% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (6b93134) to head (ba75641).

Files with missing lines Patch % Lines
src/blocked.c 87.09% 4 Missing ⚠️
src/cluster.c 94.87% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##           cluster-v2    #3536      +/-   ##
==============================================
+ Coverage       76.59%   76.61%   +0.01%     
==============================================
  Files             160      160              
  Lines           79288    79370      +82     
==============================================
+ Hits            60733    60807      +74     
- Misses          18555    18563       +8     
Files with missing lines Coverage Δ
src/debug.c 55.87% <100.00%> (+1.03%) ⬆️
src/server.c 89.59% <100.00%> (+0.03%) ⬆️
src/server.h 100.00% <ø> (ø)
src/cluster.c 90.73% <94.87%> (-0.05%) ⬇️
src/blocked.c 90.75% <87.09%> (-0.38%) ⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@zuiderkwast
Copy link
Copy Markdown
Contributor Author

One downside of making commands like CLUSTER ADDSLOTS blocking is that they can't be used in MULTI-EXEC. valkey-cli --cluster fix does that.

We don't have to mark these commands with NO_MULTI if we just fail them if instead detect that we're in multi and fail explicitly in clusterCommand() only when Raft-cluster is used. Maybe we can live with that? Hopefully, clusters with Raft and ASM don't need to be fixed...

The alternative is to keep CLUSTER MEET, CLUSTER ADDSLOTS and similar non-blocking like in legacy, returning OK immediately. Test framework's start_cluster sends CLUSTER ADDSLOTSRANGE directly after CLUSTER MEET and with gossip this works. Control planes may do the same. Maybe we can make that work in Raft too, so perhaps I should mark this PR as a draft, or perhaps we can go with a hybrid of some form, making some commands blocking and others not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant