Conversation
|
Claude finished @iamquang95's task in 3m 3s —— View job Code Review
SummaryThis PR implements the DKG reliable-broadcast ( A few issues are noted below. Findings[Medium] TODO comment must be removed before merge
[Medium]
[Medium]
[Low]
[Low] Reviewer comment already addressed
[Low]
Parity Matrix
Tests
Open Questions
|
5b783a5 to
720c50b
Compare
|
@claude use git to fetch the Go reference checkout (charon at v1.7.1) in the working directory, then review the PR again |
|
Claude finished @iamquang95's task in 4m 19s —— View job Code Review (with Go Reference attempt)
|
| Finding | Status |
|---|---|
TODO comment at protocol.rs:20 |
✅ Resolved — extracted to crates/p2p/src/protobuf.rs; peerinfo now uses the shared helper too |
next_op_id.checked_add(1).unwrap_or(0) |
✅ Resolved — now uses wrapping_add(1) (behaviour.rs:88) |
.expect() in merge_known_peers |
✅ Resolved — function now returns Result<Vec<PeerId>> and uses with_context(...)? (examples/bcast.rs:268-284) |
| Dedup check order | Still present — see finding below |
Findings
[Medium] Inbound sig-response write uses RECEIVE_TIMEOUT instead of SEND_TIMEOUT
crates/dkg/src/bcast/handler.rs:413- The signature response is written back to the client using
RECEIVE_TIMEOUT(60 s) instead ofSEND_TIMEOUT(62 s). All outbound operations inon_connection_eventuseSEND_TIMEOUTfor consistency, but this inbound-side write uses the wrong constant. This causes the write to time out 2 seconds earlier than the outbound send path on the other side, and the constant name is semantically misleading (writing is a send, not a receive). - Fix this →
[Low] Dedup runs after check() in handle_inbound_sig_request
crates/dkg/src/bcast/handler.rs:381-404- The current order is: read → lookup handler →
check()→ hash → dedup. A peer that repeatedly submits a request wherecheck()fails will never be recorded in the dedup store. In practice, a failing check request is still rejected, so this is not a correctness issue — but it means a misbehaving peer can repeatedly submit different payloads for the same(peer, msg_id)without ever triggeringDuplicateMismatchingHash. The dedup guard only fires on the second accepted request. Without the Go reference, I can't confirm whether Go performs dedup before or after the application check — but it is worth verifying.
[Low] handle_inbound_broadcast has no dedup guard
crates/dkg/src/bcast/handler.rs:424-462- Inbound broadcast messages (
/msgsubprotocol) verify signatures but do not record a dedup key. If a broadcaster retransmits the same fully-signedBCastMessage(e.g., retry logic), thecallback()is invoked multiple times on the receiver. Thededupstore is threaded into the handler but not used in this path. This may be intentional if callers are required to be idempotent, but it should be confirmed against the Go implementation.
[Low] Broadcaster does not invoke callback on itself
crates/dkg/src/bcast/behaviour.rs/crates/dkg/src/bcast/handler.rs- The broadcast originator signs the message, collects all peer signatures, verifies them, and fans out the
BCastMessageto all other peers. Buthandler.callback()is only triggered byhandle_inbound_broadcast, which only runs on inbound streams. The originating node never receives its own broadcast, so the callback is never invoked locally. This is plausibly correct (the originator already has the message), but differs from a subscriber model where all nodes including the originator receive the finalized broadcast. Worth confirming with Go behavior.
Parity Matrix
| Component | Rust | Go ref needed | Notes |
|---|---|---|---|
Protocol IDs /charon/dkg/bcast/1.0.0/{sig,msg} |
✅ mod.rs:29-32 |
No | Hardcoded strings match expected format |
RECEIVE_TIMEOUT = 60s, SEND_TIMEOUT = 62s |
✅ mod.rs:35-38 |
Yes | Values need confirmation from Go |
Dedup by (peer_id, msg_id) |
✅ handler.rs:43-48 |
Yes | Only on sig path, not bcast path |
hash_any = SHA256(type_url ‖ value) |
✅ protocol.rs:16-21 |
No | Cross-lang test vector present |
sign_any / verify_signatures |
✅ protocol.rs:24-53 |
Yes | Uses secp256k1 via pluto_k1util |
| Sequential broadcast (one at a time) | ✅ behaviour.rs:58 |
Yes | Option<BroadcastState> |
| Fail-on-disconnect during active broadcast | ✅ behaviour.rs:104-113 |
Yes | Checked on ConnectionClosed |
| Only last-connection-closed triggers failure | ✅ behaviour.rs:380-387 |
Yes | Uses HashSet<ConnectionId> per peer |
| Single-node cluster completes immediately | ✅ behaviour.rs:262-265 |
Yes | dispatches.is_empty() path |
Shared write/read_protobuf helper |
✅ p2p/src/protobuf.rs |
N/A | Extracted from peerinfo, now shared |
| Response timeout on sig write | handler.rs:413 uses RECEIVE_TIMEOUT |
Yes | Should be SEND_TIMEOUT — see finding above |
| Broadcaster invokes own callback | ❌ Not implemented | Yes | May be intentional |
| Broadcast dedup | ❌ Missing on /msg path |
Yes | May be intentional |
Tests
protocol.rs::hash_any_matches_go_expected_value— cross-language test vector ✅protocol.rs::verify_signatures_rejects_invalid_count_length_and_order✅component.rs::duplicate_message_id_registration_fails✅behaviour.rs::broadcast_round_trip_and_duplicate_semantics— covers full round-trip, dedup mismatch, and unknown message ID paths ✅- No tests for single-node cluster, broadcaster self-callback, or inbound broadcast dedup
Open Questions (for Go reference verification)
- Does Go's
handle_inbound_sig_requestrun the application check before or after dedup? - Does the Go bcast
/msghandler have a dedup guard? - Does Go invoke the local callback on the broadcasting node?
- Are the 60s/62s timeout constants confirmed from Go source?
- Does Go write the sig response with the "send" timeout or "receive" timeout?
This matches the Charon implementation
Charon doesn't dedup
Yes, they are using 1 Minute, 1 Minute + 2 Second |
|
Should we convert it to "ready for review"? |
| return Poll::Ready(event); | ||
| } | ||
|
|
||
| while let Poll::Ready(Some(command)) = self.command_rx.poll_recv(cx) { |
There was a problem hiding this comment.
| while let Poll::Ready(Some(command)) = self.command_rx.poll_recv(cx) { | |
| if let Poll::Ready(Some(command)) = self.command_rx.poll_recv(cx) { |
There was a problem hiding this comment.
There could be more than 1 command, using while is more efficient without waiting for the next poll
|
Claude finished @iamquang95's task in 3m 0s —— View job Code Review (Updated — 2026-03-30)
|
| Finding | Status |
|---|---|
TODO comment at protocol.rs |
✅ Resolved — extracted to p2p/src/protobuf.rs |
wrapping_add(1) for next_op_id |
✅ Resolved |
merge_known_peers returns Result |
✅ Resolved |
RECEIVE_TIMEOUT on sig response write |
✅ Confirmed intentional (matches Charon) |
Dedup runs after check() |
✅ Confirmed intentional (matches Charon) |
No dedup guard on /msg path |
✅ Confirmed intentional (matches Charon) |
| Broadcaster does not invoke own callback | ✅ Confirmed intentional (matches Charon) |
Status of varex83's 2026-03-27 Review Comments
| Comment | Status |
|---|---|
| Dummy handler for relays | ✅ Resolved — connection_handler_for_peer() returns dummy::ConnectionHandler for non-peers |
local_peer_id to P2PContext |
✅ Resolved — p2p_context.local_peer_id() used throughout |
| Events for debugging | ✅ Resolved — Event::BroadcastCompleted and Event::BroadcastFailed |
| Connection tracking via P2P context | ✅ Resolved — p2p_context.peer_store_lock() |
NotifyHandler::Any |
✅ Resolved — used at behaviour.rs:229,311 |
| Error as enum | ✅ Resolved — Error and Failure enums in error.rs |
while let poll_recv (behaviour.rs:438) |
✅ Resolved |
while let active_futures (handler.rs:314) |
✅ Resolved |
Split poll into separate functions (handler.rs:323) |
✅ Resolved — on_connection_event delegates to handle_fully_negotiated_inbound, handle_fully_negotiated_outbound, handle_dial_upgrade_error |
pub mod handler in mod.rs (marked "Unused") |
|
| Component could live in behaviour impl | |
broadcast() async/event-driven |
Remaining Findings
[Low] pub mod handler exposes nothing — should be pub(crate)
crates/dkg/src/bcast/mod.rs:8pub mod handler;is visible externally, but every item insidehandler(Handler,DedupStore,InEvent,OutEvent,PendingOpen) ispub(crate). Nothing outside the crate can reach these types, making thepubvisibility dead. This is what varex83 flagged as "Unused". Change topub(crate) mod handler(ormod handlerif it is only needed within the module).- Fix this →
[Open] broadcast() is now fire-and-forget — result tracking is caller's responsibility
crates/dkg/src/bcast/component.rs:102-119- The current
broadcast()sends aBroadcastCommandand returnsOk(())immediately — the result arrives later viaEvent::BroadcastCompleted/Event::BroadcastFailed. This matches the event-driven design agreed upon. However, sincebroadcast()no longer surfaces the broadcast result, callers must now drive the swarm event loop and matchEvent::BroadcastFailedto detect failures. The doc comment onbroadcast()should be updated to state this explicitly so callers know theOk(())return only means the command was enqueued, not that it succeeded.
[Open] Component as a separate struct vs. merging into Behaviour impl
crates/dkg/src/bcast/component.rs- varex83's suggestion to move
Componentinto the behaviour impl is still pending. The current split is clean and functional, but the discussion (@iamquang95: "I see, yeah I will try that") was unresolved at the time of this review. If iamquang95 decides to keep the current two-struct design, the rationale (cloneable user-facing handle separate from the swarm-owned behaviour) should be documented on the struct.
Overall Assessment
The implementation is functionally solid and all hard findings from earlier reviews have been resolved. The three items above are low/discussion-level. The two most important things to decide before marking ready-to-review are:
- Whether
pub mod handlerneeds to be restricted (easy fix). - Whether the
Component/Behavioursplit stays or gets merged (design decision between @iamquang95 and @varex83).
Once those are settled, the PR is in good shape to mark ready for review.
This follows rust p2p event-driven design
Only |
varex83
left a comment
There was a problem hiding this comment.
Great! LGTM! Really like the implementation, left few comments
| pub enum Failure { | ||
| /// The operation timed out. | ||
| Timeout, | ||
| /// The remote peer does not support the protocol. | ||
| Unsupported, | ||
| /// The operation failed due to an I/O error. | ||
| Io { | ||
| /// The underlying error message. | ||
| message: String, | ||
| }, | ||
| /// The operation failed for another reason. | ||
| Other { | ||
| /// The underlying error message. | ||
| message: String, | ||
| }, | ||
| } | ||
|
|
||
| impl Failure { | ||
| /// Creates a new [`Failure::Io`] value. | ||
| pub fn io(error: impl fmt::Display) -> Self { | ||
| Self::Io { | ||
| message: error.to_string(), | ||
| } | ||
| } | ||
|
|
||
| /// Creates a new [`Failure::Other`] value. | ||
| pub fn other(error: impl fmt::Display) -> Self { | ||
| Self::Other { | ||
| message: error.to_string(), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl fmt::Display for Failure { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| match self { | ||
| Self::Timeout => f.write_str("operation timed out"), | ||
| Self::Unsupported => f.write_str("protocol negotiation failed"), | ||
| Self::Io { message } => write!(f, "i/o error: {message}"), | ||
| Self::Other { message } => f.write_str(message), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
I think we can use thiserror here, it does pretty much the same
| pub fn set_local_peer_id(&self, peer_id: PeerId) { | ||
| *self | ||
| .local_peer_id | ||
| .write() | ||
| .expect("Failed to write local peer id") = Some(peer_id); | ||
| } |
There was a problem hiding this comment.
I think we can set it once, and make it read-only.
|
|
||
| /// Swarm-owned behaviour for reliable broadcast. | ||
| pub struct Behaviour { | ||
| peers: Arc<Vec<PeerId>>, |
There was a problem hiding this comment.
We can use known_peers from P2PContext

Should fix: #132