Skip to content

feat(l1): use permits for p2p requests#6523

Open
iovoid wants to merge 5 commits intomainfrom
feat/p2p-request-permit
Open

feat(l1): use permits for p2p requests#6523
iovoid wants to merge 5 commits intomainfrom
feat/p2p-request-permit

Conversation

@iovoid
Copy link
Copy Markdown
Contributor

@iovoid iovoid commented Apr 23, 2026

Motivation

Currently we have peer selection functions such as get_best_peer which consider the number of in-flight requests to a given peer to ensure it's capped according to what the peer can handle.

However, these functions simply return a peer id, and there is:

  • no guarantee the slot will remain available when the request is actually made
  • several tasks trying to find the a peer at the same time will be given the same answer

To solve this we do an extra inc_requests manually, which is prone to forgetting to do dec_requests and thus leaking peers. This currently happens in several snapsync related functions.

On top of that, the inc_requests/request/dec_requests pattern isn't cancellation-safe.

Description

We solve all these issues at once by having the peer selection functions return a permit to make one request to the peer, similar to what tokio's channels do.

Whenever the permit is dropped (consumed when sending, exiting without using, task is cancelled) the slot is freed. This makes it very hard to unintentionally leak request slots.

iovoid added 4 commits April 23, 2026 16:05
Replaces manual inc_requests/dec_requests calls with a RequestPermit
type that atomically reserves a slot at peer selection time and
releases it via Drop. Closes a class of leaks that used to happen
when spawned workers panicked between inc and dec.

- get_best_peer, get_best_peer_excluding, get_random_peer return
  Option<(H256, PeerConnection, RequestPermit)> and bump the selected
  peer's requests counter under &mut self before returning.
- Permit's Drop fires fire-and-forget dec_requests.
- inc_requests removed from the public protocol trait.
- PeerHandler::make_request helper deleted; callers use
  connection.outgoing_request directly with the permit bound as _permit.
- Four spawn+channel flows (account range, bytecodes, storage range,
  header download) route the permit through the completion channel;
  permit drops on receive or with the spawned task on cancellation.
- request_state_trienodes, request_storage_trienodes take
  RequestPermit instead of PeerTable; unused peer_id parameter dropped.
- update_pivot destructures the permit so it lives through the retry
  loop.

ask_peer_head_number (head-probe path via get_peer_connections) is
intentionally untracked here; the follow-up get_best_n_peers commit
restores tracking.
@github-actions github-actions Bot added the L1 Ethereum client label Apr 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Lines of code report

Total lines added: 56
Total lines removed: 62
Total lines changed: 118

Detailed view
+------------------------------------------------------+-------+------+
| File                                                 | Lines | Diff |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_handler.rs         | 568   | -28  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_table.rs           | 1123  | +56  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/snap/client.rs          | 1165  | -16  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/state.rs   | 388   | -6   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/storage.rs | 616   | -1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/snap_sync.rs       | 1138  | -11  |
+------------------------------------------------------+-------+------+

@iovoid iovoid marked this pull request as ready for review April 24, 2026 12:01
@iovoid iovoid requested a review from a team as a code owner April 24, 2026 12:01
@ethrex-project-sync ethrex-project-sync Bot moved this to In Review in ethrex_l1 Apr 24, 2026
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

This PR introduces a RequestPermit RAII guard to replace manual inc_requests/dec_requests bookkeeping. This is a significant reliability improvement—preventing resource leaks if tasks panic or error out early—and the implementation is largely correct.

Approval with minor suggestions:

  1. Clarify the saturating_sub comment (peer_table.rs:510-512)
    The comment mentions i64::saturating_sub saturating at i64::MIN, but the .max(0) call is what actually clamps the value. Suggest updating to:

    // Clamp at 0: a stale permit drop firing after a peer
    // disconnect+reconnect could otherwise push `requests` negative.
    // (i64::saturating_sub would saturate at i64::MIN without the max.)
  2. Explicit permit drop in ask_peer_head_number (peer_handler.rs:53-103)
    The function takes _permit: RequestPermit but the leading underscore suggests it's unused. Since the permit's drop side-effect is load-bearing, consider renaming to permit (without underscore) and adding an explicit drop(permit); before the final Ok(number)/Err(...) returns, or a comment like // Permit drops here on return, releasing the slot.

  3. Consistency in download_chunk_from_peer (peer_handler.rs:456-462)
    The permit is held for the entire function duration (dropped on return). This is correct if the caller expects the slot to remain occupied during processing, but differs from other workers (e.g., request_account_range_worker) that explicitly drop(permit) immediately after the wire response. Verify this is intentional; if processing is CPU-bound, consider dropping early to free the peer for other requests.

  4. Redundant permit drop (snap/client.rs:462, peer_handler.rs:417, etc.)
    Explicit drop(permit) after outgoing_request is correct and clear, though not strictly necessary (it would drop at end-of-scope anyway). Keeping it is fine for clarity.

Correctness verification:

  • The Drop impl uses dec_requests (fire-and-forget), which is safe since Drop::drop is synchronous and cannot await.
  • get_best_n_peers correctly increments counters for each returned peer atomically under &mut self before returning the permits.
  • update_pivot (snap_sync.rs) properly consumes the permit on each attempt, preventing the double-counting bug that existed with the old MAX_RETRIES_PER_PEER loop.

No security vulnerabilities or consensus-critical bugs detected. The refactor reduces the surface area for request-count desynchronization.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

I now have enough context for a thorough review. Here is the final output:


Code Review: PR #6523feat(l1): use permits for p2p requests

Overview

This PR replaces the manual inc_requests/dec_requests call pairs with a RAII RequestPermit guard that is issued atomically at peer-selection time and released on drop. This eliminates the dec_requests-on-try_recv pattern (which was non-cancellation-safe and easy to forget), ensures slot accounting can't leak under task cancellation, and removes the now-unnecessary make_request wrapper. The design is sound and the motivation is excellent.


Correctness Issues

1. Dead _peer_id parameter in get_block_header

peer_handler.rs:598:

pub async fn get_block_header(
    &mut self,
    _peer_id: H256,       // silently ignored
    connection: &mut PeerConnection,
    _permit: RequestPermit,
    block_number: u64,
) -> ...

The _peer_id parameter was previously forwarded to make_request, but that indirection is gone. The callers (e.g. snap_sync.rs) still pass peer_id, and it vanishes silently. Either remove the parameter from the signature, or use it for logging/diagnostics so errors can report which peer failed.


2. Inconsistent permit lifetime in download_chunk_from_peer

peer_handler.rs:455–487: The permit is held as _permit for the full function body — including the are_block_headers_chained validation — while all snap workers (request_account_range_worker, request_storage_ranges_worker, the bytecode task) explicitly drop(permit) immediately after outgoing_request().await. Holding the slot through local validation unnecessarily delays freeing the peer for the next spawned task in the sync loop.

Suggested fix:

let response = connection.outgoing_request(request, PEER_REPLY_TIMEOUT).await;
drop(_permit);  // or: let _ = _permit;
if let Ok(RLPxMessage::BlockHeaders(...)) = response { ... }

3. has_eligible_peer clones a PeerConnection it immediately discards

peer_table.rs:794:

fn handle_has_eligible_peer(...) -> bool {
    self.do_get_best_peer(&msg.capabilities).is_some()
}

do_get_best_peer calls do_get_best_peer_excluding, which calls peer_data.connection.clone() for the winning peer before returning it — the clone is then thrown away by .is_some(). This is called on every rotation of update_pivot when no peer passes the exclusion filter. A lighter probe that only checks can_try_more_requests and capability overlap — without touching connection — would be more efficient.


4. request_state_trienodes / request_storage_trienodes hold permits through validation

snap/client.rs:1055–1093: Both functions accept _permit: RequestPermit but don't drop it until the function returns, meaning the permit outlives the wire call and covers trie node hash verification. This is inconsistent with the pattern established in the workers. For request_state_trienodes in particular, the node hash loop could be non-trivial for large batches. An explicit drop(_permit) after connection.outgoing_request().await would be consistent and release the slot sooner.


Behavioral Change Bundled in the Refactoring

5. update_pivotMAX_RETRIES_PER_PEER removed

snap_sync.rs:789: The old code tried each peer up to MAX_RETRIES_PER_PEER = 3 times before excluding it. The new code tries each peer exactly once per rotation. A peer suffering a single transient failure (e.g. a momentary timeout on a healthy peer) is now excluded for the full rotation and won't be retried until the next rotation + backoff. Under a small peer set this could stall the pivot update loop longer than before. If the one-attempt-per-rotation strategy is intentional, it should be documented in the commit/PR body; if not, it is a regression.


Minor / Nitpick

6. _permit naming in function signatures

Using _permit: RequestPermit (versus permit) suppresses the "unused variable" lint but implies the value is unused/immediately dropped — when in reality it is intentionally kept alive until end-of-scope. The opposite pattern, let _permit = permit; inside the function body, makes this explicit. Alternatively, the existing doc-comments explain it well enough; consider also adding a one-line inline comment at the _permit declaration site (e.g. // held for RAII slot release on return).

7. do_get_best_n_peers weight snapshot is stale for subsequent picks

peer_table.rs:992–1030: Candidates are sorted by weight_peer(score, requests) at snapshot time. The handler then increments requests for each returned peer in sequence. The first pick's increment is not reflected in the weights used to order the second pick, third pick, etc. For the typical MAX_PEERS_TO_ASK = 3 the error is negligible, but worth noting if n grows.


What's Well Done

  • The #[must_use] attribute on RequestPermit is exactly right.
  • Increment-under-&mut self in the handler (before returning the permit) is atomic with respect to the actor's message queue, so there's no selection/reservation race.
  • The saturating_sub(1).max(0) guard in handle_dec_requests correctly prevents a stale permit from pushing a freshly reconnected peer's counter negative.
  • Replacing get_peer_connections (returned all peers regardless of capacity) with get_best_n_peers (filters by capacity and sorts by weight) is strictly better.
  • has_eligible_peer as a read-only probe (no slot reservation) is the right tool for the capacity-vs-rotation diagnostic in update_pivot.
  • yield_now() added when the download queue is drained but tasks are in-flight prevents a busy-wait spin on that path.
  • Removal of the duplicated comment block around inc_requests is welcome cleanup.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. Reserving a peer before checking whether there is queued work creates peer-slot churn and can starve other sync paths. get_best_peer() now increments requests immediately when it returns a permit (peer_table.rs), but these loops only discover that the work queue is empty afterwards (snap/client.rs, snap/client.rs, snap/client.rs, snap/client.rs, snap/client.rs, snap/client.rs). In the “all work is in-flight” state, the loop repeatedly reserves and drops peers without sleeping/yielding, which can make peers look busy and delay unrelated requests. Pop the task first, or add the same yield_now/sleep handling used in request_block_headers().

  2. request_block_headers() now reserves up to 5 peers up front but probes them sequentially, so idle peers stay marked busy while waiting on the first timeout. The reservation happens in get_best_n_peers() (peer_table.rs, peer_table.rs), and the requests are still issued one-by-one (peer_handler.rs, peer_handler.rs). If peer 1 stalls for PEER_REPLY_TIMEOUT, peers 2-5 are blocked from body download / snap / pivot work even though no wire request has been sent to them yet. Either reserve/query one peer at a time, or actually issue these head-number probes in parallel.

  3. The new permit model still allows stale decrements to affect a reconnected peer’s live request count. RequestPermit only stores peer_id, and Drop always sends dec_requests (peer_table.rs, peer_table.rs). After remove_peer() and a reconnect, a stale permit can decrement the new session’s requests counter (peer_table.rs, peer_table.rs). Clamping to zero avoids negatives, but it does not stop a stale drop from freeing a slot that belongs to a new in-flight request. This needs a session/generation token, not just peer_id.

No EVM/gas/EIP logic is touched here; the review risk is in sync scheduling and peer-accounting correctness. I couldn’t run cargo check in this environment because rustup could not create temp files on the read-only filesystem.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR replaces the manual inc_requests/dec_requests pattern with a RequestPermit RAII guard that is returned atomically alongside the peer connection from all selection functions (get_best_peer, get_best_peer_excluding, get_random_peer, get_best_n_peers). Dropping the permit — whether by explicit drop, normal return, or task cancellation — automatically decrements the in-flight counter, eliminating the slot-leak class of bugs and making all request paths cancellation-safe.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/clarity items with no correctness impact.

The core permit mechanism is sound — atomic increment inside the actor handler, RAII decrement on drop, #[must_use] guard, and a max(0) clamp for stale drops after disconnect. No double-decrement paths exist. The only open items are a dead _peer_id parameter, a minor comment gap in do_get_best_n_peers, and the intentional removal of per-peer retries in update_pivot — all P2.

No files require special attention; peer_handler.rs has the minor dead-parameter issue.

Important Files Changed

Filename Overview
crates/networking/p2p/peer_table.rs Introduces RequestPermit RAII guard; atomically increments requests inside selection handlers and decrements on drop; removes public inc_requests; adds get_best_n_peers and read-only has_eligible_peer.
crates/networking/p2p/peer_handler.rs Removes make_request wrapper and all manual inc_requests/dec_requests calls; passes RequestPermit through functions; dead _peer_id parameter lingers in get_block_header public API.
crates/networking/p2p/snap/client.rs Snap workers now accept permit: RequestPermit, drop it immediately after outgoing_request returns, and proceed with pure computation; removed Clone from StorageTaskResult; cleaned up worker error paths.
crates/networking/p2p/sync/healing/state.rs Threads permit through to request_state_trienodes; removes peer_table clone; straightforward adaptation.
crates/networking/p2p/sync/healing/storage.rs Threads permit through to request_storage_trienodes; removes peer_table clone; straightforward adaptation.
crates/networking/p2p/sync/snap_sync.rs Adopts has_eligible_peer probe and consumes permit in get_block_header; removes MAX_RETRIES_PER_PEER inner loop — one attempt per peer per rotation now, which is a behaviour change from the previous 3-retry policy.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant PT as PeerTableServer
    participant W as Worker Task
    participant P as Peer (network)

    C->>PT: get_best_peer(caps)
    Note over PT: peer.requests += 1 atomically
    PT-->>C: (peer_id, connection, RequestPermit)

    C->>W: tokio::spawn(worker(connection, permit, ...))
    Note over C: permit moved into worker

    W->>P: connection.outgoing_request(...)
    P-->>W: response

    W->>W: drop(permit)
    Note over W: Drop impl fires dec_requests(peer_id)
    W->>PT: dec_requests (fire-and-forget cast)
    Note over PT: peer.requests -= 1

    W->>C: tx.send(result)
Loading

Comments Outside Diff (1)

  1. crates/networking/p2p/peer_table.rs, line 502-530 (link)

    P2 do_get_best_n_peers snapshots requests before increments

    do_get_best_n_peers takes &self and snapshots each peer's requests value at filter/sort time. handle_get_best_n_peers then immediately increments requests for every returned peer (up to n times). Because the entire handler runs under &mut self in a single actor tick there is no race condition, but the sort order used to pick the top-n candidates is based on pre-increment weights. In practice this is benign, but a brief comment that the sort is intentionally a pre-increment snapshot would prevent future confusion.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/networking/p2p/peer_table.rs
    Line: 502-530
    
    Comment:
    **`do_get_best_n_peers` snapshots `requests` before increments**
    
    `do_get_best_n_peers` takes `&self` and snapshots each peer's `requests` value at filter/sort time. `handle_get_best_n_peers` then immediately increments `requests` for every returned peer (up to `n` times). Because the entire handler runs under `&mut self` in a single actor tick there is no race condition, but the sort order used to pick the top-n candidates is based on pre-increment weights. In practice this is benign, but a brief comment that the sort is intentionally a pre-increment snapshot would prevent future confusion.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/peer_handler.rs
Line: 586-591

Comment:
**Unused `_peer_id` parameter in public API**

`_peer_id: H256` is accepted by `get_block_header` but never used inside the function — it was only consumed by the removed `make_request` call. All call sites still pass `peer_id` explicitly (e.g. in `snap_sync.rs`). The parameter can be dropped from the signature; callers already hold `peer_id` for the `record_success`/`record_failure` calls they make after the function returns.

```suggestion
    pub async fn get_block_header(
        &mut self,
        connection: &mut PeerConnection,
        _permit: RequestPermit,
        block_number: u64,
    ) -> Result<Option<BlockHeader>, PeerHandlerError> {
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/peer_table.rs
Line: 502-530

Comment:
**`do_get_best_n_peers` snapshots `requests` before increments**

`do_get_best_n_peers` takes `&self` and snapshots each peer's `requests` value at filter/sort time. `handle_get_best_n_peers` then immediately increments `requests` for every returned peer (up to `n` times). Because the entire handler runs under `&mut self` in a single actor tick there is no race condition, but the sort order used to pick the top-n candidates is based on pre-increment weights. In practice this is benign, but a brief comment that the sort is intentionally a pre-increment snapshot would prevent future confusion.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 789-800

Comment:
**Retry count regression: 3 retries per peer removed**

`MAX_RETRIES_PER_PEER` (was 3) has been removed. Now each peer in a rotation gets exactly one attempt, and a failure immediately pushes it onto `excluded_peers`. With `MAX_ROTATIONS = 5` and exponential backoff between rotations, transient network hiccups that previously resolved within 3 tries will now consume an entire rotation and incur a full backoff delay before the same peer is tried again. If "one shot per rotation" is intentional, a short inline comment confirming this would be helpful.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "refactor(p2p): drop permit tests, apply ..." | Re-trigger Greptile

Comment on lines 586 to +591
Ok(self.peer_table.peer_count().await?)
}

/// Requests a single block header by number from an already-selected peer.
/// Consumes a `RequestPermit` reserved by the caller at peer selection
/// time; the permit drops when this function returns, releasing the slot.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unused _peer_id parameter in public API

_peer_id: H256 is accepted by get_block_header but never used inside the function — it was only consumed by the removed make_request call. All call sites still pass peer_id explicitly (e.g. in snap_sync.rs). The parameter can be dropped from the signature; callers already hold peer_id for the record_success/record_failure calls they make after the function returns.

Suggested change
Ok(self.peer_table.peer_count().await?)
}
/// Requests a single block header by number from an already-selected peer.
/// Consumes a `RequestPermit` reserved by the caller at peer selection
/// time; the permit drops when this function returns, releasing the slot.
pub async fn get_block_header(
&mut self,
connection: &mut PeerConnection,
_permit: RequestPermit,
block_number: u64,
) -> Result<Option<BlockHeader>, PeerHandlerError> {
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/peer_handler.rs
Line: 586-591

Comment:
**Unused `_peer_id` parameter in public API**

`_peer_id: H256` is accepted by `get_block_header` but never used inside the function — it was only consumed by the removed `make_request` call. All call sites still pass `peer_id` explicitly (e.g. in `snap_sync.rs`). The parameter can be dropped from the signature; callers already hold `peer_id` for the `record_success`/`record_failure` calls they make after the function returns.

```suggestion
    pub async fn get_block_header(
        &mut self,
        connection: &mut PeerConnection,
        _permit: RequestPermit,
        block_number: u64,
    ) -> Result<Option<BlockHeader>, PeerHandlerError> {
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 789 to +800
"Trying to update pivot to {new_pivot_block_number} with peer {peer_id} (score: {peer_score})"
);

// Try up to MAX_RETRIES_PER_PEER times with this specific peer.
// Both Ok(None) and recoverable errors count as a failure and advance
// through retries; on exhaustion, the peer is excluded and we rotate.
let mut peer_failures: u64 = 0;
for attempt in 0..MAX_RETRIES_PER_PEER {
let outcome = peers
.get_block_header(peer_id, &mut connection, new_pivot_block_number)
.await;

match outcome {
Ok(Some(pivot)) => {
// Success — reward peer and return
peers.peer_table.record_success(peer_id)?;
#[cfg(feature = "metrics")]
ethrex_metrics::sync::METRICS_SYNC.inc_pivot_update("success");
info!("Successfully updated pivot");

{
let mut diag = diagnostics.write().await;
diag.push_pivot_change(super::PivotChangeEvent {
timestamp: current_unix_time(),
old_pivot_number: block_number,
new_pivot_number: pivot.number,
outcome: "success".to_string(),
failure_reason: None,
});
diag.pivot_block_number = Some(pivot.number);
diag.pivot_timestamp = Some(pivot.timestamp);
let pivot_age = current_unix_time().saturating_sub(pivot.timestamp);
diag.pivot_age_seconds = Some(pivot_age);
METRICS
.pivot_timestamp
.store(pivot.timestamp, std::sync::atomic::Ordering::Relaxed);
}
let block_headers = peers
.request_block_headers(block_number + 1, pivot.hash())
.await?
.ok_or(SyncError::NoBlockHeaders)?;
block_sync_state
.process_incoming_headers(block_headers.into_iter())
.await?;
*METRICS.sync_head_hash.lock().await = pivot.hash();
return Ok(pivot);
}
Ok(None) => {
peers.peer_table.record_failure(peer_id)?;
peer_failures += 1;
let peer_score = peers.peer_table.get_score(peer_id).await?;
warn!(
"update_pivot: peer {peer_id} returned None (attempt {}/{MAX_RETRIES_PER_PEER}, score: {peer_score})",
attempt + 1
);
#[cfg(feature = "metrics")]
ethrex_metrics::sync::METRICS_SYNC.inc_pivot_update("peer_none");
}
Err(e) if e.is_recoverable() => {
peers.peer_table.record_failure(peer_id)?;
peer_failures += 1;
warn!(
"update_pivot: peer {peer_id} failed with {e} (attempt {}/{MAX_RETRIES_PER_PEER})",
attempt + 1
);
#[cfg(feature = "metrics")]
ethrex_metrics::sync::METRICS_SYNC.inc_pivot_update("peer_error");
}
Err(e) => {
// Non-recoverable error (e.g., dead peer table actor,
// storage full) — surface it.
return Err(SyncError::PeerHandler(e));
// One attempt per peer per rotation. A peer that fails is excluded for
// this rotation and will be retried (with backoff) in the next one.
let outcome = peers
.get_block_header(peer_id, &mut connection, permit, new_pivot_block_number)
.await;

match outcome {
Ok(Some(pivot)) => {
peers.peer_table.record_success(peer_id)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Retry count regression: 3 retries per peer removed

MAX_RETRIES_PER_PEER (was 3) has been removed. Now each peer in a rotation gets exactly one attempt, and a failure immediately pushes it onto excluded_peers. With MAX_ROTATIONS = 5 and exponential backoff between rotations, transient network hiccups that previously resolved within 3 tries will now consume an entire rotation and incur a full backoff delay before the same peer is tried again. If "one shot per rotation" is intentional, a short inline comment confirming this would be helpful.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 789-800

Comment:
**Retry count regression: 3 retries per peer removed**

`MAX_RETRIES_PER_PEER` (was 3) has been removed. Now each peer in a rotation gets exactly one attempt, and a failure immediately pushes it onto `excluded_peers`. With `MAX_ROTATIONS = 5` and exponential backoff between rotations, transient network hiccups that previously resolved within 3 tries will now consume an entire rotation and incur a full backoff delay before the same peer is tried again. If "one shot per rotation" is intentional, a short inline comment confirming this would be helpful.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to implement, equally correct and simpler this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant