prefetcher: builder-phase prefetch + streaming worker pool#2192
prefetcher: builder-phase prefetch + streaming worker pool#2192
Conversation
Replace per-call Prefetch() invocations with a long-running PrefetchStream that spans the block lifetime. A fixed worker pool pulls transactions from a channel, eliminating pool-startup overhead across the ~500 builder-mode and ~20 idle-mode calls previously issued per block. Idle and builder phases share the same stream, acting as swappable tx providers. Phase handoff uses a two-signal interrupt: hardKill for permanent exit and evmAbort for soft, per-phase aborts that discard in-flight idle work before the builder provider takes over. Prefetch(block, ...) keeps its original signature as a thin wrapper so blockchain.go behavior stays byte-identical.
…mers, builder contribution metric
Correlate tail latency with prefetch miss rate so we can tell whether the
slowest block txs are the ones we failed to prefetch or whether prefetch
quality is not the lever.
Three additions:
* slow_tx_tracker 10-min log: each of the top-K slowest txs is now
annotated inline with MGas/s and a prefetched/not-prefetched flag.
MGas/s uses integer math on nanoseconds — per-tx durations are in
the tens of microseconds and float seconds lose precision.
* worker/txApplyDuration/{prefetched,notPrefetched}: side-by-side p50/
p95/p99 of the two populations. The aggregate worker/txApplyDuration
stays for dashboard compatibility.
* worker/prefetch/builder_added_percent: share of block txs that the
builder phase prefetched, isolating the payoff of the new builder-
phase prefetch beyond the existing overall miss-rate metric.
Sizes the buffer to the idle provider's theoretical per-loop burst (IdleGasBudget / 21k ≈ 4762 at default config) so the scan never blocks on a full channel. Workers drain at ~200k tx/s so the channel sits mostly empty in practice; the buffer exists to absorb the brief scan-phase burst where idle dumps a heap's worth of txs. Memory cost: 4096 * 8 B + channel header ≈ 33 KB, negligible. Handoff drain is O(actual-occupancy), not capacity, so a larger buffer doesn't slow phase transitions.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
@claude review once |
|
Claude finished @lucca30's task in 3m 52s —— View job Code Review: prefetcher builder-phase prefetch + streaming worker pool
Overall AssessmentThis is a well-structured PR that introduces a builder-phase prefetch path alongside the existing idle-phase prefetcher. The architecture is clean: a single long-lived worker pool shared across phases, with a soft-interrupt ( No critical or high-severity issues found. One medium finding on metric attribution accuracy, and several low/nit observations below. Medium1. Race window in builder-phase metric attribution (
|
…mports fixes Review feedback: the old onSuccess closure checked genParams.builderStarted.Load() to decide whether to attribute a successful prefetch to the builder phase. That check races with buildAndCommitBlock's builderStarted.Store(true), which fires before runPrefetcher reaches the handoff — an idle-phase tx whose EVM execution finishes in the gap between those two moments would be miscounted as builder. Impact was metric-only but the old comment overstated the guarantee. Route attribution through a dedicated inBuilderPhase *atomic.Bool that the coordinator flips to true only after the handoff completes (evmAbort drain + reset). Any onSuccess firing after that point is known to come from post-handoff work, so builder_added_percent now reflects genuine builder-phase contribution. Also fix goimports formatting in core/state_prefetcher.go and miner/worker.go flagged by CI lint.
Code reviewFound 1 issue. Checked for bugs and CLAUDE.md compliance. Bug: Spin loop in File: When The comment at line 2412 says Dropped sends (buffer full) are not retried -- but the current behavior is worse than not retrying: it burns through every remaining transaction doing useless work. Suggested fix: Add |
Review feedback: when the stream channel fills mid-batch, the default branch in streamIdleBatch was dropping the tx but still calling txs.Shift() and continuing to walk the heap. Since dropped sends don't subtract from the gas budget or populate localPrefetched, the outer loop's viability check (nextViableIdleTx) kept returning valid txs, so every remaining entry burned Peek + Shift cycles only to drop. Return on the default branch instead. The outer runIdleTxProvider loop re-snapshots the pool every ~100ms, by which time workers have drained the channel and there's capacity for new sends. Dropping one tx per batch and re-entering the outer loop is strictly cheaper than walking the rest of the heap dropping everything.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
… resolves Review feedback on buildTxPlan / scanOverflow / collectPlanBatch: * buildTxPlan decremented remaining before calling ltx.Resolve(), so when a LazyTransaction failed to resolve (tx evicted between heap listing and resolution) the budget was consumed for a tx that never entered the plan. Reorder: charge gas only after Resolve succeeds, matching scanOverflow. The prefetched-skip path keeps consuming gas intentionally — those txs are still bound for the block. * scanOverflow and collectPlanBatch called prefetchedHashes.Load() without a nil guard, while the sibling buildTxPlan explicitly guarded. Production is safe because commitWork always initializes the map, but the API inconsistency is a real footgun for unit tests or future callers. Add the missing guards so all three functions accept a nil *sync.Map as "no hashes known, don't skip". No functional change in the current call paths; tests unchanged.
|
@claude Review it again. Do a careful analysis on the block building lifecycle, how prefetch works with block building. If any code introduced may lead to a panic, dead code or decreasing on performance. This PR aims to increase the window of prefetching by do also prefetching while block is building. Some mechanism were introduced like buildTxPlan to let us better predict the most probable txs to be introduced so we can improve our prefetch precision. |
|
Claude finished @lucca30's task in 9m 23s —— View job Deep Review: Builder-Phase Prefetch + Block Building Lifecycle
Note: The three findings from the prior reviews (buildTxPlan gas-before-resolve, nil guards, idle batch spin loop) have all been properly fixed in commits Block Building ↔ Prefetch Lifecycle TraceTraced the full flow to verify correctness of the two-phase design: Lifecycle verdict: The sequencing is sound. Channel creation precedes Panic Analysis
Findings1. 🔴 Worker goroutines in
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2192 +/- ##
===========================================
+ Coverage 51.93% 52.10% +0.17%
===========================================
Files 884 884
Lines 155494 155891 +397
===========================================
+ Hits 80754 81227 +473
+ Misses 69522 69456 -66
+ Partials 5218 5208 -10
... and 22 files with indirect coverage changes
🚀 New features to boost your workflow:
|
Review feedback: worker goroutines in PrefetchStream ran without their own recover. Go's recover only catches panics in its own goroutine — the parent runPrefetcher's recover does not protect the children. Any panic in statedb.Copy / ApplyMessage / IntermediateRoot / precompile edge cases would therefore kill the node even though prefetching runs on a throwaway state copy and is best-effort by definition. Add defer/recover per worker (new chain/prefetch/worker/panic meter) and match it on the sendPlan goroutine in fillTransactions (reuses the existing worker/prefetch/panic meter). Also documents the residual attribution edge where a worker stuck in IntermediateRoot (not interruptible by evmAbort) can still reach onSuccess after inBuilderPhase flips. Metric-only impact, bounded by at most one tx per block; added a comment instead of a runtime fix since the window is tiny in practice and closing it would require a more invasive barrier.
There was a problem hiding this comment.
The block building lifecycle looks correct: channel setup happens before , prevents sends-after-close, and the defer on covers all exit paths. No panics, dead code, or correctness regressions found. The two new nit-level issues found by this review (gas limit overestimation in the second call, and a metrics overcount for interrupted txs) are attached as inline comments — they affect only prefetch efficiency and metric accuracy, not block building correctness. The spin loop and nil-guard issues from prior reviews remain unaddressed and worth fixing before merge.
Extended reasoning...
Overview
This PR introduces a builder-phase prefetch path over ~600 lines across , , , , and tests. The review focused on the block building lifecycle, potential panics, dead code, and performance regressions as requested.
Block Building Lifecycle
The sequencing is correct: creates and before calling , ensuring the prefetcher goroutine always finds a valid channel. before channel close prevents any send-to-closed-channel race from goroutines. The covers all exit paths from . No deadlock or goroutine leak paths were found.
New Issues Found (Nit-Level)
Two new nit-level bugs are attached as inline comments: (1) the second call uses the full block gas limit rather than the remaining gas after priority txs, causing to over-plan normal-tier txs that will be rejected — wasting prefetch worker capacity; (2) is incremented in before checks the interrupt flag, so early-abort txs inflate . Both are metrics/efficiency issues only; block building correctness is unaffected.
Security Risks
No security risks. The prefetch path operates on throwaway state copies and cannot affect committed block state. All channels are properly bounded (4096 buffer) with non-blocking sends, preventing DoS from a stuck consumer.
Level of Scrutiny
This is a production-critical path in the Bor block builder. The streaming worker pool, two-signal interrupt scheme, and idle→builder handoff are novel concurrency patterns that warrant human review before merge, especially given several issues identified across multiple review passes that remain unaddressed (spin loop in , nil guards in /, gas ordering in ).
…cution Round 4 of review fixes — four independent issues: * sendPlan gasLimit stale for the second call. The closure captured env.header.GasLimit once, but the second invocation (normal-tier txs) runs after commitTransactions has already consumed gas from env.gasPool. Plan was over-sized, wasting prefetch capacity on txs that the builder would reject. Thread gasLimit as an explicit argument and pass env.gasPool.Gas() on the second call (with a nil guard for the first, where env.gasPool is still nil). * prefetchOneTx early-interrupt return skipped fails.Add(1). txIndex is incremented unconditionally in processTx, so every interrupt-aborted tx was counted as a successful prefetch in blockPrefetchTxsValidMeter. The function docstring already promises fails is bumped on every (0,false) return; the interrupt path was the one branch that didn't. * runPrefetcher shutdown (evmAbort + close(txsCh) + <-streamDone) was sequential, so a panic in runIdleTxProvider or runBuilderTxProvider unwound past it. The PrefetchStream goroutine and its N workers would then block forever on `range txsCh` (hardKill is only checked after a dequeue). Wrap the close in a sync.Once-guarded closure and defer it so the channel is released on every exit path, panics included. * Builder phase could double-execute an in-flight plan tx. scanOverflow checked prefetchedHashes, which is only written by onSuccess after EVM completion — a tx forwarded in one batch but still mid-EVM when the next batch's overflow scan ran could be re-emitted and picked up by a second worker. Track locally-sent hashes in runBuilderTxProvider (single- threaded, plain map is fine) and have scanOverflow consult it alongside prefetchedHashes.
- scanOverflow: break instead of Pop on gas-too-large so accumulated
extendedBudget can cover price-leading accounts on later iterations;
popping permanently evicted them from the long-lived overflow heap.
- runBuilderTxProvider: populate sentThisPhase with the current plan
batch before scanOverflow so bonus collection can't re-emit a tx
already returned by collectPlanBatch in the same window.
- PrefetchStream worker recover: increment ctx.fails so the
{valid,invalid} meters stay consistent with txIndex when a panic
aborts prefetchOneTx mid-flight.
Covers the helpers that prior review rounds kept surfacing bugs in: - scanOverflow: zero budget, budget growth preserves popped accounts (regression test for the h.Pop() → break fix), in-flight skip without budget cost, prefetched skip without budget cost. - forwardTxs: sent-hash recording, dropped-on-full channel must not record, nil sentThisPhase is safe. - collectPlanBatch: closed planCh, timer-fires-on-empty, freed-gas accumulation, prefetched filter, closed gas channel nilled out. - runBuilderTxProvider: no-duplicate-forward invariant across interleaved planCh + freed-gas signals — regression test for the within-iteration dedup fix (sentThisPhase pre-population).
| defer shutdown() | ||
|
|
||
| // Phase 1: idle tx provider — streams pool txs until builder flips or hardKill fires. | ||
| w.runIdleTxProvider(txsCh, header, genParams, hardKill) | ||
|
|
||
| // Phase 2: builder tx provider, if we actually switched modes. | ||
| if genParams.builderStarted != nil && genParams.builderStarted.Load() && !hardKill.Load() { | ||
| // Handoff: abort in-flight idle work and drain buffered idle txs so only | ||
| // builder txs reach the pool from here on. Then clear abort and run builder. | ||
| // Any in-flight idle EVM execution aborts via evmAbort; workers finish their | ||
| // current tx quickly (IntermediateRoot is the only non-interruptible work) | ||
| // and move on. Workers that pick up a drained-but-not-gone tx see evmAbort=true | ||
| // and skip it. | ||
| evmAbort.Store(true) | ||
| drainTxChan(txsCh) | ||
| evmAbort.Store(false) | ||
| // Flip phase attribution only after the handoff is complete. From here | ||
| // on, every successful prefetch is genuinely builder-phase work. | ||
| inBuilderPhase.Store(true) | ||
|
|
||
| w.runBuilderTxProvider(txsCh, header, genParams, hardKill) | ||
| } | ||
|
|
There was a problem hiding this comment.
🔴 Phase 2 of runPrefetcher is gated by a one-shot genParams.builderStarted.Load() check (worker.go:2394) performed immediately after runIdleTxProvider returns. Because runIdleTxProvider's shouldExit() returns true on gas-pool exhaustion (worker.go:2451), and with the default PrefetchGasLimitPercent=100 streamIdleBatch can drain the full header gas budget in a single sub-ms iteration of non-blocking sends, the idle provider routinely exits while buildAndCommitBlock is still blocked inside prepareWork's Bor primary-producer wait (<-time.After(delay), typically 1-2s). In that ordering the Phase 2 guard reads false, runBuilderTxProvider is skipped entirely, and the subsequent plan / freed-gas sends silently drop — defeating the PR's headline builder-phase prefetch feature exactly under the busy-pool conditions it targets. Block correctness is unaffected (prefetch is best-effort), but worker/prefetch/builder_added_percent is driven to 0% on affected blocks. Fix: replace the one-shot check with a bounded block-select on builderStarted + hardKill, OR change runIdleTxProvider to stall (not exit) on gas exhaustion so the budget is a soft rate limit, not a termination signal.
Extended reasoning...
What the bug is
runPrefetcher runs two phases in sequence. Phase 1 (runIdleTxProvider, worker.go:2438) pushes speculative pool txs into txsCh. Phase 2 (runBuilderTxProvider, worker.go:2394) reads the builder's plan + freed-gas signals. Phase 2 is gated by a one-time non-blocking builderStarted.Load() check:
// worker.go:2391-2409
w.runIdleTxProvider(txsCh, header, genParams, hardKill)
if genParams.builderStarted != nil && genParams.builderStarted.Load() && !hardKill.Load() {
// handoff + runBuilderTxProvider
}The problem is the exit conditions of Phase 1 (shouldExit, worker.go:2448-2452):
shouldExit := func() bool {
return interrupt.Load() ||
(genParams.builderStarted != nil && genParams.builderStarted.Load()) ||
totalGasPool.Gas() == 0
}Any one of three independent conditions ends Phase 1 — including gas exhaustion with no builder-start signal at all.
The race
runPrefetchercallsmakeHeader(genParams, false)at worker.go:2329 (waitOnPrepare=false) — it starts immediately.buildAndCommitBlockcallsprepareWork→makeHeader(genParams, true)→engine.Prepare. For Bor under Giugliano withsuccessionNumber=0(primary producer),Preparedoes<-time.After(delay)atconsensus/bor/bor.go:1156-1168, waiting for the block-time slot — typically 1-2 seconds.builderStarted.Store(true)only fires inbuildAndCommitBlockafterprepareWorkreturns.- With default
PrefetchGasLimitPercent=100,totalGasPoolis seeded with the fullheader.GasLimit(~30M gas on Polygon PoS).streamIdleBatchwalks the heap doing non-blockingselect-sends into a 4096-buffer channel, callingtotalGasPool.SubGas(ltx.Gas)each time (worker.go:2498-2513). With a populated pool of ~21k-gas txs, a single loop iteration does ~1428 sends in milliseconds and zeroes the gas pool. shouldExit()returns true ontotalGasPool.Gas()==0;waitUntilNextLoopsees this immediately and returns; the outerfor !shouldExit()loop exits.runIdleTxProviderreturns whilebuilderStartedis still false.- The one-shot guard at worker.go:2394 reads false → Phase 2 is skipped entirely.
shutdown()closestxsCh; workers drain and exit;<-streamDonereleases; the prefetcher is gone.
What happens next (silent drops)
Shortly after, buildAndCommitBlock creates builderPlanCh (buf 4096) and builderGasFreedCh (buf 256), flips builderStarted=true, then runs fillTransactions. The sendPlan goroutines do non-blocking sends onto builderPlanCh; commitTransactions does non-blocking sends of freed gas onto builderGasFreedCh. No one is reading either channel. Sends accumulate into the 4096 buffer, then drop silently once full. Zero builder-phase prefetch coverage for the block.
Step-by-step proof
commitWorklaunchesrunPrefetchergoroutine, then synchronously callsbuildAndCommitBlock→prepareWork.buildAndCommitBlockblocks inengine.Prepareatconsensus/bor/bor.go:1166:<-time.After(delay)— ~1-2s for primary-producer block-time alignment.runPrefetchergoroutine:makeHeader(..., false)returns immediately (no wait).runIdleTxProviderstarts withtotalGasPool = 30_000_000.streamIdleBatch:loopGasLimit = min(30M, 30M) = 30M. Non-blocking send loop; for each successful send,totalGasPool.SubGas(21_000). After ~1428 iterations,totalGasPool.Gas() == 0. Elapsed: low ms.waitUntilNextLoopfirst iteration:shouldExit()returns true (totalGasPool.Gas()==0); returns. Outerfor !shouldExit()false →runIdleTxProviderreturns.- Still in
prepareWork'stime.After(delay), sobuilderStartedis false. - Line 2394:
genParams.builderStarted.Load() == false→ Phase 2 skipped. shutdown():evmAbort.Store(true),close(txsCh),<-streamDone.runPrefetcherreturns.- Some time later,
prepareWorkreturns in the main goroutine.buildAndCommitBlockcreatesbuilderPlanCh/builderGasFreedCh, setsbuilderStarted=true, runsfillTransactions. sendPlangoroutines push tobuilderPlanCh— no reader.commitTransactions' freed-gas sends push tobuilderGasFreedCh— no reader. Buffers fill; further sends hit the non-blockingselect'sdefaultarm and drop.- Block commits with zero builder-phase prefetch coverage.
worker/prefetch/builder_added_percent = 0%for the block.
Why existing mitigations don't save it
The only thing that could save it is if workers drain txsCh so slowly during step 4 that streamIdleBatch fills the 4096 buffer and returns early before exhausting the budget, then waitUntilNextLoop sleeps 100ms checking every 10ms, during which builderStarted might flip. That is a load-dependent escape: it requires workers to drain slower than the idle send rate. On a busy pool with normal workers, the send loop outpaces worker drain comfortably, making the bug the common path, not the exception.
Impact
Block-building correctness: unaffected (prefetch is best-effort). But the PR's headline metric worker/prefetch/builder_added_percent is driven to 0% exactly in the scenarios the PR is designed for — primary producer (succession=0) post-Giugliano under a busy pool with default config. The builder-phase prefetch feature, which is the entire value proposition of this PR, is silently defeated whenever this ordering wins. The verifier observes that the devnet measurements reported in the PR description (3-5% steady-state) already reflect this: that number is the ceiling under the current design, not the floor.
How to fix
Any of:
- Phase 2 guard becomes a block-select with
hardKillas escape, with a short grace deadline:select { case <-builderStartedCh: // turn builderStarted into a signalling channel or poll with timer case <-hardKillSignal: goto shutdown case <-time.After(blockBuildDeadline): goto shutdown }
- Drop gas exhaustion from
shouldExit— treat the gas budget as a soft rate limit. When exhausted, stallrunIdleTxProviderwith a short sleep and re-checkbuilderStarted/hardKillinstead of returning. - Shrink the default idle gas budget so exhaustion is unusual (e.g. 10-20% of header gas), matching the PR's own tight-cap experiments that show this effect inverting.
| if extendedBudget > 0 { | ||
| // Mark the plan batch as in-flight before the overflow scan so | ||
| // scanOverflow won't re-emit the same tx within this iteration | ||
| // (collectPlanBatch returns before forwardTxs records hashes). | ||
| for _, tx := range batch { | ||
| sentThisPhase[tx.Hash()] = struct{}{} | ||
| } | ||
| var bonus []*types.Transaction | ||
| bonus, extendedBudget = scanOverflow(overflowHeap, extendedBudget, genParams.prefetchedTxHashes, sentThisPhase) | ||
| batch = append(batch, bonus...) | ||
| } | ||
|
|
||
| block := types.NewBlock(header, &types.Body{Transactions: transactions}, nil, trie.NewStackTrie(nil)) | ||
| result := prefetcher.Prefetch(block, throwaway, w.vmConfig(), true, interruptPrefetch) | ||
| forwardTxs(txsCh, batch, sentThisPhase) | ||
|
|
||
| // Use the actual gas used from prefetch result and mark successful transactions | ||
| if result != nil { | ||
| totalGasPool.SubGas(result.TotalGasUsed) | ||
| for _, txHash := range result.SuccessfulTxs { | ||
| txsAlreadyPrefetched[txHash] = struct{}{} | ||
| // Store in shared map for coverage metrics | ||
| if genParams.prefetchedTxHashes != nil { | ||
| genParams.prefetchedTxHashes.Store(txHash, struct{}{}) | ||
| } | ||
| if builderDone || interrupt.Load() { | ||
| return | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 In runBuilderTxProvider (miner/worker.go:2596-2608), the pre-populate of sentThisPhase at lines 2600-2602 and the remaining -= ltx.Gas; h.Shift() inside scanOverflow at lines 1948-1950 both mutate persistent state as if forwardTxs will succeed, but forwardTxs at lines 2619-2628 silently drops on a full txsCh buffer. Two consequences: (1) a dropped plan tx leaves a stale sentThisPhase entry that causes future scanOverflow iterations to h.Shift() the tx out of the overflow heap, making it permanently unreachable for overflow prefetch; (2) a dropped overflow (bonus) tx is doubly lost — shifted out of the one-shot overflowHeap, its gas subtracted from extendedBudget, yet never recorded anywhere. Impact is metric-only (reduces builder_added_percent); no block-building correctness, consensus, or security impact. Fix: only mutate sentThisPhase/extendedBudget/heap on a successful send, or use a transient per-iteration set for within-iteration dedup while refunding budget and deferring h.Shift() on drop.
Extended reasoning...
What the bug is
runBuilderTxProvider mutates three pieces of persistent state as if every forwardTxs call succeeds, but forwardTxs (worker.go:2619-2628) is non-blocking and silently drops when txsCh (buffer 4096) is full:
func forwardTxs(ch chan<- *types.Transaction, txs []*types.Transaction, sentThisPhase map[common.Hash]struct{}) {
for _, tx := range txs {
select {
case ch <- tx:
if sentThisPhase != nil {
sentThisPhase[tx.Hash()] = struct{}{}
}
default: // <-- silent drop
}
}
}The upstream mutations that leak when a send is dropped are:
- Plan path (worker.go:2600-2602):
sentThisPhaseis pre-populated with every tx inbatchbeforescanOverfloworforwardTxsruns. If a plan tx is later dropped byforwardTxs, the stale entry persists across iterations (sentThisPhaseis declared outside theforloop at line 2587). - Overflow path (worker.go:1948-1950):
scanOverflowexecutesremaining -= ltx.Gas; bonus = append(...); h.Shift()beforeforwardTxsruns. Drop → tx is gone fromoverflowHeap,extendedBudgetwas decremented, tx is in no tracking set.
Why existing code does not prevent it
sentThisPhase is only supposed to track in-flight hashes (a semantic the code even states in comments at 2581-2586), but the pre-populate at 2600-2602 adds every batch member unconditionally — conflating scheduled with in-flight. On a txsCh-full drop, the stale entry becomes an incorrect cross-iteration exclusion. For the overflow path, overflowHeap is built once at line 2576 and never refilled, so h.Shift() removal is permanent.
Concrete step-by-step proof — stale sentThisPhase leak
- Iteration N: batch=[T1, T2] from
collectPlanBatch;extendedBudget > 0. - Pre-populate adds {T1, T2} to
sentThisPhase. scanOverflowreturns empty bonus (nothing more fits).forwardTxs([T1, T2]): T1 sent OK; channel buffer now full; T2 hits thedefaultarm and is dropped. Only T1's hash is (re)recorded insentThisPhase— T2 was already added by the pre-populate.- Workers drain the channel.
- Iteration N+1:
collectPlanBatchreturns empty batch; extendedBudget still > 0 (or grows viagasFreedCh). scanOverflowwalksoverflowHeap; T2 is at (or near) the top since it remains in the pool.- The
sentThisPhase[T2]check at line 1932 hits →h.Shift()→ T2 is removed fromoverflowHeap. - T2 is never prefetched in this block cycle: not in
prefetchedHashes(never executed), gone fromoverflowHeap, gone fromplanCh(already consumed).
Concrete step-by-step proof — overflow path permanent loss
- Iteration N:
extendedBudget= 30000 from freed gas. scanOverflowencounters T3 at the head ofoverflowHeapwithGas = 21000.remaining -= 21000(extendedBudget now 9000);bonus = [T3];h.Shift()removes T3 fromoverflowHeappermanently.forwardTxs(batch ∪ [T3]): prior batch fillstxsCh; T3 hitsdefaultand is dropped.- T3 is not in
sentThisPhase(the drop path doesn't record), not inprefetchedHashes(never executed), and not inoverflowHeap(shifted out).extendedBudgetwas reduced by 21000 with zero yield.
Impact
Block-building correctness is entirely unaffected: prefetch is explicitly best-effort throwaway state and the committed block is determined by the real commitTransactions path, not the prefetcher. There is no consensus or security impact — no mutations to persistent state, no changes to validator/sprint/snapshot logic, no panic or crash surface. The observable impact is bounded degradation of the worker/prefetch/builder_added_percent metric in txsCh-saturation windows. Given the 4096-deep buffer plus N≈4·NumCPU/5 workers draining continuously, realistic trigger conditions are narrow (sustained bursts where provider outpaces drain) and per-block impact is at most a handful of unprefetched txs.
How to fix
Option A (simpler): populate sentThisPhase only inside forwardTxs on the successful-send branch, and use a transient per-iteration set passed to scanOverflow for within-iteration dedup. This makes sentThisPhase semantically match its comment.
Option B (more invasive): on a forwardTxs drop of a bonus tx, refund the budget (extendedBudget += tx.Gas) and track unsent bonus txs separately for retry. Since h.Shift() can't be undone cheaply, practical implementations will want Option A together with a lightweight retry queue for the overflow path.
Severity
All six verifier confirmations (three for bug_001, three for bug_002) independently converged on nit. This is a real semantic defect in the deduplication/accounting design that slightly undermines the PR's headline efficiency metric, but has no correctness, consensus, or security consequences. Prefetch operates on throwaway state and the block-building path is entirely independent.
| // Report prefetch coverage percentage | ||
| if len(env.txs) > 0 && genParams != nil && genParams.prefetchedTxHashes != nil { | ||
| prefetchedCount := 0 | ||
| builderAddedCount := 0 | ||
|
|
||
| // Count how many block transactions were prefetched | ||
| for _, tx := range env.txs { | ||
| if _, ok := genParams.prefetchedTxHashes.Load(tx.Hash()); ok { | ||
| prefetchedCount++ | ||
| } | ||
| if genParams.builderPrefetchedTxHashes != nil { | ||
| if _, ok := genParams.builderPrefetchedTxHashes.Load(tx.Hash()); ok { | ||
| builderAddedCount++ | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Calculate miss rate (0-100): higher = worse | ||
| // Miss rate (0-100, higher = worse). | ||
| missRate := int64((len(env.txs) - prefetchedCount) * 100 / len(env.txs)) | ||
| prefetchMissRateHistogram.Update(missRate) | ||
|
|
||
| // Builder-added share (0-100): block txs the builder phase prefetched on | ||
| // its own. Only emitted when the builder phase actually ran. | ||
| if genParams.builderPrefetchedTxHashes != nil { | ||
| builderAdded := int64(builderAddedCount * 100 / len(env.txs)) | ||
| prefetchBuilderAddedHistogram.Update(builderAdded) | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 The deferred metrics block in commit() (worker.go:2754-2780) reads genParams.builderPrefetchedTxHashes and prefetchedTxHashes while PrefetchStream workers may still be draining in-flight txs from txsCh and writing to those maps via onSuccess. close(builderPlanCh) in buildAndCommitBlock only signals runBuilderTxProvider to return; the <-streamDone barrier is awaited in the separate prefetcher goroutine, not the main goroutine that calls commit(). sync.Map makes this safe (no data race), but the reads happen before the last onSuccess writes land, so worker/prefetch/builder_added_percent — the PR's headline feature metric — is systematically under-counted by up to ~numWorkers txs per block under load.
Extended reasoning...
What the bug is
In buildAndCommitBlock (miner/worker.go), the sequence around block commit is:
- Line 2265:
genParams.planWg.Wait()— waits for sendPlan goroutines. - Line 2268-2270:
close(genParams.builderGasFreedCh). - Line 2273-2275:
close(genParams.builderPlanCh)— causesrunBuilderTxProvider'scollectPlanBatchto returnbuilderDone=true, andrunBuilderTxProviderreturns. - Line 2306:
w.commit(work.copy(), ..., genParams)— runs in the same (main) goroutine.
Meanwhile, the prefetcher goroutine launched at line 2191 (go func() { ... w.runPrefetcher(...) }) is separate. When runBuilderTxProvider returns inside it, runPrefetcher then executes shutdown() (close txsCh) and <-streamDone. Crucially, the main buildAndCommitBlock goroutine does not wait on any of that before calling commit().
The specific race
commit()'s deferred block at worker.go:2754-2780 iterates env.txs and calls .Load() on both genParams.prefetchedTxHashes and genParams.builderPrefetchedTxHashes. At that instant, PrefetchStream workers (launched inside runPrefetcher) can still be executing any tx that was dequeued from txsCh before the channel was closed. Each such worker runs ApplyMessage + IntermediateRoot(true) (~1-10ms of wall time) and then invokes onSuccess(hash, gasUsed), which calls .Store on both maps. These .Store calls can happen after the defer's .Load calls read the maps.
Why existing code does not prevent it
The close(builderPlanCh) handshake only terminates the provider side of the pipeline — it tells runBuilderTxProvider there are no more plan txs. It does not drain or wait on the worker side. The <-streamDone barrier that does wait for all workers is sequenced inside runPrefetcher (after shutdown()), which runs in the prefetcher goroutine — not the main goroutine. sync.Map prevents a data-race crash, but it does not give the reader a consistent snapshot: any Store completing after the reader's Load is simply missed.
Impact
Metric-only. Block-building correctness and consensus are entirely unaffected (prefetch is best-effort throughout). The impact is systematic under-counting of worker/prefetch/builder_added_percent — the PR description calls this the "headline metric for judging whether the new phase is pulling its weight." Late onSuccess writes are bounded per block by the number of workers (max(1, 4*NumCPU/5), ~6 on an 8-core machine), so the bias is small but systematic. FinalizeAndAssemble runs inside commit() before the defer fires and acts as a natural buffer, which is why the devnet 3-5% number isn't zero — but under saturated pool/worker load the bias tilts further downward.
How to fix
Any of: (a) await <-streamDone from buildAndCommitBlock before calling commit() (requires surfacing the barrier via genParams); (b) snapshot the maps at close(builderPlanCh) time into a slice, then compute the metric from the snapshot; (c) move the metric computation downstream of the prefetcher's exit.
Step-by-step proof
runBuilderTxProviderloop reachesforwardTxsfor batch [T1, T2]; both land intxsCh(buffered).- Workers W1 and W2 dequeue T1 and T2 respectively and begin
ApplyMessage+IntermediateRoot(true). buildAndCommitBlockreaches line 2274:close(builderPlanCh).collectPlanBatch's next select iteration readsplanChclosed → returnsbuilderDone=true.runBuilderTxProviderreturns.- Main goroutine continues from line 2275 into
switch err→ reaches line 2306:w.commit(work.copy(), ..., genParams). commit()runsFinalizeAndAssemble, submits totaskCh, returns. Total wall time: tens of ms, typically longer than W1/W2's remaining EVM work — but not always, and not deterministically.commit()'s defer fires. It calls.Load(T1.Hash)and.Load(T2.Hash)onbuilderPrefetchedTxHashes.- If W1 or W2 had not yet reached its
onSuccesscall at the moment of the.Load(e.g. still insideIntermediateRoot), that tx is not counted — even though it was genuinely prefetched during the builder phase. prefetchBuilderAddedHistogramreceives a value lower than the true builder contribution for this block.
|
@claude ultra-review this PR. Context:
Please focus only on angles the above does not already cover: 1. Concurrency correctness. Trace every interaction between
2. Resource-accounting invariants, proven not just asserted:
3. Perf regression on the hot import path. Compare the new 4. Failure modes:
Scope exclusions:
|
|
Claude finished @lucca30's task in 5m 26s —— View job Ultra-Review: Streaming Prefetcher Concurrency & Correctness
All prior review findings (88c4e44..827c017) verified as addressed. This review covers only the four scoped areas. One actionable finding; remainder is clean. 1. Concurrency CorrectnessTraced all interactions between Lost-wakeup between Double-close of Goroutine stall ordering — Panic during handoff that skips Concurrent No concurrency issues found. 2. Resource-Accounting Invariants(a)
|
| Path | txIndex incremented? | fails incremented? | Counted? |
|---|---|---|---|
processTx → prefetchOneTx succeeds |
Yes (line 196) | No | valid |
processTx → prefetchOneTx interrupt |
Yes (line 196) | Yes (line 233) | invalid |
processTx → prefetchOneTx error (any) |
Yes (line 196) | Yes (lines 239, 247, 258) | invalid |
processTx → worker panic in prefetchOneTx |
Yes (line 196) | Yes (recover at line 137) | invalid |
runWorker → evmAbort=true → continue |
No | No | not counted (correct) |
runWorker → hardKill=true → return |
No | No | not counted (correct) |
At stream exit (line 147-149):
processed := ctx.txIndex.Load()
blockPrefetchTxsValidMeter.Mark(processed - ctx.fails.Load())
blockPrefetchTxsInvalidMeter.Mark(ctx.fails.Load())valid + invalid = (processed - fails) + fails = processed = txIndex. Invariant holds on all exit paths. ✅
(b) No tx hash reaches txsCh twice in a single runBuilderTxProvider invocation
Three dedup layers:
collectPlanBatch(line 2665-2666): checksprefetchedHashes.Load(tx.Hash())— skips txs already completed by idle/prior builder work.sentThisPhasepre-populated (lines 2600-2602) with current plan-batch hashes beforescanOverflowruns — prevents within-iteration plan-vs-overflow duplicates (fixed in 62a72c0).forwardTxs(line 2624) records each successfully-sent tx insentThisPhase— prevents cross-iteration re-emission.
The one gap: forwardTxs drops on a full txsCh buffer (default branch, line 2626-2627), but the pre-populate at lines 2600-2602 already added the hash to sentThisPhase unconditionally. A dropped plan tx still has a sentThisPhase entry, so it won't be re-emitted by scanOverflow in future iterations. However, a dropped bonus tx from scanOverflow was h.Shift()-ed out of the overflow heap (line 1950) and its hash was not pre-populated in sentThisPhase (only plan txs are pre-populated). If it was added to sentThisPhase by forwardTxs on the successful-send branch, it's tracked. If it was dropped, it's gone from the heap and not in sentThisPhase — it can't reappear. So no hash can reach txsCh twice. ✅
(c) builderPrefetchedTxHashes is only written while inBuilderPhase==true
onSuccess at line 2363: if inBuilderPhase.Load() && .... inBuilderPhase is set to true at line 2406, strictly after evmAbort.Store(false) at line 2403. The residual edge case (worker completing IntermediateRoot during the handoff, acknowledged in comments at lines 2351-2356) could misattribute at most 1 tx per block. Documented and acceptable. ✅
3. Perf Regression on Hot Import Path
The import path (blockchain.go:782) calls Prefetch(block, throwaway, vmCfg, false, followupInterrupt), which now wraps PrefetchStream via a pre-filled channel.
Strictly new work per tx on the import path vs the old errgroup code:
- 1 channel send (non-blocking, buffer == len(txs)) + 1 channel receive per tx
- 1
txIndex.Add(1)atomic increment (old code used loop variableidirectly) - 2 nil-checks (
evmAbort,onSuccess) that short-circuit immediately since both are nil
Total overhead: ~2-5 nanoseconds per tx. Negligible against statedb.Copy() + ApplyMessage + IntermediateRoot (~1-10ms each).
Structural improvement: Old code called statedb.Copy() in the dispatching goroutine before spawning work. New code calls statedb.Copy() inside the worker goroutine (prefetchOneTx line 244), allowing copies to happen concurrently — slightly lower latency under contention.
Worker pool size: Identical — max(1, 4*runtime.NumCPU()/5) in both old and new.
No performance regression on the import path. ✅
4. Failure Modes
builderPlanCh closed before runBuilderTxProvider starts
builderPlanCh is created at line 2223 and closed at line 2274, both in buildAndCommitBlock (main goroutine). runBuilderTxProvider runs in the prefetcher goroutine. The ordering: builderStarted.Store(true) (line 2225) → runIdleTxProvider exits (sees builderStarted=true) → handoff → runBuilderTxProvider starts.
Can close(builderPlanCh) at line 2274 fire before runBuilderTxProvider starts? Yes — if fillTransactions + planWg.Wait() completes before the prefetcher goroutine reaches line 2408. In that case, collectPlanBatch reads from a closed planCh and gets builderDone=true immediately (line 2660). runBuilderTxProvider returns immediately. Any plan txs were buffered in the 4096-deep channel and consumed; any freed-gas signals were buffered in the 256-deep builderGasFreedCh. These are lost. This is acceptable: the entire block was already built; prefetching for an already-committed block is pointless. ✅
Empty txPool at buildOverflowHeap time
buildOverflowHeap (line 2635-2641) calls Pending() which returns an empty map. newTransactionsByPriceAndNonce with empty txs returns an empty heap. scanOverflow immediately breaks at line 2917 (ltx == nil). No bonus txs emitted. No crash, no wasted work. ✅
Stale pool snapshot (overflow heap built at T, consumed at T+100ms after pool churned)
buildOverflowHeap is called once at the start of runBuilderTxProvider (line 2576). The overflow heap is then scanned repeatedly over the 2ms batch windows. Pool churn means:
- Txs in the overflow heap may have been evicted from the pool →
ltx.Resolve()returns nil →h.Pop()at line 1944, skipped gracefully. - New txs arriving after the snapshot are invisible to the overflow scan → missed opportunity, but prefetch is best-effort. The plan channel covers the primary path (builder's own view).
No crash, no incorrect behavior. Stale snapshot is a deliberate design tradeoff acknowledged in the PR. ✅
Concurrent hardKill + evmAbort set from different goroutines mid-phase
Already covered in Analysis 1. Both are atomic.Bool; no ordering dependency. Workers prioritize hardKill. The shutdown() function sets evmAbort=true then close(txsCh), both inside sync.Once. If hardKill=true races with shutdown(), workers exit on whichever signal they see first. ✅
🟡 One Remaining Unfixed Finding
prepareWork early-return bypasses defer interruptPrefetch.Store(true) — zombie prefetcher on prepareWork failure
File: miner/worker.go lines 2209-2228
func (w *worker) buildAndCommitBlock(..., interruptPrefetch *atomic.Bool) {
work, err := w.prepareWork(genParams, w.makeWitness)
if err != nil {
return // ← line 2212: exits BEFORE the defer at line 2228 is registered
}
...
defer interruptPrefetch.Store(true) // ← line 2228: too lateThe prefetcher goroutine (spawned unconditionally at line 2191) is never signaled to stop. It runs runIdleTxProvider which loops until gas exhaustion (totalGasPool.Gas() == 0), burning through the full header gas budget (~100M gas at default PrefetchGasLimitPercent=100) across 4*NumCPU/5 workers doing throwaway EVM executions.
This was flagged in the previous review round but not fixed.
Why it's bounded, not a leak: runPrefetcher's deferred shutdown() (via sync.Once) ensures txsCh is closed once runIdleTxProvider returns, so the PrefetchStream goroutine and workers self-terminate. No goroutine leak — just wasted CPU.
Production trigger: prepareWork → engine.Prepare can fail via UnauthorizedSignerError, Heimdall snapshot/span lookup failures, or makeEnv → stateless.NewWitness() errors. On a Giugliano primary producer, Prepare includes the succession-delay time.After — but the runPrefetcher call to makeHeader(..., false) skips this wait and starts streaming immediately, so the zombie is active for the full duration of the wasted work.
Impact: ~100M gas of throwaway EVM work across ~6 workers per prepareWork failure. No correctness/consensus impact, but real CPU waste on an uncommon-but-realistic error path. If prepareWork fails repeatedly (e.g., prolonged Heimdall issues), zombie prefetchers from successive commitWork calls compete for CPU.
Fix: Move the defer before the err check:
func (w *worker) buildAndCommitBlock(..., interruptPrefetch *atomic.Bool) {
defer interruptPrefetch.Store(true) // ← moved up: covers ALL exit paths
work, err := w.prepareWork(genParams, w.makeWitness)
if err != nil {
return
}Summary
| Area | Verdict |
|---|---|
| Lost-wakeup in handoff | Clean |
| Double-close of txsCh | Protected by sync.Once |
| Goroutine stall ordering | Clean — channel close unblocks all paths |
| Panic during handoff | sync.Once defer covers it; PrefetchStream goroutine self-terminates |
txIndex == valid + invalid |
Holds on all 6 exit paths (verified per-path) |
No-dup in runBuilderTxProvider |
Three dedup layers cover all reachable cases |
builderPrefetchedTxHashes attribution |
Correctly gated; 1-tx residual acknowledged |
| Import path perf | ~2-5ns overhead per tx; statedb.Copy() moved to worker (slight improvement) |
builderPlanCh closed early |
Graceful — collectPlanBatch sees closed channel, exits |
| Empty pool | Graceful — empty heap, no work |
| Stale snapshot | By design — Resolve() nil-check handles evicted txs |
| Concurrent kill+abort | Atomic stores, no ordering dependency, workers prioritize hardKill |
| prepareWork early-return | Unfixed — zombie prefetcher burns CPU until gas exhaustion |
The concurrency design is sound. The sync.Once shutdown, two-signal interrupt model, and channel lifecycle are correctly composed. The one actionable item is the defer placement in buildAndCommitBlock — a one-line move.
| Branch: lmartins/lmartins/builder-prefetch-mode-switch
Move `defer interruptPrefetch.Store(true)` to the top of buildAndCommitBlock so the prefetcher goroutine is signaled to stop on every return path — including the early return when prepareWork fails (UnauthorizedSignerError, Heimdall snapshot/span failures, stateless.NewWitness errors). Previously the defer sat after the err check, so on prepareWork failure runIdleTxProvider would loop until gas exhaustion, burning ~100M gas of throwaway EVM work across the worker pool before self-terminating. No correctness/consensus impact; fixes a real CPU-waste path on an uncommon-but-realistic error branch. Bounded by PrefetchStream's sync.Once shutdown (no goroutine leak), but the wasted work was material under prolonged Heimdall issues when consecutive commitWork calls could pile up zombie prefetchers.
Pass sentThisPhase into collectPlanBatch and skip txs already present, mirroring the existing scanOverflow dedup pattern. Closes the third edge of the builder-phase dedup matrix: 1. within-iter plan↔overflow — covered by pre-populate sentThisPhase before scanOverflow (commit 62a72c0) 2. plan→overflow cross-iter — covered by scanOverflow's sentThisPhase read (commit 88c4e44) 3. overflow→plan cross-iter — this commit Scenario: scanOverflow emits tx T in iteration N (h.Shift() past T, sentThisPhase[T]=true). Worker W1 begins multi-ms EVM on T. In iteration N+1, collectPlanBatch reads a buffered copy of T from planCh; prefetchedHashes is still empty because onSuccess hasn't fired, so T slips through and gets forwarded a second time. Impact is wasted worker capacity only (throwaway state, no consensus/correctness effect), but the race opens wider exactly on contract-heavy workloads where builder-phase prefetch matters most. Kurtosis run #2 scenario E didn't catch this because value-transfer EVM finishes in ~10µs, well inside the 2ms batch window — so prefetchedHashes raced fast enough to plug the gap. Added TestCollectPlanBatch_SkipsInflight as the regression test.
|



Purpose
Today's prefetcher only runs in a speculative idle phase: before block-building starts, it scans the tx pool and warms state for what it guesses the block might contain. Whatever it misses — late-arriving p2p txs, txs unlocked by freed gas, anything the pool view didn't reflect at guess-time — pays the full cache-miss cost at commit time. On contract-heavy blocks this is the tail of the miss-rate distribution.
This PR adds three more prefetch phases, synchronized to the live block build, so the prefetcher stops guessing and starts following what the builder is actually about to do. All three target exactly the gap the idle phase leaves: near-certain txs the builder will commit, warmed with near-zero speculation.
Prefetch lifecycle (after this PR)
One long-lived worker pool per block; four sequential tx providers feed it. The pool is never torn down between phases — the mode switch is a provider swap, not a prefetcher restart.
buildAndCommitBlockis still assembling its environmentcommitTransactionspass (priority + normal)sendPlanclones the price-nonce heap;buildTxPlanwalks it withremainingGas()as budget, emits every non-prefetched tx that fitscommitTransactions, right before each applybuilderPlanChas the builder reaches itltx.Gas − actualUseddelivered viabuilderGasFreedCh; overflow heap scanned when budget accumulatesPop()as "too large"All three builder phases dedup against
prefetchedTxHashesand asentThisPhaselocal set, so no tx is re-executed. The three-edge dedup matrix (plan↔overflow within iter, plan→overflow cross-iter, overflow→plan cross-iter) is fully closed.Handoff sequence (idle → builder)
Coordinated by a two-signal interrupt to avoid pool teardown:
builderStarted→ coordinator setsevmAbort.Store(true). In-flight idle EVM execution aborts via the EVM interrupt; workers entering the loop see the flag and skip.txsChnon-blockingly.evmAbort.Store(false). Workers resume, now fed by the builder provider (upfront-plan + per-tx + freed-gas overflow).No duplicate prefetches, no lost builder txs, no worker pool churn.
Precision impact
New headline metric:
worker/prefetch/builder_added_percent— fraction of a block's txs the builder phase (upfront plan + per-tx + overflow combined) warmed on its own, beyond what idle had already done. Attributes each prefetch completion to the phase it fired under, so operators can see at a glance how much of a block's cache warming idle alone could not have reached.Also split:
worker/txApplyDuration/{prefetched,notPrefetched}— quantifies the cache-miss penalty this lifecycle is closing.Together these two metrics answer: how many txs needed builder-phase help, and how much apply-time that help saved.
Implementation notes
range txsCh(buffer 4096, ≈33 KB); phase switches change producer only, not pool.Prefetch(block, ...)keeps its original signature as a thin wrapper aroundPrefetchStream. Same topology, same parallelism, samePrefetchResultshape.statedb.Copy()— no shared state, no consensus-path coupling.Validation
go build ./...+golangci-lint runcleango test -race ./core/... ./miner/...— 121s, all prefetch + slow-tx tests pass. Includes 13 new unit tests on the pure primitives (scanOverflow,forwardTxs,collectPlanBatch,buildTxPlan, no-duplicate-forward invariant across 10k iterations) and 3 stream-lifecycle integration tests.diffguard --base origin/develop— no new complexity violations.processed == fails + successfulin everyPrefetchStreamexit (2180+ closures).chain_prefetch_worker_panicmeter = 1050, harness log count = 1050 (exact match). Node kept producing blocks throughout.prepareWorkearly-return bypassingdefer interruptPrefetch.Store(true)) — fixed.