Skip to content

fix(op-acceptance-tests): stabilize TestUnsafeGapFillAfterUnsafeReorg_RestartL2CL#20199

Merged
wwared merged 2 commits intodevelopfrom
aj/fix/unsafe-gap-fill-reorg-flake
Apr 22, 2026
Merged

fix(op-acceptance-tests): stabilize TestUnsafeGapFillAfterUnsafeReorg_RestartL2CL#20199
wwared merged 2 commits intodevelopfrom
aj/fix/unsafe-gap-fill-reorg-flake

Conversation

@ajsutton
Copy link
Copy Markdown
Contributor

@ajsutton ajsutton commented Apr 21, 2026

Claude: Opened by Claude on behalf of @ajsutton. Closes #19936.

Fixes two independent races in TestUnsafeGapFillAfterUnsafeReorg_RestartL2CL. First, ReorgTriggered returns as soon as the reorg block is rewritten and does not wait for the sequencer to rebuild past the verifier's frozen unsafe head, so the immediately-following seqUnsafe > verUnsafe assertion could sample the sequencer mid-rebuild and fail (matching the CI evidence of seq:18 ver:18). Fixed by capturing the verifier's unsafe head right after L2CLB.Stop() and waiting on Reached(eth.Unsafe, verUnsafeFrozen.Number+1, 30) before the assertion, making the ordering structurally guaranteed.

Second, the two pre-reorg Matched calls used a 5-attempt (~10s) budget that assumed the verifier was in lockstep with the sequencer. Under cold-start contention (e.g. op-reth JIT compile) the sequencer can burst-produce 100+ blocks before the verifier begins syncing, and 10s isn't enough to close the gap. Bumped both budgets to 30 attempts (60s), consistent with the existing post-stop budget of 50.

@ajsutton ajsutton force-pushed the aj/fix/unsafe-gap-fill-reorg-flake branch 2 times, most recently from 637551e to 4a8ada2 Compare April 21, 2026 02:57
@ajsutton ajsutton marked this pull request as ready for review April 21, 2026 03:02
@ajsutton ajsutton requested a review from a team as a code owner April 21, 2026 03:02
…_RestartL2CL

ReorgTriggered only waits until the block at the reorg-target height has
been rewritten on the sequencer; it does not wait for the sequencer to
rebuild past the verifier's frozen unsafe head. The test then asserts
seq.Number > ver.Number, which races against the sequencer producing the
next block after the reorg. In failing CI (op-geth, pipeline 123064, job
4866963) both heads were observed at 18 because the sequencer had
reorged and rebuilt exactly up to the verifier's frozen height but had
not yet extended past it.

Capture the verifier's unsafe head immediately after stopping L2CLB (at
which point the verifier cannot advance — its CL is down and the preset
has NoDiscovery), then after ReorgTriggered returns wait for the
sequencer to reach verFrozen.Number + 1 before sampling the heads for
the assertion. This makes the seq > ver ordering structurally guaranteed
rather than racing with post-reorg block production.

Refs: #19936
…afeGapFillAfterUnsafeReorg_RestartL2CL

The two pre-reorg Matched calls used only 5 attempts (~10s). This assumes
the verifier is already in lockstep with the sequencer, which breaks when
the verifier starts late relative to the sequencer and has to catch up
tens of blocks — e.g. slow op-reth cold start or a gossipsub mesh that
takes time to form under CI contention. When this happens the sequencer
produces a burst of blocks to catch up to wall-clock genesis time, leaving
the verifier well behind. 10s is not enough to close that gap via gossip.

Bump both Matched budgets to 30 attempts (60s) — consistent with the
existing post-stop gap-close budget of 50 attempts — so the verifier can
finish catching up before the test perturbs the system. The happy path
still returns on first success, so fast runs are unaffected.
@ajsutton ajsutton force-pushed the aj/fix/unsafe-gap-fill-reorg-flake branch from 4a8ada2 to 3d8c640 Compare April 21, 2026 22:38
@wwared wwared added this pull request to the merge queue Apr 22, 2026
Merged via the queue into develop with commit 11b9948 Apr 22, 2026
67 checks passed
@wwared wwared deleted the aj/fix/unsafe-gap-fill-reorg-flake branch April 22, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: TestUnsafeGapFillAfterUnsafeReorg_RestartL2CL

2 participants