fix(op-node): probe engine finalized head on EL-sync startup#20139
Draft
sebastianst wants to merge 1 commit intodevelopfrom
Draft
fix(op-node): probe engine finalized head on EL-sync startup#20139sebastianst wants to merge 1 commit intodevelopfrom
sebastianst wants to merge 1 commit intodevelopfrom
Conversation
When op-node is configured with --syncmode=execution-layer, NewEngineController unconditionally initializes syncStatus to syncStatusWillStartEL. After a restart against an already-synced engine, op-node stalls there indefinitely: SyncStep backs off on isEngineInitialELSyncing(), and the only place that transitions out of syncStatusWillStartEL is insertUnsafePayload — which requires a fresh unsafe payload that may not arrive (the sequencer gossips blocks the engine already has). Add MaybeSkipELSyncIfEngineAlreadySynced: a startup guard that queries the engine's finalized head while in syncStatusWillStartEL. If the engine is already synced (non-genesis finalized head, and SupportsPostFinalizationELSync is not set), transition directly to syncStatusFinishedEL and emit ResetEngineRequestEvent so op-node's in-memory heads are populated via FindL2Heads. The emit happens after the mutex is released to avoid re-entering OnEvent under the lock. Call it at the top of SyncStep before the EL-sync backoff — it's a no-op once syncStatus has transitioned, so it runs at most a few times on startup. The finalized-head check that drives both startup probing and the existing transition inside insertUnsafePayload is factored out into checkEngineAlreadySynced. Fixes #18468 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e451773 to
90669a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the stall described in #18468: when op-node is configured with
--syncmode=execution-layer,NewEngineControllerunconditionally initializessyncStatus = syncStatusWillStartEL. After a restart against an already-synced engine, op-node sits in that state forever:SyncStepbacks off onisEngineInitialELSyncing(), and the only place that transitions out ofsyncStatusWillStartELisinsertUnsafePayload— which requires a fresh unsafe payload to arrive. If the sequencer is only gossipping blocks the engine already has, op-node and reth deadlock (reth logsno consensus updates received for a while; op-node logsunsafe=0 safe=0 el_syncing=trueindefinitely).Approach
MaybeSkipELSyncIfEngineAlreadySyncedonEngineController: whilesyncStatus == syncStatusWillStartEL, probe the engine's finalized head. If it's a non-genesis block (andSupportsPostFinalizationELSyncis not set), transition directly tosyncStatusFinishedELand emitResetEngineRequestEventso op-node's in-memory heads are populated viaFindL2Heads. The emit happens after the mutex is released, becauseResetEngineRequestEvent's handler re-acquires the same lock viaOnEvent.SyncStepbefore the EL-sync backoff. It's a no-op oncesyncStatushas transitioned, so it runs at most a few times on startup.insertUnsafePayload'ssyncStatusWillStartELbranch) intocheckEngineAlreadySynced, used by both call sites.Scope notes
CLSyncmode,syncStatusnever transitions into any EL-syncing state, soisEngineInitialELSyncing()is always false there and the FCU-reduction changes don't alter the gossip→FCU path. The real bug is the startup-stall in ELSync mode, which this PR addresses.CLSyncmode. op-node has no direct signal for that today. A follow-up would need op-node to inspect engine FCU responses (SYNCING vs VALID) to track engine-initiated sync state.Test plan
go build ./op-node/...— cleango vet ./op-node/...— cleango test ./op-node/rollup/engine/...— pass (newTestMaybeSkipELSyncIfEngineAlreadySyncedwith four sub-tests)go test ./op-node/rollup/{driver,derive,attributes,sequencing,finality}/...— passop-e2e/actions/sync/...— requires forge-artifacts build; deferred to CIFixes #18468
🤖 Generated with Claude Code