Conversation
Closes #3098. When a block producer sends preconfirmation updates, sentry nodes optimistically treat the included transactions as committed, removing them from the mempool and marking their inputs as spent. If the producer crashes and re-produces a block at the same height without those transactions, the mempool is left in a stale state: inputs stay marked as spent and outputs linger in `extracted_outputs`, preventing re-submission of rolled-back transactions and causing dependents to reference non-existent UTXOs. This PR makes preconfirmed transactions tentative until the canonical block at their height is imported. On import, preconfirmed txs present in the block are confirmed and their tracking is cleared; those absent are rolled back by restoring inputs, purging dependents, and emitting `SqueezedOut` notifications. It also adds integration tests: re-insertion after rollback, dependent eviction, normal confirmation, and stale-height cleanup. - [x] Breaking changes are clearly marked as such in the PR description and changelog - [x] New behavior is reflected in tests - [x] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) - [ ] I have reviewed the code myself - [x] I have created follow-up issues caused by this PR and linked them here
PR SummaryMedium Risk Overview Fixes PoA HA edge cases by (1) grouping reconciliation “votes” by Prevents a PoA leader deadlock after reconciliation import by introducing a shared Adds txpool preconfirmation rollback: tracks tentative preconfirmations by block height, records “tentative” spent inputs, ignores late preconfirmations below the canonical tip, and on block import rolls back preconfirmed tx state (including dependent coin- and contract-based transactions) when the canonical block omits them, with a new dedicated test suite. Reviewed by Cursor Bugbot for commit 6241f5c. Bugbot is set up for automated code reviews on this repo. Configure here. |
## Linked Issues/PRs <!-- List of related issues/PRs --> ## Description <!-- List of detailed changes --> ## Checklist - [ ] Breaking changes are clearly marked as such in the PR description and changelog - [ ] New behavior is reflected in tests - [ ] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) ### Before requesting review - [ ] I have reviewed the code myself - [ ] I have created follow-up issues caused by this PR and linked them here ### After merging, notify other teams [Add or remove entries as needed] - [ ] [Rust SDK](https://github.com/FuelLabs/fuels-rs/) - [ ] [Sway compiler](https://github.com/FuelLabs/sway/) - [ ] [Platform documentation](https://github.com/FuelLabs/devrel-requests/issues/new?assignees=&labels=new+request&projects=&template=NEW-REQUEST.yml&title=%5BRequest%5D%3A+) (for out-of-organization contributors, the person merging the PR will do this) - [ ] Someone else?
…same-block dea… (#3271) …dlock (#3269) ## Summary - Fixes a PoA reconciliation deadlock observed on devnet 2026-04-17 where the same block ended up on all 6 Redis nodes with three different epochs, causing permanent livelock - `unreconciled_blocks` now groups votes by `block_id` only, tracking max epoch as a tiebreaker. Identical blocks written during re-promotion storms count toward quorum. - Added a regression test that reproduces the exact production error string ## The bug During re-promotion storms (two pods racing for leadership), the same block can be written to different Redis nodes with different epochs. The old vote grouping `(epoch, block_id)` fragmented these identical blocks into separate vote groups: ``` Node state (same block_id, different epoch stamps): 1a-0, 1a-1, 1b-1: epoch 268 → vote group A, count=3 1b-0: epoch 269 → vote group B, count=1 1c-0, 1c-1: epoch 270 → vote group C, count=2 ← max-epoch winner Required quorum: 4. Winner count: 2 → repair attempted. Repair writes the winner to all 6 nodes → HEIGHT_EXISTS on every node (each has SOME entry at that height) → Written=0 → total=2 < quorum. Permanent livelock. ``` ## The fix Group by `block_id` alone; track max epoch per block_id as the tiebreaker when block_ids genuinely differ: ```rust // Before HashMap::<(u64, BlockId), (usize, SealedBlock)> vote_key = (*epoch, block.entity.id()) winner = max_by_key(epoch) // After HashMap::<BlockId, (u64, usize, SealedBlock)> // (max_epoch, count, block) vote_key = block.entity.id() winner = max_by_key(max_epoch) ``` **Behavior change:** - Same block with multiple epochs → single vote group → counts as a single block on N nodes → reconciles directly without repair (this fixes the deadlock) - Genuinely different blocks at same height → picks higher-epoch block → same behavior as before ## Test plan - [x] New test `leader_state__when_same_block_has_different_epochs_across_nodes_then_reconciles_without_repair` reproduces the exact production error without the fix (`"Backlog unresolved at height 1: repair failed to reach quorum"`) and passes with it - [x] All 9 existing `leader_state__*` tests still pass - [ ] Deploy to devnet and verify the stuck authority recovers Please go to the `Preview` tab and select the appropriate sub-template: * [Classic PR](?expand=1&template=default.md) * [Bump version](?expand=1&template=bump_version.md) --------- Co-authored-by: Brandon Kite <brandonkite92@gmail.com>
…port (#3261) (#3274) cherry-pick #3261 ## Summary - Fixes a deadlock in the PoA service that caused a 30-minute block production outage on testnet (April 9, 2026) - After a FENCING_ERROR, reconciliation imports a block via `execute_and_commit` which marks it as `Source::Network`. The SyncTask sees this and transitions from `Synced` → `NotSynced`. On the next iteration, `ensure_synced()` blocks forever — the leader can't produce while blocked, and the SyncTask needs a locally-produced block to recover. Classic deadlock. - Fix: add a reconciliation watermark (`Arc<AtomicU32>`) shared between `MainTask` and `SyncTask`. Before importing reconciliation blocks, `MainTask` sets the watermark to the max height. `SyncTask` treats blocks at heights ≤ the watermark as locally produced, staying `Synced`. ## Details **Root cause chain:** 1. `importer.rs:584-585` — `execute_and_commit` always uses `ImportResult::new_from_network()` 2. `sync.rs:186-203` — SyncTask transitions `Synced → NotSynced` on non-local block with height > current 3. `service.rs:501-521` — `ensure_synced()` blocks on `sync_state.changed()` when `NotSynced` 4. Deadlock: leader blocked in `ensure_synced()`, SyncTask waiting for locally-produced block that can never arrive **Why a watermark:** A bool flag has a race condition — the SyncTask may not poll the broadcast channel until after the flag is cleared. The watermark encodes a permanent fact ("all blocks up to height N were reconciled") that never needs clearing. **Files changed (all within `fuel-core-poa`):** - `sync.rs` — Add `reconciliation_watermark` field, check it in block handler - `service.rs` — Create shared watermark, set via `fetch_max` during reconciliation - `service_test.rs` — Add deadlock reproduction test ## Test plan - [x] `sync_task__network_block_at_reconciliation_height_causes_not_synced_without_watermark` — confirms bug mechanism (network block → NotSynced) - [x] `sync_task__network_block_within_watermark_stays_synced` — verifies watermark prevents NotSynced; blocks above watermark still trigger it - [x] `main_task__reconciliation_import_does_not_deadlock_leader` — full service-level deadlock reproduction (fails without fix, passes with) - [x] All 51 existing `fuel-core-poa` tests pass --------- Please go to the `Preview` tab and select the appropriate sub-template: * [Classic PR](?expand=1&template=default.md) * [Bump version](?expand=1&template=bump_version.md) --------- Co-authored-by: Brandon Kite <brandonkite92@gmail.com> Co-authored-by: Green Baneling <XgreenX9999@gmail.com> Co-authored-by: Hannes Karppila <2204863+Dentosal@users.noreply.github.com>
## Linked Issues/PRs <!-- List of related issues/PRs --> Cherrypick #3272 ## Description <!-- List of detailed changes --> ## Checklist - [ ] Breaking changes are clearly marked as such in the PR description and changelog - [ ] New behavior is reflected in tests - [ ] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) ### Before requesting review - [ ] I have reviewed the code myself - [ ] I have created follow-up issues caused by this PR and linked them here ### After merging, notify other teams [Add or remove entries as needed] - [ ] [Rust SDK](https://github.com/FuelLabs/fuels-rs/) - [ ] [Sway compiler](https://github.com/FuelLabs/sway/) - [ ] [Platform documentation](https://github.com/FuelLabs/devrel-requests/issues/new?assignees=&labels=new+request&projects=&template=NEW-REQUEST.yml&title=%5BRequest%5D%3A+) (for out-of-organization contributors, the person merging the PR will do this) - [ ] Someone else? Please go to the `Preview` tab and select the appropriate sub-template: * [Classic PR](?expand=1&template=default.md) * [Bump version](?expand=1&template=bump_version.md)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6241f5c. Configure here.
| if entry_height ~= nil and entry_height < posted_height then | ||
| stop_scan = true | ||
| break | ||
| end |
There was a problem hiding this comment.
Lua early-stop may miss HEIGHT_EXISTS on out-of-order streams
Medium Severity
The early-stop optimization halts the XREVRANGE scan when it finds an entry_height < posted_height. Since XREVRANGE iterates by stream ID (timestamp order), not by height value, this assumes heights are strictly monotonically increasing in the stream. If a partial write from a failed leader left an orphan at a higher height and then a new leader (with a valid epoch) successfully wrote at a lower height, the stream could contain out-of-order heights. In that case, the scan would stop early and miss an existing entry at posted_height, bypassing the HEIGHT_EXISTS safety check that prevents forks.
Reviewed by Cursor Bugbot for commit 6241f5c. Configure here.
There was a problem hiding this comment.
this is a fair assumption, since write_block.lua enforces that any new heights are greater than the current stream.


Version 0.47.4
Changed
Fixed