Skip to content

Rollback unsuccessful preconfs in the mempool#3264

Merged
Dentosal merged 6 commits intomasterfrom
dento/mempool-rollback-failed-preconfs
Apr 16, 2026
Merged

Rollback unsuccessful preconfs in the mempool#3264
Dentosal merged 6 commits intomasterfrom
dento/mempool-rollback-failed-preconfs

Conversation

@Dentosal
Copy link
Copy Markdown
Member

@Dentosal Dentosal commented Apr 14, 2026

Closes #3098.

Problem

When a block producer sends preconfirmation updates, sentry nodes optimistically treat the included transactions as committed, removing them from the mempool and marking their inputs as spent. If the producer crashes and re-produces a block at the same height without those transactions, the mempool is left in a stale state: inputs stay marked as spent and outputs linger in extracted_outputs, preventing re-submission of rolled-back transactions and causing dependents to reference non-existent UTXOs.

Solution

This PR makes preconfirmed transactions tentative until the canonical block at their height is imported. On import, preconfirmed txs present in the block are confirmed and their tracking is cleared; those absent are rolled back by restoring inputs, purging dependents, and emitting SqueezedOut notifications. It also adds integration tests: re-insertion after rollback, dependent eviction, normal confirmation, and stale-height cleanup.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

@Dentosal Dentosal self-assigned this Apr 14, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 14, 2026

PR Summary

Medium Risk
Touches core txpool commit/preconfirmation paths and input/output accounting; incorrect edge cases could cause mempool inconsistency or unintended eviction, though changes are well-covered by new integration tests.

Overview
Preconfirmation handling in txpool_v2 is changed to be tentative until the canonical block is imported: the worker now tracks preconfirmed txids by tentative height, confirms them when they appear in the block, and otherwise rolls them back (restoring spent inputs, clearing extracted_outputs, and emitting SqueezedOut for dependents).

This adds new rollback plumbing across the pool: SpentInputs now records/clears tentative spends, ExtractedOutputs can query contracts created by a tx for cleanup, and the collision manager tracks Input::Contract users so contract-dependent txs admitted only via preconfirmed ContractCreated outputs are evicted on rollback; late preconfirmations at or below the canonical tip are ignored. Integration tests are added to cover reinsertion, dependent eviction (coins + contracts), extracted-first regression, stale-height cleanup, and late-preconf ignoring.

Reviewed by Cursor Bugbot for commit f989aab. Bugbot is set up for automated code reviews on this repo. Configure here.

@Dentosal Dentosal marked this pull request as ready for review April 14, 2026 17:30
@Dentosal Dentosal requested review from a team, MitchTurner and xgreenx as code owners April 14, 2026 17:30
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Late preconfirmation causes spurious rollback of valid dependents
    • The worker now ignores preconfirmations at or below already-processed canonical heights, preventing stale entries from triggering rollback against later blocks, and a regression test covers the late-arrival scenario.

Create PR

Or push these changes by commenting:

@cursor push b6bf02a282
Preview (b6bf02a282)
diff --git a/crates/services/txpool_v2/src/pool_worker.rs b/crates/services/txpool_v2/src/pool_worker.rs
--- a/crates/services/txpool_v2/src/pool_worker.rs
+++ b/crates/services/txpool_v2/src/pool_worker.rs
@@ -138,6 +138,7 @@
                     pool: tx_pool,
                     view_provider,
                     tentative_preconfs: BTreeMap::new(),
+                    latest_processed_block_height: None,
                 };
 
                 tokio_runtime.block_on(async {
@@ -276,6 +277,8 @@
     /// Used to roll back stale preconfirmations when the canonical block at
     /// that height does not include those transactions.
     tentative_preconfs: BTreeMap<BlockHeight, HashSet<TxId>>,
+    /// The highest canonical block height already processed by this worker.
+    latest_processed_block_height: Option<BlockHeight>,
 }
 
 impl<View, TxStatusManager> PoolWorker<View, TxStatusManager>
@@ -494,6 +497,10 @@
 
     fn process_block(&mut self, block_result: SharedImportResult) {
         let block_height = *block_result.sealed_block.entity.header().height();
+        self.latest_processed_block_height = Some(
+            self.latest_processed_block_height
+                .map_or(block_height, |latest| latest.max(block_height)),
+        );
 
         let confirmed_tx_ids: HashSet<TxId> = block_result
             .tx_status
@@ -570,6 +577,25 @@
         tx_id: TxId,
         status: PreConfirmationStatus,
     ) {
+        let preconfirmed_height = match &status {
+            PreConfirmationStatus::Success(status) => Some(status.tx_pointer.block_height()),
+            PreConfirmationStatus::Failure(status) => Some(status.tx_pointer.block_height()),
+            PreConfirmationStatus::SqueezedOut(_) => None,
+        };
+
+        if let (Some(height), Some(latest_processed)) =
+            (preconfirmed_height, self.latest_processed_block_height)
+            && height <= latest_processed
+        {
+            tracing::debug!(
+                "Ignoring late preconfirmation for tx {} at height {} (latest processed {})",
+                tx_id,
+                height,
+                latest_processed
+            );
+            return;
+        }
+
         let (outputs, block_height) = match &status {
             PreConfirmationStatus::Success(status) => {
                 self.pool.process_preconfirmed_committed_transaction(tx_id);

diff --git a/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs b/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs
--- a/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs
+++ b/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs
@@ -11,6 +11,7 @@
         block::Block,
         consensus::Sealed,
     },
+    entities::coins::coin::CompressedCoin,
     fuel_tx::{
         Output,
         TxPointer,
@@ -332,3 +333,80 @@
 
     service.stop_and_await().await.unwrap();
 }
+
+/// A preconfirmation that arrives after its canonical block has already been
+/// imported must be ignored, otherwise a later block import can roll back valid
+/// dependents of that already-committed transaction.
+#[tokio::test]
+async fn late_preconfirmation_does_not_rollback_valid_dependents() {
+    // Given
+    let (block_sender, block_receiver) = tokio::sync::mpsc::channel(10);
+    let mut universe = TestPoolUniverse::default();
+    let (output_a, unset_input_a) = universe.create_output_and_input();
+    let tx_parent = universe.build_script_transaction(None, Some(vec![output_a.clone()]), 1);
+    let tx_parent_id = tx_parent.id(&Default::default());
+
+    // Seed DB with the parent output as already committed in canonical block 1.
+    let (owner, amount, asset_id) = match &output_a {
+        Output::Coin {
+            to,
+            amount,
+            asset_id,
+        } => (*to, *amount, *asset_id),
+        _ => panic!("Expected a coin output"),
+    };
+    let mut coin = CompressedCoin::default();
+    coin.set_owner(owner);
+    coin.set_amount(amount);
+    coin.set_asset_id(asset_id);
+    universe
+        .database_mut()
+        .data
+        .lock()
+        .unwrap()
+        .coins
+        .insert(UtxoId::new(tx_parent_id, 0), coin);
+
+    let service = universe.build_service(
+        None,
+        Some(MockImporter::with_block_provider(block_receiver)),
+    );
+    service.start_and_await().await.unwrap();
+
+    // Canonical block at height 1 already contains tx_parent.
+    block_sender
+        .send(make_block_import(1, &[tx_parent_id]))
+        .await
+        .unwrap();
+    tokio::time::sleep(std::time::Duration::from_millis(50)).await;
+
+    // Late preconfirmation for the already-processed height 1.
+    universe.send_preconfirmation(
+        tx_parent_id,
+        make_preconf_success(tx_parent_id, 1, output_a),
+    );
+    tokio::time::sleep(std::time::Duration::from_millis(50)).await;
+
+    // tx_child is valid because tx_parent output is in canonical DB.
+    let input_a = unset_input_a.into_input(UtxoId::new(tx_parent_id, 0));
+    let tx_child = universe.build_script_transaction(Some(vec![input_a]), None, 2);
+    let tx_child_id = tx_child.id(&Default::default());
+    service.shared.insert(tx_child).await.unwrap();
+    universe
+        .await_expected_tx_statuses_submitted(vec![tx_child_id])
+        .await;
+
+    // When a later block is imported, no rollback should be triggered from the
+    // late preconfirmation.
+    block_sender.send(make_block_import(2, &[])).await.unwrap();
+    tokio::time::sleep(std::time::Duration::from_millis(50)).await;
+
+    // Then the valid dependent remains in the pool.
+    let found = service.shared.find(vec![tx_child_id]).await.unwrap();
+    assert!(
+        found[0].is_some(),
+        "tx_child should stay in pool; late preconfirmation must be ignored"
+    );
+
+    service.stop_and_await().await.unwrap();
+}

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread crates/services/txpool_v2/src/pool_worker.rs
Copy link
Copy Markdown
Member

@MitchTurner MitchTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good to me.

I've noticed that our code has a lot more comments than it has in the past. I assume this is due to agents we are using. I'm of the opinion that we include an AGENTS.md file in our repos to ensure that we follow some standards. I don't mind the comments in the domain code--although I don't like a ton--I definitely don't like so many in the tests though.

nit: the test are using given/when/then kinda, but aren't following the send__when_x_then_y_happens pattern which in conjunction with removing the comments would make them easier to read.

spender_of_inputs: HashMap<TxId, Vec<InputKey>>,
/// Inputs permanently spent during preconfirmation processing, saved so
/// they can be rolled back if the preconfirmation turns out to be stale.
tentative_spent: HashMap<TxId, Vec<InputKey>>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance that his will grow indefinitely?

Seems fine to me. It's not good to test internal values anyway, but worth considering if this is a "leak" in any way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, unless blocks are not getting imported at all. These are always cleared for when the associated block height gets imported.

@Dentosal
Copy link
Copy Markdown
Member Author

I've noticed that our code has a lot more comments than it has in the past. I assume this is due to agents we are using. I'm of the opinion that we include an AGENTS.md file in our repos to ensure that we follow some standards.

Sounds reasonable.

I don't mind the comments in the domain code--although I don't like a ton--I definitely don't like so many in the tests though.

I personally like having more comments rather than less. IMO our codebases are very under-commented when it comes to reasons why things look like they do, and this makes it rather it hard to follow i.e. what a field is used for. This is especially nice in tests which are most of the time written once and rarely touched again.

nit: the test are using given/when/then kinda, but aren't following the send__when_x_then_y_happens pattern which in conjunction with removing the comments would make them easier to read.

I much prefer the style where the human readable text is in a comment block and the test name is mostly used as a shorthand identifier.

@MitchTurner
Copy link
Copy Markdown
Member

I much prefer the style where the human readable text is in a comment block and the test name is mostly used as a shorthand identifier.

In it's best form, the send__ style of tests end up being a series of small tests that you can quickly check the behavior/coverage. So if your tests all start looking like each other, and are only 10-20 lines long, you can quickly understand the scope of all your tests. i.e. If I'm checking coverage of my code, having 10 short tests is much less cognitive overhead, where the differences are encapsulated in the name of one variable and the expect in the //then block, rather than having to read each test anew.

That's the vision.

@Voxelot
Copy link
Copy Markdown
Member

Voxelot commented Apr 15, 2026

I think there is still a rollback gap for spent-input cleanup in the producer-local extracted-first path.

The happy paths seem fine:

  1. Sentry / non-extracted path
  • T is still in txpool when the preconfirmation arrives
  • process_preconfirmed_committed_transaction(T) finds T in tx_id_to_storage_id
  • it records rollback state with record_tentative_spend(T, inputs)
  • if canon later omits T, unspend_preconfirmed(T) restores those inputs from tentative_spent
  1. Local producer / extracted path where T becomes canonical
  • T is extracted first, so maybe_spend_inputs(T, inputs) marks T’s inputs as spent in spent_inputs and stores the input list in spender_of_inputs[T]
  • later the local preconfirmation arrives and spend_inputs_by_tx_id(T) drains spender_of_inputs[T]
  • if the canonical block includes T, this is fine: the inputs are supposed to remain spent, and no rollback is needed

The unhappy path is local producer + rollback:

  1. T is extracted for production
  2. maybe_spend_inputs(T, inputs) does two things:
    • marks T’s inputs as spent in spent_inputs
    • stores the same input list in spender_of_inputs[T]
  3. T is removed from pool storage/selection
  4. local preconfirmation for T arrives
  5. process_preconfirmed_committed_transaction(T) calls spend_inputs_by_tx_id(T)
  6. spend_inputs_by_tx_id(T) drains spender_of_inputs[T], but it does NOT clear those inputs from spent_inputs:
    • it removes the bookkeeping entry
    • it leaves/reapplies the live spent markers in spent_inputs
  7. because T was already extracted, it is no longer in tx_id_to_storage_id, so the branch that calls record_tentative_spend(T, inputs) is skipped
  8. later the canonical block omits T
  9. rollback_preconfirmed_transaction(T) calls unspend_preconfirmed(T)
  10. unspend_preconfirmed(T) only clears:
  • InputKey::Tx(T)
  • the per-input keys saved in tentative_spent[T]
  1. but tentative_spent[T] was never populated in this extracted-first path
  2. so T’s coin/message input keys remain present in spent_inputs

That is the bug: the original inputs remain marked spent in spent_inputs, but the metadata needed to clear them was lost when spender_of_inputs[T] was drained and never copied into tentative_spent[T]. This means these inputs will be silently rejected by the authority node (ie p2p message rejection) until they expire from the LRU cache, blocking the user from retrying their transaction!

Suggested fix direction:

  • when a preconfirmation arrives for an already-extracted tx, preserve the original input keys in rollbackable state before spend_inputs_by_tx_id(T) drains spender_of_inputs[T]
  • in other words, the extracted / maybe_spent state needs to transition into tentative_spent, even when the tx is no longer present in tx_id_to_storage_id

I think this needs a regression test covering:

  • producer extracts T
  • producer locally processes preconfirmation for T
  • canonical block at that height omits T
  • T’s original inputs are no longer present in spent_inputs

@Voxelot
Copy link
Copy Markdown
Member

Voxelot commented Apr 15, 2026

Another potential rollback gap to consider is transactions that were admitted only because a preconfirmed tx temporarily created a contract.

The coin-dependent rollback path looks covered, but I do not see equivalent cleanup for contract-created dependents. The happy path is straightforward: if a preconfirmed tx P creates contract C, and a dependent tx D using Input::Contract(C, ...) is admitted while that preconfirmation is live, then everything is fine if the canonical block later includes P, because C really exists.

The concerning case is when P is preconfirmed, txpool records that temporary contract existence in extracted_outputs, and D is admitted because validation accepts extracted_outputs.contract_exists(...), but the canonical block at that height later omits P. Rollback clears the extracted outputs for P, but rollback_preconfirmed_transaction(P) appears to remove dependents only through get_coins_spenders(P). I do not see a symmetric cleanup path for txs that were admitted only because the temporary contract existed.

So this looks like a separate rollback edge case: the temporary contract creation can be removed, but a dependent tx that was only valid because of that temporary contract may still remain in the pool.

It would probably be worth either adding symmetric contract-dependent tracking alongside get_coins_spenders(...), or deriving those dependents from the dependency graph during rollback. I think this also needs a regression test covering a preconfirmed tx that creates a contract, a dependent tx inserted via Input::Contract(...), and then rollback when the canonical block omits the creator.

@Voxelot
Copy link
Copy Markdown
Member

Voxelot commented Apr 15, 2026

A third thing worth considering is stale preconfirmations that arrive after the node has already imported a newer canonical block.

This feels less like a pure cleanup/rollback issue and more like an admission check: I do not see process_preconfirmed_transaction(...) rejecting a preconfirmation whose tx_pointer.block_height() is already at or below the node’s current canonical height.

That means if the node has already imported canonical height H, and a delayed preconfirmation later arrives for some height <= H, txpool can still apply it immediately:

  • mark the tx as committed from txpool’s perspective
  • insert its resolved_outputs into extracted_outputs
  • admit dependents against those outputs
  • mark inputs spent

This is only really a problem if the late preconfirmation is for a tx that did not actually end up in the canonical block at the height referenced by its tx_pointer. In that case, the preconfirmation is now stale, but txpool will still temporarily treat it as live. That can block the tx’s original inputs from being usable, and can also temporarily admit dependents against outputs that never became canonical.

This PR will eventually reconcile that state on a later block import, so it is not necessarily permanent corruption. But it does create a real stale-acceptance window where txpool can temporarily treat old, non-canonical preconfirmed state as live until the next block import arrives.

So I think there is a real correctness risk here, not just a defensive tidy-up. If stale preconfirmations can arrive late, they should probably be ignored before mutating txpool state whenever preconf_height <= current_canonical_height.

I think this needs a regression test covering:

  • node imports canonical block at height H
  • delayed preconfirmation later arrives for height <= H
  • the referenced tx is not in the canonical block at that height
  • txpool ignores the preconfirmation and does not mutate spent_inputs / extracted_outputs

Comment thread crates/services/txpool_v2/src/collision_manager/basic.rs
@Dentosal
Copy link
Copy Markdown
Member Author

@Voxelot Addressed all these in 01410e0. I mostly followed your suggested fixes. I added tests for all of them too.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 84b0c4a. Configure here.

Comment thread crates/services/txpool_v2/src/pool.rs
Copy link
Copy Markdown
Member

@Voxelot Voxelot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes look good, not sure why ci isn't happy though. will approve again if needed once xi is passing.

@Dentosal Dentosal merged commit 85eedad into master Apr 16, 2026
40 checks passed
@Dentosal Dentosal deleted the dento/mempool-rollback-failed-preconfs branch April 16, 2026 20:18
Dentosal added a commit that referenced this pull request Apr 16, 2026
Closes #3098.

When a block producer sends preconfirmation updates, sentry nodes
optimistically treat the included transactions as committed, removing
them from the mempool and marking their inputs as spent. If the producer
crashes and re-produces a block at the same height without those
transactions, the mempool is left in a stale state: inputs stay marked
as spent and outputs linger in `extracted_outputs`, preventing
re-submission of rolled-back transactions and causing dependents to
reference non-existent UTXOs.

This PR makes preconfirmed transactions tentative until the canonical
block at their height is imported. On import, preconfirmed txs present
in the block are confirmed and their tracking is cleared; those absent
are rolled back by restoring inputs, purging dependents, and emitting
`SqueezedOut` notifications. It also adds integration tests:
re-insertion after rollback, dependent eviction, normal confirmation,
and stale-height cleanup.

- [x] Breaking changes are clearly marked as such in the PR description
and changelog
- [x] New behavior is reflected in tests
- [x] [The specification](https://github.com/FuelLabs/fuel-specs/)
matches the implemented behavior (link update PR if changes are needed)

- [ ] I have reviewed the code myself
- [x] I have created follow-up issues caused by this PR and linked them
here
Comment on lines +351 to +358
Input::Contract(ContractInput { contract_id, .. }) => {
if let Some(users) = self.contract_users.get_mut(contract_id) {
users.retain(|id| id != &tx_id);
if users.is_empty() {
self.contract_users.remove(contract_id);
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing transaction IDs here, you will not have access to them during rollback_preconfirmed_transaction. Because we remove it when a transaction is included in the block during production, while you call rollback_preconfirmed_transaction after block is done, so you will not have access to them. It only should affect authority, I guess, so maybe it is fine. I think sentries should still be able to work with it. But maybe we want to move this clean up int oprocess_block

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: #3268

Dentosal added a commit that referenced this pull request Apr 17, 2026
Closes #3098.

When a block producer sends preconfirmation updates, sentry nodes
optimistically treat the included transactions as committed, removing
them from the mempool and marking their inputs as spent. If the producer
crashes and re-produces a block at the same height without those
transactions, the mempool is left in a stale state: inputs stay marked
as spent and outputs linger in `extracted_outputs`, preventing
re-submission of rolled-back transactions and causing dependents to
reference non-existent UTXOs.

This PR makes preconfirmed transactions tentative until the canonical
block at their height is imported. On import, preconfirmed txs present
in the block are confirmed and their tracking is cleared; those absent
are rolled back by restoring inputs, purging dependents, and emitting
`SqueezedOut` notifications. It also adds integration tests:
re-insertion after rollback, dependent eviction, normal confirmation,
and stale-height cleanup.

- [x] Breaking changes are clearly marked as such in the PR description
and changelog
- [x] New behavior is reflected in tests
- [x] [The specification](https://github.com/FuelLabs/fuel-specs/)
matches the implemented behavior (link update PR if changes are needed)

- [ ] I have reviewed the code myself
- [x] I have created follow-up issues caused by this PR and linked them
here
MitchTurner pushed a commit that referenced this pull request Apr 17, 2026
Closes #3098.

When a block producer sends preconfirmation updates, sentry nodes
optimistically treat the included transactions as committed, removing
them from the mempool and marking their inputs as spent. If the producer
crashes and re-produces a block at the same height without those
transactions, the mempool is left in a stale state: inputs stay marked
as spent and outputs linger in `extracted_outputs`, preventing
re-submission of rolled-back transactions and causing dependents to
reference non-existent UTXOs.

This PR makes preconfirmed transactions tentative until the canonical
block at their height is imported. On import, preconfirmed txs present
in the block are confirmed and their tracking is cleared; those absent
are rolled back by restoring inputs, purging dependents, and emitting
`SqueezedOut` notifications. It also adds integration tests:
re-insertion after rollback, dependent eviction, normal confirmation,
and stale-height cleanup.

- [x] Breaking changes are clearly marked as such in the PR description
and changelog
- [x] New behavior is reflected in tests
- [x] [The specification](https://github.com/FuelLabs/fuel-specs/)
matches the implemented behavior (link update PR if changes are needed)

- [ ] I have reviewed the code myself
- [x] I have created follow-up issues caused by this PR and linked them
here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rollback unsuccessful preconfs in the mempool

4 participants