Rollback unsuccessful preconfs in the mempool#3264
Conversation
PR SummaryMedium Risk Overview This adds new rollback plumbing across the pool: Reviewed by Cursor Bugbot for commit f989aab. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Late preconfirmation causes spurious rollback of valid dependents
- The worker now ignores preconfirmations at or below already-processed canonical heights, preventing stale entries from triggering rollback against later blocks, and a regression test covers the late-arrival scenario.
Or push these changes by commenting:
@cursor push b6bf02a282
Preview (b6bf02a282)
diff --git a/crates/services/txpool_v2/src/pool_worker.rs b/crates/services/txpool_v2/src/pool_worker.rs
--- a/crates/services/txpool_v2/src/pool_worker.rs
+++ b/crates/services/txpool_v2/src/pool_worker.rs
@@ -138,6 +138,7 @@
pool: tx_pool,
view_provider,
tentative_preconfs: BTreeMap::new(),
+ latest_processed_block_height: None,
};
tokio_runtime.block_on(async {
@@ -276,6 +277,8 @@
/// Used to roll back stale preconfirmations when the canonical block at
/// that height does not include those transactions.
tentative_preconfs: BTreeMap<BlockHeight, HashSet<TxId>>,
+ /// The highest canonical block height already processed by this worker.
+ latest_processed_block_height: Option<BlockHeight>,
}
impl<View, TxStatusManager> PoolWorker<View, TxStatusManager>
@@ -494,6 +497,10 @@
fn process_block(&mut self, block_result: SharedImportResult) {
let block_height = *block_result.sealed_block.entity.header().height();
+ self.latest_processed_block_height = Some(
+ self.latest_processed_block_height
+ .map_or(block_height, |latest| latest.max(block_height)),
+ );
let confirmed_tx_ids: HashSet<TxId> = block_result
.tx_status
@@ -570,6 +577,25 @@
tx_id: TxId,
status: PreConfirmationStatus,
) {
+ let preconfirmed_height = match &status {
+ PreConfirmationStatus::Success(status) => Some(status.tx_pointer.block_height()),
+ PreConfirmationStatus::Failure(status) => Some(status.tx_pointer.block_height()),
+ PreConfirmationStatus::SqueezedOut(_) => None,
+ };
+
+ if let (Some(height), Some(latest_processed)) =
+ (preconfirmed_height, self.latest_processed_block_height)
+ && height <= latest_processed
+ {
+ tracing::debug!(
+ "Ignoring late preconfirmation for tx {} at height {} (latest processed {})",
+ tx_id,
+ height,
+ latest_processed
+ );
+ return;
+ }
+
let (outputs, block_height) = match &status {
PreConfirmationStatus::Success(status) => {
self.pool.process_preconfirmed_committed_transaction(tx_id);
diff --git a/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs b/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs
--- a/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs
+++ b/crates/services/txpool_v2/src/tests/tests_preconf_rollback.rs
@@ -11,6 +11,7 @@
block::Block,
consensus::Sealed,
},
+ entities::coins::coin::CompressedCoin,
fuel_tx::{
Output,
TxPointer,
@@ -332,3 +333,80 @@
service.stop_and_await().await.unwrap();
}
+
+/// A preconfirmation that arrives after its canonical block has already been
+/// imported must be ignored, otherwise a later block import can roll back valid
+/// dependents of that already-committed transaction.
+#[tokio::test]
+async fn late_preconfirmation_does_not_rollback_valid_dependents() {
+ // Given
+ let (block_sender, block_receiver) = tokio::sync::mpsc::channel(10);
+ let mut universe = TestPoolUniverse::default();
+ let (output_a, unset_input_a) = universe.create_output_and_input();
+ let tx_parent = universe.build_script_transaction(None, Some(vec![output_a.clone()]), 1);
+ let tx_parent_id = tx_parent.id(&Default::default());
+
+ // Seed DB with the parent output as already committed in canonical block 1.
+ let (owner, amount, asset_id) = match &output_a {
+ Output::Coin {
+ to,
+ amount,
+ asset_id,
+ } => (*to, *amount, *asset_id),
+ _ => panic!("Expected a coin output"),
+ };
+ let mut coin = CompressedCoin::default();
+ coin.set_owner(owner);
+ coin.set_amount(amount);
+ coin.set_asset_id(asset_id);
+ universe
+ .database_mut()
+ .data
+ .lock()
+ .unwrap()
+ .coins
+ .insert(UtxoId::new(tx_parent_id, 0), coin);
+
+ let service = universe.build_service(
+ None,
+ Some(MockImporter::with_block_provider(block_receiver)),
+ );
+ service.start_and_await().await.unwrap();
+
+ // Canonical block at height 1 already contains tx_parent.
+ block_sender
+ .send(make_block_import(1, &[tx_parent_id]))
+ .await
+ .unwrap();
+ tokio::time::sleep(std::time::Duration::from_millis(50)).await;
+
+ // Late preconfirmation for the already-processed height 1.
+ universe.send_preconfirmation(
+ tx_parent_id,
+ make_preconf_success(tx_parent_id, 1, output_a),
+ );
+ tokio::time::sleep(std::time::Duration::from_millis(50)).await;
+
+ // tx_child is valid because tx_parent output is in canonical DB.
+ let input_a = unset_input_a.into_input(UtxoId::new(tx_parent_id, 0));
+ let tx_child = universe.build_script_transaction(Some(vec![input_a]), None, 2);
+ let tx_child_id = tx_child.id(&Default::default());
+ service.shared.insert(tx_child).await.unwrap();
+ universe
+ .await_expected_tx_statuses_submitted(vec![tx_child_id])
+ .await;
+
+ // When a later block is imported, no rollback should be triggered from the
+ // late preconfirmation.
+ block_sender.send(make_block_import(2, &[])).await.unwrap();
+ tokio::time::sleep(std::time::Duration::from_millis(50)).await;
+
+ // Then the valid dependent remains in the pool.
+ let found = service.shared.find(vec![tx_child_id]).await.unwrap();
+ assert!(
+ found[0].is_some(),
+ "tx_child should stay in pool; late preconfirmation must be ignored"
+ );
+
+ service.stop_and_await().await.unwrap();
+}This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
MitchTurner
left a comment
There was a problem hiding this comment.
The logic looks good to me.
I've noticed that our code has a lot more comments than it has in the past. I assume this is due to agents we are using. I'm of the opinion that we include an AGENTS.md file in our repos to ensure that we follow some standards. I don't mind the comments in the domain code--although I don't like a ton--I definitely don't like so many in the tests though.
nit: the test are using given/when/then kinda, but aren't following the send__when_x_then_y_happens pattern which in conjunction with removing the comments would make them easier to read.
| spender_of_inputs: HashMap<TxId, Vec<InputKey>>, | ||
| /// Inputs permanently spent during preconfirmation processing, saved so | ||
| /// they can be rolled back if the preconfirmation turns out to be stale. | ||
| tentative_spent: HashMap<TxId, Vec<InputKey>>, |
There was a problem hiding this comment.
Is there any chance that his will grow indefinitely?
Seems fine to me. It's not good to test internal values anyway, but worth considering if this is a "leak" in any way.
There was a problem hiding this comment.
I don't think so, unless blocks are not getting imported at all. These are always cleared for when the associated block height gets imported.
Sounds reasonable.
I personally like having more comments rather than less. IMO our codebases are very under-commented when it comes to reasons why things look like they do, and this makes it rather it hard to follow i.e. what a field is used for. This is especially nice in tests which are most of the time written once and rarely touched again.
I much prefer the style where the human readable text is in a comment block and the test name is mostly used as a shorthand identifier. |
In it's best form, the That's the vision. |
|
I think there is still a rollback gap for spent-input cleanup in the producer-local extracted-first path. The happy paths seem fine:
The unhappy path is local producer + rollback:
That is the bug: the original inputs remain marked spent in Suggested fix direction:
I think this needs a regression test covering:
|
|
Another potential rollback gap to consider is transactions that were admitted only because a preconfirmed tx temporarily created a contract. The coin-dependent rollback path looks covered, but I do not see equivalent cleanup for contract-created dependents. The happy path is straightforward: if a preconfirmed tx The concerning case is when So this looks like a separate rollback edge case: the temporary contract creation can be removed, but a dependent tx that was only valid because of that temporary contract may still remain in the pool. It would probably be worth either adding symmetric contract-dependent tracking alongside |
|
A third thing worth considering is stale preconfirmations that arrive after the node has already imported a newer canonical block. This feels less like a pure cleanup/rollback issue and more like an admission check: I do not see That means if the node has already imported canonical height
This is only really a problem if the late preconfirmation is for a tx that did not actually end up in the canonical block at the height referenced by its This PR will eventually reconcile that state on a later block import, so it is not necessarily permanent corruption. But it does create a real stale-acceptance window where txpool can temporarily treat old, non-canonical preconfirmed state as live until the next block import arrives. So I think there is a real correctness risk here, not just a defensive tidy-up. If stale preconfirmations can arrive late, they should probably be ignored before mutating txpool state whenever I think this needs a regression test covering:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 84b0c4a. Configure here.
Voxelot
left a comment
There was a problem hiding this comment.
fixes look good, not sure why ci isn't happy though. will approve again if needed once xi is passing.
Closes #3098. When a block producer sends preconfirmation updates, sentry nodes optimistically treat the included transactions as committed, removing them from the mempool and marking their inputs as spent. If the producer crashes and re-produces a block at the same height without those transactions, the mempool is left in a stale state: inputs stay marked as spent and outputs linger in `extracted_outputs`, preventing re-submission of rolled-back transactions and causing dependents to reference non-existent UTXOs. This PR makes preconfirmed transactions tentative until the canonical block at their height is imported. On import, preconfirmed txs present in the block are confirmed and their tracking is cleared; those absent are rolled back by restoring inputs, purging dependents, and emitting `SqueezedOut` notifications. It also adds integration tests: re-insertion after rollback, dependent eviction, normal confirmation, and stale-height cleanup. - [x] Breaking changes are clearly marked as such in the PR description and changelog - [x] New behavior is reflected in tests - [x] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) - [ ] I have reviewed the code myself - [x] I have created follow-up issues caused by this PR and linked them here
| Input::Contract(ContractInput { contract_id, .. }) => { | ||
| if let Some(users) = self.contract_users.get_mut(contract_id) { | ||
| users.retain(|id| id != &tx_id); | ||
| if users.is_empty() { | ||
| self.contract_users.remove(contract_id); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
I think removing transaction IDs here, you will not have access to them during rollback_preconfirmed_transaction. Because we remove it when a transaction is included in the block during production, while you call rollback_preconfirmed_transaction after block is done, so you will not have access to them. It only should affect authority, I guess, so maybe it is fine. I think sentries should still be able to work with it. But maybe we want to move this clean up int oprocess_block
Closes #3098. When a block producer sends preconfirmation updates, sentry nodes optimistically treat the included transactions as committed, removing them from the mempool and marking their inputs as spent. If the producer crashes and re-produces a block at the same height without those transactions, the mempool is left in a stale state: inputs stay marked as spent and outputs linger in `extracted_outputs`, preventing re-submission of rolled-back transactions and causing dependents to reference non-existent UTXOs. This PR makes preconfirmed transactions tentative until the canonical block at their height is imported. On import, preconfirmed txs present in the block are confirmed and their tracking is cleared; those absent are rolled back by restoring inputs, purging dependents, and emitting `SqueezedOut` notifications. It also adds integration tests: re-insertion after rollback, dependent eviction, normal confirmation, and stale-height cleanup. - [x] Breaking changes are clearly marked as such in the PR description and changelog - [x] New behavior is reflected in tests - [x] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) - [ ] I have reviewed the code myself - [x] I have created follow-up issues caused by this PR and linked them here
Closes #3098. When a block producer sends preconfirmation updates, sentry nodes optimistically treat the included transactions as committed, removing them from the mempool and marking their inputs as spent. If the producer crashes and re-produces a block at the same height without those transactions, the mempool is left in a stale state: inputs stay marked as spent and outputs linger in `extracted_outputs`, preventing re-submission of rolled-back transactions and causing dependents to reference non-existent UTXOs. This PR makes preconfirmed transactions tentative until the canonical block at their height is imported. On import, preconfirmed txs present in the block are confirmed and their tracking is cleared; those absent are rolled back by restoring inputs, purging dependents, and emitting `SqueezedOut` notifications. It also adds integration tests: re-insertion after rollback, dependent eviction, normal confirmation, and stale-height cleanup. - [x] Breaking changes are clearly marked as such in the PR description and changelog - [x] New behavior is reflected in tests - [x] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) - [ ] I have reviewed the code myself - [x] I have created follow-up issues caused by this PR and linked them here


Closes #3098.
Problem
When a block producer sends preconfirmation updates, sentry nodes optimistically treat the included transactions as committed, removing them from the mempool and marking their inputs as spent. If the producer crashes and re-produces a block at the same height without those transactions, the mempool is left in a stale state: inputs stay marked as spent and outputs linger in
extracted_outputs, preventing re-submission of rolled-back transactions and causing dependents to reference non-existent UTXOs.Solution
This PR makes preconfirmed transactions tentative until the canonical block at their height is imported. On import, preconfirmed txs present in the block are confirmed and their tracking is cleared; those absent are rolled back by restoring inputs, purging dependents, and emitting
SqueezedOutnotifications. It also adds integration tests: re-insertion after rollback, dependent eviction, normal confirmation, and stale-height cleanup.Checklist
Before requesting review