Skip to content

Enhance DomainAdapter to retrieve missing UTXOs from archive store#962

Open
nicolasLuduena wants to merge 2 commits intomainfrom
fix-get-utxos-u5c
Open

Enhance DomainAdapter to retrieve missing UTXOs from archive store#962
nicolasLuduena wants to merge 2 commits intomainfrom
fix-get-utxos-u5c

Conversation

@nicolasLuduena
Copy link
Copy Markdown
Member

@nicolasLuduena nicolasLuduena commented Apr 9, 2026

This pull request enhances the LedgerContext implementation for DomainAdapter by improving the retrieval of UTXO (Unspent Transaction Output) data. The main improvement is a fallback mechanism that attempts to fetch missing UTXOs directly from archived blocks if they are not found in the state store. This makes UTXO lookups more robust and comprehensive.

Key improvements to UTXO retrieval:

  • Added logic to detect missing UTXOs after the initial state store lookup and attempt to recover them by:
    • Grouping missing references by transaction hash.
    • Looking up the block slot containing each transaction via the index store.
    • Fetching the corresponding block from the archive store and decoding it.
    • Extracting the relevant transaction outputs and inserting them into the result set.

Supporting changes:

  • Added imports for HashMap, MultiEraBlock, and trait imports for ArchiveStore and IndexStore to support the new logic.

Summary by CodeRabbit

  • Bug Fixes

    • Improved UTXO lookup reliability with a multi-stage retrieval that falls back to archived blocks to reconstruct missing outputs, ensuring more complete and consistent access to past transactions.
  • Chores

    • Removed the legacy WAL-based historical input scanning path.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

After a state-store lookup, DomainAdapter::get_utxos now detects missing UTXOs and reconstructs them by grouping refs by transaction hash, resolving slots via indexes, fetching archived block bytes, decoding blocks, locating transactions, and inserting outputs into the result map. get_historical_utxos was removed.

Changes

Cohort / File(s) Summary
UTXO resolution & archive fallback
src/adapters/mod.rs
Reworked DomainAdapter::get_utxos: build dolos_refs, query StateStore::get_utxos, detect missing refs, group missing refs by tx-hash, resolve slot via indexes().slot_by_tx_hash, fetch block bytes via archive().get_block_by_slot, decode with MultiEraBlock::decode, locate transactions by hash, extract outputs and insert (era, encoded_output) into the returned UtxoMap. Removed get_historical_utxos (WAL-based path).

Sequence Diagram

sequenceDiagram
    participant Caller as Caller
    participant StateStore as StateStore
    participant Indexes as Indexes
    participant Archive as Archive
    participant Decoder as BlockDecoder
    participant Result as ResultMap

    Caller->>StateStore: get_utxos(dolos_refs)
    StateStore-->>Caller: initial UtxoMap
    Caller->>Caller: compute missing refs
    alt no missing refs
        Caller-->>Result: return UtxoMap
    else missing refs exist
        Caller->>Caller: group refs by tx_hash
        loop per tx_hash
            Caller->>Indexes: slot_by_tx_hash(tx_hash)
            Indexes-->>Caller: slot
            Caller->>Archive: get_block_by_slot(slot)
            Archive-->>Caller: block_bytes
            Caller->>Decoder: MultiEraBlock::decode(block_bytes)
            Decoder-->>Caller: decoded_block (era, txs...)
            Caller->>Caller: find tx where tx.hash() == tx_hash
            Caller->>Result: insert (tx_output.index -> (block.era, output.encode()))
        end
        Caller-->>Result: return populated UtxoMap
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • scarmuega

Poem

🐇 I hopped through slots and hashed domain,

I chased the outputs down the chain,
When state was sparse and refs were thin,
I fetched the block and searched within,
Now missing UTXOs return again.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately summarizes the main enhancement: adding fallback logic to retrieve missing UTXOs from the archive store when they are not found in the state store.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-get-utxos-u5c

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/adapters/mod.rs`:
- Around line 209-216: The current pattern using let Ok(Some(...)) =
self.indexes().slot_by_tx_hash(&tx_hash_bytes) else { continue } (and similarly
for self.archive().get_block_by_slot and MultiEraBlock::decode) collapses Err
into “not found”; instead match each call explicitly: handle Ok(Some(...)) as
before, treat Ok(None) as a genuine not-found (continue), and propagate or
return Err for any Err results so archive/index/decoding failures are not
swallowed. Update the call sites (self.indexes().slot_by_tx_hash,
self.archive().get_block_by_slot, MultiEraBlock::decode) to use match or ? to
return errors, and adjust the enclosing function’s return type if needed to
return Result<Option<...>, E> rather than silently continuing on Err.
- Around line 198-235: DomainAdapter::get_utxos currently reconstructs missing
outputs from ArchiveStore (via indexes().slot_by_tx_hash,
archive().get_block_by_slot, MultiEraBlock::decode) and returns them into the
current-state API; change this so the fallback does not expose historical/spent
outputs: either remove the ArchiveStore reconstruction entirely (leave missing
refs absent) or, if you keep reconstruction, validate each reconstructed output
against the current StateStore (use StateStore APIs / the same lookup used for
present UTXOs) to ensure the output is still unspent before inserting into
result; update get_utxos accordingly and keep get_historical_utxos as the sole
path for historical-only reads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 316af519-02f9-4e05-9f82-2f3a43966c5a

📥 Commits

Reviewing files that changed from the base of the PR and between d90cd81 and 6734f71.

📒 Files selected for processing (1)
  • src/adapters/mod.rs

Comment thread src/adapters/mod.rs
Comment on lines +198 to +235
let missing: Vec<_> = refs.iter().filter(|r| !result.contains_key(r)).collect();
if missing.is_empty() {
return Some(result);
}

let some = dolos_core::StateStore::get_utxos(self.state(), refs)
.ok()?
.into_iter()
.map(|(k, v)| {
let era = v.0.try_into().expect("era out of range");
(k.into(), (era, v.1.clone()))
})
.collect();

Some(some)
let mut by_tx: HashMap<Vec<u8>, Vec<&pallas::interop::utxorpc::TxoRef>> = HashMap::new();
for txo_ref in &missing {
by_tx.entry(txo_ref.0.to_vec()).or_default().push(txo_ref);
}

for (tx_hash_bytes, txo_refs) in by_tx {
let Ok(Some(slot)) = self.indexes().slot_by_tx_hash(&tx_hash_bytes) else {
continue;
};
let Ok(Some(block_bytes)) = self.archive().get_block_by_slot(&slot) else {
continue;
};
let Ok(block) = MultiEraBlock::decode(&block_bytes) else {
continue;
};

let block_txs = block.txs();
let Some(tx) = block_txs
.iter()
.find(|tx| tx.hash().as_ref() == tx_hash_bytes.as_slice())
else {
continue;
};

let outputs = tx.outputs();
let era = block.era();

for txo_ref in txo_refs {
let Some(output) = outputs.get(txo_ref.1 as usize) else {
continue;
};
result.insert(*txo_ref, (era, output.encode()));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== LedgerContext implementations ==\n'
rg -nP --type=rust -C3 'impl\s+pallas::interop::utxorpc::LedgerContext\s+for\s+\w+|fn\s+get_utxos\s*\('

printf '\n== Historical lookup definitions / call sites ==\n'
rg -nP --type=rust -C3 '\bfn\s+get_historical_utxos\s*\(|(?<!fn\s)\bget_historical_utxos\s*\('

printf '\n== Non-definition get_utxos call sites ==\n'
rg -nP --type=rust -C3 '(?<!fn\s)\bget_utxos\s*\('

Repository: txpipe/dolos

Length of output: 13638


🏁 Script executed:

# Check comments and context around the fallback code in DomainAdapter
sed -n '150,250p' src/adapters/mod.rs

# Look for any documentation or comments about why ArchiveStore fallback exists
rg -B5 -A5 'fn get_utxos' src/adapters/mod.rs

# Check if there are tests that verify this fallback behavior
rg -B3 -A3 'get_utxos.*ArchiveStore|ArchiveStore.*get_utxos|historical.*fallback|fallback.*historical' --type=rust

Repository: txpipe/dolos

Length of output: 3910


🏁 Script executed:

# Check git history for when this fallback was added and any commit messages
git log --oneline -p -- src/adapters/mod.rs | grep -A 20 "get_utxos" | head -100

# Look for any documentation or design notes about LedgerContext
find . -name "*.md" -o -name "*.txt" | xargs grep -l "LedgerContext\|get_utxos\|historical.*utxo" 2>/dev/null | head -5

# Check if StateStore.get_utxos is designed to handle spent outputs (i.e., does it return None for spent?)
rg -B5 -A15 'fn get_utxos.*StateStore|impl.*StateStore.*{' crates/core/src/state.rs | head -80

# Look for comments or documentation explaining StateStore behavior
rg -B5 -A10 'pub trait StateStore|// StateStore|StateStore.*current' crates/core/src/state.rs

Repository: txpipe/dolos

Length of output: 1875


🏁 Script executed:

# Search for StateStore trait definition and documentation
rg -A30 'pub trait StateStore' crates/core/src/state.rs

# Look for comments in the StateStore implementation files
rg 'StateStore.*current|current.*state|unspent|spent' crates/core/src/state.rs -B2 -A2

# Check AGENTS.md or other docs that were mentioned in learnings
cat AGENTS.md 2>/dev/null | grep -A10 -B10 "StateStore\|current.*state\|historical"

# Look for whether get_utxos is expected to return spent outputs or only current state
rg -B5 -A5 'spent|unspent' src/adapters/mod.rs crates/cardano/src/estart/loading.rs

Repository: txpipe/dolos

Length of output: 10288


Confirm: this fallback should not return historical/spent outputs.

The fallback mechanism in DomainAdapter::get_utxos violates the documented storage semantics. Per AGENTS.md, StateStore represents "current ledger state (the 'world view')" while ArchiveStore is strictly "historical block storage". The fallback reconstructs missing outputs directly from ArchiveStore without validating whether they remain in the current unspent set—a call site in estart/loading.rs explicitly expects "outputs that are still unspent" (line 70).

The separate get_historical_utxos method confirms this is an intentional API boundary. Either:

  1. Remove the ArchiveStore fallback entirely (missing outputs stay missing), or
  2. Add validation that reconstructed outputs are still present in the current StateStore before returning them

The fallback as currently written allows spent outputs to leak into the current-state API.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/adapters/mod.rs` around lines 198 - 235, DomainAdapter::get_utxos
currently reconstructs missing outputs from ArchiveStore (via
indexes().slot_by_tx_hash, archive().get_block_by_slot, MultiEraBlock::decode)
and returns them into the current-state API; change this so the fallback does
not expose historical/spent outputs: either remove the ArchiveStore
reconstruction entirely (leave missing refs absent) or, if you keep
reconstruction, validate each reconstructed output against the current
StateStore (use StateStore APIs / the same lookup used for present UTXOs) to
ensure the output is still unspent before inserting into result; update
get_utxos accordingly and keep get_historical_utxos as the sole path for
historical-only reads.

Comment thread src/adapters/mod.rs
Comment on lines +209 to +216
let Ok(Some(slot)) = self.indexes().slot_by_tx_hash(&tx_hash_bytes) else {
continue;
};
let Ok(Some(block_bytes)) = self.archive().get_block_by_slot(&slot) else {
continue;
};
let Ok(block) = MultiEraBlock::decode(&block_bytes) else {
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t collapse archive/index failures into “not found”.

This fallback currently returns a partial Some(result) when the index lookup, archive read, or block decode errors out. That makes backend corruption/transient outages indistinguishable from a genuine missing ref.

Proposed fix
         for (tx_hash_bytes, txo_refs) in by_tx {
-            let Ok(Some(slot)) = self.indexes().slot_by_tx_hash(&tx_hash_bytes) else {
-                continue;
-            };
-            let Ok(Some(block_bytes)) = self.archive().get_block_by_slot(&slot) else {
-                continue;
-            };
-            let Ok(block) = MultiEraBlock::decode(&block_bytes) else {
-                continue;
-            };
+            let Some(slot) = self.indexes().slot_by_tx_hash(&tx_hash_bytes).ok()? else {
+                continue;
+            };
+            let Some(block_bytes) = self.archive().get_block_by_slot(&slot).ok()? else {
+                continue;
+            };
+            let block = MultiEraBlock::decode(&block_bytes).ok()?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/adapters/mod.rs` around lines 209 - 216, The current pattern using let
Ok(Some(...)) = self.indexes().slot_by_tx_hash(&tx_hash_bytes) else { continue }
(and similarly for self.archive().get_block_by_slot and MultiEraBlock::decode)
collapses Err into “not found”; instead match each call explicitly: handle
Ok(Some(...)) as before, treat Ok(None) as a genuine not-found (continue), and
propagate or return Err for any Err results so archive/index/decoding failures
are not swallowed. Update the call sites (self.indexes().slot_by_tx_hash,
self.archive().get_block_by_slot, MultiEraBlock::decode) to use match or ? to
return errors, and adjust the enclosing function’s return type if needed to
return Result<Option<...>, E> rather than silently continuing on Err.

@nicolasLuduena nicolasLuduena changed the title Fix u5c domain adapter Enhance DomainAdapter to retrieve missing UTXOs from archive store Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant